首页 > 编程语言 >python酒店相似度推荐系统

python酒店相似度推荐系统

时间:2024-03-10 18:23:42浏览次数:39  
标签:... 酒店 相似 python sum words import Seattle desc

import numpy as np
import pandas as pd
from nltk.corpus import stopwords
from sklearn.metrics.pairwise import linear_kernel
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.decomposition import LatentDirichletAllocation
import re
import random

import cufflinks
from plotly.offline import iplot
cufflinks.go_offline()

#加载数据集
data = pd.read_table('./Seattle_Hotels.txt',encoding="latin-1",sep = ',')
data

name address desc
0 Hilton Garden Seattle Downtown 1821 Boren Avenue, Seattle Washington 98101 USA Located on the southern tip of Lake Union, the...
1 Sheraton Grand Seattle 1400 6th Avenue, Seattle, Washington 98101 USA Located in the city's vibrant core, the Sherat...
2 Crowne Plaza Seattle Downtown 1113 6th Ave, Seattle, WA 98101 Located in the heart of downtown Seattle, the ...
3 Kimpton Hotel Monaco Seattle 1101 4th Ave, Seattle, WA98101 What?s near our hotel downtown Seattle locatio...
4 The Westin Seattle 1900 5th Avenue, Seattle, Washington 98101 USA Situated amid incredible shopping and iconic a...
... ... ... ...
147 The Halcyon Suite Du Jour 1125 9th Ave W, Seattle, WA 98119 Located in Queen Anne district, The Halcyon Su...
148 Vermont Inn 2721 4th Ave, Seattle, WA 98121 Just a block from the world famous Space Needl...
149 Stay Alfred on Wall Street 2515 4th Ave, Seattle, WA 98121 Stay Alfred on Wall Street resides in the hear...
150 Pike's Place Lux Suites by Barsala 2nd Ave and Stewart St, Seattle, WA 98101 The perfect marriage of heightened convenience...
151 citizenM Seattle South Lake Union hotel 201 Westlake Ave N, Seattle, WA 98109 Yes, it's true. Every room at citizenM is the ...

152 rows × 3 columns


data.shape
(152, 3)

data['desc'][100]
'On a budget in Seattle or looking for something different? The historic charm and "home away from home" atmosphere of The Baroness will be sure to make you feel like one of the family. Conveniently located on First Hill, we are proud to be part of the Virginia Mason Hospital campus and only minutes from Harborview Medical Center and Swedish Hospital. The Baroness Hotel is a great option for short or long term medical, patient or family stays. Whether you are visiting the area\'s world-class medical facilities or on a budget vacation, our goal is to ensure a wonderful stay. Guest Amenities: Complimentary Internet access, Two twin, one or two queen studios with mini fridge and microwave, Two twin or one queen suites with full kitchens, Laundry facilities available, Flat screen cable television with HBO, Complimentary local calls, Ice and vending machines located in the lobby, Coffee maker and hairdryers in all guestrooms, Room service available seven days a week from the Rhododendron Cafe, Limited wheelchair accessibility, Guest library and business center, Printing & fax services available, 100% non-smoking and pet free, Rooms are not air conditioned - fans are available, Self-parking available at Virginia Mason hospital for a fee.'

看一下酒店的主要描述信息

vec = CountVectorizer().fit(data['desc'])
bag_of_words = vec.transform(data['desc'])

bag_of_words.toarray()
array([[0, 1, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 1, 0, 0]], dtype=int64)

bag_of_words.shape
(152, 3200)
sum_words = bag_of_words.sum(axis =0 )

sum_words
matrix([[ 1, 11, 11, ...,  2,  6,  2]], dtype=int64)





def get_top_n_words(corpus,n=None):
    vec = CountVectorizer().fit(corpus)
    bag_of_words = vec.transform(corpus)
    sum_words = bag_of_words.sum(axis =0 )
    word_freqs = [(word,sum_words[0,idex]) for word,idex in vec.vocabulary_.items()]
    word_freq = sorted(word_freqs,key = lambda x:x[1],reverse=True)
    return word_freq[:n]



common_words = get_top_n_words(data['desc'],20)
common_words
[('the', 1258),
 ('and', 1062),
 ('of', 536),
 ('seattle', 533),
 ('to', 471),
 ('in', 449),
 ('our', 359),
 ('you', 304),
 ('hotel', 295),
 ('with', 280),
 ('is', 271),
 ('at', 231),
 ('from', 224),
 ('for', 216),
 ('your', 186),
 ('or', 161),
 ('center', 151),
 ('are', 136),
 ('downtown', 133),
 ('on', 129)]


df1 = pd.DataFrame(common_words,columns=['desc','counts'])
df1.head()
desc counts
0 seattle 533
1 hotel 295
2 center 151
3 downtown 133
4 free 123



df1.groupby('desc').sum()['counts'].sort_values().iplot(kind='barh',yTitle='counts',linecolor='black',title='top 20 before remove stopwords')

标签:...,酒店,相似,python,sum,words,import,Seattle,desc
From: https://www.cnblogs.com/guodong1789/p/18064537

相关文章

  • python实现批量运行命令行
    python实现批量运行命令行背景:对于不同参数设置来调用同一个接口,如果手动一条条修改再运行非常慢且容易出错。尤其是这次参数非常多且长。比如之前都是输入nohuppython-uexe.py>>../log/exp3.log2>&1&来运行一次,在exe中会设置参数并调用接口运行preditction_uni(input_f......
  • Python scapy模拟dhcp客户端
    安装scapyaptinstall-ypython3-scapy1.发送dhcpdiscover广播报文2.sniff抓包,收到dhcpoffer广播报文3.发送dhcprequest广播报文4.sniff抓包,收到dhcpack广播报文importthreadingfromscapy.allimport*fromscapy.layers.l2importEtherdefsend_dhcp_discover......
  • 群星璀璨的Python江湖,Python第三方库介绍
    如果Python语言是一个江湖,那么python第三方库就是一个个身怀绝技的江湖侠客,这些侠客在这偌大的江湖尽情挥洒着江湖侠气。有人名扬天下,有人默默无闻,有人纵横捭阖,有人黯然谢幕,每天都在上演人生的悲欢离合。那么什么是python第三方库呢?什么是python第三方库python是一门非常流行......
  • python
    ##python在python的学习中,我发现了其许多与c语言的不同之处。例如:`#判断输入整数是否在【0,100】之间num=eval(input("请输入一个整数:"))判断【0,100】ifnum>100ornum<0:print("输入整数小于0或大于100")else:print("输入整数在0到100之间(含)")`#python的输出p......
  • python 新版本flask创建接口方式
    importpymysqlfrompymysql.cursorsimportDictCursorfromflaskimportFlask,requestfromflask.json.providerimportDefaultJSONProviderfromflask_corsimportCORSfromdatetimeimportdatetime,datesqlconnect=pymysql.connect(user="root&q......
  • python字符串截取
    python中字符截取[-1]、[:-1]、[::-1]、[n::-1]等使用方法的详细讲解(建议留存)-@小浩-博客园(cnblogs.com)str(df.loc[df.目的地.isnull(),'路线名'].values)#表示提取出目的地为空的路线名的值并将他变成字符串#上面的提取方式是在只有一行的情况下,如果是多行就要用......
  • Asyncio in Python and Concurrency tasks
    AsyncioLibraryandConcurrencytasksinPythonTheasynciolibraryisaPythonstandardlibrarymoduleusedforwritingsingle-threadedconcurrentcodeusingcoroutines,multiplexingI/Oaccess,andrunningnetworkclientsandservers.Itprovidesafram......
  • 卸载环境所有python包(第三方库)
    打开CMD终端,查看已安装库:piplist一个个删除需要执行:pipuninstall包名;那么如何一次性删除所有的包呢?首先需要执行以下代码: pipfreeze>modules.txt这时候就能够把所有的第三方模块的模块名称以及第三方模块的版本号等等信息保存在了这个modules.txt文件中,之后的操作就是......
  • forward reference in python
    ForwardReferenceinpythonThereisacodesnippetlike:@propertydefanalyses(self)->"AnalysesHubWithDefault":result=self._analysesifresultisNone:raiseValueError("Cannotaccessanalysesthisearlyinproject......
  • 【Python使用】python高级进阶知识md总结第2篇:HTTP 请求报文,HTTP响应报文【附代码文
    python高级进阶全知识知识笔记总结完整教程(附代码资料)主要内容讲述:操作系统,虚拟机软件,Ubuntu操作系统,Linux内核及发行版,查看目录命令,切换目录命令,绝对路径和相对路径,创建、删除文件及目录命令,复制、移动文件及目录命令,终端命令格式的组成,查看命令帮助。HTTP请求报文,HTTP响应报文......