首页 > 编程语言 >python爬虫------------旅游的地点的爬取和可视化 2

python爬虫------------旅游的地点的爬取和可视化 2

时间:2023-06-10 22:24:15浏览次数:63  
标签:plt python text list 爬虫 ------------ print import csv

随笔 - 2  文章 - 0  评论 - 0  阅读 - 55

python爬虫------------旅游的地点的爬取和可视化 审核中

 

1.选题背景

我国旅游行业的极速发展,因为之前疫情原因,使得国内旅游成为新风潮,由于国内疫情解封,使得中国成为最先开放旅游的国家,

本次项目可视化就是分析国内旅游的数据,分析适合出行旅游的时间与地点信息.

2.设计方案

 1.向目标网络发送请求

   2.获取数据 网页源码

   3.筛选我们需要的数据 网页源代码

   4.筛选数据 获取数据

   5.for循环  获取每一页的数据

   6.提前数据

   出发日期 天数 人均费用 人物 玩法.....

   7.保存数据

   8.多页爬取

   9.可视化分析根据项目主题,设计项目实施方案,包括实现思路与技术难点等

导入所需要的库

 

#筛选数据

import parsel

import csv 

import time

import random

import pandas

import matplotlib.pyplot as plt

from pyecharts import Map

import jieba

from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator

from PIL import Image

from os import path

##数据可视化

import matplotlib.pyplot as plt

 2.数据的抓取

 

3.主题云词图

list_all = []

text = ''

with open('C:/Users/wdsa/Desktop/去哪儿.csv', 'r', encoding='utf8') as file:

    t = file.read()

    file.close()

for i in title_list:

        if type(i) == float:

            pass

        else:

            list_all.append(i)

  txt = " ".join(list_all)

    backgroud_Image = plt.imread('C:/Users/wdsa/Desktop/阳光.jpg')

    print('加载图片成功!')

    w = WordCloud(

        font_path="msyh.ttc",

        width=1000,

        height=800,

        background_color="white",

        stopwords=STOPWORDS,

        max_font_size=150

    )

    w.generate(txt)

    print('开始加载文本')

    img_colors = ImageColorGenerator(backgroud_Image)

    w.recolor(color_func=img_colors)

    plt.imshow(w)

    plt.axis('off')

    plt.show()

    d = path.dirname(__file__)

        # w.to_file(d,"C:/Users/wdsa/Desktop/wordcloud.jpg")!!!!!!!!!!!

    print('生成词云成功!')

 4.爬取浏览量前五的主题

 

6.出行方式云词图

 list_all_1 = []

txt_1 = ''

for j in GO_list:

    if i == 'nan':

        pass

    elif type(j) == float:

        pass

    else:

        list_all_1.append(j)

txt_1 = " ".join(list_all_1)

backgroud_Image = plt.imread('C:/Users/wdsa/Desktop/阳光.jpg')

print('加载图片成功!')

 pose= WordCloud(

        font_path="msyh.ttc",

        width=1000,

        height=800,

        background_color="white",

        stopwords=STOPWORDS,

        max_font_size=150,    )

pose.generate(txt_1)

print('开始加载文本')

 img = ImageColorGenerator(backgroud_Image)

w.recolor(color_func=img)

plt.imshow(pose)

plt.axis('off')

plt.show()

d = path.dirname(__file__)

print('生成词云成功!')

完整代码

import random

import time 

import pandas as pd

import requests

import parsel

import csv

import time

import random

import pandas

import matplotlib.pyplot as plt

from pyecharts import Map

 import jieba

from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator

from PIL import Image

from os import path

import matplotlib.pyplot as plt

csv_qne = open('C:/Users/wdsa/Desktop/去哪儿.csv',"a",encoding = "utf-8",newline = "")

csv_writer = csv.writer(csv_qne)

csv_writer.writerow(['地点','浏览量','短评','日期','人物','天数','人均消费','详情页'])

for i in range(1,5):

    url = f'https://travel.qunar.com/travelbook/list.htm?page={i}&order=hot_heat'

    response = requests.get(url = url )

    print(response)

    data_html = response.text

    data_html = response.text

    selector = parsel.Selector(data_html)

    print(selector)

    for i in url_list:

        detail_id = i.replace('/youji/','')

        datail_url = 'https://travel.qunar.com/travelbook/note/' + detail_id

        response_1 = requests.get(url =datail_url)

        data_html_1= response_1.text

        selector_1 = parsel.Selector(data_html_1)

        title = selector_1.css('.b_crumb_cont *:nth-child(3)::text').get()

        comment = selector_1.css('.title.white::text').get()

        count = selector_1.css('.view_count::text').get()

        data = selector_1.css('#js_mainleft > div.b_foreword > ul > li.f_item.when > p > span.data::text').get()

        days = selector_1.css('#js_mainleft > div.b_foreword > ul > li.f_item.howloog > p > span.data::text').get()

        character = selector_1.css('#js_mainleft > div.b_foreword > ul > li.f_item.howlong > p > span.data::text').get()

        money = selector_1.css('#js_mainleft > div.b_foreword > ul > li.f_item.who > p > span.data::text').get()

        play_list = selector_1.css('#js_mainleft > dix.b_foreword > ul > li.f_item.how > p > span.data::text').get()

        csv_writer.writerow([title,comment,count,data,days,money,data,play_list,datail_url])

        time.sleep(1)

csv_qne.close()

title_list = []

speake = []

happer_day = []

count_list = []

days_list = [

GO_list = []

meony_list = []

url_list_to = []

af= pd.read_csv('C:/Users/wdsa/Desktop/去哪儿.csv') 

for i in af['地点']:

    title_list.append(i)

 for i in af['短评']:

    speake.append(i)

 for i in af['浏览量']:

    count_list.append(i)

 for i in af['日期']:

    days_list.append(i)

 for i in af['天数']:

    happer_day.append(i)

 for i in af['人物']:

    GO_list.append(i)

 for i in af['人均消费']:

    meony_list.append(i)

 for i in af['详情页']:

    url_list_to.append(i)

 df =pd.DataFrame(af)

 list_all = []

text = ''

with open('C:/Users/wdsa/Desktop/去哪儿.csv', 'r', encoding='utf8') as file:

    t = file.read()

    file.close()

   for i in title_list:

        if type(i) == float:

            pass

        else:

            list_all.append(i)

 txt = " ".join(list_all)

    backgroud_Image = plt.imread('C:/Users/wdsa/Desktop/阳光.jpg')

    print('加载图片成功!')

  w = WordCloud(

        font_path="msyh.ttc",

        width=1000,

        height=800,

        background_color="white",

        stopwords=STOPWORDS,

       max_font_size=150,

)

    w.generate(txt)

    print('开始加载文本')

    img_colors = ImageColorGenerator(backgroud_Image)

    w.recolor(color_func=img_colors)

    plt.imshow(w)

    plt.axis('off')

    plt.show()

    d = path.dirname(__file__)

    print('生成词云成功!')

plt.rcParams['font.sans-serif'] = ['SimHei']

plt.rcParams['axes.unicode_minus'] = False

plt.figure(figsize=(17, 15))

bar_width= 0.25 

plt.bar(title_list[:20:4],

        count_list[:5],

        bar_width,

        align="center",

        color="red",

        label="unpurchased",

        alpha=0.5)

plt.show()

plt.figure(figsize=(17, 15))

plt.plot(title_list[:20:4],

         count_list[:5],

          color="red",

            label='浏览量',

         marker='*',

         )

plt.show()

list_all_1 = []

txt_1 = ''

for j in GO_list:

    if i == 'nan':

        pass

    elif type(j) == float:

        pass

    else:

        list_all_1.append(j)

txt_1 = " ".join(list_all_1) 

backgroud_Image = plt.imread('C:/Users/wdsa/Desktop/阳光.jpg')

print('加载图片成功!')

pose= WordCloud(

        font_path="msyh.ttc",

        width=1000,

        height=800,

        background_color="white",

        stopwords=STOPWORDS,

        max_font_size=150,

    )

pose.generate(txt_1)

print('开始加载文本')

img = ImageColorGenerator(backgroud_Image)

w.recolor(color_func=img)

plt.imshow(pose)

plt.axis('off')

plt.show()

d = path.dirname(__file__)

        # w.to_file(d,"C:/Users/wdsa/Desktop/wordcloud.jpg")!!!!!!!!!!!

print('生成词云成功!')

 

 

 

 

 

总结

综上所有数据可知,我们用去哪儿网对于国内旅游城市进行了一定的分析以及排名,让人们出游有更加合理的选择,更体现国内疫情解封后每个城市旅行的情况

 

标签:plt,python,text,list,爬虫,------------,print,import,csv
From: https://www.cnblogs.com/lukunting/p/17472076.html

相关文章

  • 4.2学习总结
    <%--CreatedbyIntelliJIDEA.User:绿波亭Date:2023/5/29Time:14:51TochangethistemplateuseFile|Settings|FileTemplates.--%><%@pagecontentType="text/html;charset=UTF-8"language="java"%><!DOCTYPEhtml&g......
  • ObjectARX 2014 项目升级到高版本vs2017出现提示平台集v141未安装
    ARX2014项目升级到vs2017的时候提示平台集未安装。解决方式:在vcproj文件中,添加相应的平台集。v141类似截图......
  • 4.7学习总结
    关于post传值乱码问题 由于get传值跟在网址后,post在特定的区域需要字符编译(大概意思,不准确),需要对编码进行设置request.setCharacterEncoding("UTF-8");......
  • 4.5学习总结
    后端登录<%@pageimport="wangzhan.Thesql"%><%@pageimport="com.mysql.cj.Session"%><%@pagelanguage="java"contentType="text/html;charset=UTF-8"pageEncoding="UTF-8"%><!DOCTYPEh......
  • 4.9学习总结
    androidstdio如何迁移device 1.在C:\Users\(名字)\.android\avd文件中找到虚拟机文件复制且删除(由于我已经迁移完毕,只剩下一个.ini文件)  类似于这样的虚拟机文件。  2.将它复制或迁移到你希望的文件夹,并且复制下你的路径。 3.打开.ini文件,将路径粘贴上去。完......
  • 4.8学习总结
    数据库查询功能 1--基本格式2SELECT[ALL|DISTINCT]<目标列表达式>[,<目标列表达式>]…FROM<表名或视图名>[,<表名或视图名>]…|(SELECT语句)3[AS]<别名>4[WHERE<条件表达式>]5[GROUPBY<列名1>[HAVING<条件表达式>]]6[ORDERBY<列名2>......
  • 华为交换机配置DHCP Snooping
    局域网中一般用核心交换机做DHCP,有时某个员工私自接入wifi并且把网线插到lan口,wifi自带dhcp功能,导致内网混乱,dhcpsnooping主要用来限制这些非法dhcp。1.在有DHCP服务的交换机上配置如下:dhcpsnoopingenableipv4  #使能全局DHCPSnooping功能vlan10       ......
  • spdlog使用头文件或dll的宏设置
    宏作用SPDLOG_HEADER_ONLYspdlog只使用头文件FMT_HEADER_ONLYfmtlib只用用头文件SPDLOG_COMPILED_LIBspdlog使用dllSPDLOG_SHARED_LIBspdlog使用dll,实际可以不设置,单独设置SPDLOG_COMPILED_LIB就可以SPDLOG_ACTIVE_LEVEL=SPDLOG_LEVEL_TRACESPDLOG_TRAC......
  • SpringCloud项目中实现服务降级
    服务降级描述服务降级是服务自我保护的一种方式,或者保护下游服务的一种方式,用于确保服务不会受请求突增影响变得不可用,确保服务不会崩溃服务降级虽然会导致请求失败,但是不会导致阻塞。实现思路服务A使用Feign远程调用服务B。当服务A的访问量过大,服务B已无法支持服务A的调用,......
  • 4、第一次构建项目报错处理
     翻译如下:编译错误此项目包含Java编译错误,可能导致自定义视图呈现失败。先修复编译问题。解决方案如下:找到File->InvalidateCaches/Restart清除缓存及重启Studio 点击InvalidateCaches/Restart清除缓存及重启Studio 解决了哦,没有报错了 ......