首页 > 编程语言 >python爬虫------------旅游的地点的爬取和可视化

python爬虫------------旅游的地点的爬取和可视化

时间:2023-06-10 22:26:02浏览次数:53  
标签:plt python text list 爬虫 ------------ print import csv

1.选题背景

我国旅游行业的极速发展,因为之前疫情原因,使得国内旅游成为新风潮,由于国内疫情解封,使得中国成为最先开放旅游的国家,

本次项目可视化就是分析国内旅游的数据,分析适合出行旅游的时间与地点信息.

2.设计方案

 1.向目标网络发送请求

   2.获取数据 网页源码

   3.筛选我们需要的数据 网页源代码

   4.筛选数据 获取数据

   5.for循环  获取每一页的数据

   6.提前数据

   出发日期 天数 人均费用 人物 玩法.....

   7.保存数据

   8.多页爬取

   9.可视化分析根据项目主题,设计项目实施方案,包括实现思路与技术难点等

导入所需要的库

 

#筛选数据

import parsel

import csv 

import time

import random

import pandas

import matplotlib.pyplot as plt

from pyecharts import Map

import jieba

from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator

from PIL import Image

from os import path

##数据可视化

import matplotlib.pyplot as plt

 2.数据的抓取

 

3.主题云词图

list_all = []

text = ''

with open('C:/Users/wdsa/Desktop/去哪儿.csv', 'r', encoding='utf8') as file:

    t = file.read()

    file.close()

for i in title_list:

        if type(i) == float:

            pass

        else:

            list_all.append(i)

  txt = " ".join(list_all)

    backgroud_Image = plt.imread('C:/Users/wdsa/Desktop/阳光.jpg')

    print('加载图片成功!')

    w = WordCloud(

        font_path="msyh.ttc",

        width=1000,

        height=800,

        background_color="white",

        stopwords=STOPWORDS,

        max_font_size=150

    )

    w.generate(txt)

    print('开始加载文本')

    img_colors = ImageColorGenerator(backgroud_Image)

    w.recolor(color_func=img_colors)

    plt.imshow(w)

    plt.axis('off')

    plt.show()

    d = path.dirname(__file__)

        # w.to_file(d,"C:/Users/wdsa/Desktop/wordcloud.jpg")!!!!!!!!!!!

    print('生成词云成功!')

 4.爬取浏览量前五的主题

 

6.出行方式云词图

 list_all_1 = []

txt_1 = ''

for j in GO_list:

    if i == 'nan':

        pass

    elif type(j) == float:

        pass

    else:

        list_all_1.append(j)

txt_1 = " ".join(list_all_1)

backgroud_Image = plt.imread('C:/Users/wdsa/Desktop/阳光.jpg')

print('加载图片成功!')

 pose= WordCloud(

        font_path="msyh.ttc",

        width=1000,

        height=800,

        background_color="white",

        stopwords=STOPWORDS,

        max_font_size=150,    )

pose.generate(txt_1)

print('开始加载文本')

 img = ImageColorGenerator(backgroud_Image)

w.recolor(color_func=img)

plt.imshow(pose)

plt.axis('off')

plt.show()

d = path.dirname(__file__)

print('生成词云成功!')

完整代码

import random

import time 

import pandas as pd

import requests

import parsel

import csv

import time

import random

import pandas

import matplotlib.pyplot as plt

from pyecharts import Map

 import jieba

from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator

from PIL import Image

from os import path

import matplotlib.pyplot as plt

csv_qne = open('C:/Users/wdsa/Desktop/去哪儿.csv',"a",encoding = "utf-8",newline = "")

csv_writer = csv.writer(csv_qne)

csv_writer.writerow(['地点','浏览量','短评','日期','人物','天数','人均消费','详情页'])

for i in range(1,5):

    url = f'https://travel.qunar.com/travelbook/list.htm?page={i}&order=hot_heat'

    response = requests.get(url = url )

    print(response)

    data_html = response.text

    data_html = response.text

    selector = parsel.Selector(data_html)

    print(selector)

    for i in url_list:

        detail_id = i.replace('/youji/','')

        datail_url = 'https://travel.qunar.com/travelbook/note/' + detail_id

        response_1 = requests.get(url =datail_url)

        data_html_1= response_1.text

        selector_1 = parsel.Selector(data_html_1)

        title = selector_1.css('.b_crumb_cont *:nth-child(3)::text').get()

        comment = selector_1.css('.title.white::text').get()

        count = selector_1.css('.view_count::text').get()

        data = selector_1.css('#js_mainleft > div.b_foreword > ul > li.f_item.when > p > span.data::text').get()

        days = selector_1.css('#js_mainleft > div.b_foreword > ul > li.f_item.howloog > p > span.data::text').get()

        character = selector_1.css('#js_mainleft > div.b_foreword > ul > li.f_item.howlong > p > span.data::text').get()

        money = selector_1.css('#js_mainleft > div.b_foreword > ul > li.f_item.who > p > span.data::text').get()

        play_list = selector_1.css('#js_mainleft > dix.b_foreword > ul > li.f_item.how > p > span.data::text').get()

        csv_writer.writerow([title,comment,count,data,days,money,data,play_list,datail_url])

        time.sleep(1)

csv_qne.close()

title_list = []

speake = []

happer_day = []

count_list = []

days_list = [

GO_list = []

meony_list = []

url_list_to = []

af= pd.read_csv('C:/Users/wdsa/Desktop/去哪儿.csv') 

for i in af['地点']:

    title_list.append(i)

 for i in af['短评']:

    speake.append(i)

 for i in af['浏览量']:

    count_list.append(i)

 for i in af['日期']:

    days_list.append(i)

 for i in af['天数']:

    happer_day.append(i)

 for i in af['人物']:

    GO_list.append(i)

 for i in af['人均消费']:

    meony_list.append(i)

 for i in af['详情页']:

    url_list_to.append(i)

 df =pd.DataFrame(af)

 list_all = []

text = ''

with open('C:/Users/wdsa/Desktop/去哪儿.csv', 'r', encoding='utf8') as file:

    t = file.read()

    file.close()

   for i in title_list:

        if type(i) == float:

            pass

        else:

            list_all.append(i)

 txt = " ".join(list_all)

    backgroud_Image = plt.imread('C:/Users/wdsa/Desktop/阳光.jpg')

    print('加载图片成功!')

  w = WordCloud(

        font_path="msyh.ttc",

        width=1000,

        height=800,

        background_color="white",

        stopwords=STOPWORDS,

       max_font_size=150,

)

    w.generate(txt)

    print('开始加载文本')

    img_colors = ImageColorGenerator(backgroud_Image)

    w.recolor(color_func=img_colors)

    plt.imshow(w)

    plt.axis('off')

    plt.show()

    d = path.dirname(__file__)

    print('生成词云成功!')

plt.rcParams['font.sans-serif'] = ['SimHei']

plt.rcParams['axes.unicode_minus'] = False

plt.figure(figsize=(17, 15))

bar_width= 0.25 

plt.bar(title_list[:20:4],

        count_list[:5],

        bar_width,

        align="center",

        color="red",

        label="unpurchased",

        alpha=0.5)

plt.show()

plt.figure(figsize=(17, 15))

plt.plot(title_list[:20:4],

         count_list[:5],

          color="red",

            label='浏览量',

         marker='*',

         )

plt.show()

list_all_1 = []

txt_1 = ''

for j in GO_list:

    if i == 'nan':

        pass

    elif type(j) == float:

        pass

    else:

        list_all_1.append(j)

txt_1 = " ".join(list_all_1) 

backgroud_Image = plt.imread('C:/Users/wdsa/Desktop/阳光.jpg')

print('加载图片成功!')

pose= WordCloud(

        font_path="msyh.ttc",

        width=1000,

        height=800,

        background_color="white",

        stopwords=STOPWORDS,

        max_font_size=150,

    )

pose.generate(txt_1)

print('开始加载文本')

img = ImageColorGenerator(backgroud_Image)

w.recolor(color_func=img)

plt.imshow(pose)

plt.axis('off')

plt.show()

d = path.dirname(__file__)

        # w.to_file(d,"C:/Users/wdsa/Desktop/wordcloud.jpg")!!!!!!!!!!!

print('生成词云成功!')

 

 

 

 

 

总结

综上所有数据可知,我们用去哪儿网对于国内旅游城市进行了一定的分析以及排名,让人们出游有更加合理的选择,更体现国内疫情解封后每个城市旅行的情况

 

标签:plt,python,text,list,爬虫,------------,print,import,csv
From: https://www.cnblogs.com/lukunting/p/17472064.html

相关文章

  • 矩阵乘法与动态 DP 入门
    矩阵乘法及广义矩阵乘法前置知识:矩阵相关基础概念。记\(A(i,j)\)表示矩阵\(A\)的第\(i\)行第\(j\)列,\(n_A\)为\(A\)的行数,\(m_A\)为\(A\)的列数。定义矩阵加法\(A+B\)为(\(n_A=n_B,m_A=m_B\)):\[\\\\\[A+B](i,j)=A(i,j)+B(i,j)\]矩阵加法有交换律,结合......
  • SpringCloud项目中实现服务降级
    服务降级描述服务降级是服务自我保护的一种方式,或者保护下游服务的一种方式,用于确保服务不会受请求突增影响变得不可用,确保服务不会崩溃服务降级虽然会导致请求失败,但是不会导致阻塞。实现思路服务A使用Feign远程调用服务B。当服务A的访问量过大,服务B已无法支持服务......
  • 4.1学习总结
    HTML 全局属性New :HTML5新属性。属性描述accesskey设置访问元素的键盘快捷键。class规定元素的类名(classname)contenteditableNew规定是否可编辑元素的内容。contextmenuNew指定一个元素的上下文菜单。当用户右击该元素,出现上下文菜单data-*New用于存......
  • jmeter005:察看结果树之以(txt、css、html、json)格式查看结果
     txt:这里就不用说了,已txt文件展示,形式比较单一,但也是用的比较多的 css:css取样测试其实与txt也差不多,区别就是比txt多了“选择器”筛选 html:html有三种模式,(HTML以基本的界面形式展示数据)、(HTMLSourceFormatted会下载图像来展示)、(HTMLSourceformatted:如果选择了HTML......
  • 4.3学习总结
    加密//Copyright(c)2006DamienMiller<[email protected]>////Permissiontouse,copy,modify,anddistributethissoftwareforany//purposewithorwithoutfeeisherebygranted,providedthattheabove//copyrightnoticeandthispermissionnotice......
  • Luogu P3167 [CQOI2014]通配符匹配
    [CQOI2014]通配符匹配题目描述几乎所有操作系统的命令行界面(CLI)中都支持文件名的通配符匹配以方便用户。最常见的通配符有两个,一个是星号(”“'),可以匹配0个及以上的任意字符:另一个是问号(”?“),可以匹配恰好一个任意字符。现在需要你编写一个程序,对于给定的文件名列表和一个包......
  • python爬虫------------旅游的地点的爬取和可视化 2
    随笔-2  文章-0  评论-0  阅读- 55python爬虫------------旅游的地点的爬取和可视化  1.选题背景我国旅游行业的极速发展,因为之前疫情原因,使得国内旅游成为新风潮,由于国内疫情解封,使得中国成为最先开放旅游的国家,本次项目可视化就是分析国内旅游......
  • 4.2学习总结
    <%--CreatedbyIntelliJIDEA.User:绿波亭Date:2023/5/29Time:14:51TochangethistemplateuseFile|Settings|FileTemplates.--%><%@pagecontentType="text/html;charset=UTF-8"language="java"%><!DOCTYPEhtml&g......
  • ObjectARX 2014 项目升级到高版本vs2017出现提示平台集v141未安装
    ARX2014项目升级到vs2017的时候提示平台集未安装。解决方式:在vcproj文件中,添加相应的平台集。v141类似截图......
  • 4.7学习总结
    关于post传值乱码问题 由于get传值跟在网址后,post在特定的区域需要字符编译(大概意思,不准确),需要对编码进行设置request.setCharacterEncoding("UTF-8");......