目标网站:https://www.zhihu.com/hot
任务:获取标题,摘录,热度.
1.进入网页,F12-网络,没有信息就刷新一下,获取到Cookie和User-Agent.
2.导入requests模块请求网页,获取到网页源码.
3.分析下网页源码,看看需要的信息都在哪些标签下。
4.编写代码获取标签内容
5.将其存入到列表中
6.写入到xls表格中(写入xls文件时要把zhihus=[]移动到循环外,不然只会写一条)
7.写入成功
优化后的源代码(删除了多余的print):
import requests import xlwt from bs4 import BeautifulSoup url = "https://www.zhihu.com/hot" headers = {'User-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36 Edg/130.0.0.0', 'Cookie': '_zap=99f5c205-e7ef-4daa-a8e3-29932e423cf3; q_c1=4750584755814320bba915a18840a5b3|1711178919000|1647004465000; _xsrf=orXaE1NjOPAlcNLmChwhrWCvTRMjqGR3; __zse_ck=003_b7=jsXkzPa91C/PsE9EITBjHKe/C3rYyrPFzVPps/L2Rl8kcjitGhkhA6rANiU3zuMRz/qCL9VNUzUMx1tghhxbvu6n=NdA24PXRBL+3Xj40; edu_user_uuid=edu-v1|e4e1bfef-1123-4e91-ad1a-6451e41342f7; d_c0=ACDSoWlzrxmPTvI3uQJuwQs89iikaNLQrAk=|1734051035; z_c0=2|1:0|10:1734051039|4:z_c0|80:MS4xY1BoSklnQUFBQUFtQUFBQVlBSlZUZDdTU0dqUkFWYTN4M0VReFIzcC0ydnZGXzM1bWlIcERnPT0=|27ef6bc70a40f5a31170a4b0a550e4017718394160b79236681529831e39e1ad; SESSIONID=LS0v4zfbDRVHsGRCWUV6kHq2hMG36tJZYrcA755V32D; JOID=W1EQBUNC_9MrSUpXDE9-T-NIwjwUDqDgU3wwPW0HrYlgNAwFQ_9GvUNIRlgBxCSQsEXMy8rBRcxglUYH7aS4r4k=; osd=VF4QAUxN8NMvRkVYDEtxQOxIxjMbAaDkXHM_PWkIooZgMAMKTP9CskxHRlwOyyuQtErDxMrFSsNvlUII4qu4q4Y=; tst=h'} response = requests.get(url=url, headers=headers) contents = response.content.decode('UTF-8') # print(contents) bs = BeautifulSoup(contents, 'html.parser') div_tag = bs.find_all('div', class_='HotItem-content') zhihus = [] for div in div_tag: # 标题 title = div.find('h2', class_='HotItem-title').get_text() # 摘录 p_tag = div.find('p', class_='HotItem-excerpt') if p_tag: zhailu = p_tag.get_text() else: zhailu = "" # 热度 redu = div.find('div', class_='HotItem-metrics HotItem-metrics--bottom').get_text() zhihu = { '标题': title, '摘录': zhailu, '热度': redu, } zhihus.append(zhihu) # print(zhihus) # 创建工作簿 workbook = xlwt.Workbook(encoding='utf - 8') # 创建工作表 worksheet = workbook.add_sheet('知乎热榜') # 写入表头 headers = ['标题', '摘录', '热度'] for col, header in enumerate(headers): worksheet.write(0, col, header) # 写入数据 for row, zhihu in enumerate(zhihus, start=1): worksheet.write(row, 0, zhihu['标题']) worksheet.write(row, 1, zhihu['摘录']) worksheet.write(row, 2, zhihu['热度']) # 保存文件 workbook.save('知乎热榜数据.xls')
如有更好的意见请指出
标签:知乎,worksheet,HotItem,BeautifulSoup,zhihus,爬取,headers,div,zhihu From: https://blog.csdn.net/2301_81525789/article/details/144587956