bs4的css选择器提取
运行代码
from bs4 import BeautifulSoup
import requests
from fake_useragent import UserAgent
def bs_extract():
# url地址
url = 'https://www.maoyan.com/films'
# 设置请求头
headers = {'User-Agent': UserAgent().firefox}
# 发送请求
resp = requests.get(url, headers=headers)
with open('maoyan1.txt', 'w') as f:
f.write(resp.text)
with open('maoyan.txt', 'r') as f:
resp = f.read()
# bs4解析响应
soup = BeautifulSoup(resp, 'lxml')
# 提取名称
names = [div.text for div in soup.select('div[class="channel-detail movie-item-title"]>a')]
# 提取评分
scores = [div.text for div in soup.select('div[class="channel-detail channel-detail-orange"]')]
# 打印结果
for n,s in zip(names, scores):
print(f'{n}: {s}')
if __name__ == '__main__':
bs_extract()
运行结果
浴火之路: 9.1
只此青绿: 9.5
749局: 8.6
哈利·波特与魔法石: 暂无评分
志愿军:存亡之战: 9.7
出走的决心: 9.5
野孩子: 9.2
变形金刚:起源: 9.4
熊猫计划: 9.4
里斯本丸沉没: 9.6
名侦探柯南:百万美元的五棱星: 9.1
危机航线: 9.4
异形:夺命舰: 8.7
爆款好人: 9.0
荒野机器人: 9.5
姥姥的外孙: 9.4
哈利·波特与密室: 暂无评分
绑架游戏: 暂无评分
新大头儿子和小头爸爸6:迷你大冒险: 9.3
一雪前耻: 8.8
标签:__,提取,热映,评分,resp,电影,9.4,div,猫眼 From: https://www.cnblogs.com/qyly/p/18454463