这就是b站番剧页面,然后f12进入开发者模式找出url
再观察第二页的url和第一页有何区别,发现page=页数,可以通过这个实现翻页,有不懂的欢迎来问,一起交流,新人感谢支持,也欢迎给出优化方案
参考代码: import requests import json import pymongo mongo_conn = pymongo.MongoClient() for page in range(1,193): headers={ 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36' } url = f'https://api.bilibili.com/pgc/season/index/result?st=1&order=3&season_version=-1&spoken_language_type=-1&area=-1&is_finish=-1©right=-1&season_status=-1&season_month=-1&year=-1&style_id=-1&sort=0&page={page}&season_type=1&pagesize=20&type=1' response = requests.get(url,headers=headers) # print(response.status_code) json_str = response.content.decode() #json反序列化 data = json.loads(json_str) for item in data['data']['list']: print(page,item['title'],item['order']) #存入mongoDB mongo_conn.bilibili.season.insert_one({'page':page,'title':item['title'],'order':item['order']})
链接数据库之前不要忘了开启服务,然后数据展示如下:
后面还可以对数据进行分析与可视化
标签:站番剧,url,season,order,爬取,item,json,数据,page From: https://blog.csdn.net/m0_65088713/article/details/143702626