本文借助爬虫来分析哪些技术正在快速发展,哪些问题在开发者中引起广泛讨论,从而为学习和研究提供重要参考。
使用python爬虫分析最新技术趋势
- 一、爬取目标
- 二、代码环境
- 2.1 编程语言
- 2.2 三方库
- 2.3 环境配置
- 三、代码实战
- 3.1 接口分析
- 3.2 接口参数分析
- 接口地址
- 请求方法
- 描述
- 请求参数
- 响应
- 示例请求
- 示例响应
- 错误处理
- 3.3 爬虫代码编写
- 3.4 代理ip模式爬取
一、爬取目标
在当今快速变化的技术环境中,获取最新的行业信息和技术趋势至关重要。国内主要IT技术社区汇聚了大量的技术文章和开发者分享的经验。我们的目标是抓取热门文章数据,以便获取当前技术社区中最受关注的内容和趋势。这些数据将帮助我们了解哪些技术正在快速发展,哪些问题在开发者中引起广泛讨论,从而为学习和研究提供重要参考。
文本涉及的思路和代码仅用于学习和研究之用,并遵循相关网站的使用协议。
二、代码环境
以下是本文相关的代码环境依赖和配置。
2.1 编程语言
- Python:选择Python作为编程语言是因为其强大的库支持和简洁的语法,特别适用于数据抓取和处理任务。具体的安装配置方法可参考我另一篇文章:《【Python】Python下载安装新手教程:适用于Windows、macOS和Linux多平台的python环境搭建保姆级全攻略》。
2.2 三方库
- Requests:用于发送HTTP请求,获取网页内容。它简单易用,支持各种HTTP方法,是Python中进行网络请求的首选库。
- BeautifulSoup:用于解析HTML和XML文档,提取网页中的数据。BeautifulSoup提供了多种解析器,能够轻松处理复杂的HTML结构。
- lxml:一个高性能的XML和HTML解析库,基于libxml2和libxslt库,可以高效地处理大型文档,适用于需求复杂的网页信息提取。
可以使用以下命令安装所需的Python库:
pip install requests beautifulsoup4 pandas
2.3 环境配置
- Python版本:推荐使用Python 3.7及以上版本,以确保兼容性和性能优化。
- IDE设置:建议使用PyCharm或者VSCode等集成开发环境,这些工具提供了良好的代码编辑、调试和运行支持。
- PyCharm:提供强大的代码补全、调试和项目管理功能,非常适合大型项目开发。
- VSCode:轻量级的编辑器,具有丰富的插件支持,可以根据需要灵活配置。
三、代码实战
3.1 接口分析
使用Chrome浏览器访问热门文章页,按下F12
键调出开发者工具,如下图切换到网络
功能,然后选择Fetch/XHR
。
上述准备准备工作完成之后,再次刷新该网页。如下图所示,可以看到有一个名为hot-rank
的请求。
3.2 接口参数分析
点击hot-rank
请求,可以看到该请求的详细信息。
如上图所示,可以看到完整的请求地址为https://blog.csdn.net/phoenix/web/blog/hot-rank?page=0&pageSize=25&type=
。
如上图所示,切换到响应
页,可以看到该接口的响应结果,如下所示,可以看到一共有25条热榜文章的信息,主要包括当前热榜周期,热榜热度,作者昵称,作者用户名,文章地址,阅读点赞评论数量等相关信息。
{
"code": 200,
"message": "success",
"traceId": "8d606612-0994-479f-96a0-2629e91fb57d",
"data": [
{
"period": "2024-10-29-12",
"hotRankScore": "21673",
"pcHotRankScore": "2.2w",
"loginUserIsFollow": false,
"nickName": "平凡程序猿~",
"avatarUrl": "https://profile-avatar.csdnimg.cn/2ee6469786394599b70d5357c55e4e4a_2302_81410974.jpg!1",
"userName": "2302_81410974",
"articleTitle": "深入探索:深度学习在时间序列预测中的强大应用与实现",
"articleDetailUrl": "https://blog.csdn.net/2302_81410974/article/details/143285977",
"commentCount": "75",
"favorCount": "90",
"viewCount": "1151",
"hotComment": null,
"picList": [
"https://i-blog.csdnimg.cn/direct/b229b47116c7421b9487dc4d9640b8d8.jpeg"
],
"isNew": null,
"productId": "143285977",
"productType": "blog",
"recommendType": "ali",
"report_data": null
},
{
"period": "2024-10-29-12",
"hotRankScore": "19924",
"pcHotRankScore": "2.0w",
"loginUserIsFollow": true,
"nickName": "诡异森林。",
"avatarUrl": "https://profile-avatar.csdnimg.cn/1071f5d6f89d42d6ae79f0b04ab9ac4d_m0_74068921.jpg!1",
"userName": "m0_74068921",
"articleTitle": "Docker:技术架构的演进之路",
"articleDetailUrl": "https://blog.csdn.net/m0_74068921/article/details/143252466",
"commentCount": "64",
"favorCount": "92",
"viewCount": "1914",
"hotComment": null,
"picList": [
"https://i-blog.csdnimg.cn/direct/a473a9ccc3444e92b4bd76e3ab238c54.png"
],
"isNew": null,
"productId": "143252466",
"productType": "blog",
"recommendType": "ali",
"report_data": null
},
{
"period": "2024-10-29-12",
"hotRankScore": "19525",
"pcHotRankScore": "2.0w",
"loginUserIsFollow": true,
"nickName": "程序猿进阶",
"avatarUrl": "https://profile-avatar.csdnimg.cn/6f2e9637836e45a884cab077791f1478_zhengzhaoyang122.jpg!1",
"userName": "zhengzhaoyang122",
"articleTitle": "web3.0 开发实践",
"articleDetailUrl": "https://blog.csdn.net/zhengzhaoyang122/article/details/143272424",
"commentCount": "67",
"favorCount": "71",
"viewCount": "2063",
"hotComment": null,
"picList": [
"https://i-blog.csdnimg.cn/direct/2e02bd7403b643048df326467be6dc30.png"
],
"isNew": null,
"productId": "143272424",
"productType": "blog",
"recommendType": "ali",
"report_data": null
},
{
"period": "2024-10-29-12",
"hotRankScore": "18719",
"pcHotRankScore": "1.9w",
"loginUserIsFollow": false,
"nickName": "nantangyuxi",
"avatarUrl": "https://profile-avatar.csdnimg.cn/bcce2794bbfb446180c3734639950b78_xiaoxingkongyuxi.jpg!1",
"userName": "xiaoxingkongyuxi",
"articleTitle": "MATLAB实现基于CNN-BiLSTM卷积双向长短期记忆神经网络的时间序列预测",
"articleDetailUrl": "https://blog.csdn.net/xiaoxingkongyuxi/article/details/143263563",
"commentCount": "1",
"favorCount": "22",
"viewCount": "801",
"hotComment": null,
"picList": [
"https://i-blog.csdnimg.cn/direct/5f55baeb85aa4c2b93c19fdd7ec1ea7d.jpeg"
],
"isNew": null,
"productId": "143263563",
"productType": "blog",
"recommendType": "ali",
"report_data": null
},
{
"period": "2024-10-29-12",
"hotRankScore": "17018",
"pcHotRankScore": "1.7w",
"loginUserIsFollow": false,
"nickName": "虞书欣的6",
"avatarUrl": "https://profile-avatar.csdnimg.cn/8f9e3e31879a4c1f93bc7caca17f3ffc_cxh666888_.jpg!1",
"userName": "cxh666888_",
"articleTitle": "Python小游戏14——雷霆战机",
"articleDetailUrl": "https://blog.csdn.net/cxh666888_/article/details/143270011",
"commentCount": "2",
"favorCount": "32",
"viewCount": "1959",
"hotComment": null,
"picList": [
"/i/ll/?i=adc77680bd174e358868931dd9720f49.jpg"
],
"isNew": null,
"productId": "143270011",
"productType": "blog",
"recommendType": "ali",
"report_data": null
},
{
"period": "2024-10-29-12",
"hotRankScore": "14736",
"pcHotRankScore": "1.5w",
"loginUserIsFollow": true,
"nickName": "小ᶻZ࿆",
"avatarUrl": "https://profile-avatar.csdnimg.cn/fcce75c8147f48338d288437f2ce7f8e_2201_75539691.jpg!1",
"userName": "2201_75539691",
"articleTitle": "【AIGC】从CoT到BoT:AGI推理能力提升24%的技术变革如何驱动ChatGPT未来发展",
"articleDetailUrl": "https://blog.csdn.net/2201_75539691/article/details/143277081",
"commentCount": "110",
"favorCount": "100",
"viewCount": "1286",
"hotComment": null,
"picList": [
"https://i-blog.csdnimg.cn/direct/392caf61274b4c2b87aec87b7cec5b09.jpeg"
],
"isNew": null,
"productId": "143277081",
"productType": "blog",
"recommendType": "ali",
"report_data": null
},
{
"period": "2024-10-29-12",
"hotRankScore": "12787",
"pcHotRankScore": "1.3w",
"loginUserIsFollow": true,
"nickName": "2的n次方_",
"avatarUrl": "https://profile-avatar.csdnimg.cn/06acfd433ab0467f918983afeee463dd_2202_76097976.jpg!1",
"userName": "2202_76097976",
"articleTitle": "【Spring MVC】请求参数的传递",
"articleDetailUrl": "https://blog.csdn.net/2202_76097976/article/details/143088438",
"commentCount": "84",
"favorCount": "80",
"viewCount": "1353",
"hotComment": null,
"picList": [
"https://i-blog.csdnimg.cn/direct/ec823407c72f4d1eba176532fa523a60.png"
],
"isNew": null,
"productId": "143088438",
"productType": "blog",
"recommendType": "ali",
"report_data": null
},
{
"period": "2024-10-29-12",
"hotRankScore": "12517",
"pcHotRankScore": "1.3w",
"loginUserIsFollow": true,
"nickName": "程序边界",
"avatarUrl": "https://profile-avatar.csdnimg.cn/ea275d892df44c5fb722c5756f8ba98b_qq_32682301.jpg!1",
"userName": "qq_32682301",
"articleTitle": "AIGC时代的数据盛宴:R语言引领数据分析新风尚",
"articleDetailUrl": "https://blog.csdn.net/qq_32682301/article/details/143290891",
"commentCount": "40",
"favorCount": "56",
"viewCount": "738",
"hotComment": null,
"picList": [
"/i/ll/?i=img_convert/1556a4ddabef4ac5b908fba2a47d3eee.png"
],
"isNew": null,
"productId": "143290891",
"productType": "blog",
"recommendType": "ali",
"report_data": null
},
{
"period": "2024-10-29-12",
"hotRankScore": "12246",
"pcHotRankScore": "1.2w",
"loginUserIsFollow": true,
"nickName": "小林熬夜学编程",
"avatarUrl": "https://profile-avatar.csdnimg.cn/54f236b0001841bd8615c85217b2ec93_2201_75584283.jpg!1",
"userName": "2201_75584283",
"articleTitle": "【Linux系统编程】第三十八弹---信号世界探索:从生活到技术的全面解析",
"articleDetailUrl": "https://blog.csdn.net/2201_75584283/article/details/142580264",
"commentCount": "91",
"favorCount": "104",
"viewCount": "592",
"hotComment": null,
"picList": [
"https://i-blog.csdnimg.cn/direct/9f6ddbb1af0440a09d928581b47a29f2.png"
],
"isNew": null,
"productId": "142580264",
"productType": "blog",
"recommendType": "ali",
"report_data": null
},
{
"period": "2024-10-29-12",
"hotRankScore": "12180",
"pcHotRankScore": "1.2w",
"loginUserIsFollow": false,
"nickName": "小码农叔叔",
"avatarUrl": "https://profile-avatar.csdnimg.cn/2a38fa4448434cf9be6818697cdaf229_zhangcongyi420.jpg!1",
"userName": "zhangcongyi420",
"articleTitle": "【大数据】Flink + Kafka 实现通用流式数据处理详解",
"articleDetailUrl": "https://blog.csdn.net/zhangcongyi420/article/details/143027849",
"commentCount": "146",
"favorCount": "88",
"viewCount": "2092",
"hotComment": null,
"picList": [
"https://i-blog.csdnimg.cn/direct/10919f87476e4db59d559260198a1925.png"
],
"isNew": null,
"productId": "143027849",
"productType": "blog",
"recommendType": "ali",
"report_data": null
},
{
"period": "2024-10-29-12",
"hotRankScore": "12115",
"pcHotRankScore": "1.2w",
"loginUserIsFollow": true,
"nickName": "Kwan的解忧杂货铺@新空间代码工作室",
"avatarUrl": "https://profile-avatar.csdnimg.cn/7b22c422c2df41c9aa22ff208e9cb96d_qyj19920704.jpg!1",
"userName": "qyj19920704",
"articleTitle": "本地Docker部署开源WAF雷池并实现异地远程登录管理界面",
"articleDetailUrl": "https://blog.csdn.net/qyj19920704/article/details/143309113",
"commentCount": "85",
"favorCount": "64",
"viewCount": "931",
"hotComment": null,
"picList": [
"https://i-blog.csdnimg.cn/direct/78b08acbdf86436f868d0b827fc038ca.jpeg"
],
"isNew": null,
"productId": "143309113",
"productType": "blog",
"recommendType": "ali",
"report_data": null
},
{
"period": "2024-10-29-12",
"hotRankScore": "11432",
"pcHotRankScore": "1.1w",
"loginUserIsFollow": false,
"nickName": "Mr.Winter`",
"avatarUrl": "https://profile-avatar.csdnimg.cn/055065db1f8b4d4bb2b90751efcd4e07_frigidwinter.jpg!1",
"userName": "FRIGIDWINTER",
"articleTitle": "轨迹规划 | 基于差速运动学的有模型PID算法(附ROS C++仿真)",
"articleDetailUrl": "https://blog.csdn.net/FRIGIDWINTER/article/details/143258906",
"commentCount": "28",
"favorCount": "45",
"viewCount": "1122",
"hotComment": null,
"picList": [
"https://i-blog.csdnimg.cn/direct/8baf090b363a46b982678d5dee8fd3c5.gif"
],
"isNew": null,
"productId": "143258906",
"productType": "blog",
"recommendType": "ali",
"report_data": null
},
{
"period": "2024-10-29-12",
"hotRankScore": "10661",
"pcHotRankScore": "1.1w",
"loginUserIsFollow": false,
"nickName": "凯子坚持C",
"avatarUrl": "https://profile-avatar.csdnimg.cn/125c0a68206144a6a3869aeddf9a9cbc_2301_80863610.jpg!1",
"userName": "2301_80863610",
"articleTitle": "AI创作者与人类创作者的协作模式",
"articleDetailUrl": "https://blog.csdn.net/2301_80863610/article/details/143305703",
"commentCount": "83",
"favorCount": "78",
"viewCount": "1109",
"hotComment": null,
"picList": [
"https://i-blog.csdnimg.cn/direct/ce0c12726e154ac08a26f62929d6c837.gif"
],
"isNew": null,
"productId": "143305703",
"productType": "blog",
"recommendType": "ali",
"report_data": null
},
{
"period": "2024-10-29-12",
"hotRankScore": "9709",
"pcHotRankScore": "1.0w",
"loginUserIsFollow": false,
"nickName": "CoderJia_",
"avatarUrl": "https://profile-avatar.csdnimg.cn/3de6e5c240924736b5dda23e0d74cbd3_u014390502.jpg!1",
"userName": "u014390502",
"articleTitle": "重学SpringBoot3-Spring WebFlux之SSE服务器发送事件",
"articleDetailUrl": "https://blog.csdn.net/u014390502/article/details/143275309",
"commentCount": "42",
"favorCount": "60",
"viewCount": "1601",
"hotComment": null,
"picList": [
"https://i-blog.csdnimg.cn/direct/b41deddffd8d416fbb67e296f0143932.webp"
],
"isNew": null,
"productId": "143275309",
"productType": "blog",
"recommendType": "ali",
"report_data": null
},
{
"period": "2024-10-29-12",
"hotRankScore": "9559",
"pcHotRankScore": "1.0w",
"loginUserIsFollow": false,
"nickName": "颜淡慕潇",
"avatarUrl": "https://profile-avatar.csdnimg.cn/2febcd43c21442509a2bf41fa63d7b67_weixin_36755535.jpg!1",
"userName": "weixin_36755535",
"articleTitle": "【K8S系列】Kubernetes 中 Service IP 地址和端口不匹配问题及解决方案【已解决】",
"articleDetailUrl": "https://blog.csdn.net/weixin_36755535/article/details/143271989",
"commentCount": "39",
"favorCount": "30",
"viewCount": "1898",
"hotComment": null,
"picList": [
"https://i-blog.csdnimg.cn/direct/d49227ce7645436aad653e460cfbdd86.png"
],
"isNew": null,
"productId": "143271989",
"productType": "blog",
"recommendType": "ali",
"report_data": null
},
{
"period": "2024-10-29-12",
"hotRankScore": "8973",
"pcHotRankScore": "8973",
"loginUserIsFollow": false,
"nickName": "DARLING Zero two♡",
"avatarUrl": "https://profile-avatar.csdnimg.cn/00d1491d64c44cdbbd15500178786292_zero_vpn.jpg!1",
"userName": "Zero_VPN",
"articleTitle": "关于我、重生到500年前凭借C语言改变世界科技vlog.11——深入理解指针(1)",
"articleDetailUrl": "https://blog.csdn.net/Zero_VPN/article/details/143288823",
"commentCount": "94",
"favorCount": "55",
"viewCount": "1246",
"hotComment": null,
"picList": [
"https://i-blog.csdnimg.cn/direct/3d6ea5738c17443e9969873cf82c5e8b.png"
],
"isNew": null,
"productId": "143288823",
"productType": "blog",
"recommendType": "ali",
"report_data": null
},
{
"period": "2024-10-29-12",
"hotRankScore": "8843",
"pcHotRankScore": "8843",
"loginUserIsFollow": true,
"nickName": "Jupiter·",
"avatarUrl": "https://profile-avatar.csdnimg.cn/036f6cff17d74f91abb8a599504fe014_2301_77509762.jpg!1",
"userName": "2301_77509762",
"articleTitle": "深入理解数据链路层:以太网帧格式、MAC地址、交换机、MTU及ARP协议详解与ARP欺骗探究",
"articleDetailUrl": "https://blog.csdn.net/2301_77509762/article/details/142486419",
"commentCount": "28",
"favorCount": "41",
"viewCount": "588",
"hotComment": null,
"picList": [
"https://i-blog.csdnimg.cn/direct/e0c2599816294c55a3aca8018727b542.png"
],
"isNew": null,
"productId": "142486419",
"productType": "blog",
"recommendType": "ali",
"report_data": null
},
{
"period": "2024-10-29-12",
"hotRankScore": "8700",
"pcHotRankScore": "8700",
"loginUserIsFollow": true,
"nickName": "小强在此",
"avatarUrl": "https://profile-avatar.csdnimg.cn/b1d9c6e6380a41898d74591dc3406e7c_zcy_c.jpg!1",
"userName": "ZCY_c",
"articleTitle": "机器学习【学校智慧食堂及其应用】",
"articleDetailUrl": "https://blog.csdn.net/ZCY_c/article/details/143310696",
"commentCount": "45",
"favorCount": "47",
"viewCount": "681",
"hotComment": null,
"picList": [
"https://i-blog.csdnimg.cn/direct/5301d509f3154664944736e615200d97.jpeg"
],
"isNew": null,
"productId": "143310696",
"productType": "blog",
"recommendType": "ali",
"report_data": null
},
{
"period": "2024-10-29-12",
"hotRankScore": "8690",
"pcHotRankScore": "8690",
"loginUserIsFollow": false,
"nickName": "deephub",
"avatarUrl": "https://profile-avatar.csdnimg.cn/0054ba55863d44d98d9ba189a2401027_m0_46510245.jpg!1",
"userName": "m0_46510245",
"articleTitle": "深度学习中的学习率调度:循环学习率、SGDR、1cycle 等方法介绍及实践策略研究",
"articleDetailUrl": "https://blog.csdn.net/m0_46510245/article/details/143280635",
"commentCount": "1",
"favorCount": "23",
"viewCount": "3668",
"hotComment": null,
"picList": [
"/i/ll/?i=img_convert/037199363e3dd7cfa9764901ee3734e8.jpeg"
],
"isNew": null,
"productId": "143280635",
"productType": "blog",
"recommendType": "ali",
"report_data": null
},
{
"period": "2024-10-29-12",
"hotRankScore": "8681",
"pcHotRankScore": "8681",
"loginUserIsFollow": false,
"nickName": "景天科技苑",
"avatarUrl": "https://profile-avatar.csdnimg.cn/89baaea9472b4fd7a2058d0e79ee2bc9_littlefun591.jpg!1",
"userName": "littlefun591",
"articleTitle": "【Golang】Go语言中如何进行包管理",
"articleDetailUrl": "https://blog.csdn.net/littlefun591/article/details/143300406",
"commentCount": "50",
"favorCount": "46",
"viewCount": "1234",
"hotComment": null,
"picList": [
"https://i-blog.csdnimg.cn/direct/947ad3e9d18b4e66a55dc31ecd7c1109.jpeg"
],
"isNew": null,
"productId": "143300406",
"productType": "blog",
"recommendType": "ali",
"report_data": null
},
{
"period": "2024-10-29-12",
"hotRankScore": "8412",
"pcHotRankScore": "8412",
"loginUserIsFollow": true,
"nickName": "盼小辉丶",
"avatarUrl": "https://profile-avatar.csdnimg.cn/40d49e89aeed4d50b8b32680f49e3ac1_lovemy134611.jpg!1",
"userName": "LOVEmy134611",
"articleTitle": "遗传算法与深度学习实战(20)——使用进化策略自动超参数优化",
"articleDetailUrl": "https://blog.csdn.net/LOVEmy134611/article/details/143283052",
"commentCount": "19",
"favorCount": "34",
"viewCount": "431",
"hotComment": null,
"picList": [
"https://i-blog.csdnimg.cn/direct/412d30ce3bda4f70845f524b724c35e3.png"
],
"isNew": null,
"productId": "143283052",
"productType": "blog",
"recommendType": "ali",
"report_data": null
},
{
"period": "2024-10-29-12",
"hotRankScore": "8255",
"pcHotRankScore": "8255",
"loginUserIsFollow": true,
"nickName": "中草药z",
"avatarUrl": "https://profile-avatar.csdnimg.cn/0e3cd82612e7400c9dfcef891e713698_2302_79806056.jpg!1",
"userName": "2302_79806056",
"articleTitle": "【Spring】Ioc&DI",
"articleDetailUrl": "https://blog.csdn.net/2302_79806056/article/details/143271383",
"commentCount": "39",
"favorCount": "61",
"viewCount": "675",
"hotComment": null,
"picList": [
"https://i-blog.csdnimg.cn/direct/752811eac90a465fa9b2c7b5f94b3b3d.png"
],
"isNew": null,
"productId": "143271383",
"productType": "blog",
"recommendType": "ali",
"report_data": null
},
{
"period": "2024-10-29-12",
"hotRankScore": "8057",
"pcHotRankScore": "8057",
"loginUserIsFollow": false,
"nickName": "川川菜鸟",
"avatarUrl": "https://profile-avatar.csdnimg.cn/171da61cd5e74ebfa9ab08e29de51894_weixin_46211269.jpg!1",
"userName": "weixin_46211269",
"articleTitle": "数学建模学习(131):使用Python基于VIKOR算法的多准则决策分析",
"articleDetailUrl": "https://blog.csdn.net/weixin_46211269/article/details/143268220",
"commentCount": "0",
"favorCount": "12",
"viewCount": "816",
"hotComment": null,
"picList": [
"/i/ll/?i=img_convert/fad536e972e14ce4b37803185dc3b00c.png"
],
"isNew": null,
"productId": "143268220",
"productType": "blog",
"recommendType": "ali",
"report_data": null
},
{
"period": "2024-10-29-12",
"hotRankScore": "7904",
"pcHotRankScore": "7904",
"loginUserIsFollow": true,
"nickName": "java李杨勇",
"avatarUrl": "https://profile-avatar.csdnimg.cn/ec7f24ace4574ddf9f7859b8d43ec482_weixin_39709134.jpg!1",
"userName": "weixin_39709134",
"articleTitle": "基于大数据爬虫+Hive+SpringBoot+的歌曲筛选推荐与可视化大屏平台设计和实现(源码+论文+部署讲解等)",
"articleDetailUrl": "https://blog.csdn.net/weixin_39709134/article/details/143314219",
"commentCount": "25",
"favorCount": "28",
"viewCount": "1927",
"hotComment": null,
"picList": [
"https://i-blog.csdnimg.cn/direct/43134b0303c1466383ec0a6b8e19fd1d.png"
],
"isNew": null,
"productId": "143314219",
"productType": "blog",
"recommendType": "ali",
"report_data": null
},
{
"period": "2024-10-29-12",
"hotRankScore": "7143",
"pcHotRankScore": "7143",
"loginUserIsFollow": false,
"nickName": "小码农<^_^>",
"avatarUrl": "https://profile-avatar.csdnimg.cn/3480896bab63498188fad036782a48b3_2401_87257864.jpg!1",
"userName": "2401_87257864",
"articleTitle": "优选算法精品课--双指针算法(2)",
"articleDetailUrl": "https://blog.csdn.net/2401_87257864/article/details/143302883",
"commentCount": "55",
"favorCount": "67",
"viewCount": "991",
"hotComment": null,
"picList": [
"https://i-blog.csdnimg.cn/direct/758a49d491dc40c297a510d18e15849a.png"
],
"isNew": null,
"productId": "143302883",
"productType": "blog",
"recommendType": "ali",
"report_data": null
}
]
}
结合上述得到的信息,我们对这个热榜文章查询api的分析如下:
接口地址
https://blog.csdn.net/phoenix/web/blog/hot-rank
请求方法
- GET
描述
此API端点用于获取CSDN的热榜文章列表,用户可以指定分页和排行榜类型。
请求参数
-
page (整数,可选)
- 描述: 指定要获取的结果页码。
- 默认值:
0
- 示例:
page=1
-
pageSize (整数,可选)
- 描述: 指定每页返回的文章数量。
- 默认值:
25
- 示例:
pageSize=50
-
type (字符串,可选)
- 描述: 指定热榜的类型(例如,日榜、周榜)。具体值取决于CSDN的内部配置。
- 示例:
type=daily
响应
-
内容类型:
application/json
-
响应结构:
{ "status": "success", "data": [ { "title": "文章标题", "author": "作者姓名", "time": "发布时间", "read_count": "阅读数量", "link": "文章链接" }, ... ], "page": 0, "pageSize": 25, "total": "文章总数" }
示例请求
GET /phoenix/web/blog/hot-rank?page=0&pageSize=25&type= HTTP/1.1
Host: blog.csdn.net
示例响应
{
"status": "success",
"data": [
{
"title": "理解AI",
"author": "张三",
"time": "2023-10-01",
"read_count": "5000",
"link": "https://blog.csdn.net/article/12345"
}
],
"page": 0,
"pageSize": 25,
"total": 100
}
错误处理
- HTTP 状态码:
200 OK
: 请求成功。400 Bad Request
: 请求无效,通常由于参数错误。500 Internal Server Error
: 服务器内部错误。
3.3 爬虫代码编写
针对上一节得到的API文档,我们可以使用python编程模拟实现热榜请求,代码如下所示:
import requests
import time
import random
# 热榜请求URL
csdn_hot_rank_url = "https://blog.csdn.net/phoenix/web/blog/hot-rank?page=0&pageSize=25&type="
# 抓取数据函数
def fetch_hot_rank():
try:
headers = {
"user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36",
"Accept-Encoding": "gzip, deflate",
"Accept-Charset": "utf-8"
}
# 发送请求
response = requests.get(csdn_hot_rank_url, headers=headers, timeout=10)
response.raise_for_status()
# 解析响应
data = response.json()
# 筛选并打印所需数据
if data["code"] == 200:
for entry in data["data"]:
nick_name = entry.get("nickName", "N/A")
article_title = entry.get("articleTitle", "N/A")
pc_hot_rank_score = entry.get("pcHotRankScore", "N/A")
print(f"作者: {nick_name}, 文章名: {article_title}, 当前热度值: {pc_hot_rank_score}")
else:
print("Failed to fetch data. Server responded with error.")
except requests.RequestException as e:
print(f"Request failed: {e}")
# 主函数
if __name__ == "__main__":
fetch_hot_rank()
运行结果如下所示,一共获取到25条数据。
对比热榜页前8篇文章,可以看到是一一对应的。
3.4 代理ip模式爬取
在进行网络爬虫时,使用代理IP是一种有效的策略,能够提升爬虫的稳定性和效率。主要体现在如下几方面:
- 避免IP封禁:频繁请求同一网站可能导致IP被临时或永久封禁。代理IP可以通过切换不同的IP地址,降低被封禁的风险。
- 提高爬取效率:使用多个代理IP可以实现并行请求,显著提高数据爬取速度。
- 访问受限资源:某些网站可能对特定地区的IP进行访问限制,使用代理IP可以绕过这些限制,实现无障碍访问。
刚好,前两天有个学习爬虫的粉丝朋友找我推荐好用的代理ip,他想学习爬取国外电商平台的数据。根据我的经验,我给他推荐了我一直在用的青果网络代理IP,基本满足了我的日常所需:
- 支持长达6小时上千个代理ip的免费试用,拿来做测试验证是非常合适的。
- 国内、国外全覆盖,可适用于全球化应用。
- 价格便宜,性价比高。
- 提供开箱即用的SDK和主流语言示例代码,几乎0接入成本。
下面,给大家演示一下,使用代理ip之后,爬虫代码如何编写:
import requests
import time
import random
#这是我手动提取的一组代理ip,实际应用过程中可以使用代码自动请求,更加灵活实用。
proxyAddr = "125.77.162.121:20062"
#根据自己的信息进行替换
authKey = "秘钥中的Authkey"
password = "秘钥中的Authpwd"
# 账密模式,无需配置ip白名单
proxyUrl = "http://%(user)s:%(password)s@%(server)s" % {
"user": authKey,
"password": password,
"server": proxyAddr,
}
proxies = {
"http": proxyUrl,
"https": proxyUrl,
}
# 热榜请求URL
csdn_hot_rank_url = "https://blog.csdn.net/phoenix/web/blog/hot-rank?page=0&pageSize=25&type="
# 抓取数据函数
def fetch_hot_rank():
try:
headers = {
"user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36",
"Accept-Encoding": "gzip, deflate",
"Accept-Charset": "utf-8"
}
# 发送请求,增加proxies字段,每次访问目标网站之前前会启用代理ip。
response = requests.get(csdn_hot_rank_url, headers=headers, proxies=proxies, timeout=10)
response.raise_for_status()
# 解析响应
data = response.json()
# 筛选并打印所需数据
if data["code"] == 200:
for entry in data["data"]:
nick_name = entry.get("nickName", "N/A")
article_title = entry.get("articleTitle", "N/A")
pc_hot_rank_score = entry.get("pcHotRankScore", "N/A")
print(f"作者: {nick_name}, 文章名: {article_title}, 当前热度值: {pc_hot_rank_score}")
else:
print("Failed to fetch data. Server responded with error.")
except requests.RequestException as e:
print(f"Request failed: {e}")
# 主函数
if __name__ == "__main__":
fetch_hot_rank()
执行结果如下图所示,依然可以请求到结果: