首页 > 编程语言 >使用python爬虫爬取热门文章分析最新技术趋势

使用python爬虫爬取热门文章分析最新技术趋势

时间:2024-11-01 14:17:12浏览次数:3  
标签:blog cn python 爬虫 爬取 csdnimg https null data

本文借助爬虫来分析哪些技术正在快速发展,哪些问题在开发者中引起广泛讨论,从而为学习和研究提供重要参考。

在这里插入图片描述

使用python爬虫分析最新技术趋势

  • 一、爬取目标
  • 二、代码环境
    • 2.1 编程语言
    • 2.2 三方库
    • 2.3 环境配置
  • 三、代码实战
    • 3.1 接口分析
    • 3.2 接口参数分析
      • 接口地址
      • 请求方法
      • 描述
      • 请求参数
      • 响应
      • 示例请求
      • 示例响应
      • 错误处理
    • 3.3 爬虫代码编写
    • 3.4 代理ip模式爬取

一、爬取目标

在当今快速变化的技术环境中,获取最新的行业信息和技术趋势至关重要。国内主要IT技术社区汇聚了大量的技术文章和开发者分享的经验。我们的目标是抓取热门文章数据,以便获取当前技术社区中最受关注的内容和趋势。这些数据将帮助我们了解哪些技术正在快速发展,哪些问题在开发者中引起广泛讨论,从而为学习和研究提供重要参考。

文本涉及的思路和代码仅用于学习和研究之用,并遵循相关网站的使用协议。

二、代码环境

以下是本文相关的代码环境依赖和配置。

2.1 编程语言

2.2 三方库

  • Requests:用于发送HTTP请求,获取网页内容。它简单易用,支持各种HTTP方法,是Python中进行网络请求的首选库。
  • BeautifulSoup:用于解析HTML和XML文档,提取网页中的数据。BeautifulSoup提供了多种解析器,能够轻松处理复杂的HTML结构。
  • lxml:一个高性能的XML和HTML解析库,基于libxml2和libxslt库,可以高效地处理大型文档,适用于需求复杂的网页信息提取。

可以使用以下命令安装所需的Python库:

pip install requests beautifulsoup4 pandas

2.3 环境配置

  • Python版本:推荐使用Python 3.7及以上版本,以确保兼容性和性能优化。
  • IDE设置:建议使用PyCharm或者VSCode等集成开发环境,这些工具提供了良好的代码编辑、调试和运行支持。
    • PyCharm:提供强大的代码补全、调试和项目管理功能,非常适合大型项目开发。
    • VSCode:轻量级的编辑器,具有丰富的插件支持,可以根据需要灵活配置。

三、代码实战

3.1 接口分析

使用Chrome浏览器访问热门文章页,按下F12键调出开发者工具,如下图切换到网络功能,然后选择Fetch/XHR
在这里插入图片描述
上述准备准备工作完成之后,再次刷新该网页。如下图所示,可以看到有一个名为hot-rank的请求。
在这里插入图片描述

3.2 接口参数分析

点击hot-rank请求,可以看到该请求的详细信息。
在这里插入图片描述
如上图所示,可以看到完整的请求地址为https://blog.csdn.net/phoenix/web/blog/hot-rank?page=0&pageSize=25&type=
在这里插入图片描述
如上图所示,切换到响应页,可以看到该接口的响应结果,如下所示,可以看到一共有25条热榜文章的信息,主要包括当前热榜周期,热榜热度,作者昵称,作者用户名,文章地址,阅读点赞评论数量等相关信息。

{
    "code": 200,
    "message": "success",
    "traceId": "8d606612-0994-479f-96a0-2629e91fb57d",
    "data": [
        {
            "period": "2024-10-29-12",
            "hotRankScore": "21673",
            "pcHotRankScore": "2.2w",
            "loginUserIsFollow": false,
            "nickName": "平凡程序猿~",
            "avatarUrl": "https://profile-avatar.csdnimg.cn/2ee6469786394599b70d5357c55e4e4a_2302_81410974.jpg!1",
            "userName": "2302_81410974",
            "articleTitle": "深入探索:深度学习在时间序列预测中的强大应用与实现",
            "articleDetailUrl": "https://blog.csdn.net/2302_81410974/article/details/143285977",
            "commentCount": "75",
            "favorCount": "90",
            "viewCount": "1151",
            "hotComment": null,
            "picList": [
                "https://i-blog.csdnimg.cn/direct/b229b47116c7421b9487dc4d9640b8d8.jpeg"
            ],
            "isNew": null,
            "productId": "143285977",
            "productType": "blog",
            "recommendType": "ali",
            "report_data": null
        },
        {
            "period": "2024-10-29-12",
            "hotRankScore": "19924",
            "pcHotRankScore": "2.0w",
            "loginUserIsFollow": true,
            "nickName": "诡异森林。",
            "avatarUrl": "https://profile-avatar.csdnimg.cn/1071f5d6f89d42d6ae79f0b04ab9ac4d_m0_74068921.jpg!1",
            "userName": "m0_74068921",
            "articleTitle": "Docker:技术架构的演进之路",
            "articleDetailUrl": "https://blog.csdn.net/m0_74068921/article/details/143252466",
            "commentCount": "64",
            "favorCount": "92",
            "viewCount": "1914",
            "hotComment": null,
            "picList": [
                "https://i-blog.csdnimg.cn/direct/a473a9ccc3444e92b4bd76e3ab238c54.png"
            ],
            "isNew": null,
            "productId": "143252466",
            "productType": "blog",
            "recommendType": "ali",
            "report_data": null
        },
        {
            "period": "2024-10-29-12",
            "hotRankScore": "19525",
            "pcHotRankScore": "2.0w",
            "loginUserIsFollow": true,
            "nickName": "程序猿进阶",
            "avatarUrl": "https://profile-avatar.csdnimg.cn/6f2e9637836e45a884cab077791f1478_zhengzhaoyang122.jpg!1",
            "userName": "zhengzhaoyang122",
            "articleTitle": "web3.0 开发实践",
            "articleDetailUrl": "https://blog.csdn.net/zhengzhaoyang122/article/details/143272424",
            "commentCount": "67",
            "favorCount": "71",
            "viewCount": "2063",
            "hotComment": null,
            "picList": [
                "https://i-blog.csdnimg.cn/direct/2e02bd7403b643048df326467be6dc30.png"
            ],
            "isNew": null,
            "productId": "143272424",
            "productType": "blog",
            "recommendType": "ali",
            "report_data": null
        },
        {
            "period": "2024-10-29-12",
            "hotRankScore": "18719",
            "pcHotRankScore": "1.9w",
            "loginUserIsFollow": false,
            "nickName": "nantangyuxi",
            "avatarUrl": "https://profile-avatar.csdnimg.cn/bcce2794bbfb446180c3734639950b78_xiaoxingkongyuxi.jpg!1",
            "userName": "xiaoxingkongyuxi",
            "articleTitle": "MATLAB实现基于CNN-BiLSTM卷积双向长短期记忆神经网络的时间序列预测",
            "articleDetailUrl": "https://blog.csdn.net/xiaoxingkongyuxi/article/details/143263563",
            "commentCount": "1",
            "favorCount": "22",
            "viewCount": "801",
            "hotComment": null,
            "picList": [
                "https://i-blog.csdnimg.cn/direct/5f55baeb85aa4c2b93c19fdd7ec1ea7d.jpeg"
            ],
            "isNew": null,
            "productId": "143263563",
            "productType": "blog",
            "recommendType": "ali",
            "report_data": null
        },
        {
            "period": "2024-10-29-12",
            "hotRankScore": "17018",
            "pcHotRankScore": "1.7w",
            "loginUserIsFollow": false,
            "nickName": "虞书欣的6",
            "avatarUrl": "https://profile-avatar.csdnimg.cn/8f9e3e31879a4c1f93bc7caca17f3ffc_cxh666888_.jpg!1",
            "userName": "cxh666888_",
            "articleTitle": "Python小游戏14——雷霆战机",
            "articleDetailUrl": "https://blog.csdn.net/cxh666888_/article/details/143270011",
            "commentCount": "2",
            "favorCount": "32",
            "viewCount": "1959",
            "hotComment": null,
            "picList": [
                "/i/ll/?i=adc77680bd174e358868931dd9720f49.jpg"
            ],
            "isNew": null,
            "productId": "143270011",
            "productType": "blog",
            "recommendType": "ali",
            "report_data": null
        },
        {
            "period": "2024-10-29-12",
            "hotRankScore": "14736",
            "pcHotRankScore": "1.5w",
            "loginUserIsFollow": true,
            "nickName": "小ᶻZ࿆",
            "avatarUrl": "https://profile-avatar.csdnimg.cn/fcce75c8147f48338d288437f2ce7f8e_2201_75539691.jpg!1",
            "userName": "2201_75539691",
            "articleTitle": "【AIGC】从CoT到BoT:AGI推理能力提升24%的技术变革如何驱动ChatGPT未来发展",
            "articleDetailUrl": "https://blog.csdn.net/2201_75539691/article/details/143277081",
            "commentCount": "110",
            "favorCount": "100",
            "viewCount": "1286",
            "hotComment": null,
            "picList": [
                "https://i-blog.csdnimg.cn/direct/392caf61274b4c2b87aec87b7cec5b09.jpeg"
            ],
            "isNew": null,
            "productId": "143277081",
            "productType": "blog",
            "recommendType": "ali",
            "report_data": null
        },
        {
            "period": "2024-10-29-12",
            "hotRankScore": "12787",
            "pcHotRankScore": "1.3w",
            "loginUserIsFollow": true,
            "nickName": "2的n次方_",
            "avatarUrl": "https://profile-avatar.csdnimg.cn/06acfd433ab0467f918983afeee463dd_2202_76097976.jpg!1",
            "userName": "2202_76097976",
            "articleTitle": "【Spring MVC】请求参数的传递",
            "articleDetailUrl": "https://blog.csdn.net/2202_76097976/article/details/143088438",
            "commentCount": "84",
            "favorCount": "80",
            "viewCount": "1353",
            "hotComment": null,
            "picList": [
                "https://i-blog.csdnimg.cn/direct/ec823407c72f4d1eba176532fa523a60.png"
            ],
            "isNew": null,
            "productId": "143088438",
            "productType": "blog",
            "recommendType": "ali",
            "report_data": null
        },
        {
            "period": "2024-10-29-12",
            "hotRankScore": "12517",
            "pcHotRankScore": "1.3w",
            "loginUserIsFollow": true,
            "nickName": "程序边界",
            "avatarUrl": "https://profile-avatar.csdnimg.cn/ea275d892df44c5fb722c5756f8ba98b_qq_32682301.jpg!1",
            "userName": "qq_32682301",
            "articleTitle": "AIGC时代的数据盛宴:R语言引领数据分析新风尚",
            "articleDetailUrl": "https://blog.csdn.net/qq_32682301/article/details/143290891",
            "commentCount": "40",
            "favorCount": "56",
            "viewCount": "738",
            "hotComment": null,
            "picList": [
                "/i/ll/?i=img_convert/1556a4ddabef4ac5b908fba2a47d3eee.png"
            ],
            "isNew": null,
            "productId": "143290891",
            "productType": "blog",
            "recommendType": "ali",
            "report_data": null
        },
        {
            "period": "2024-10-29-12",
            "hotRankScore": "12246",
            "pcHotRankScore": "1.2w",
            "loginUserIsFollow": true,
            "nickName": "小林熬夜学编程",
            "avatarUrl": "https://profile-avatar.csdnimg.cn/54f236b0001841bd8615c85217b2ec93_2201_75584283.jpg!1",
            "userName": "2201_75584283",
            "articleTitle": "【Linux系统编程】第三十八弹---信号世界探索:从生活到技术的全面解析",
            "articleDetailUrl": "https://blog.csdn.net/2201_75584283/article/details/142580264",
            "commentCount": "91",
            "favorCount": "104",
            "viewCount": "592",
            "hotComment": null,
            "picList": [
                "https://i-blog.csdnimg.cn/direct/9f6ddbb1af0440a09d928581b47a29f2.png"
            ],
            "isNew": null,
            "productId": "142580264",
            "productType": "blog",
            "recommendType": "ali",
            "report_data": null
        },
        {
            "period": "2024-10-29-12",
            "hotRankScore": "12180",
            "pcHotRankScore": "1.2w",
            "loginUserIsFollow": false,
            "nickName": "小码农叔叔",
            "avatarUrl": "https://profile-avatar.csdnimg.cn/2a38fa4448434cf9be6818697cdaf229_zhangcongyi420.jpg!1",
            "userName": "zhangcongyi420",
            "articleTitle": "【大数据】Flink + Kafka 实现通用流式数据处理详解",
            "articleDetailUrl": "https://blog.csdn.net/zhangcongyi420/article/details/143027849",
            "commentCount": "146",
            "favorCount": "88",
            "viewCount": "2092",
            "hotComment": null,
            "picList": [
                "https://i-blog.csdnimg.cn/direct/10919f87476e4db59d559260198a1925.png"
            ],
            "isNew": null,
            "productId": "143027849",
            "productType": "blog",
            "recommendType": "ali",
            "report_data": null
        },
        {
            "period": "2024-10-29-12",
            "hotRankScore": "12115",
            "pcHotRankScore": "1.2w",
            "loginUserIsFollow": true,
            "nickName": "Kwan的解忧杂货铺@新空间代码工作室",
            "avatarUrl": "https://profile-avatar.csdnimg.cn/7b22c422c2df41c9aa22ff208e9cb96d_qyj19920704.jpg!1",
            "userName": "qyj19920704",
            "articleTitle": "本地Docker部署开源WAF雷池并实现异地远程登录管理界面",
            "articleDetailUrl": "https://blog.csdn.net/qyj19920704/article/details/143309113",
            "commentCount": "85",
            "favorCount": "64",
            "viewCount": "931",
            "hotComment": null,
            "picList": [
                "https://i-blog.csdnimg.cn/direct/78b08acbdf86436f868d0b827fc038ca.jpeg"
            ],
            "isNew": null,
            "productId": "143309113",
            "productType": "blog",
            "recommendType": "ali",
            "report_data": null
        },
        {
            "period": "2024-10-29-12",
            "hotRankScore": "11432",
            "pcHotRankScore": "1.1w",
            "loginUserIsFollow": false,
            "nickName": "Mr.Winter`",
            "avatarUrl": "https://profile-avatar.csdnimg.cn/055065db1f8b4d4bb2b90751efcd4e07_frigidwinter.jpg!1",
            "userName": "FRIGIDWINTER",
            "articleTitle": "轨迹规划 | 基于差速运动学的有模型PID算法(附ROS C++仿真)",
            "articleDetailUrl": "https://blog.csdn.net/FRIGIDWINTER/article/details/143258906",
            "commentCount": "28",
            "favorCount": "45",
            "viewCount": "1122",
            "hotComment": null,
            "picList": [
                "https://i-blog.csdnimg.cn/direct/8baf090b363a46b982678d5dee8fd3c5.gif"
            ],
            "isNew": null,
            "productId": "143258906",
            "productType": "blog",
            "recommendType": "ali",
            "report_data": null
        },
        {
            "period": "2024-10-29-12",
            "hotRankScore": "10661",
            "pcHotRankScore": "1.1w",
            "loginUserIsFollow": false,
            "nickName": "凯子坚持C",
            "avatarUrl": "https://profile-avatar.csdnimg.cn/125c0a68206144a6a3869aeddf9a9cbc_2301_80863610.jpg!1",
            "userName": "2301_80863610",
            "articleTitle": "AI创作者与人类创作者的协作模式",
            "articleDetailUrl": "https://blog.csdn.net/2301_80863610/article/details/143305703",
            "commentCount": "83",
            "favorCount": "78",
            "viewCount": "1109",
            "hotComment": null,
            "picList": [
                "https://i-blog.csdnimg.cn/direct/ce0c12726e154ac08a26f62929d6c837.gif"
            ],
            "isNew": null,
            "productId": "143305703",
            "productType": "blog",
            "recommendType": "ali",
            "report_data": null
        },
        {
            "period": "2024-10-29-12",
            "hotRankScore": "9709",
            "pcHotRankScore": "1.0w",
            "loginUserIsFollow": false,
            "nickName": "CoderJia_",
            "avatarUrl": "https://profile-avatar.csdnimg.cn/3de6e5c240924736b5dda23e0d74cbd3_u014390502.jpg!1",
            "userName": "u014390502",
            "articleTitle": "重学SpringBoot3-Spring WebFlux之SSE服务器发送事件",
            "articleDetailUrl": "https://blog.csdn.net/u014390502/article/details/143275309",
            "commentCount": "42",
            "favorCount": "60",
            "viewCount": "1601",
            "hotComment": null,
            "picList": [
                "https://i-blog.csdnimg.cn/direct/b41deddffd8d416fbb67e296f0143932.webp"
            ],
            "isNew": null,
            "productId": "143275309",
            "productType": "blog",
            "recommendType": "ali",
            "report_data": null
        },
        {
            "period": "2024-10-29-12",
            "hotRankScore": "9559",
            "pcHotRankScore": "1.0w",
            "loginUserIsFollow": false,
            "nickName": "颜淡慕潇",
            "avatarUrl": "https://profile-avatar.csdnimg.cn/2febcd43c21442509a2bf41fa63d7b67_weixin_36755535.jpg!1",
            "userName": "weixin_36755535",
            "articleTitle": "【K8S系列】Kubernetes 中 Service IP 地址和端口不匹配问题及解决方案【已解决】",
            "articleDetailUrl": "https://blog.csdn.net/weixin_36755535/article/details/143271989",
            "commentCount": "39",
            "favorCount": "30",
            "viewCount": "1898",
            "hotComment": null,
            "picList": [
                "https://i-blog.csdnimg.cn/direct/d49227ce7645436aad653e460cfbdd86.png"
            ],
            "isNew": null,
            "productId": "143271989",
            "productType": "blog",
            "recommendType": "ali",
            "report_data": null
        },
        {
            "period": "2024-10-29-12",
            "hotRankScore": "8973",
            "pcHotRankScore": "8973",
            "loginUserIsFollow": false,
            "nickName": "DARLING Zero two♡",
            "avatarUrl": "https://profile-avatar.csdnimg.cn/00d1491d64c44cdbbd15500178786292_zero_vpn.jpg!1",
            "userName": "Zero_VPN",
            "articleTitle": "关于我、重生到500年前凭借C语言改变世界科技vlog.11——深入理解指针(1)",
            "articleDetailUrl": "https://blog.csdn.net/Zero_VPN/article/details/143288823",
            "commentCount": "94",
            "favorCount": "55",
            "viewCount": "1246",
            "hotComment": null,
            "picList": [
                "https://i-blog.csdnimg.cn/direct/3d6ea5738c17443e9969873cf82c5e8b.png"
            ],
            "isNew": null,
            "productId": "143288823",
            "productType": "blog",
            "recommendType": "ali",
            "report_data": null
        },
        {
            "period": "2024-10-29-12",
            "hotRankScore": "8843",
            "pcHotRankScore": "8843",
            "loginUserIsFollow": true,
            "nickName": "Jupiter·",
            "avatarUrl": "https://profile-avatar.csdnimg.cn/036f6cff17d74f91abb8a599504fe014_2301_77509762.jpg!1",
            "userName": "2301_77509762",
            "articleTitle": "深入理解数据链路层:以太网帧格式、MAC地址、交换机、MTU及ARP协议详解与ARP欺骗探究",
            "articleDetailUrl": "https://blog.csdn.net/2301_77509762/article/details/142486419",
            "commentCount": "28",
            "favorCount": "41",
            "viewCount": "588",
            "hotComment": null,
            "picList": [
                "https://i-blog.csdnimg.cn/direct/e0c2599816294c55a3aca8018727b542.png"
            ],
            "isNew": null,
            "productId": "142486419",
            "productType": "blog",
            "recommendType": "ali",
            "report_data": null
        },
        {
            "period": "2024-10-29-12",
            "hotRankScore": "8700",
            "pcHotRankScore": "8700",
            "loginUserIsFollow": true,
            "nickName": "小强在此",
            "avatarUrl": "https://profile-avatar.csdnimg.cn/b1d9c6e6380a41898d74591dc3406e7c_zcy_c.jpg!1",
            "userName": "ZCY_c",
            "articleTitle": "机器学习【学校智慧食堂及其应用】",
            "articleDetailUrl": "https://blog.csdn.net/ZCY_c/article/details/143310696",
            "commentCount": "45",
            "favorCount": "47",
            "viewCount": "681",
            "hotComment": null,
            "picList": [
                "https://i-blog.csdnimg.cn/direct/5301d509f3154664944736e615200d97.jpeg"
            ],
            "isNew": null,
            "productId": "143310696",
            "productType": "blog",
            "recommendType": "ali",
            "report_data": null
        },
        {
            "period": "2024-10-29-12",
            "hotRankScore": "8690",
            "pcHotRankScore": "8690",
            "loginUserIsFollow": false,
            "nickName": "deephub",
            "avatarUrl": "https://profile-avatar.csdnimg.cn/0054ba55863d44d98d9ba189a2401027_m0_46510245.jpg!1",
            "userName": "m0_46510245",
            "articleTitle": "深度学习中的学习率调度:循环学习率、SGDR、1cycle 等方法介绍及实践策略研究",
            "articleDetailUrl": "https://blog.csdn.net/m0_46510245/article/details/143280635",
            "commentCount": "1",
            "favorCount": "23",
            "viewCount": "3668",
            "hotComment": null,
            "picList": [
                "/i/ll/?i=img_convert/037199363e3dd7cfa9764901ee3734e8.jpeg"
            ],
            "isNew": null,
            "productId": "143280635",
            "productType": "blog",
            "recommendType": "ali",
            "report_data": null
        },
        {
            "period": "2024-10-29-12",
            "hotRankScore": "8681",
            "pcHotRankScore": "8681",
            "loginUserIsFollow": false,
            "nickName": "景天科技苑",
            "avatarUrl": "https://profile-avatar.csdnimg.cn/89baaea9472b4fd7a2058d0e79ee2bc9_littlefun591.jpg!1",
            "userName": "littlefun591",
            "articleTitle": "【Golang】Go语言中如何进行包管理",
            "articleDetailUrl": "https://blog.csdn.net/littlefun591/article/details/143300406",
            "commentCount": "50",
            "favorCount": "46",
            "viewCount": "1234",
            "hotComment": null,
            "picList": [
                "https://i-blog.csdnimg.cn/direct/947ad3e9d18b4e66a55dc31ecd7c1109.jpeg"
            ],
            "isNew": null,
            "productId": "143300406",
            "productType": "blog",
            "recommendType": "ali",
            "report_data": null
        },
        {
            "period": "2024-10-29-12",
            "hotRankScore": "8412",
            "pcHotRankScore": "8412",
            "loginUserIsFollow": true,
            "nickName": "盼小辉丶",
            "avatarUrl": "https://profile-avatar.csdnimg.cn/40d49e89aeed4d50b8b32680f49e3ac1_lovemy134611.jpg!1",
            "userName": "LOVEmy134611",
            "articleTitle": "遗传算法与深度学习实战(20)——使用进化策略自动超参数优化",
            "articleDetailUrl": "https://blog.csdn.net/LOVEmy134611/article/details/143283052",
            "commentCount": "19",
            "favorCount": "34",
            "viewCount": "431",
            "hotComment": null,
            "picList": [
                "https://i-blog.csdnimg.cn/direct/412d30ce3bda4f70845f524b724c35e3.png"
            ],
            "isNew": null,
            "productId": "143283052",
            "productType": "blog",
            "recommendType": "ali",
            "report_data": null
        },
        {
            "period": "2024-10-29-12",
            "hotRankScore": "8255",
            "pcHotRankScore": "8255",
            "loginUserIsFollow": true,
            "nickName": "中草药z",
            "avatarUrl": "https://profile-avatar.csdnimg.cn/0e3cd82612e7400c9dfcef891e713698_2302_79806056.jpg!1",
            "userName": "2302_79806056",
            "articleTitle": "【Spring】Ioc&DI",
            "articleDetailUrl": "https://blog.csdn.net/2302_79806056/article/details/143271383",
            "commentCount": "39",
            "favorCount": "61",
            "viewCount": "675",
            "hotComment": null,
            "picList": [
                "https://i-blog.csdnimg.cn/direct/752811eac90a465fa9b2c7b5f94b3b3d.png"
            ],
            "isNew": null,
            "productId": "143271383",
            "productType": "blog",
            "recommendType": "ali",
            "report_data": null
        },
        {
            "period": "2024-10-29-12",
            "hotRankScore": "8057",
            "pcHotRankScore": "8057",
            "loginUserIsFollow": false,
            "nickName": "川川菜鸟",
            "avatarUrl": "https://profile-avatar.csdnimg.cn/171da61cd5e74ebfa9ab08e29de51894_weixin_46211269.jpg!1",
            "userName": "weixin_46211269",
            "articleTitle": "数学建模学习(131):使用Python基于VIKOR算法的多准则决策分析",
            "articleDetailUrl": "https://blog.csdn.net/weixin_46211269/article/details/143268220",
            "commentCount": "0",
            "favorCount": "12",
            "viewCount": "816",
            "hotComment": null,
            "picList": [
                "/i/ll/?i=img_convert/fad536e972e14ce4b37803185dc3b00c.png"
            ],
            "isNew": null,
            "productId": "143268220",
            "productType": "blog",
            "recommendType": "ali",
            "report_data": null
        },
        {
            "period": "2024-10-29-12",
            "hotRankScore": "7904",
            "pcHotRankScore": "7904",
            "loginUserIsFollow": true,
            "nickName": "java李杨勇",
            "avatarUrl": "https://profile-avatar.csdnimg.cn/ec7f24ace4574ddf9f7859b8d43ec482_weixin_39709134.jpg!1",
            "userName": "weixin_39709134",
            "articleTitle": "基于大数据爬虫+Hive+SpringBoot+的歌曲筛选推荐与可视化大屏平台设计和实现(源码+论文+部署讲解等)",
            "articleDetailUrl": "https://blog.csdn.net/weixin_39709134/article/details/143314219",
            "commentCount": "25",
            "favorCount": "28",
            "viewCount": "1927",
            "hotComment": null,
            "picList": [
                "https://i-blog.csdnimg.cn/direct/43134b0303c1466383ec0a6b8e19fd1d.png"
            ],
            "isNew": null,
            "productId": "143314219",
            "productType": "blog",
            "recommendType": "ali",
            "report_data": null
        },
        {
            "period": "2024-10-29-12",
            "hotRankScore": "7143",
            "pcHotRankScore": "7143",
            "loginUserIsFollow": false,
            "nickName": "小码农<^_^>",
            "avatarUrl": "https://profile-avatar.csdnimg.cn/3480896bab63498188fad036782a48b3_2401_87257864.jpg!1",
            "userName": "2401_87257864",
            "articleTitle": "优选算法精品课--双指针算法(2)",
            "articleDetailUrl": "https://blog.csdn.net/2401_87257864/article/details/143302883",
            "commentCount": "55",
            "favorCount": "67",
            "viewCount": "991",
            "hotComment": null,
            "picList": [
                "https://i-blog.csdnimg.cn/direct/758a49d491dc40c297a510d18e15849a.png"
            ],
            "isNew": null,
            "productId": "143302883",
            "productType": "blog",
            "recommendType": "ali",
            "report_data": null
        }
    ]
}

结合上述得到的信息,我们对这个热榜文章查询api的分析如下:

接口地址

  • https://blog.csdn.net/phoenix/web/blog/hot-rank

请求方法

  • GET

描述

此API端点用于获取CSDN的热榜文章列表,用户可以指定分页和排行榜类型。

请求参数

  • page (整数,可选)

    • 描述: 指定要获取的结果页码。
    • 默认值: 0
    • 示例: page=1
  • pageSize (整数,可选)

    • 描述: 指定每页返回的文章数量。
    • 默认值: 25
    • 示例: pageSize=50
  • type (字符串,可选)

    • 描述: 指定热榜的类型(例如,日榜、周榜)。具体值取决于CSDN的内部配置。
    • 示例: type=daily

响应

  • 内容类型: application/json

  • 响应结构:

    {
        "status": "success",
        "data": [
            {
                "title": "文章标题",
                "author": "作者姓名",
                "time": "发布时间",
                "read_count": "阅读数量",
                "link": "文章链接"
            },
            ...
        ],
        "page": 0,
        "pageSize": 25,
        "total": "文章总数"
    }
    

示例请求

GET /phoenix/web/blog/hot-rank?page=0&pageSize=25&type= HTTP/1.1
Host: blog.csdn.net

示例响应

{
    "status": "success",
    "data": [
        {
            "title": "理解AI",
            "author": "张三",
            "time": "2023-10-01",
            "read_count": "5000",
            "link": "https://blog.csdn.net/article/12345"
        }
    ],
    "page": 0,
    "pageSize": 25,
    "total": 100
}

错误处理

  • HTTP 状态码:
    • 200 OK: 请求成功。
    • 400 Bad Request: 请求无效,通常由于参数错误。
    • 500 Internal Server Error: 服务器内部错误。

3.3 爬虫代码编写

针对上一节得到的API文档,我们可以使用python编程模拟实现热榜请求,代码如下所示:

import requests
import time
import random

# 热榜请求URL
csdn_hot_rank_url = "https://blog.csdn.net/phoenix/web/blog/hot-rank?page=0&pageSize=25&type="

# 抓取数据函数
def fetch_hot_rank():
    try:
        headers = {
            "user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36",
            "Accept-Encoding": "gzip, deflate",
            "Accept-Charset": "utf-8"
        }
        # 发送请求
        response = requests.get(csdn_hot_rank_url, headers=headers, timeout=10)
        response.raise_for_status()
        
        # 解析响应
        data = response.json()
        
        # 筛选并打印所需数据
        if data["code"] == 200:
            for entry in data["data"]:
                nick_name = entry.get("nickName", "N/A")
                article_title = entry.get("articleTitle", "N/A")
                pc_hot_rank_score = entry.get("pcHotRankScore", "N/A")
                print(f"作者: {nick_name}, 文章名: {article_title}, 当前热度值: {pc_hot_rank_score}")
        else:
            print("Failed to fetch data. Server responded with error.")
    
    except requests.RequestException as e:
        print(f"Request failed: {e}")


# 主函数
if __name__ == "__main__":
    fetch_hot_rank()

运行结果如下所示,一共获取到25条数据。
在这里插入图片描述
对比热榜页前8篇文章,可以看到是一一对应的。
在这里插入图片描述

3.4 代理ip模式爬取

在进行网络爬虫时,使用代理IP是一种有效的策略,能够提升爬虫的稳定性和效率。主要体现在如下几方面:

  • 避免IP封禁:频繁请求同一网站可能导致IP被临时或永久封禁。代理IP可以通过切换不同的IP地址,降低被封禁的风险。
  • 提高爬取效率:使用多个代理IP可以实现并行请求,显著提高数据爬取速度。
  • 访问受限资源:某些网站可能对特定地区的IP进行访问限制,使用代理IP可以绕过这些限制,实现无障碍访问。

刚好,前两天有个学习爬虫的粉丝朋友找我推荐好用的代理ip,他想学习爬取国外电商平台的数据。根据我的经验,我给他推荐了我一直在用的青果网络代理IP,基本满足了我的日常所需:

  1. 支持长达6小时上千个代理ip的免费试用,拿来做测试验证是非常合适的。
  2. 国内、国外全覆盖,可适用于全球化应用。
  3. 价格便宜,性价比高。
  4. 提供开箱即用的SDK和主流语言示例代码,几乎0接入成本。

下面,给大家演示一下,使用代理ip之后,爬虫代码如何编写:

import requests
import time
import random

#这是我手动提取的一组代理ip,实际应用过程中可以使用代码自动请求,更加灵活实用。
proxyAddr = "125.77.162.121:20062"

#根据自己的信息进行替换
authKey = "秘钥中的Authkey"
password = "秘钥中的Authpwd"

# 账密模式,无需配置ip白名单
proxyUrl = "http://%(user)s:%(password)s@%(server)s" % {
    "user": authKey,
    "password": password,
    "server": proxyAddr,
}

proxies = {
    "http": proxyUrl,
    "https": proxyUrl,
}

# 热榜请求URL
csdn_hot_rank_url = "https://blog.csdn.net/phoenix/web/blog/hot-rank?page=0&pageSize=25&type="

# 抓取数据函数
def fetch_hot_rank():
    try:
        headers = {
            "user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36",
            "Accept-Encoding": "gzip, deflate",
            "Accept-Charset": "utf-8"
        }
        # 发送请求,增加proxies字段,每次访问目标网站之前前会启用代理ip。
        response = requests.get(csdn_hot_rank_url, headers=headers, proxies=proxies, timeout=10)
        response.raise_for_status()
        
        # 解析响应
        data = response.json()
        
        # 筛选并打印所需数据
        if data["code"] == 200:
            for entry in data["data"]:
                nick_name = entry.get("nickName", "N/A")
                article_title = entry.get("articleTitle", "N/A")
                pc_hot_rank_score = entry.get("pcHotRankScore", "N/A")
                print(f"作者: {nick_name}, 文章名: {article_title}, 当前热度值: {pc_hot_rank_score}")
        else:
            print("Failed to fetch data. Server responded with error.")
    
    except requests.RequestException as e:
        print(f"Request failed: {e}")


# 主函数
if __name__ == "__main__":
    fetch_hot_rank()

执行结果如下图所示,依然可以请求到结果:
在这里插入图片描述

标签:blog,cn,python,爬虫,爬取,csdnimg,https,null,data
From: https://blog.csdn.net/g310773517/article/details/143189616

相关文章

  • 在 Python 中将列表转换为字符串需要哪些技术
    在Python中,列表转换为字符串的技术主要包括使用join方法、使用for循环拼接、以及通过map函数结合join方法等手段。在详述中,我们会重点讲解join方法,这是将列表转换为字符串的最简便及最常用的技术。它通过连接序列中的元素,来生成一个新的字符串。一、使用JOIN方法join方法是将......
  • Python内存模型
    Python深浅拷贝一.变量的内存模型x=1print(id(x))#94454455464992print(id(1))#94454455464992print(id(5))#94454455465120x=5print(id(x))#94454455465120print(id(1))#94454455464992print(id(5))#94454455465120"""Python更改变量值,发生变化......
  • Python格式化字符串
    1.%格式化最早用%进行格式化字符串#%d%s%f格式化字符串name="Max"num=1print("Hello%s,yournumis%d"%(name,num))#HelloMax,yournumis1#也支持字典形式格式化print("Hello[%(name)s],yournumis%(num)d"%{"num":1,"name&q......
  • Python的数值与bytes类型
    Python中数值的表示进制表示n=97#十进制表示97n=0b01100001#二进制表示97n=0x61#十六进制表示97n=0o141#八进制表示97上面四种方式定义的值是等价的,均为十进制的97ASCII编码表示n="a"n="\b01100001"n="\x61"n="\o141"上面四种方式定义的值也是等价......
  • https脚本python和bash版本
    以下是一个使用 `curl` 的 Bash 脚本,可以发送 HTTPS 的 GET 和 POST 请求,同时支持 cookies 和gzip 压缩: ```bash#!/bin/bash # 默认配置COOKIE_FILE="cookies.txt" # 存储cookies的文件USER_AGENT="Mozilla/5.0(X11;Ubuntu;Linuxx86_64)AppleWe......
  • 【Python】网络请求与数据获取:Requests库的使用与技巧
    网络请求与数据获取:Requests库的使用与技巧在现代Web开发与数据科学工作中,从API、网页或服务端获取数据是非常常见的任务,而Python的Requests库为此提供了便捷且功能强大的工具。本文将从基本的HTTP请求操作出发,结合常见的数据获取需求,深入讲解Requests的使用......
  • python 计算 sin 值
    概述当角(弧度描述)x足够小时,sin(x)约等于x,而已知三角等式sin(x)=3sin(x/3)-4sin^3(x/3),用python语言计算任意大的弧度角的sin值实现可以利用给定的三角恒等式[\sin(x)=3\sin\left(\frac{x}{3}\right)-4\sin^3\left(\frac{x}{3}\right)]来递归地计算任意弧度......
  • 使用wxpython开发跨平台桌面应用,对wxpython控件实现类似C#扩展函数处理的探究
    本人之前对C#开发非常喜欢,也从事开发C#开发桌面开发、Web后端、Vue前端应用开发多年,最近一直在研究使用Python,希望能够把C#的一些好的设计模式、开发便利经验引入到Python开发中,很多时候类似的开发方式,可以极大提高我们开发的效率,本篇随笔对wxpython控件实现类似C#扩展函数处理的......
  • 【含文档+PPT+源码】基于Python校园跑腿管理系统设计与实现
    项目介绍本课程演示的是一款基于Python校园跑腿管理系统设计与实现,主要针对计算机相关专业的正在做毕设的学生与需要项目实战练习的Python学习者。1.包含:项目源码、项目文档、数据库脚本、软件工具等所有资料2.带你从零开始部署运行本套系统3.该项目附带的源码资料可作为......
  • 【粒子群优化算法】基于Schwefel‘s P2.21函数的PSO算法变体性能分析(附完整算法Python
    基于Schwefel'sP2.21函数的PSO算法变体性能分析(附完整算法Python代码)摘要1.引言1.1研究目的2.算法与测试函数2.1Schwefel'sP2.21函数2.2PSO算法变体2.2.1标准PSO(SPSO)2.2.2自适应PSO(APSO)2.2.3改进的带变异PSO(IPSOM)2.2.4混合PSO(HPSO)3.实验设计3.......