首页 > 编程语言 >python爬虫获取tap帖子

python爬虫获取tap帖子

时间:2022-10-26 23:00:13浏览次数:56  
标签:3DPC% tap python UA 爬虫 content moment data id

1.tap帖子数据获取

代码中cookie 为登陆后页面抓包的cookie,其中详情页需要3种拼接url,第一种是链接中含有topic,第二种中含有moment,第三种是视频,其中含有video

import requests
import json
import time

for data in range(0,20,10): #翻页,每加10翻一页
     url = 'https://www.taptap.cn/webapiv2/feed/v6/by-group?from={}&group_id=61080&limit=10&sort=created&type=feed&X-UA=V%3D1%26PN%3DWebApp%26LANG%3Dzh_CN%26VN_CODE%3D93%26VN%3D0.1.0%26LOC%3DCN%26PLT%3DPC%26DS%3DAndroid%26UID%3D8c933580-fddc-48ac-ad5f-86caf48af0d8%26VID%3D119295298%26DT%3DPC'.format(data)
     # url = 'https://www.taptap.com/webapiv2/feed/v6/by-group?from={}&group_id=61080&limit=10&sort=created&type=feed&X-UA=V%3D1%26PN%3DWebApp%26LANG%3Dzh_CN%26VN_CODE%3D92%26VN%3D0.1.0%26LOC%3DCN%26PLT%3DPC%26DS%3DAndroid%26UID%3Db60535e1-e107-4196-a819-8a37bdfdc90b%26VID%3D119295298%26DT%3DPC%26OS%3DWindows%26OSV%3D10'.format(data)
     headers = {"accept": "application/json, text/plain, */*",
                "cookie":"",
                "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3928.4 Safari/537.36"

               }
     data = {
            "group_id": "61080",
            "type": "feed",
            "sort": "created",
            # "X-UA": "V=1&PN=WebApp&LANG=zh_CN&VN_CODE=92&VN=0.1.0&LOC=CN&PLT=PC&DS=Android&UID=b60535e1-e107-4196-a819-8a37bdfdc90b&VID=119295298&DT=PC&OS=Windows&OSV=10",
            "X-UA": "V=1&PN=WebApp&LANG=zh_CN&VN_CODE=93&VN=0.1.0&LOC=CN&PLT=PC&DS=Android&UID=e71df365-69b0-4860-b76b-719ebd46ecd8&VID=119295298&DT=PC&OS=Windows&OSV=10"

            }
     json_ids = requests.get(url=url, headers=headers, data=data).json()

     for dic in json_ids['data']['list']:
         content_list = []
         content = {}
         timeStamp = dic['moment']['created_time']
         # for timeStamp in timeStamp_list:
         # timeStamp = 1665801067  # 10位时间戳
         # timeStamp_13 = 1381419600234# 13位时间戳
         timeArray = time.localtime(timeStamp)  # 转化成对应的时间
         otherStyleTime = time.strftime("%Y-%m-%d %H:%M:%S", timeArray)  # 字符串
         content['date'] = otherStyleTime
         content['athourr'] = dic['moment']['author']['user']['name']
         # content['contentatr'] = dic['moment']['contents']['raw_text']
         adresss = dic['moment']['author']['user']['id']
         content['adress'] = str(adresss)
         idstrs = dic['moment']['id_str']
         personurl = 'https://www.taptap.com/moment/' + idstrs
         content['url'] = personurl
         idstress = dic['moment']['complaint']['web_url']

         if "topic" in idstress:
             # url1 = 'https://www.taptap.com/webapiv2/moment/v2/detail?id='+idstrs+'&X-UA=V%3D1%26PN%3DWebApp%26LANG%3Dzh_CN%26VN_CODE%3D92%26VN%3D0.1.0%26LOC%3DCN%26PLT%3DPC%26DS%3DAndroid%26UID%3Db60535e1-e107-4196-a819-8a37bdfdc90b%26VID%3D119295298%26DT%3DPC%26OS%3DWindows%26OSV%3D10'
             idstresss = idstress.replace('/complaint?id=','').replace('&type=topic','')
             idstressss = str(idstresss)
             print(idstressss)
             url1 = 'https://www.taptap.cn/webapiv2/topic/v1/detail?id='+idstressss+'&X-UA=V%3D1%26PN%3DWebApp%26LANG%3Dzh_CN%26VN_CODE%3D93%26VN%3D0.1.0%26LOC%3DCN%26PLT%3DPC%26DS%3DAndroid%26UID%3De71df365-69b0-4860-b76b-719ebd46ecd8%26VID%3D119295298%26DT%3DPC%26OS%3DWindows%26OSV%3D10'
         elif "video" in idstress:
             idstrel = idstress.replace('/complaint?id=', '').replace('&type=video', '')
             idstresl = str(idstrel)
             print(idstresl)
             url1 ='https://www.taptap.cn/webapiv2/video/v2/detail?id='+idstresl+'&X-UA=V%3D1%26PN%3DWebApp%26LANG%3Dzh_CN%26VN_CODE%3D93%26VN%3D0.1.0%26LOC%3DCN%26PLT%3DPC%26DS%3DAndroid%26UID%3D8c933580-fddc-48ac-ad5f-86caf48af0d8%26VID%3D119295298%26DT%3DPC'

         else:
             # url1 = 'https://www.taptap.cn/webapiv2/topic/v1/detail?id='+idstress+'&X-UA=V%3D1%26PN%3DWebApp%26LANG%3Dzh_CN%26VN_CODE%3D93%26VN%3D0.1.0%26LOC%3DCN%26PLT%3DPC%26DS%3DAndroid%26UID%3De71df365-69b0-4860-b76b-719ebd46ecd8%26VID%3D119295298%26DT%3DPC%26OS%3DWindows%26OSV%3D10'
             url1 = 'https://www.taptap.cn/webapiv2/moment/v2/detail?id=' + idstrs + '&X-UA=V%3D1%26PN%3DWebApp%26LANG%3Dzh_CN%26VN_CODE%3D93%26VN%3D0.1.0%26LOC%3DCN%26PLT%3DPC%26DS%3DAndroid%26UID%3De71df365-69b0-4860-b76b-719ebd46ecd8%26VID%3D119295298%26DT%3DPC%26OS%3DWindows%26OSV%3D10'
             # print(url1)
         headers = {"accept": "application/json, text/plain, */*",
                    "cookie": "",
                    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3928.4 Safari/537.36"

                    }
         datas = {
             "id": idstrs,
             "X-UA": "V=1&PN=WebApp&LANG=zh_CN&VN_CODE=92&VN=0.1.0&LOC=CN&PLT=PC&DS=Android&UID=b60535e1-e107-4196-a819-8a37bdfdc90b&VID=119295298&DT=PC&OS=Windows&OSV=10",
                    }
         result1 = requests.get(url=url1, headers=headers, data=datas).json()
         print(url1)
         content['atricalll'] = result1['data']['moment']['sharing']['title'].replace('\n','').replace(' ','')
         content['atricallll'] = result1['data']['moment']['sharing']['description'].replace('\n','').replace(' ','')
         # print(atricalll,atricallll)
         content_list.append(content)
         print(content_list)

         with open('taptap.csv','a', encoding='utf-8') as f:
             for content in content_list:
                 f.write(content['date'] + ',' + content['athourr']+ ','+content['adress'] + ',' + content['url'] + ','+content['atricalll'] + ',' + content['atricallll']+ '\n')

2.运行后数据展示

 

标签:3DPC%,tap,python,UA,爬虫,content,moment,data,id
From: https://www.cnblogs.com/icekele/p/16830499.html

相关文章

  • Python: Template Method Pattern
    GeovinDuTemplate.py#模板方法模式TemplateMethodPatterndefget_text():return"text文件""""methodtogetthexmlversionoffile"""......
  • 我的爱情与Python不得不说的故事
    最近,沉迷于辩论比赛,有最近有场辩论赛因为一句话出圈了:为什么是坠入爱河而不是跳入爱河呢?因为爱本身是自由意志的沉沦。这让我想起来我当时坠入爱河的时候,作为人家印象里呆板......
  • 学习python-Day80
    今日学习内容一、表单控制二、购物车案例三、v-model进阶(了解)四、vue生命周期五、与后端交互ajaz六、计算属性七、侦听属性......
  • Python进阶篇04-面向对象编程
    面向对象编程面向对象编程和面向过程编程的区别:类和实例类:抽象的、用于创建实例的基础模板,类里面可以定义这个类所拥有的基础的属性。实例:根据类而创建的具体的对象,实......
  • 第三方模块的下载与使用,网络爬虫模块之requests模块,自动化办公领域之openpyxl模块
    第三方模块的下载与使用第三方模块:别人写的模块一般情况下功能都特别强大我们如果想使用第三方模块第一次必须先下载后面才可以反复使用(等同于内置模块)下载......
  • python基础之模块
    第三方模块的下载与使用第三方模块:别人写的模块一般情况下功能特别强大想使用第三方模块必须先下载后面才可以反复使用方式1:命令行借助于pip工具pip......
  • OpenCV-Python learning-9.图像阈值处理
    你也可以​​iframe外链​​查看。本节内容包括:常用阈值方法自适应阈值Otsu(大津法)自适应阈值​​github地址​​......
  • OpenCV-Python learning-8.颜色空间
    你也可以​​iframe外链​​查看。本节内容包括:改变色彩空间:cvtColor使用HSV对象跟踪练习......
  • python sklearn中的KNN
    代码fromsklearnimportdatasetsfromsklearn.model_selectionimporttrain_test_splitfromsklearn.neighborsimportKNeighborsClassifierimportnumpyasnpiris=dat......
  • 网络爬虫以及自动化办公基础
    Day22网络爬虫以及自动化办公基础作业讲解第三方模块下载与使用网络爬虫模块之request模块网络爬虫实践之爬取链家二手房数据自动化办公领域openpyx今日内容详细1......