新手如何完成python视频爬虫

时间：2023-01-06 10:01:36浏览次数：45

标签：__ pcursor typename python 爬虫 url json 新手 data

作为新手小白学习爬虫，重要的就是实战经验，爬虫语言有多种，今天我们就通过python语言来做爬虫视频，下面的代码值得大家借鉴参考。

# coding=utf-8

import json
import os.path
import pprint

import requests


def get_page(pcursor):
    path = 'video/'
    if not os.path.exists(path):
        os.mkdir(path)
    # 爬取对象'https://www.kuaishou.com/profile/3xhv7zhkfr3rqag'
    """
    ctrl+r 批量替换
    https://www.kuaishou.com/short-video/3xw5fmcf9jdap29?authorId=3xhv7zhkfr3rqag&streamSource=profile&area=profilexxnull

    https://www.kuaishou.com/short-video/3xf98wc5q2cuxtq?authorId=3xhv7zhkfr3rqag&streamSource=profile&area=profilexxnull

    """

    url = 'https://www.kuaishou.com/graphql'
    headers = {
        'content-type': 'application/json',
        'Cookie': 'kpf=PC_WEB; kpn=KUAISHOU_VISION; clientid=3; did=web_72314bf978cb158dd7034b2370d2ae70',
        'Host': 'www.kuaishou.com',
        'Origin': 'https://www.kuaishou.com',
        'Referer': 'https://www.kuaishou.com/short-video/3x6v3xmcjsd5cki?authorId=3xhv7zhkfr3rqag&streamSource=profile&area=profilexxnull',
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36',
    }
    data = {
        "operationName": "visionProfilePhotoList",
        "query": "query visionProfilePhotoList($pcursor: String, $userId: String, $page: String, $webPageArea: String) {\n  visionProfilePhotoList(pcursor: $pcursor, userId: $userId, page: $page, webPageArea: $webPageArea) {\n    result\n    llsid\n    webPageArea\n    feeds {\n      type\n      author {\n        id\n        name\n        following\n        headerUrl\n        headerUrls {\n          cdn\n          url\n          __typename\n        }\n        __typename\n      }\n      tags {\n        type\n        name\n        __typename\n      }\n      photo {\n        id\n        duration\n        caption\n        likeCount\n        realLikeCount\n        coverUrl\n        coverUrls {\n          cdn\n          url\n          __typename\n        }\n        photoUrls {\n          cdn\n          url\n          __typename\n        }\n        photoUrl\n        liked\n        timestamp\n        expTag\n        animatedCoverUrl\n        stereoType\n        videoRatio\n        profileUserTopPhoto\n        __typename\n      }\n      canAddComment\n      currentPcursor\n      llsid\n      status\n      __typename\n    }\n    hostName\n    pcursor\n    __typename\n  }\n}\n",
        "variables": {"userId": "3xhv7zhkfr3rqag", "pcursor": pcursor, "page": "detail", "webPageArea": "profilexxnull"}

    }
    rsp = requests.post(url=url, json=data, headers=headers)
    # 第一种方式转成json
    # json_data = json.loads(rsp.text)
    # 或者
    json_data = rsp.json()
    # print(json_data, type(json_data))
    url_list = json_data['data']['visionProfilePhotoList']['feeds']
    pcursor = json_data['data']['visionProfilePhotoList']['pcursor']
    # print(url_list)
    # pprint.pprint(url_list)

    for key in url_list:
        # 视屏标题
        title = key['photo']['caption']
        # print(title)
        # 视频url
        new_url = key['photo']['photoUrl']
        # print(title, new_url)
        # 发送请求
        content_data = requests.get(url=new_url).content
        # 保存目录
        with open(f'video/{title}.mp4', mode='wb') as f:
            f.write(content_data)
            print(f'=======================正在下载标题为 {title} 的快手短视频==========================')
    if pcursor != "no_more":
        get_page(pcursor)


get_page("")

标签：__,pcursor,typename,python,爬虫,url,json,新手,data
From： https://blog.51cto.com/u_13488918/5992184

【团队效率提升】Python-PyWebIO介绍
作者：京东零售关键Q&A快速了解PyWebIOQ：首先，什么是PyWebIO？A：PyWebIO提供了一系列命令式的交互函数，能够让咱们用只用Python就可以编写Web应用,不需要编写前端页面和后端......
Python文件夹操作
如何使用python新建文件夹以及递归创建文件夹os.mkdir使用python创建文件夹，通常使用os.mkdir方法，在使用这个方法时有几个小的细节需要注意，假设你的代码是这样编写的i......
Python中的注释和input函数的使用
注释：一.最基础也是最常用的注释是#注释内容快捷键ctrl+/ 适用多个代码集成工具都是这个快捷键Python使用井号#作为单行注释的符号，语法格式为：#注释内容从井号......
Python接口自动化系列- 读取 ini 配置文件05
一、ini文件的组成一个ini文件是由多个section组成，每个section中以key=vlaue形式存储数据；二、python读取ini文件数据1、导包importconfigparserconfig=configp......
网易云爬虫+逆向分析
本人属于爬虫小白级别，历经重重困难终于有了点感觉，对于网易云音乐的爬虫，想为大家分享一下自己的思路，有不足的地方还望各位大佬指出。当然，目前无法下载无损音乐，也无法下载vi......
Python 异步：完整教程
Asyncio允许我们在Python中使用基于协程的并发异步编程。尽管asyncio已经在Python中使用多年，但它仍然是Python中最有趣但最令人沮丧的领域之一。对于新开发人员来......
[oeasy]python0037_终端_terminal_电传打字机_tty_shell_控制台_console_发展历史
换行回车回忆上次内容换行和回车是两回事换行对应字节0x0ALine-Feed水平不动垂直向上喂纸所以是feed回车对应字节0x0D......
关于爬虫中几个常用库的使用方法总结
关于爬虫中几个常用库的使用方法总结学了半个多月的爬虫了，用这个案例总结一下各个爬虫库的用法。当然后面还有更重要和更好用的方法，再等后面学到，再做总结了。1.目标......
[oeasy]python0037_终端_terminal_电传打字机_tty_shell_控制台_console_发展历史
换行回车回忆上次内容换行和回车是两回事换行对应字节0x0ALine-Feed水平不动垂直向上喂纸所以是feed回车对应字节0x0DCarriage-Return垂直......
一步一步学爬虫（4）数据存储之MySQL存储
(一步一步学爬虫（4）数据存储之MySQL存储)4.4MySQL存储关系型数据库是基于关系模型的数据库，而关系模型是通过二维表来保存的，所以它的存储方式就是行列组成的表，每一列是......

新手如何完成python视频爬虫

相关文章

赞助商

阅读排行