首页 > 编程语言 >基于Python拉取快手直播视频流,并将视频流切割成一定时长的视频片段

基于Python拉取快手直播视频流,并将视频流切割成一定时长的视频片段

时间:2024-10-30 10:20:55浏览次数:3  
标签:cookies 快手 视频流 play Python live cookie url print

通过访问快手的直播间网页,从网页的script标签内部提取出关于该直播间的相关信息的JSON串,最终从JSON里提取出直播视频流的hls地址和直播间的其他信息。

附代码

import json
import random
import re
import subprocess
import sys
import time
from enum import Enum
from urllib.parse import urlparse
from urllib.parse import urlunparse

import requests
from bs4 import BeautifulSoup

from CookieUtil import CookieUtil


class LivingStatus(Enum):
    Living = 1
    STOP = 2
    ERROR = 3


def generate_did():
    random_number = int(random.random() * 1e9)
    hex_chars = "0123456789ABCDEF"
    random_hex = ''.join(random.choice(hex_chars) for _ in range(7))
    return "web_" + str(random_number) + random_hex

def get_stream_url(user_agent, pc_live_url):
    did = generate_did()
    print("did: \n", did)

    headers = {
        'referer': "https://live.kuaishou.com/",
        'User-Agent': user_agent,
        "Cookie": f"_did={did}",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7"
    }

    response = requests.get(pc_live_url, headers=headers, allow_redirects=True)

    cookie_dict = CookieUtil.cookies_to_dict(response.cookies)
    cookie_content = CookieUtil.cookies_to_string(cookie_dict)
    print("cookie_content: \n", cookie_content)

    headers['Cookie'] = cookie_content

    response = requests.get(pc_live_url, headers=headers, allow_redirects=True)

    html_str = response.text

    soup = BeautifulSoup(html_str, 'html.parser')
    scripts = soup.find_all('script')

    result = []

    for script in scripts:
        target_str = script.string
        if target_str is not None and "liveStream" in target_str:
            if "undefined," in target_str:
                target_str = target_str.replace("undefined,", '"",')
            match = re.search(r'window\.__INITIAL_STATE__=(.*?);', target_str)

            if match:
                extracted_content = match.group(1)
                print("extracted_content:\n", extracted_content)
                data = json.loads(extracted_content)

                live_room = data['liveroom']
                if live_room is not None:
                    play_list = live_room['playList']
                    if play_list is not None and len(play_list) > 0:
                        play_item = play_list[0]
                        if "errorType" in play_item:
                            error_msg = play_item['errorType']['title']
                            print(error_msg)
                            return [], LivingStatus.ERROR.value
                        if "isLiving" in play_item:
                            status = play_item['isLiving']
                            print("living status: ", status)
                            if not status:
                                print("直播已经结束!")
                                return [], LivingStatus.STOP.value
                        if "liveStream" in play_item:
                            live_stream = play_item['liveStream']
                            if live_stream is not None and "playUrls" in live_stream:
                                play_urls = live_stream['playUrls']
                                if play_urls is not None:
                                    for play_url in play_urls:
                                        result.extend(play_url['adaptationSet']['representation'])
                                    filtered_list = [{'name': item['shortName'], 'url': item['url']} for item in result]
                                    return filtered_list, LivingStatus.Living.value
                                else:
                                    print("play_urls不存在")
                            else:
                                print("live_stream不存在")
                    else:
                        print("play_list不存在")
                else:
                    print("live_room不存在")
            else:
                print("未找到匹配的内容")
    return [], LivingStatus.ERROR.value

def save_video_slice(user_agent, stream_data):
    real_url = stream_data[0]['url']

    analyzeduration = "20000000"
    probesize = "10000000"
    bufsize = "8000k"
    max_muxing_queue_size = "1024"

    ffmpeg_command = [
        'ffmpeg', "-y",
        "-v", "verbose",
        "-rw_timeout", "30000000",
        "-loglevel", "error",
        "-hide_banner",
        "-user_agent", user_agent,
        "-protocol_whitelist", "rtmp,crypto,file,http,https,tcp,tls,udp,rtp",
        "-thread_queue_size", "1024",
        "-analyzeduration", analyzeduration,
        "-probesize", probesize,
        "-fflags", "+discardcorrupt",
        "-i", real_url,
        "-bufsize", bufsize,
        "-sn", "-dn",
        "-reconnect_delay_max", "60",
        "-reconnect_streamed", "-reconnect_at_eof",
        "-max_muxing_queue_size", max_muxing_queue_size,
        "-correct_ts_overflow", "1",
    ]

    now = time.strftime("%Y-%m-%d_%H-%M-%S", time.localtime())
    save_file_path = f"{now}_%03d.mp4"
    command = [
        "-c:v", "copy",
        "-c:a", "aac",
        "-map", "0",
        "-f", "segment",
        "-segment_time", "20",
        "-segment_time_delta", "0.01",
        "-segment_format", "mp4",
        "-reset_timestamps", "1",
        "-pix_fmt", "yuv420p",
        save_file_path,
    ]

    ffmpeg_command.extend(command)
    print("开始拉取数据流...")

    result = ' '.join(ffmpeg_command)
    print("result: \n", result)
    _output = subprocess.check_output(ffmpeg_command, stderr=subprocess.STDOUT)
    # 以下代码理论上不会执行
    print(_output)

if __name__ == '__main__':
    # https://live.kuaishou.com/u/3xf2ed9vrbqzr49
    # url = input('请输入快手直播链接:')
    # url = "https://live.kuaishou.com/u/3xf2ed9vrbqzr49"
    # url = "https://live.kuaishou.com/u/3xj6wf7ksgs2uru"
    url = "https://live.kuaishou.com/u/DD5221273500"
    # url = "https://live.kuaishou.com/u/haiwangqi"
    parsed_url = urlparse(url)
    # 移除查询参数
    url_without_query = urlunparse(parsed_url._replace(query=""))

    user_agent = "这里填写你的浏览器的user-agent,也可以伪造"

    try_times = 0
    while True:
        stream_url_list, ret_flag = get_stream_url(user_agent, url_without_query)
        if ret_flag == LivingStatus.STOP.value:
            print("直播已结束")
            break
        if ret_flag == LivingStatus.Living.value:
            print(stream_url_list)
            break
        try_times = try_times + 1
        if try_times > 10:
            print("获取直播流地址失败")
            sys.exit(-1)

    save_video_slice(user_agent, stream_url_list)

requirements.txt

requests
fake_useragent
beautifulsoup4

CookieUtil.py

from http.cookies import SimpleCookie

class CookieUtil:

    @staticmethod
    def cookies(session_cookies, lastest_cookie):
        old_cookies = CookieUtil.cookies_from_headers(session_cookies)
        CookieUtil.merge_cookies(old_cookies, lastest_cookie)
        return old_cookies

    @staticmethod
    def cookies_from_headers(session_cookies):
        cookies = {}
        for i in session_cookies:
            cookies[i.name.strip()] = i.value.strip()

        return cookies

    @staticmethod
    def cookies_to_string(cookies):
        return "; ".join([f"{key}={value}" for key, value in cookies.items()])

    @staticmethod
    def merge_cookies(old_cookies, new_cookies):
        for key, value in old_cookies.items():
            new_cookies.setdefault(key, value)

    @staticmethod
    def cookies_to_dict(cookie_string):
        cookie = SimpleCookie()
        cookie.load(cookie_string)
        cookie_dict = {key: morsel.value for key, morsel in cookie.items()}
        return cookie_dict

标签:cookies,快手,视频流,play,Python,live,cookie,url,print
From: https://blog.csdn.net/sh_moranliunian/article/details/143322938

相关文章

  • python基础(元组)
    学习目标:元组的概念,创建,访问元组元素,删除和修改,推导式一.元组的概念:1.元组和列表类似,元素用小括号()包围2.各元素之间使用逗号隔开3.有序的,不可变的数据类型,没有增删该查,如果要修改,可以采用其他的数据类型,给元组重新赋值4.元素可以是不同的数据类型二.创建元组1.创......
  • 深入Python爬虫技术:数据存储与反爬虫策略
    深入Python爬虫技术:数据存储与反爬虫策略在第一篇文章中,我们介绍了Python爬虫的基础知识和简单的网页内容获取方法。接下来,我们将继续深入学习,探讨如何有效地存储爬取的数据,以及应对网站的反爬虫机制。数据存储可以帮助我们积累并组织数据,而反爬虫策略则确保爬虫在不断变化......
  • 使用最小二乘法进行线性回归(Python)
    已知测得某块地,当温度处于15至40度之间时,数得某块草地上小花朵的数量和温度值的数据如下表所示。现在要来找出这些数据中蕴含的规律,用来预测其它未测温度时的小花朵的数量。测得数据如下图所示:importmatplotlib.pyplotaspltimportnumpyasnptemperatures=[15,20,......
  • 基于图论的时间序列数据平稳性与连通性分析:利用图形、数学和 Python 揭示时间序列数据
    时间序列数据表示了一个随时间记录的值的序列。理解这些序列内部的关系,尤其是在多元或复杂的时间序列数据中,不仅仅局限于随时间绘制数据点(这并不是说这种做法不好)。通过将时间序列数据转换为图,我们可以揭示数据片段内部隐藏的连接、模式和关系,帮助我们发现平稳性和时间连通......
  • 基于python的语音识别与蓝牙通信的温控系统
    基于python的语音识别与蓝牙通信的温控系统大家好我是小俊学长,混迹在java圈的辛苦码农。今天要和大家聊的是一款基于python的语音识别与蓝牙通信的温控系统。项目源码以及部署相关请联系小俊学长,文末附上联系信息。......
  • 【Python原创毕设|课设】基于Python Flask IT行业招聘可视化分析系统-文末附下载方式,
    基于PythonFlask物流行业招聘可视化分析系统(获取方式访问文末官网)一、项目简介二、开发环境三、项目技术四、功能结构五、运行截图六、数据库设计七、功能实现八、源码获取一、项目简介本系统是一款基于PythonFlask的IT行业招聘可视化分析平台,旨在为行业用户提供......
  • 1. Python 与 Matplotlib
    PyPlot绘图matplotlib的安装!pipinstallmatplotlibimportmatplotlibprint(matplotlib.__version__)#查看版本importmatplotlib.pyplotasplt#在图中从00到6250画一条直线默认绘制直线importnumpyasnpxpoints=np.array([0,6])#注意是两个x坐标......
  • python 入门九大排序:1冒泡排序2插入排序3选择排序4快速排序5归并排序6堆排序7计数排序
    1冒泡排序:冒泡排序是一种简单的排序算法,它重复地遍历要排序的数列,一次比较两个元素,如果它们的顺序错误就把它们交换过来。代码如下:importnumpyasnpdefbubbling(arr):n=len(arr)foriinrange(n-1):forjinrange(n-i-1):ifarr[j......
  • Python中的*args和**kwargs
    在Python编程中,函数的参数处理是一个非常重要的特性,它让代码更加灵活和强大。特别是在处理不确定数量的参数时,Python提供了两个非常有用的工具:*args和**kwargs。这两个特殊的参数使得函数能够接收任意数量的位置参数或关键字参数,从而极大地增加了函数的通用性和灵活性。*args用于......
  • D50【python 接口自动化学习】- python基础之类
    day50init方法学习日期:20241027学习目标:类--64init方法:如何为对象传递参数?学习笔记:魔术方法init方法classKlass(object):#定义初始化方法,类实例化时自动进行初始化def__init__(self,name,age):self.name=nameself.age=agede......