首页 > 其他分享 >02-requests

02-requests

时间:2024-02-11 09:56:34浏览次数:28  
标签:02 url res json https import requests

本节来学爬虫使用requests模块的常见操作。

1.URL参数

无论是在发送GET/POST请求时,网址URL都可能会携带参数,例如:http://www.5xclass.cn?age=19&name=wupeiqi

res = requests.get(
	url="https://www.5xclass.cn?age=19&name=wupeiqi"
)
res = requests.get(
	url="https://www.5xclass.cn",
    params={
        "age":19,
        "name":"wupeiqi"
    }
)

案例:花瓣美女

# @课程   : 爬虫逆向实战课
# @讲师   : 武沛齐
# @课件获取: wupeiqi666

import requests
import json

res = requests.get(
    url="https://api.huaban.com/search/file?text=%E7%BE%8E%E5%A5%B3&sort=all&limit=40&page=1&position=search_pin&fields=pins:PIN,total,facets,split_words,relations"
)

data_dict = json.loads(res.text)
pin_list = data_dict["pins"]
for item in pin_list:
    print(item['user']['username'], item['raw_text'])
# @课程   : 爬虫逆向实战课
# @讲师   : 武沛齐
# @课件获取: wupeiqi666

import requests
import json

res = requests.get(
    url="https://api.huaban.com/search/file",
    params={
        "text": "美女",
        "sort":"all",
        "limit": 40,
        "page": 1,
        "position": "search_pin",
        "fields": "pins:PIN,total,facets,split_words,relations"
    }
)

data_dict = json.loads(res.text)
pin_list = data_dict["pins"]
for item in pin_list:
    print(item['user']['username'], item['raw_text'])

2.请求体格式

在发送POST请求时候,常见的请求体格式一般有二种:

  • form表单格式(抽屉新热榜)

    name=wupeiqi&age=18&size=99
    
    特征:
    	1.谷歌浏览器抓包 Form Data
        2.请求头 Content-Type:application/x-www-form-urlencoded; charset=UTF-8
    
  • json格式(腾讯课堂)

    {"name":"wupeiqi","age":18,"size":99}
    
    特征:
    	1.谷歌浏览器抓包 Request Payload
    	2.请求头 Content-Type:application/json;charset=utf-8
    

2.1 form表单格式

res = requests.post(
    url="...",
    data="name=wupeiqi&age=18&size=99",
    headers={
        "Content-Type": "application/x-www-form-urlencoded; charset=UTF-8"
    }
)
res = requests.post(
    url="...",
    data={
        "name":"wupeiqi",
        "age":18,
        "size":19
    },
    headers={
        "Content-Type": "application/x-www-form-urlencoded; charset=UTF-8"
    }
)

案例:福州搜索

https://www.fuzhou.gov.cn/ssp/main/search.html?siteId=402849946077df37016077eea95e002f

# @课程   : 爬虫逆向实战课
# @讲师   : 武沛齐
# @课件获取: wupeiqi666

import requests

res = requests.post(
    url="https://www.fuzhou.gov.cn/ssp/search/api/search?time=1701392708022",
    data="siteType=1&mainSiteId=402849946077df37016077eea95e002f&siteId=402849946077df37016077eea95e002f&type=0&page=1&rows=10&historyId=48908a988b85d1ee018c22e8a6e242c0&sourceType=SSP_ZHSS&isChange=0&fullKey=N&wbServiceType=13&fileType=&fileNo=&pubOrg=&themeType=&searchTime=&startDate=&endDate=&sortFiled=RELEVANCE&searchFiled=&dirUseLevel=&issueYear=&issueMonth=&allKey=&fullWord=&oneKey=&notKey=&totalIssue=&chnlName=&zfgbTitle=&zfgbContent=&zfgbPubOrg=&zwgkPubDate=&zwgkDoctitle=&zwgkDoccontent=&zhPubOrg=1&keyWord=%E7%BC%96%E7%A8%8B&pubOrgType=&zhuTiIdList=&feaTypeName=&jiGuanList=&publishYear=",
    headers={
        "Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36",
    }
)
print(res.text)
# @课程   : 爬虫逆向实战课
# @讲师   : 武沛齐
# @课件获取: wupeiqi666

import requests

res = requests.post(
    url="https://www.fuzhou.gov.cn/ssp/search/api/search?time=1701392708022",
    data={
        "siteType": "1",
        "mainSiteId": "402849946077df37016077eea95e002f",
        "siteId": "402849946077df37016077eea95e002f",
        "type": "0",
        "page": "1",
        "rows": "10",
        "historyId": "48908a988b85d1ee018c22e8a6e242c0",
        "sourceType": "SSP_ZHSS",
        "isChange": "0",
        "fullKey": "N",
        "wbServiceType": "13",
        "fileType": "",
        "fileNo": "",
        "pubOrg": "",
        "themeType": "",
        "searchTime": "",
        "startDate": "",
        "endDate": "",
        "sortFiled": "RELEVANCE",
        "searchFiled": "",
        "dirUseLevel": "",
        "issueYear": "",
        "issueMonth": "",
        "allKey": "",
        "fullWord": "",
        "oneKey": "",
        "notKey": "",
        "totalIssue": "",
        "chnlName": "",
        "zfgbTitle": "",
        "zfgbContent": "",
        "zfgbPubOrg": "",
        "zwgkPubDate": "",
        "zwgkDoctitle": "",
        "zwgkDoccontent": "",
        "zhPubOrg": "1",
        "keyWord": "编程",
        "pubOrgType": "",
        "zhuTiIdList": "",
        "feaTypeName": "",
        "jiGuanList": "",
        "publishYear": ""
    },
    headers={
        "Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36",
    }
)
print(res.text)

2.1 json格式

res = requests.post(
    url="...",
    data=json.dumps(  {"name":"wupeiqi","age":18,"size":99}  ),
    headers={
        "Content-Type": "application/json;charset=utf-8"
    }
)
res = requests.post(
    url="...",
    json={"name":"wupeiqi","age":18,"size":99},
)

案例:腾讯课堂

# @课程   : 爬虫逆向实战课
# @讲师   : 武沛齐
# @课件获取: wupeiqi666

import json
import requests

res = requests.post(
    url="https://ke.qq.com/cgi-proxy/course_list/search_course_list?bkn=&r=0.1649",
    data=json.dumps({"mt":"1001","st":"2056","page":"2","visitor_id":"9340935770592473","finger_id":"a9d61dde57ac8f4694860da1e9952a3b","platform":3,"source":"search","count":24,"need_filter_contact_labels":1}),
    headers={
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36",
        "Referer":"https://ke.qq.com/course/list?mt=1001&quicklink=1&st=2056&page=2",
        "Content-Type":"application/json;charset=utf-8"
    }
)

print(res.text)
# @课程   : 爬虫逆向实战课
# @讲师   : 武沛齐
# @课件获取: wupeiqi666

import requests


res = requests.post(
    url="https://ke.qq.com/cgi-proxy/course_list/search_course_list?bkn=&r=0.1649",
    json={"mt":"1001","st":"2056","page":"2","visitor_id":"9340935770592473","finger_id":"a9d61dde57ac8f4694860da1e9952a3b","platform":3,"source":"search","count":24,"need_filter_contact_labels":1},
    headers={
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36",
        "Referer":"https://ke.qq.com/course/list?mt=1001&quicklink=1&st=2056&page=2"
    }
)

print(res.text)

3.Cookie

Cookie本质上是浏览器存储的键值对,一般用于用户凭证的保存。

  • 浏览器向后端发送请求时,后端可以返回cookie(自动保存在浏览器)。
  • 后续浏览器再次返送请求时,会自动携带cookie。

image-20231201093136579

image-20231201093209106

读取返回的Cookie:

import requests

res = requests.get(
    url="https://www.bilibili.com/"
)
cookie_dict = res.cookies.get_dict()
print(cookie_dict)   # {"v1":123,"v3":456}

发送请求时携带cookie:

import requests

res = requests.get(
    url="https://www.bilibili.com/",
    headers={
        "Cookie":"innersign=0; buvid3=8427E089-F4D7-CCF7-4997-0087D04B3C9810575infoc"
    }
)
import requests

res = requests.get(
    url="https://www.bilibili.com/",
    cookies={
        "innersign":"0",
        "buvid3":"8427E089-F4D7-CCF7-4997-0087D04B3C9810575infoc"
    }
)

案例:B站账户信息

# @课程   : 爬虫逆向实战课
# @讲师   : 武沛齐
# @课件获取: wupeiqi666

import requests

res = requests.get(
    url="https://api.bilibili.com/x/member/web/account?web_location=333.33",
    headers={
        "Cookie": "自己的cookie",
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36",
    }
)

print(res.text)

4.响应体格式

基于requests发送请求后,返回的数据都封装在了res对象中,例如:

# @课程   : 爬虫逆向实战课
# @讲师   : 武沛齐
# @课件获取: wupeiqi666

import requests
import json

res = requests.get(
    url="https://api.huaban.com/search/file?text=%E7%BE%8E%E5%A5%B3&sort=all&limit=40&page=1&position=search_pin&fields=pins:PIN,total,facets,split_words,relations"
)

# 原始响应体(字节类型)
res.content

# 原始文本,将字节转换成字符串形式
res.text

# 如果返回是JSON格式,可以自动转化json格式。   即:data = json.loads(res.text)   注意:{"xxx":123}       <html></asdfasdfadf</>
data = res.json()

4.1 原始字节

一般用于 图片、文件、视频 等下载时,获取原始数据。

https://huaban.com/boards/32310016

import requests

res = requests.get(
    url="https://gd-hbimg.huaban.com/b93fcc5bb4751934bbd56918bdab8184966dca2974df1-bo7qSF",
)
print(res.content)

with open("v1.png", mode='wb') as f:
    f.write(res.content)

https://www.douyin.com/video/7291625134530579747?modeFrom=

import requests

res = requests.get(
    url="https://v3-web.douyinvod.com/4f17c475df0a484a41fa1abe00f43aaa/65695cf6/video/tos/cn/tos-cn-ve-15c001-alinc2/oIrIAZg9ChRRquVyAYQdESIxNWAQzBACtzJemf/?a=6383&ch=0&cr=0&dr=0&er=0&cd=0%7C0%7C0%7C0&cv=1&br=1354&bt=1354&cs=0&ds=4&ft=GN7rKGVVyw3XRZ_8emo~xj7ScoAp9656EvrK-iBTkto0g3&mime_type=video_mp4&qs=0&rc=ZWU7NTo3NDc0ZDM2ODQ6OkBpM2c8ODw6Zm9qbjMzNGkzM0AxNS9eY2BjNTQxYGIvYDMwYSNgZzVpcjRnamxgLS1kLS9zcw%3D%3D&btag=e00030000&dy_q=1701400015&feature_id=46a7bb47b4fd1280f3d3825bf2b29388&l=20231201110655C26B991C0F271857AB12",
)
# print(res.content)

with open("v1.mp4", mode='wb') as f:
    f.write(res.content)

4.2 普通文本

import requests

res = requests.get(
	url="https://www.5xclass.cn?age=19&name=wupeiqi"
)
print(res.text)

# 输出
<!DOCTYPE html>
<html lang="en">
<head>
...
import requests

res = requests.get(
    url="https://movie.douban.com/j/search_subjects?type=movie&tag=%E8%B1%86%E7%93%A3%E9%AB%98%E5%88%86&page_limit=50&page_start=0",
    headers={
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36"
    }
)

print(res.text)

# 输出
{"subjects":[{"episodes_info":"","rate":"9.7","cover_x":2000,"title":"肖申克的救赎"...

4.2 转换格式

对于json格式,为了更方便的获取内部元素,可以转换成python的字典或列表等类型。

import requests

res = requests.get(
    url="https://movie.douban.com/j/search_subjects?type=movie&tag=%E8%B1%86%E7%93%A3%E9%AB%98%E5%88%86&page_limit=50&page_start=0",
    headers={
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36"
    }
)

# 手动转换
import json
data_dict = json.loads(res.text)

# 内部自动转换
data_dict = res.json()

标签:02,url,res,json,https,import,requests
From: https://www.cnblogs.com/fuminer/p/18013179

相关文章

  • 2024春晚刘谦魔术揭秘
    2024春晚刘谦魔术揭秘!魔术步骤任意4张扑克牌,叠在一起对半撕开,再叠在一起名字有几个字,就把几张扑克,依次放到最下面再将最上面3张,插到剩下扑克牌的中间任意位置拿出最上面一张扑克牌藏起来任意拿出一张、两张或三张扑克牌,再插到剩下扑克牌的中间任意位置如果是男生拿一张,如......
  • 寒假训练 2024/2/11凌晨
    紫书uva437标签:二位偏序,区间dp题意:给$n$种长方体,每种有无限块,要求罗列最高的高度。限制条件是在下面的长方体的长和宽要严格大于上面的。思路:思路很简单,题目给的$n的范围[1,50]$,模拟一下我们可以推断,每一种长方体有$A_3^{3}=6$种排列方式,我们把每一种的六种排列方式......
  • 2024/2/10学习进度笔记
    RDD,学名可伸缩的分布式数据集(ResilientDistributedDataset)。是一种对数据集形态的抽象,基于此抽象,使用者可以在集群中执行一系列计算,而不用将中间结果落盘。而这正是之前MR抽象的一个重要痛点,每一个步骤都需要落盘,使得不必要的开销很高。对于分布式系统,容错支持是必不可少的。......
  • P1102 A-B 数对
    原题链接解法一:二分搜素首先我们知晓A-B=C,那么A=B+C,我们只需要遍历数组中的每一个元素然后在数组中搜素a[i]+c的值是否存在即可。Code #include<bits/stdc++.h>usingnamespacestd;typedeflonglongll;constintN=2e5+5;lla[N];intmain(){intn,c;l......
  • 【学习笔记】李宏毅 2023 春机器学习课程听课记录
    1ChatGPT原理剖析ChatGPT的社会化:学会文字接龙人类引导文字接龙方向模仿人类喜好用增强式学习向模拟老师学习1.1预训练(Pre-train)ChatGPT真正在做的事情本质上是文字接龙,将其看成一个函数\(f(x)\),其中的\(x\)自变量可以是用户输入的一个语句,得到的函数就是接下来......
  • BeginCTF 2024(自由赛道)MISC
    realcheckin题目:从catf1y的笔记本中发现了这个神秘的代码MJSWO2LOPNLUKTCDJ5GWKX3UN5PUEM2HNFXEGVCGL4ZDAMRUL5EDAUDFL5MU6VK7O5UUYMK7GEYWWZK7NE3X2===你能帮助我找到最后的flag吗?我的解答:base32解码begin{WELCOMe_to_B3GinCTF_2024_H0Pe_YOU_wiL1_11ke_i7}下一站上岸......
  • 2024年应该关注的十大人工智能创新
    人工智能(AI)不再只是一个流行词,它已成为我们日常生活的重要组成部分。人工智能在去年深入地融入我们社会的各个方面,改变我们的生活方式、工作方式以及与技术互动的方式。今年是大年初一,我们将探讨2024年可能出现的十大人工智能创新,拥抱这些即将到来的人工智能创新,可以为一个充满激......
  • V的2023 - 虽迟但到
    一年一度的年终总结又来了,只感叹时光飞逝,这几年的年终总结对我来说就好像是一年写一次的日记,记忆会模糊,但文字不会。这是第四年,每次回首前几年的文章,心中都会有一丝触动,原来生活给我留下过深刻痕迹。V的2022-往事随风不平凡的2021年ykCoder的2020年年终总结爬山散心二......
  • 关于刘谦2024春晚的数学游戏原理
    自己想出来的!首先牌的顺序肯定是形如\(ABCDABCD\)。将牌的顺序考虑成一个字符环。按照名字长度对该字符环进行左移,本质上没有打乱这个环的顺序。因此在置换后,牌的顺序还是会形如\(ABCDABCD\)。将前三张随机放到牌堆中间,我们发现此时牌堆顶和牌堆底的两张牌是一样的。因此......
  • 2024.2.8&2024.2.9
    1.重写是子类对父类的允许访问的方法的实现过程进行重新编写,返回值和形参都不改变。即外科不变,核心重写。重写的好处在于,子类可以根据需求,定义特定于自己的行为。也就是说子类可以根据需求实现父类的方法。重写方法不能抛出新的检查异常或者比被重写方法更加宽泛的异常。例如:父......