前言
在我们写爬虫的过程中,cookie一般是我们最经常接触到的东西。而由于在爬虫过程中的各个阶段的难度往往不同,所以我们很多时候会采用浏览器、requests等等各种方案来在采集效率和开发难度中取得平衡。因此经常会遇到cookie的各种互相转换。
Cookie的转换
- 字符串string形式转换为字典dict形式
from http.cookies import SimpleCookie
import json
cookie_str = """PHPSESSID=ufl7bh3adse15vvks0kusmgt92; ezoadgid_55920=-1; ezoref_55920=google.com"""
cookie = SimpleCookie()
cookie.load(cookie_str)
cookie_dict = {k: v.value for k, v in cookie.items()}
print(json.dumps(cookie_dict,indent=2))
结果为
{
"PHPSESSID": "ufl7bh3adse15vvks0kusmgt92",
"ezoadgid_55920": "-1",
"ezoref_55920": "google.com"
}
- selenium的name-value形式cookie转字典dict的cookie
from selenium import webdriver
import json
browser = webdriver.Chrome()
browser.get("https:///www.baidu.com")
browser_cookie = browser.get_cookies()
dict_cookie = {}
for c in browser_cookie:
dict_cookie[c['name']] = c['value']
print(json.dumps(dict_cookie,indent=2))
结果为
{
"ZFY": "g3:AuUuEFuAF1L0Zpt9:B3:BSIIYEoAucw7cHt4QJFly9s:C",
"BAIDUID_BFESS": "64BC7B5CD127208685D64C19DB5A01FA:FG=1",
"BA_HECTOR": "2pak8l2k000g2gck80000lj81htjm6j1l",
"H_PS_PSSID": "36549_38105_38094_37907_37989_37800_37925_38086_26350_38101_38008_37881",
"BAIDUID": "64BC7B5CD127208685D64C19DB5A01FA:FG=1",
"BIDUPSID": "64BC7B5CD1272086BBA474D012BB5674",
"PSTM": "1675221202",
"BD_UPN": "12314753",
"BD_HOME": "1"
}
- 字典dict形式的cookie转requests cookie_jar
from requests.cookies import cookiejar_from_dict
dict_cookie = {
"ZFY": "g3:AuUuEFuAF1L0Zpt9:B3:BSIIYEoAucw7cHt4QJFly9s:C",
"BAIDUID_BFESS": "64BC7B5CD127208685D64C19DB5A01FA:FG=1",
"BA_HECTOR": "2pak8l2k000g2gck80000lj81htjm6j1l",
"H_PS_PSSID": "36549_38105_38094_37907_37989_37800_37925_38086_26350_38101_38008_37881",
"BAIDUID": "64BC7B5CD127208685D64C19DB5A01FA:FG=1",
"BIDUPSID": "64BC7B5CD1272086BBA474D012BB5674",
"PSTM": "1675221202",
"BD_UPN": "12314753",
"BD_HOME": "1"
}
cookie_jar = cookiejar_from_dict(dict_cookie)
标签:BD,cookie,浅谈,爬虫,BAIDUID,dict,import,browser
From: https://www.cnblogs.com/dingnosakura/p/17083255.html