首先,playwright修改GET/POST请求参数后在浏览器(chromium)的network面板的入参是没有变化的,但实际上传给服务端的参数是已经发生变化了的,下面先搭建了一个返回入参的flask服务,地址为"http://127.0.0.1:8083"。
接着通过playwright分别发送GET和POST请求,参数均为:{"key1": "value1", "key2": "value2"},并实现如下功能:
1、将GET请求的key1的值修改为“GET”;
2、将POST请求的key1的值修改为“POST”。
首先需要一个方法处理GET/POST的参数,将key1的值做修改:
async def handle_route(route): url = route.request.url if route.request.method == "GET": print(f"GET请求url为:{route.request.url}") bits = list(parse.urlparse(url)) qs = parse.parse_qs(bits[4]) qs["key1"] = ["GET"] # 此处替换key1的值 bits[4] = parse.urlencode(qs, True) url = parse.urlunparse(bits) print(f"改变后的url为: {url}\n") await route.continue_(url=url) elif route.request.method == "POST": print(f"POST请求入参为:{route.request.post_data}") text_list = route.request.post_data.split("&") for i in range(len(text_list)): text_item = text_list[i] if "key1=" in text_item: text_list[i] = "key1=POST" # 此处替换key1的值 print(f"改变后的数据为: {'&'.join(text_list)}\n") await route.continue_(post_data="&".join(text_list))
然后就可以通过playwright的contexts.route或page.route对请求进行拦截处理,代码如下:
async def main(): url = "http://127.0.0.1:8083" async with async_playwright() as p: browser = await p.chromium.launch(headless=False) context = await browser.new_context() page = await context.new_page() await context.route("*/**", lambda route: handle_route(route)) await page.goto(f"{url}?key1=value1&key2=value2") await page.wait_for_load_state('networkidle') print("--新建窗口,使用js执行post请求--") page = await context.new_page() await page.evaluate( """ //发送POST请求跳转到指定页面 function httpPost(URL, PARAMS) { var temp = document.createElement("form"); temp.action = URL; temp.method = "post"; temp.style.display = "none"; for (var x in PARAMS) { var opt = document.createElement("textarea"); opt.name = x; opt.value = PARAMS[x]; temp.appendChild(opt); } document.body.appendChild(temp); temp.submit(); return temp; } httpPost('""" + url + """', {"key1": "value1", "key2": "value2"}) """) await page.wait_for_timeout(1000) input("任意键关闭浏览器") await browser.close() await p.stop() if __name__ == "__main__": asyncio.run(main())
拦截更改网络请求
可以通过 page.on("request") 和 page.on("response") 来监听请求和响应事件。
from playwright.sync_api import sync_playwright as playwright def run(pw): browser = pw.webkit.launch() page = browser.new_page() # Subscribe to "request" and "response" events. page.on("request", lambda request: print(">>", request.method, request.url)) page.on("response", lambda response: print("<<", response.status, response.url)) page.goto("https://example.com") browser.close() with playwright() as pw: run(pw)
其中 request 和 response 的属性和方法,可以查阅文档:https://playwright.dev/python/docs/api/class-request
通过 context.route, 还可以伪造修改拦截请求等。比如说,拦截所有的图片请求以减少带宽占用:
context = browser.new_context() page = context.new_page() # route 的参数默认是通配符,也可以传递编译好的正则表达式对象 context.route("**/*.{png,jpg,jpeg}", lambda route: route.abort()) context.route(re.compile(r"(\.png$)|(\.jpg$)"), lambda route: route.abort()) page.goto("https://example.com") browser.close()
其中 route 对象的相关属性和方法,可以查阅文档:https://playwright.dev/python/docs/api/class-route
灵活设置代理
Playwright 还可以很方便地设置代理。Puppeteer 在打开浏览器之后就无法在更改代理了,对于爬虫类应用非常不友好,而 Playwright 可以通过 Context 设置代理,这样就非常轻量,不用为了切换代理而重启浏览器。
context = browser.new_context( proxy={"server": "http://example.com:3128", "bypass": ".example.com", "username": "", "password": ""} )
最终服务端接收到的GET请求结果为:
POST请求结果为:
标签:playwright,GET,url,route,await,request,修改,context,page From: https://www.cnblogs.com/Im-Victor/p/17786741.html