首先,我不是开发人员,所以使用 ai 为我生成一个代码以从网页获取 xhr 请求,即: https://www.oddsportal.com/football/brazil/serie-a/bragantino -athletico-pr-xx0ujiJ5/ 这只是一个示例。我想从该页面上的 xhr 请求获取分数,而不是使用其他方法(例如使用类等定位它)。
对我来说最有趣的是,当您打开网页时,我想要获取的网络请求不会出现,直到你点击另一个选项卡或桌面并返回目标页面,就会出现我需要的请求。该请求的响应将为我提供比赛得分数据。
这是硒代码,它捕获一些请求,但不是我要查找的请求。代码会告诉你我这里的情况。由于我所在国家/地区的网页的可访问性,我必须仅使用带有 chrome 开发工具的 Opera 浏览器。提供此信息,但我不相信情况是这样,因为此代码正在获取一些请求,并将它们列在终端上。如果有人模拟这个,请帮助我。
下面是我的py文件的代码。我期望从代码中捕获请求名称:1-xx0ujiJ5-yj3ae.dat,其网址为: https://www.oddsportal.com/feed/postmatch-score/1-xx0ujiJ5-yj3ae.dat
import json
import time
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
Paths
driver_path = r'C:\Users\tugbe\Desktop\basit_oddsportal\drivers\chromedriver-win64\chromedriver.exe'
binary_path = r'C:\Users\tugbe\AppData\Local\Programs\Opera\opera.exe'
Set up Chrome options for Opera
options = Options()
options.binary_location = binary_path
options.add_argument('--ignore-certificate-errors')
options.add_argument('--ignore-ssl-errors')
options.add_argument("--auto-open-devtools-for-tabs") # Automatically open DevTools
options.debugger_address = "127.0.0.1:9222" # Connect to the existing Opera instance
Enable Performance logging
options.set_capability('goog:loggingPrefs', {'performance': 'ALL'})
Initialize the WebDriver service
service = Service(driver_path)
Initialize the WebDriver with options and capabilities
driver = webdriver.Chrome(service=service, options=options)
Enable Network logging
driver.execute_cdp_cmd('Network.enable', {})
def capture_network_logs(duration=60):
start_time = time.time()
captured_urls = set()
print(f"Capturing network logs for {duration} seconds...")
while time.time() - start_time < duration:
logs = driver.get_log('performance')
for entry in logs:
log = json.loads(entry['message'])['message']
if log['method'] == 'Network.requestWillBeSent':
request = log['params']['request']
if 'url' in request and request['url'] not in captured_urls:
print(f"Request URL: {request['url']}")
print(f"Request Method: {request['method']}")
print(f"Request Headers: {json.dumps(request['headers'], indent=2)}")
print("=" * 80)
captured_urls.add(request['url'])
time.sleep(1) # Sleep for a short while before capturing logs again
try:
Open a new tab in the existing Opera window
driver.execute_script("window.open('about:blank', '_blank');")
driver.switch_to.window(driver.window_handles[-1])
Open the target URL in the new tab
url = 'https://www.oddsportal.com/football/brazil/serie-a/bragantino-athletico-pr-xx0ujiJ5/'
driver.get(url)
Wait until the page is loaded
time.sleep(10) # Wait for 10 seconds to ensure the page is fully loaded
Inspect an element to trigger the DevTools
driver.execute_script("document.querySelector('body').click();")
Wait a bit to ensure the element is inspected
time.sleep(5)
Refresh the page to ensure all requests are captured
driver.refresh()
Wait until the page is loaded again
time.sleep(10)
Click on the second opened tab
driver.switch_to.window(driver.window_handles[1])
Wait a bit in the second tab
time.sleep(5)
Switch back to the target tab
driver.switch_to.window(driver.window_handles[-1])
Wait a bit to ensure the switch is complete
time.sleep(5)
Capture and print network logs for 1 minute (60 seconds)
capture_network_logs(duration=60)
finally:
Close the WebDriver
driver.quit()
问题在于正在尝试在请求发生之前捕获请求。已经正确地识别出在单击远离页面然后返回之前不会加载包含分数数据的 XHR 请求。
当前的代码打开页面,等待,然后 开始 监听网络请求。需要更改代码的结构,以便在与页面交互以触发请求 之后 监听请求。
以下是如何修改代码以捕获所需 XHR 请求的方法:
import json
import time
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
# ...(现有的路径、选项和服务设置)...
driver = webdriver.Chrome(service=service, options=options)
driver.execute_cdp_cmd('Network.enable', {})
def capture_network_logs(duration=60):
start_time = time.time()
captured_urls = set()
print(f"捕获网络日志 {duration} 秒...")
while time.time() - start_time < duration:
logs = driver.get_log('performance')
for entry in logs:
log = json.loads(entry['message'])['message']
if log['method'] == 'Network.requestWillBeSent':
request = log['params']['request']
if 'url' in request and request['url'] not in captured_urls:
print(f"请求 URL: {request['url']}")
print(f"请求方法: {request['method']}")
print(f"请求标头: {json.dumps(request['headers'], indent=2)}")
print("=" * 80)
captured_urls.add(request['url'])
# 检查是否是目标请求
if '1-xx0ujiJ5-yj3ae.dat' in request['url']:
# 在此处处理请求,例如,提取分数数据
print("找到目标请求!")
time.sleep(1)
try:
# ...(现有的打开选项卡和加载 URL 的代码)...
# ***在切换选项卡之前***,开始捕获网络日志
capture_network_logs()
# ...(现有的切换选项卡和与页面交互的代码)...
finally:
driver.quit()
此代码中的更改是:
-
在与页面交互以触发 XHR 请求
之前
启动
capture_network_logs()
函数。 这可确保在请求发生时监听它。 -
在
capture_network_logs()
函数中,添加了一个条件来检查捕获的请求是否是正在寻找的请求。 如果是,则可以从请求中提取分数数据。
通过在与页面交互以触发请求 之前 开始监听网络请求,应该能够使用 Selenium 成功捕获所需的 XHR 请求。
标签:python,selenium-webdriver,xmlhttprequest From: 78790854