我想设置一个脚本来从输入 URL 后生成的网站导出链接。相关网站是 pagespeed.web.dev。我的知识为零,所以虽然我知道这不是最好的选择,但我还是向 ChatGPT 寻求帮助。看起来只用 1 个 URL 就可以很好地完成所有事情,但一旦我尝试做 5 个 URL,它就崩溃了。注意:据我了解,我不是数据抓取,您只需在框中输入 URL,单击“分析”,然后使用按钮复制链接。
这是代码本身:
from selenium import webdriver
from selenium.webdriver.firefox.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
import pyperclip # For clipboard operations
# Replace with the path to your GeckoDriver
geckodriver_path = r'C:\Users\*****.OSHS\Documents\geckodriver.exe'
# Replace with the URL of the website performance tool
website_url = 'https://pagespeed.web.dev/'
# Replace with the placeholder text for the input box and the export button
input_box_placeholder = 'Enter a web page URL'
analyze_button_xpath = '/html/body/c-wiz/div[2]/div/div[2]/form/div[2]/button/span' # Ensure this XPath correctly identifies the button
copy_button_xpath = '/html/body/header/span/div[1]/button/span' # Adjust this XPath if needed
# Replace with your website URL
website_to_monitor = [
'https://www.ohiostatewaterproofing.com',
'https://www.basementwaterproofing.com',
'https://www.everdrywaterproofinglouisville.com/',
'https://www.stablwall.com',
'https://www.everdrycolumbus.com'
]
# Initialize the WebDriver for Firefox
service = Service(executable_path=geckodriver_path)
driver = webdriver.Firefox(service=service)
# File to save the results
results_file = 'website_performance_reports.txt'
def get_report_link(driver, website_url):
try:
# Open the website performance tool
driver.get(website_url)
# Initialize WebDriverWait
wait = WebDriverWait(driver, 30)
# Wait for the input box by placeholder text and then find it
input_box = wait.until(EC.presence_of_element_located((By.XPATH, f'//input[@placeholder="{input_box_placeholder}"]')))
input_box.send_keys(website_to_monitor)
# Wait for the analyze button to be clickable and then find it
analyze_button = wait.until(EC.element_to_be_clickable((By.XPATH, analyze_button_xpath)))
analyze_button.click()
# Wait for the analysis to complete (adjust the sleep duration as needed)
time.sleep(30) # Adjust this as needed based on the website's performance
# Simulate the click to copy the link
copy_button_xpath = '/html/body/header/span/div[1]/button/span' # Replace with the actual XPath for the copy button
copy_button = wait.until(EC.element_to_be_clickable((By.XPATH, copy_button_xpath)))
copy_button.click()
# Get the copied link from the clipboard
report_link = pyperclip.paste()
return report_link
except Exception as e:
print(f"Error retrieving report for {website_url}: {e}")
return None
def save_results(results):
with open(results_file, 'w') as file:
for url, report_link in results:
file.write(f"Website: {url}\n")
file.write(f"Report Link: {report_link}\n")
file.write("----\n")
def main():
# Initialize the WebDriver for Firefox
service = Service(executable_path=geckodriver_path)
driver = webdriver.Firefox(service=service)
results = []
for website in website_to_monitor:
print(f"Processing {website}...")
report_link = get_report_link(driver, website)
results.append((website, report_link))
# Save all results to a file
save_results(results)
driver.quit()
print(f"All reports saved to {results_file}")
if __name__ == "__main__":
main()
屏蔽了我的用户名,但是该路径有效
我尝试使用 ChatGPT 并继续使用其建议来修复问题,但似乎每个修复都没有帮助。这是我从终端得到的错误:
Error retrieving report for https://www.ohiostatewaterproofing.com: Message:
Stacktrace:
RemoteError@chrome://remote/content/shared/RemoteError.sys.mjs:8:8
WebDriverError@chrome://remote/content/shared/webdriver/Errors.sys.mjs:193:5
NoSuchElementError@chrome://remote/content/shared/webdriver/Errors.sys.mjs:511:5
dom.find/</<@chrome://remote/content/shared/DOM.sys.mjs:136:16
没有 try/ except 语句的代码错误:
File "C:\Users\*****.OSHS\OSW-Test.py", line 31, in <module>
driver = webdriver.Firefox(service=service)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\*****.OSHS\AppData\Roaming\Python\Python312\site-packages\selenium\webdriver\firefox\webdriver.py", line 57, in __init__
if finder.get_browser_path():
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\*****.OSHS\AppData\Roaming\Python\Python312\site-packages\selenium\webdriver\common\driver_finder.py", line 47, in get_browser_path
return self._binary_paths()["browser_path"]
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\*****.OSHS\AppData\Roaming\Python\Python312\site-packages\selenium\webdriver\common\driver_finder.py", line 78, in _binary_paths
raise NoSuchDriverException(msg) from err
selenium.common.exceptions.NoSuchDriverException: Message: Unable to obtain driver for firefox; For documentation on this error, please visit: https://www.selenium.dev/documentation/webdriver/troubleshooting/errors/driver_location
错误信息“selenium.common.exceptions.NoSuchDriverException: Message: Unable to obtain driver for firefox”表示 Selenium 找不到 GeckoDriver。 以下是如何解决此问题的方法:
1. 下载 GeckoDriver * 从官方网站下载 GeckoDriver:https://github.com/mozilla/geckodriver/releases * 选择与系统相对应的版本(Windows、macOS 或 Linux)。 * 下载后,解压缩文件。
2. 将 GeckoDriver 添加到的系统路径 需要将 GeckoDriver 可执行文件的位置添加到的系统 PATH 环境变量中。可以这样做:
-
选项 1:将 GeckoDriver 移动到系统路径中已有的目录
-
例如,可以将
geckodriver.exe
移动到C:\Windows\System32
文件夹。
-
例如,可以将
-
选项 2:将 GeckoDriver 路径添加到 PATH 环境变量
-
Windows:
- 在 Windows 搜索栏中搜索“环境变量”。
- 选择“编辑系统环境变量”。
- 点击“环境变量”按钮。
- 在“系统变量”部分中,选择“Path”变量并点击“编辑”。
-
点击“新建”并输入 GeckoDriver 解压缩到的文件夹的完整路径。例如:
C:\path\to\geckodriver.exe
- 点击“确定”保存更改。
-
Windows:
3. 重启的 IDE 或文本编辑器 完成上述步骤后,请确保重新启动正在使用的任何 IDE 或文本编辑器,以使更改生效。
其他建议:
*
检查 GeckoDriver 版本:
确保下载的 GeckoDriver 版本与的 Firefox 浏览器版本兼容。 可以通过访问
about:support
在 Firefox 中查看的浏览器版本。
*
使用 WebDriverManager:
WebDriverManager 是一个库,可以自动管理的 WebDriver 可执行文件。 可以使用 pip 安装它:
pip install webdriver-manager
*
代码中的路径:
确保在代码中指定的 GeckoDriver 路径是正确的。 它应该是 GeckoDriver 可执行文件所在的完整路径,包括文件名 (
geckodriver.exe
)。
完成上述步骤后,重新运行的代码,它应该可以正常工作并找到 GeckoDriver。
标签:python,python-3.x,selenium-webdriver From: 78818298