首页 > 编程语言 >python selenium 速查笔记

python selenium 速查笔记

时间:2024-04-02 17:33:55浏览次数:31  
标签:webdriver python selenium driver element import 速查 find

1.安装与配置

pip install selenium

基本使用selenium都是为了动态加载网页内容用于爬虫,所以一般也会用到phantomjs

mac下如果要配置phantomjs环境的话

echo $PATH

ln -s <phantomjs地址> <PATH中任一路径>

至于chromeDriver,配置方法类似,下载地址:

https://sites.google.com/a/chromium.org/chrom selenium import webdriver

2.代码样例

复制代码
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait # available since 2.4.0
from selenium.webdriver.support import expected_conditions as EC # available since 2.26.0

# Create a new instance of the Firefox driver
driver = webdriver.Firefox()

# go to the google home page
driver.get("http://www.google.com")

# the page is ajaxy so the title is originally this:
print driver.title

# find the element that's name attribute is q (the google search box)
inputElement = driver.find_element_by_name("q")

# type in the search
inputElement.send_keys("cheese!")

# submit the form (although google automatically searches now without submitting)
inputElement.submit()

try:
    # we have to wait for the page to refresh, the last thing that seems to be updated is the title
    WebDriverWait(driver, 10).until(EC.title_contains("cheese!"))

    # You should see "cheese! - Google Search"
    print driver.title

finally:
    driver.quit()
复制代码

3.api速查

3.1定位元素

3.1.1 通过id查找:

element = driver.find_element_by_id("coolestWidgetEvah")

or

from selenium.webdriver.common.by import By
element = driver.find_element(by=By.ID, value="coolestWidgetEvah")

3.1.2 通过class查找

cheeses = driver.find_elements_by_class_name("cheese")

or

from selenium.webdriver.common.by import By
cheeses = driver.find_elements(By.CLASS_NAME, "cheese")

3.1.3 通过标签名称查找

target_div = driver.find_element_by_tag_name("div")

or

from selenium.webdriver.common.by import By
target_div = driver.find_element(By.TAG_NAME, "div")

3.1.4 通过name属性查找

btn = driver.find_element_by_name("input_btn")

or

from selenium.webdriver.common.by import By
btn = driver.find_element(By.NAME, "input_btn")

3.1.5 通过链接的内容查找

next_page = driver.find_element_by_link_text("下一页")

or

from selenium.webdriver.common.by import By
next_page = driver.find_element(By.LINK_TEXT, "下一页")

3.1.6 通过链接的部分内容查找

next_page = driver.find_element_by_partial_link_text("去下一页")

or

from selenium.webdriver.common.by import By
next_page = driver.find_element(By.PARTIAL_LINK_TEXT, "下一页")

3.1.7 通过css查找

cheese = driver.find_element_by_css_selector("#food span.dairy.aged")

or

from selenium.webdriver.common.by import By
cheese = driver.find_element(By.CSS_SELECTOR, "#food span.dairy.aged")

3.1.8 通过xpath查找

inputs = driver.find_elements_by_xpath("//input")

or

from selenium.webdriver.common.by import By
inputs = driver.find_elements(By.XPATH, "//input")

3.1.9 通过js查找

labels = driver.find_elements_by_tag_name("label")
inputs = driver.execute_script(
    "var labels = arguments[0], inputs = []; for (var i=0; i < labels.length; i++){" +
    "inputs.push(document.getElementById(labels[i].getAttribute('for'))); } return inputs;", labels)

3.2 获取元素的文本信息

element = driver.find_element_by_id("element_id")
element.text

3.3 修改userAgent

profile = webdriver.FirefoxProfile()
profile.set_preference("general.useragent.override", "some UA string")
driver = webdriver.Firefox(profile)

3.4 cookies 

复制代码
# Go to the correct domain
driver.get("http://www.example.com")

# Now set the cookie. Here's one for the entire domain
# the cookie name here is 'key' and its value is 'value'
driver.add_cookie({'name':'key', 'value':'value', 'path':'/'})
# additional keys that can be passed in are:
# 'domain' -> String,
# 'secure' -> Boolean,
# 'expiry' -> Milliseconds since the Epoch it should expire.

# And now output all the available cookies for the current URL
for cookie in driver.get_cookies():
    print "%s -> %s" % (cookie['name'], cookie['value'])

# You can delete cookies in 2 ways
# By name
driver.delete_cookie("CookieName")
# Or all of them
driver.delete_all_cookies()
复制代码

最后放一个自己的代码样例好了,完成的功能为找到搜索框输入搜索关键词然后点击搜索按钮,然后打开每个搜索结果并且输出网页源代码

复制代码
# coding=utf-8
import time
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait # available since 2.4.0
from selenium.webdriver.support import expected_conditions as EC # available since 2.26.0

# Create a new instance of the Firefox driver
driver = webdriver.Chrome()

# go to the home page
driver.get("http://www.zjcredit.gov.cn")

#获得当前窗口句柄
nowhandle = driver.current_window_handle

print driver.title
# find the element that's name attribute is qymc (the search box)
inputElement = driver.find_element_by_name("qymc")
print inputElement

# type in the search
inputElement.send_keys(u"同花顺")

driver.find_element_by_name("imageField").click();
# submit the form (compare with google we can found that the search is not a standard form and can not be submitted, we do click instead)
# inputElement.submit()


try:
    # overlap will happen if we do not move the page to the bottom
    # the last link will be under another unrelevant link if we do not scroll to the bottom
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    #find all link and click them
    for item in driver.find_elements_by_xpath('//*[@id="pagetest2"]/div/table/tbody/tr/td/a'):
        item.click()
        time.sleep(10)
    #获取所有窗口句柄
    allhandles=driver.window_handles
    #在所有窗口中查找新开的窗口
    for handle in allhandles:
        if handle!=nowhandle:
            #这两步是在弹出窗口中进行的操作,证明我们确实进入了
            driver.switch_to_window(handle)
            print driver.page_source
        #返回到主窗口页面
           driver.switch_to_window(nowhandle)

finally:
    driver.quit()

标签:webdriver,python,selenium,driver,element,import,速查,find
From: https://www.cnblogs.com/tianyihao/p/18111120

相关文章

  • python基础——基础代码每日复习
    '''字符串的格式化方法一,示例'''name="张三"money=102desc="今天收到{}的学费{}元"string=desc.format(name,money)print(string)#今天收到张三的学费102元'''字符串的格式化方法一,示例'''str='今天在{}......
  • systemctl控制python脚本开机自启
    一次搞定Linuxsystemd服务脚本-知乎(zhihu.com)#service文件目录/etc/systemd/system/mantis-stat.service#文件内容Description=mantis-statAfter=network.target[Service]Type=forkingUser=rootGroup=rootWorkingDirectory=/root/daizc/mantis-stat-master......
  • python数据容器之列表
    python数据容器-列表1、列表列表的定义语法:[元素1,元素2,元素3,…]列表内的每一个数据,称之为元素以[]作为标识列表内每一个元素之间用,逗号隔开注意:列表可以一次存储多个数据,且可以为不同的数据类型,支持嵌套列表的特点:可以容纳多个元素,上限为2**6......
  • 入门级Python编程题(2)
    编写一个Python程序,找出列表中第n小的整数。定义函数find_nth_smallest(),该函数接受整数列表numbers_list和整数n作为参数。在函数内部,返回列表中第n小的整数。如果n大于列表的长度,则返回None。deffind_nth_smallest(numbers_list,n):ifn>len(numbers_list):......
  • python学习-python解释器的安装
    先进入python下载的官网:https://www.python.org/2.根据自己电脑的系统来选择下载哪个解释器的版本,我的是Windows,选择的是Windows,截图如下:3.再根据自己Windows的版本来选择下载对应的版本,以Python3.11.8版本为例,(64-bit),截图如下:4.下载完成之后双击对应的exe文件,进行安装,安......
  • 初始selenium
    拓展阅读https://zhuanlan.zhihu.com/p/453590557安装pipinstallseleniumwebdriver的下载谷歌浏览器:https://googlechromelabs.github.io/chrome-for-testing/进入后找到chromedriver选择合适的版本下载火狐浏览器:https://github.com/mozilla/geckodriver/releases/进......
  • Python解压序列
    一.普遍情况:x,y,z=1,2,3print("x:",x)#x:1print("y:",y)#y:2print("z:",z)#z:3二.针对元祖:name=("qiaobushi","wanglihong","leibushi")x,y,z=nameprint(name)print("x:",x)print(......
  • Python单双引号转义符输出
    一、单引号输出#单引号print('Hello,world')结果:Hello,world二、双引号输出#双引号print("Hello,world")结果:Hello,world三、单双混搭#单双混搭print("'Hello,world'")print('"Hello,world"')结果:'Hello,world'......
  • python如何处理文本错误
    在python中,如果读取的文本文件不在程序的目录中,会提示FileNotFoundError如果不能确定文本文件是否在目录中,可以用tryexcept语句代码如下frompathlibimportPathpath=Path('alice.txt')try:contents=path.read_text(encoding='utf_8')exceptFileNotFoundError:......
  • Python从0到100(九):Python字符串介绍及使用
    一、字符串的定义1.什么是字符串字符串是一种表示文本数据的类型。所谓字符串,就是由零个或多个字符组成的有限序列,一般记为:s=a......