xpath寻找标签

时间：2024-02-22 15:15:00浏览次数：31

标签：xpath body href 标签寻找 html following 节点

语法格式如下

1 标签名   # 找xml中所有这个标签
2 /       # 只找一层
3 //      # 子子孙孙都会找
4 .       # 从当前路径下
5 ..      # 上一层
6 @属性名 # 找有这个属性的标签

数据准备

doc='''
<html>
 <head>
  <base href='http://example.com/' />
  <title>Example website</title>
 </head>
 <body>
  <div id='images'>
   <a href='image1.html' id='id_a' name='lqz'>Name: My image 1 <br /><img src='image1_thumb.jpg' /></a>
   <a href='image2.html'>Name: My image 2 <br /><img src='image2_thumb.jpg' /></a>
   <a href='image3.html'>Name: My image 3 <br /><img src='image3_thumb.jpg' /></a>
   <a href='image4.html'  class='li'>Name: My image 4 <br /><img src='image4_thumb.jpg' /></a>
   <a href='image5.html' class='li li-item' name='items'>Name: My image 5 <br /><img src='image5_thumb.jpg' /></a>
   <a href='image6.html' name='items'><span><h5>test</h5></span>Name: My image 6 <br /><img src='image6_thumb.jpg' /></a>
  </div>
 </body>
</html>
'''

使用方法

# 导入模块
from lxml import etree

# 实例化
html=etree.HTML(doc) # 加载字符串
# html=etree.parse('search.html',etree.HTMLParser()) # 加载文件

1 所有节点

a=html.xpath('//*')
a=html.xpath('/*')

2 指定节点（结果为列表）

a=html.xpath('//head')

3 子节点，子孙节点

a=html.xpath('//div/a')
a=html.xpath('//body/a') # 无数据
a=html.xpath('//body//a')

4 父节点

a=html.xpath('//body//a[@href="image1.html"]/..')
a=html.xpath('//body//a[1]/..')  # 从1开始

# 也可以这样
a=html.xpath('//body//a[1]/parent::*')  # 找父亲---》父亲可以是任意标签
a=html.xpath('//body//a[1]/parent::div')  # 找父亲---》父亲是div标签

5 属性匹配

a=html.xpath('//a[@href="image1.html"]')

6 文本获取(重要)

a=html.xpath('//body//a[@href="image1.html"]/text()')

7 属性获取(重要)

a=html.xpath('//body//a/@href') # 拿所有a的href属性

# 注意从1 开始取（不是从0）
a=html.xpath('//body//a[1]/@href')

8 属性多值匹配

# a 标签有多个class类，直接匹配就不可以了，需要用contains
a=html.xpath('//body//a[@class="li"]') # 无法获取，原因：因为这个a有两个类
a=html.xpath('//body//a[contains(@class,"li")]')
a=html.xpath('//body//a[contains(@class,"li")]/text()')

9 多属性匹配

a=html.xpath('//body//a[contains(@class,"li") or @name="items"]')
a=html.xpath('//body//a[contains(@class,"li") and @name="items"]/text()')
a=html.xpath('//body//a[contains(@class,"li")]/text()')

10 按序选择

a=html.xpath('//a[2]/text()')
a=html.xpath('//a[2]/@href')
# 取最后一个
a=html.xpath('//a[last()]/@href')
# 位置小于3的
a=html.xpath('//a[position()<3]/@href')
# 倒数第三个
a=html.xpath('//a[last()-2]/@href')

11 节点轴选择

# ancestor：祖先节点

# 使用了* 获取所有祖先节点
a=html.xpath('//a/ancestor::*')

# 获取祖先节点中的div
a=html.xpath('//a/ancestor::html')

# attribute：属性值
a=html.xpath('//a[1]/attribute::*')
a=html.xpath('//a[1]/attribute::id')

# child：直接子节点
a=html.xpath('//a[1]/child::*')
a=html.xpath('//a[1]/child::img')

# descendant：所有子孙节点
a=html.xpath('//a[6]/descendant::*')

# following:当前节点之后所有节点
a=html.xpath('//a[1]/following::*')
a=html.xpath('//a[1]/following::*[1]/@href')

# following-sibling:当前节点之后同级节点
a=html.xpath('//a[1]/following-sibling::*')
a=html.xpath('//a[1]/following-sibling::a')
a=html.xpath('//a[1]/following-sibling::*[2]')
a=html.xpath('//a[1]/following-sibling::*[2]/@href')

print(a)

标签：xpath,body,href,标签,寻找,html,following,节点
From： https://www.cnblogs.com/wellplayed/p/18027368

从右边开始寻找整数的第k位
从右边开始寻找整数的第k位Implementmatch_k,whichtakesinanintegerkandreturnsafunctionthattakesinavariablexandreturnsTrueifallthedigitsinxthatarekapartarethesame.Forexample,match_k(2)returnsaoneargumentfunctionthattake......
1 Spring5 自定义标签开发
spring5 自定义脚本开发步骤1 定义bean，publicclassUser{privateStringid;privateStringuserName;privateStringemail;privateStringpassword;publicStringgetId(){returnid;}publicvoidsetId(St......
selenium搜索标签，获取标签属性
搜索标签1By.ID#根据id号查找标签bro.find_element(By.ID,'id内容')2By.NAME#根据name属性查找标签3By.TAG_NAME#根据标签名查找标签a_list=bro.find_elements(By.TAG_NAME,'a')4By.CLASS_NAME#按类名找dig=bro.find_element(By.CLASS_NAME,'diggit')......
collection标签多条件查询
场景：要查的数据在两个表，并且这个两个表为一对多关系。eg：以上为我最终要得到的数据实体，现在要开始查这些数据思路：先查【一对多】中的【一】这张表基本信息，其次查【多】中你要进行多条件筛选的这些条件字段代码：1、查基本信息<selectid="selectByDeviceCode"resultMap="map">......
修改标签官网自带css——dialog
对于标签原本自带的class类就如下图的.el-dialog__body就是自带的原dialog：现在若要更改padding值方式一（但是修改的是全局的了）：<style>.el-dialog__body{padding:15px;}</style>方式二（给dialog加一个自定义类名，修改的是所有class匹配的el-dia......
在script标签写export为什么会抛错｜type module import ES5 ES6 预处理指令序言 JavaS
今天我们进入到语法部分的学习。在讲解具体的语法结构之前，这一堂课我首先要给你介绍一下JavaScript语法的一些基本规则。脚本和模块首先，JavaScript有两种源文件，一种叫做脚本，一种叫做模块。这个区分是在ES6引入了模块机制开始的，在ES5和之前的版本中，就只有一种源文件类型（就......
在涉及恶意软件的任何调查中，寻找持久性点（也称为“自动启动扩展点”或ASEP）是一项经常出
AutostartcategoriesWhenyoulaunchAutorunsforthefirsttime,allautostartentriesonthesystemaredisplayedinonelonglistontheEverythingtab.As Figure4-8 shows,thedisplayincludesupto19othertabsthatbreakdownthecompletelistint......
（学习日记）一、Web框架-HTML标签-网页请求
1.快速开发网站render_template是Flask框架的一个函数，用于渲染模板并生成动态的HTML文件app=Flask(name,template_floder(''路径''))构造一个Flask类赋给app，template_floder修改寻找模板的默认路径，默认是当前目录下的templates文件（没有则需要创建一个目录文件）fromflask......
8小时速成golang（四）反射reflect 和结构体标签
编程语言中反射的概念在计算机科学领域，反射是指一类应用，它们能够自描述和自控制。也就是说，这类应用通过采用某种机制来实现对自己行为的描述（self-representation）和监测（examination），并能根据自身行为的状态和结果，调整或修改应用所描述行为的状态和相关的语义。每种语言的反射模......
link标签中的rel="home"表示什么意思？
rel属性用于指定链接的关系。例如：<linkrel="home"title="home"href="https://emuchong.com/"/>用以表示当前网页的主页是https://emuchong.com/这个地址。这样做的好处除了提供语义的基本描述，Opera会自动识别出文档<head>段中<link>的rel-home属性。Opera浏览器会提供一个......

xpath寻找标签

语法格式如下

数据准备

使用方法

1 所有节点

2 指定节点（结果为列表）

3 子节点，子孙节点

4 父节点

5 属性匹配

6 文本获取(重要)

7 属性获取(重要)

8 属性多值匹配

9 多属性匹配

10 按序选择

11 节点轴选择

相关文章

赞助商

阅读排行