xpath使用

xpath使用

时间：2023-07-11 18:24:29浏览次数：41

### xpath使用

```python
页面中定位元素(标签)，两种通用方式
# 	-css选择器
#     -xpath：XPath即为XML路径语言（XML Path Language），它是一种用来确定XML文档中某部分位置的语言

# xpath语法
#     div	选取div标签
#     /	从根节点选取
#     //	从匹配选择的当前节点选择文档中的节点，而不考虑它们的位置
#     .	选取当前节点。
#     ..	选取当前节点的父节点。
#     @	选取属性

doc = '''
<html>
 <head>
  <base href='http://example.com/' />
  <title>Example website</title>
 </head>
 <body>
  <div id='images'>
   <a href='image1.html' id='lqz'>Name: My image 1 <br /><img src='image1_thumb.jpg' /></a>
   <a href='image2.html'>Name: My image 2 <br /><img src='image2_thumb.jpg' /></a>
   <a href='image3.html'>Name: My image 3 <br /><img src='image3_thumb.jpg' /></a>
   <a href='image4.html'>Name: My image 4 <br /><img src='image4_thumb.jpg' /></a>
   <a href='image5.html' class='li li-item' name='items'>Name: My image 5 <br /><img src='image5_thumb.jpg' /></a>
   <a href='image6.html' name='items'><span><h5>test</h5></span>Name: My image 6 <br /><img src='image6_thumb.jpg' /></a>
  </div>
 </body>
</html>
'''
from lxml import etree

html = etree.HTML(doc)
# html=etree.parse('search.html',etree.HTMLParser())
# 1 所有节点
# a = html.xpath('//*')

# 2 指定节点
# a=html.xpath('//head')
# 3 子节点，子孙节点
# a=html.xpath('//div/a')
# a=html.xpath('//body/a') #无数据
# a=html.xpath('//body//a')
# 4 父节点
# a=html.xpath('//body//a[@href="image1.html"]/..')
# a=html.xpath('//body//a[1]/..')
# 也可以这样
# a=html.xpath('//body//a[1]/parent::*')
# a=html.xpath('//body//a[1]/parent::div')
# 5 属性匹配
# a=html.xpath('//body//a[@href="image1.html"]')

# 6 文本获取     /text()
# a=html.xpath('//body//a[@href="image1.html"]/text()')

# 7 属性获取     @属性名
# a=html.xpath('//body//a/@href')
# # 注意从1 开始取（不是从0）
# a=html.xpath('//body//a[1]/@href')

# 8 属性多值匹配
#  a 标签有多个class类，直接匹配就不可以了，需要用contains
# a=html.xpath('//body//a[@class="li"]')
# a=html.xpath('//body//a[contains(@class,"li")]')
# a=html.xpath('//body//a[contains(@class,"li")]/text()')
# 9 多属性匹配
# a=html.xpath('//body//a[contains(@class,"li") or @name="items"]')
# a=html.xpath('//body//a[contains(@class,"li") and @name="items"]/text()')
# a=html.xpath('//body//a[contains(@class,"li")]/text()')
# 10 按序选择
# a=html.xpath('//a[2]/text()')
# a=html.xpath('//a[2]/@href')
# 取最后一个
# a=html.xpath('//a[last()]/@href')
# a=html.xpath('//a[last()-1]/@href') # 倒数第二个
# 位置小于3的
# a = html.xpath('//a[position()<3]/@href')

# 倒数第三个
# a=html.xpath('//a[last()-2]/@href')
# 11 节点轴选择
# ancestor：祖先节点
# 使用了* 获取所有祖先节点
# a=html.xpath('//a/ancestor::*')
# # 获取祖先节点中的div
# a=html.xpath('//a/ancestor::div')
# attribute：属性值
# a=html.xpath('//a[1]/attribute::*')
# a=html.xpath('//a[1]/attribute::href')

# child：直接子节点
# a=html.xpath('//a[1]/child::*')
# descendant：所有子孙节点
# a=html.xpath('//a[6]/descendant::*')
# following:当前节点之后所有节点
# a=html.xpath('//a[1]/following::*')
# a=html.xpath('//a[1]/following::*[1]/@href')
# following-sibling:当前节点之后同级节点
# a=html.xpath('//a[1]/following-sibling::*')
# a=html.xpath('//a[1]/following-sibling::a')
# a=html.xpath('//a[1]/following-sibling::*[2]')
# a=html.xpath('//a[1]/following-sibling::*[2]/@href')

# print(a)


'''
/
//
.
..
取文本  /text()
取属性  /@属性名
根据属性过滤  [@属性名=属性值]
class 特殊
[contains(@class,"li")]
'''

# 终极大招

标签：xpath,body,html,class,href,使用,节点
From： https://www.cnblogs.com/liyuanxiangls/p/17545595.html

gitlab使用runner来实现CI/CD
1：安装runner比如，我们需要在192.168.3.129服务器上来实现自动部署，那我们就在这台服务器上安装runner在gitlab后台，比如tn项目，那我们进入项目，在设置中，找到CI/CD点击展开，新建项目runner 选择项目信息，Linux、标签自己填写、下面的勾记得勾选一下（运行未打标签的作业），后面的可写可......
使用whisper批量生成字幕(whisper.cpp)
前言最近发现了whisper这个语音生成字幕的本地工具，但是whisper速度不算快，然后在github上发现了whisper.cpp这个项目，执行速度更快，还可以在命令行使用，这样就可以自己定制了。命令行压缩包下载命令行下载地址:https://github.com/Const-me/Whisper/releases下载【cli.zip】，解压即......
"Tarfs"是一个内存文件系统，它使用TAR（Tape Archive）文件格式来实现在内存中创建一个虚拟
"Tarfs"是一个内存文件系统，它使用TAR（TapeArchive）文件格式来实现在内存中创建一个虚拟的文件系统。TAR文件格式是一种常见的存档文件格式，用于将多个文件和目录组合成单个文件。Tarfs通过将TAR文件加载到内存中，并在内存空间中模拟文件和目录结构，实现了一个简单的文件系统。它允许......
创建属于自己的github、使用git提交、更新代码至github、写好readme
1.在github上创建一个Repository点击github网站,你可以用你的邮箱先注册一个账号。点击New，转到创建一个repository的界面，如下图所示，你可以填写你的Repositoryname、description、选择是否公开、增添一个默认的Reademe等等，一般都可以选择上。2.使用git提交、更新代码至git......
使用input标签的时候报错，提示Form elements must have labels: Element has no title
使用input标签的时候报错，提示Formelementsmusthavelabels:ElementhasnotitleattributeElementhasnoplaceholderattribute大概就是下面这样其实规范化一下，加个label就可以了......
wpf的动态Tab的例子，使用Prism
引用Prism.Core，Prism.Wpf和Prism.Unity修改App.xaml的类型替换为 PrismApplication 修改App.xaml.cs：///<summary>///InteractionlogicforApp.xaml///</summary>publicpartialclassApp:PrismApplication{protectedoverride......
[ESP] 使用Ayla API Reference配网和连Ayla云
示例用的文档及链接USDevDashboard(查看oem-id和oem-key)https://dashboard-dev.aylanetworks.com/AylaAPIReference(绑定用户，设备和Ayla云)https://docs.aylanetworks.com/referenceAyla_demo的官方文档(构建，运行步骤)https://docs.aylanetworks.com/docs/integr......
OxyPlot曲线图控件的使用
官网：https://github.com/oxyplot/oxyplot官方文档：https://oxyplot.readthedocs.io/en/latest/ Nuget包平台Nuget包版本WPFOxyPlot.Wpf2.1.2WindowsFormsOxyPlot.WindowsForms2.1.2AvaloniaOxyPlot.Avalonia2.1.0 ......
使用LabVIEW实现 DeepLabv3+ 语义分割含源码
前言图像分割可以分为两类：语义分割（SemanticSegmentation）和实例分割（InstanceSegmentation），前面已经给大家介绍过两者的区别，并就如何在labview上实现相关模型的部署也给大家做了讲解，今天和大家分享如何使用labview实现deeplabv3+的语义分割，并就PascalVOC2012(DeepLabv3Plus-Mo......
zabbix自动发现与监控内存和CPU使用率最高的进程,监测路由器
使用snmp采集信息snmp安装及使用 windows2008设置snmphttps://jingyan.baidu.com/album/3d69c5515e56b3f0cf02d7bf.html?picindex=1路由器配置snmphttps://wenku.baidu.com/view/e08c6f1583d049649a665828.html 一、开启snmp服务，参考网站:router(config)#snmp-servercommuni......

相关文章

赞助商

阅读排行