-
-
演示一个虎扑体育网站-NBA球员
-
新手的话需要安装两个模块requests和lxml requests的作用:就是爬虫模块不断向浏览器发送请求 lxml的作用:模块可以利用XPath规则语法,来快速的定位HTML\XML 文档中特定元素以及获取节点信息
1.mport requests 2.from lxml import etree 3.url = 'https://nba.hupu.com/stats/players' 4.headers ={ 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36 dg/114.0.1823.43'} 5.resp = requests.get(url,headers = headers) 6.e = etree.HTML(resp.text) 7.names = e.xpath('//table[@class="players_table"]//tr/td/a/text()') 8.print(names)
右键运行查看运行效果
解析响应的数据
nos = e.xpath('//table[@class="players_table"]//tr/td[1]/text()') names = e.xpath('//table[@class="players_table"]//tr/td[2]/a/text()') teams = e.xpath('//table[@class="players_table"]//tr/td[3]/a/text()') scores = e.xpath('//table[@class="players_table"]//tr/td[4]/text()')
for no,name,team,score in zip(nos,names,teams,scores): print(f'排名: {no} 姓名: {name} 球队:{team} 得分:{score}') 查看运行
标签:xpath,python,text,超级,tr,爬虫,players,table,td From: https://blog.51cto.com/u_15947611/6523078