首页 > 其他分享 >爬虫:BeautifulSoup(6)--select

爬虫:BeautifulSoup(6)--select

时间:2022-11-28 10:04:48浏览次数:40  
标签:-- BeautifulSoup soup html url print id select


Beautiful Soup中的select

Beautiful Soup中的select也是过滤器的一种,个人认为要比find_all()好用一点

find_all()的返回方式是列表,以主页为例,探究一下select

# coding=utf-8
from bs4 import BeautifulSoup
import requests

url = 'https://www.cs.net/'
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; …) Gecko/20100101 Firefox/61.0',
'Referer':'https://www.cs.net/'
}
html = requests.get(url, headers)
soup = BeautifulSoup(html.text, features='html.parser')

1.按标签查询

tag = soup.select('title');
print(tag)

#输出
#[<title>专业IT技术社区</title>]

2.按类名查询 – 类名前加点

class_ = soup.select('.carousel-caption')
print(class_)

#输出
#class_ = soup.select('.carousel-caption')
# [<div class="carousel-caption">前端工程师凭什么这么值钱?</div>,
# <div class="carousel-caption">让面试官颤抖的Tomcat系统架构!</div>,
# <div class="carousel-caption">上班时间“划水”、下班时间“加班”。钱和命,孰轻孰重?</div>,
# <div class="carousel-caption"> 面试定心丸:AI知识点备忘录(包括ML、DL、Python、Pandas等)</div>,
# <div class="carousel-caption">Google发布“多巴胺”开源强化学习框架,三大特性全满足</div>]

3.按id查询 – id前加

html2 = '''<body>
<p class=""><b>The Dormouse's story</b></p>
<p class="story">
<a href="" id="link1">link1</a>
<a href="" id="link2">link2</a>
<a href="" id="link3">link3</a>
</p>
</body>'''
soup = BeautifulSoup(html2, features='html.parser')
id = soup.select('#link1')
print(id)

#输出
#[<a href="" id="link1">link1</a>]

4.组合查询 – 父子标签间空格

rep = soup.select(".clearfix .list_con .title h2 a")
for url in rep:
print(url.text, url.get('href'))

#输出


标签:--,BeautifulSoup,soup,html,url,print,id,select
From: https://blog.51cto.com/u_15879559/5890579

相关文章