我正在为 Etsy 编写一个抓取器,当我抓取评论的范围时,我得到了正确的输出。然而,当我用价格来获取跨度时,它只给我 None 值,我不明白为什么。如果有人可以提供帮助,那就太好了!
#html parsing
page_soup = soup(page_html, "html.parser")
#grabs each listing card
divs = page_soup.find_all("div", {"class": "v2-listing-card__shop"})
for i in divs:
shop = i.p.text
reviews = i.find("span", {"class" : "text-body-smaller text-gray-lighter display-inline-block vertical-align-middle icon-b-1"})
prices = i.find("span", {"class" : "currency-value"})
print shop
print reviews.text
print prices
以下是网站上的两个 span 元素:
<div class="v2-listing-card__info">
<p class="text-gray text-truncate mb-xs-0 text-body">
Blush Watercolor Flowers & Leaves with Different Shades Clipart Separate Elements Hand Painted Commercial Use | S15 Fairy Tale
</p>
<div class="v2-listing-card__shop">
<p class="text-gray-lighter text-body-smaller display-inline-block mr-xs-1">PatishopArt</p>
<div class="v2-listing-card__rating icon-t-2">
<div class="stars-svg stars-smaller ">
<input name="initial-rating" type="hidden" value="5"/>
<input name="rating" type="hidden" value="5"/>
<span class="screen-reader-only">5 out of 5 stars</span>
<div aria-hidden="true" class="rating lit rating-first icon-b-2" data-rating="1">
<span class="etsy-icon stars-svg-star" title="Disappointed"><svg aria-hidden="true" focusable="false" viewbox="3 3 18 18" xmlns="http://www.w3.org/2000/svg"><path d="M19.985,10.36a0.5,0.5,0,0,0-.477-0.352H14.157L12.488,4.366a0.5,0.5,0,0,0-.962,0l-1.67,5.642H4.5a0.5,0.5,0,0,0-.279.911L8.53,13.991l-1.5,5.328a0.5,0.5,0,0,0,.741.6l4.231-2.935,4.215,2.935a0.5,0.5,0,0,0,.743-0.6l-1.484-5.328,4.306-3.074A0.5,0.5,0,0,0,19.985,10.36Z"></path></svg></span>
<div class="rating lit" data-rating="2">
<span class="etsy-icon stars-svg-star" title="Not a fan"><svg aria-hidden="true" focusable="false" viewbox="3 3 18 18" xmlns="http://www.w3.org/2000/svg"><path d="M19.985,10.36a0.5,0.5,0,0,0-.477-0.352H14.157L12.488,4.366a0.5,0.5,0,0,0-.962,0l-1.67,5.642H4.5a0.5,0.5,0,0,0-.279.911L8.53,13.991l-1.5,5.328a0.5,0.5,0,0,0,.741.6l4.231-2.935,4.215,2.935a0.5,0.5,0,0,0,.743-0.6l-1.484-5.328,4.306-3.074A0.5,0.5,0,0,0,19.985,10.36Z"></path></svg></span>
<div class="rating lit" data-rating="3">
<span class="etsy-icon stars-svg-star" title="It's okay"><svg aria-hidden="true" focusable="false" viewbox="3 3 18 18" xmlns="http://www.w3.org/2000/svg"><path d="M19.985,10.36a0.5,0.5,0,0,0-.477-0.352H14.157L12.488,4.366a0.5,0.5,0,0,0-.962,0l-1.67,5.642H4.5a0.5,0.5,0,0,0-.279.911L8.53,13.991l-1.5,5.328a0.5,0.5,0,0,0,.741.6l4.231-2.935,4.215,2.935a0.5,0.5,0,0,0,.743-0.6l-1.484-5.328,4.306-3.074A0.5,0.5,0,0,0,19.985,10.36Z"></path></svg></span>
<div class="rating lit" data-rating="4">
<span class="etsy-icon stars-svg-star" title="Like it"><svg aria-hidden="true" focusable="false" viewbox="3 3 18 18" xmlns="http://www.w3.org/2000/svg"><path d="M19.985,10.36a0.5,0.5,0,0,0-.477-0.352H14.157L12.488,4.366a0.5,0.5,0,0,0-.962,0l-1.67,5.642H4.5a0.5,0.5,0,0,0-.279.911L8.53,13.991l-1.5,5.328a0.5,0.5,0,0,0,.741.6l4.231-2.935,4.215,2.935a0.5,0.5,0,0,0,.743-0.6l-1.484-5.328,4.306-3.074A0.5,0.5,0,0,0,19.985,10.36Z"></path></svg></span>
<div class="rating lit" data-rating="5">
<span class="etsy-icon stars-svg-star" title="Love it"><svg aria-hidden="true" focusable="false" viewbox="3 3 18 18" xmlns="http://www.w3.org/2000/svg"><path d="M19.985,10.36a0.5,0.5,0,0,0-.477-0.352H14.157L12.488,4.366a0.5,0.5,0,0,0-.962,0l-1.67,5.642H4.5a0.5,0.5,0,0,0-.279.911L8.53,13.991l-1.5,5.328a0.5,0.5,0,0,0,.741.6l4.231-2.935,4.215,2.935a0.5,0.5,0,0,0,.743-0.6l-1.484-5.328,4.306-3.074A0.5,0.5,0,0,0,19.985,10.36Z"></path></svg></span>
</div>
</div>
</div>
</div>
</div>
</div>
<span class="text-body-smaller text-gray-lighter display-inline-block vertical-align-middle icon-b-1">(110)</span>
</div>
</div>
<p class="n-listing-card__price text-gray strong mt-xs-0">
<span class="currency-symbol">$</span><span class="currency-value">6.60</span>
</p>
<!-- This shows Free shipping on its own line , we only show it if it wasn't shown above -->
</div>
问题在于,尝试搜索的“价格”跨度位于用于搜索的“divs”元素之外。因此,当迭代“divs”时,价格跨度不会包含在内,并且
find
返回
None
。
以下是如何修改代码以获取价格的方法:
import requests
from bs4 import BeautifulSoup
# 获取页面内容
url = 'https://www.etsy.com/search?q=watercolor+flowers'
response = requests.get(url)
page_html = response.text
# 解析 HTML
page_soup = BeautifulSoup(page_html, "html.parser")
# 获取每个商品卡片
listing_cards = page_soup.find_all("div", {"class": "v2-listing-card"})
# 遍历商品卡片
for card in listing_cards:
# 获取店铺名称
shop = card.find("p", {"class": "text-gray-lighter text-body-smaller"}).text.strip()
# 获取评论数
reviews = card.find("span", {"class": "text-body-smaller text-gray-lighter display-inline-block vertical-align-middle icon-b-1"})
reviews = reviews.text.strip() if reviews else "No Reviews"
# 获取价格
prices = card.find("span", {"class": "currency-value"})
prices = prices.text.strip() if prices else "No Price Found"
# 打印信息
print(f"Shop: {shop}")
print(f"Reviews: {reviews}")
print(f"Price: {prices}")
print("-" * 20)
在这个修改后的代码中:
-
我们查找所有具有 "v2-listing-card" 类的
div
元素,其中包含店铺信息、评论和价格。 -
然后,我们在每个
card
中查找相应的元素,以获取店铺名称、评论数和价格。 -
我们使用
.strip()
方法来删除文本开头和结尾的空白字符。 - 如果没有找到评论或价格,我们会打印 "No Reviews" 或 "No Price Found"。
这应该可以解决遇到的问题,并允许抓取 Etsy 商品列表的价格。
标签:python,web-scraping,beautifulsoup From: 47461142