参照一本书(《python数据分析入门 从数据获取到可视化》-沈祥壮)上的代码准备学习一下爬虫,但是卡在了标题中的错误中,尝试了很多方法:pip install lxml / pip uninstall lxml、直接在官网上下载相应版本的lxml 使用绝对路径安装等等,仍然无法解决。
期间有许多提示内容,其中就包括下图所示:
Requirement already satisfied: lxml in c:\users\许逍遥\appdata\local\programs\python\python37\lib\site-packages (4.4.1)
显示的意思很明显,已经安装过了lxml,所以问题就在pycharm配置这块,具体解决办法可以参考下面这篇文章(主要是
“
注意!敲黑板了!
进入到pycharm,选择file-setting-project interpreter:
”
这块):
修改相应配置后,就可以正常运行了!
附(实验代码):
import requests
from bs4 import BeautifulSoup
url = 'https://book.douban.com/latest'
data = requests.get(url)
#data = requests.get(url)
#print(data.text)
soup = BeautifulSoup(data.text,'lxml')
books_left = soup.find('ul',{ 'class':'cover-col-4 clearfix' })
books_left = books_left.find_all('li')
books_right = soup.find('ul',{ 'class':'cover-col-4 pl20 clearfix' })
books_right = books_right.find_all('li')
books = list(books_left) + list(books_right)
#print(soup)
img_urls = []
titles = []
ratings = []
authors = []
details = []
for book in books:
#封面图片url地址
img_url = book.find_all('a')[0].find('img').get('src')
img_urls.append(img_url)
#图书标题
title = book.find_all('a')[1].get_text()
titles.append(title)
# 评价星级
rating = book.find('p', {'class': 'rating'}).get_text()
rating = rating.replace('\n', '').replace(' ', '')
ratings.append(rating)
# 作者及出版信息
author = book.find('p', {'class': 'color-gray'}).get_text()
author = author.replace('\n', '').replace(' ', '')
authors.append(author)
# 图书简介
detail = book.find_all('p')[2].get_text()
detail = detail.replace('\n', '').replace(' ', '')
details.append(detail)
print("img_urls: ", img_urls)
print("titles: ", titles)
print("ratings: ", ratings)
print("authors: ", authors)
print("details: ", details)
标签:lxml,img,get,python,book,books,报错,find From: https://blog.51cto.com/u_15849465/5801380