技术点
1.requests
2.beautifulSoup
ps:程序可扩展
1.比如翻页下载
2.下拉加载更多可以用selenium
3.多线程或者异步协程提升下载效率
4.自动选择类型继续下载...
标签:get,Python,pic,爬虫,page,爬取,headers,requests,find From: https://blog.51cto.com/mooreyxia/6064570
import requests
from bs4 import BeautifulSoup
from fake_useragent import UserAgent
ua = UserAgent()
user_agent = ua.random
headers = {
'user-agent': user_agent
}
domain = 'https://www.umei.cc'
resp = requests.get(f'{domain}/bizhitupian/diannaobizhi/', headers=headers)
resp.encoding = 'utf-8'
main_page = BeautifulSoup(resp.text, 'html.parser')
a_list = main_page.find('ul', attrs={'class': 'pic-list after'}).find_all('a', class_=None)
for url in a_list:
sub_pageUrl = domain + url.get('href')
resp1 = requests.get(sub_pageUrl, headers=headers)
resp1.encoding = 'utf-8'
child_page = BeautifulSoup(resp1.text, 'html.parser')
pic_src = child_page.find('section', attrs={'class': 'img-content'}).find('img').get('src')
pic_name = child_page.find('div', class_='main-bt').find('h1').text
picResp = requests.get(pic_src, headers=headers)
with open(f'../FileForDemo/Umei/{pic_name}.jpg', mode='wb') as file:
file.write(picResp.content)
picResp.close()
resp1.close()
print(f'{pic_name}下载完成')
print('主页当前的链接已下载完毕')