python 爬虫 -----Bs4 爬取并且下载图片

时间：2022-10-28 19:45:44浏览次数：45

标签：src img get python resp 爬取 Bs4 href print

# 1.拿到主页面主代码，拿到子页面连接地址，href
# 2.通过href拿到子页面内容，从子页面中找到图片的下载地址 img -> src
# 3. 下载图片
import requests
from bs4 import BeautifulSoup
import time
import urllib3

urllib3.disable_warnings() # 去除警告

url = "https://www.umei.cc/bizhitupian/weimeibizhi/"
domain = "https://www.umei.cc/"

resp = requests.get(url, verify=False)
resp.encoding = 'utf-8' # 调整字符编码
# print(resp.text)

main_page = BeautifulSoup(resp.text, "html.parser") # 指定html解析器
alist = main_page.find("div", class_="item_list infinite_scroll").find_all("a") # 查找数据
# print(alist)

for a in alist:
    # print(a.get("href")) # 直接通过get就可以拿到属性的值
    href = domain + a.get('href').strip("/") # 拼接连接
    # print(href)
    # 拿到子页面源代码
    child_resp = requests.get(href, verify=False) # 拿到源代码 去掉了安全验证
    child_resp.encoding = 'uft-8' # 调整字符编码
    # 提取下载路径
    child_page = BeautifulSoup(child_resp.text, "html.parser") # 指定html解析器
    p = child_page.find("div", class_="big-pic") # 查找数据
    img = p.find("img")# 得到连接img
    # print(img.get("src"))
    src = img.get("src")# 得到属性值src

    # 下载图片
    img_resp = requests.get(src, verify=False) # 拿到连接 去掉了安全验证
    # img_resp.content #拿到字节
    img_name = src.split("/")[-1] # 拿到url中最后一个/以后的内容 命名
    with open("img/"+img_name, mode="wb") as f: # 打开文件
        f.write(img_resp.content) # 写入到文件

    print("over!!", img_name) # 结束语句
    time.sleep(1) # 停止语句

print("all over!!") # 爬取完毕语句
resp.close() # 关闭请求

标签：src,img,get,python,resp,爬取,Bs4,href,print
From： https://www.cnblogs.com/slowlydance2me/p/16837209.html

python coverage 代码覆盖率
coverage runmain.py coveragereport coveragehtml-d resulthtml 自动生成#使用API生成代码覆盖率统计报告#exec_api.pyimportcoverageimportuni......
python 中实现向列表的最后一位或者最后两位之前插入元素
001、>>>list1##测试列表[1,2,3,4,5]>>>list1.insert(-1,"xxx")##在列表最后以为之前插入数据>>>list1[1,2,3,4,'......
python 网页登录了之后拿着cookie直接用于脚本程序获取api接口数据
#++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++#python3.7cookieurllib3,requests##pipinstallrequests#分析网站数据源接口规律#当前......
python cmake 下载失败？ No CMAKE_C_COMPILER could be found.
1.pipinstallcmake--default-timeout=100-ihttps://pypi.tuna.tsinghua.edu.cn/simple 2.第二个错误： --Buildingfor:VisualStudio162019 --Selectin......
常用的Python函数有几类
今天就给大家介绍12类，新手在做写代码的时候容易卡壳，尤其当接触的函数以及其他知识比较多的时候，经常会看完需求之后不知道自己该用什么方法来实现它，实现的逻辑可能你有，但怎么......
python lxml 解析
1.lxml的安装 pipinstalllxml2.导入lxml的etree库fromlxmlimportetree 示例1fromlxmlimportetreehtml=etree.HTML(response.text)ret_list=htm......
python自学简单的网站开发 2
1.一般网站会有多个视图，我们要先在Views.py 中添加多个视图地址。defadd(request):returnHttpResponse("add....")deffind(request):returnHttpRespon......
ROS2时间同步(python)
最近1周一直研究ROS2的时间同步，翻越很多博客，很少有人使用ROS2进行时间同步的代码，无奈不断尝试与源码阅读，终于将其搞定，为此，本博客将介绍基于python的ROS2的时间同步方法。......
Python 简易版贪食蛇（源代码）
Python简易版贪食蛇简易版贪食蛇代码如下，直接运行即可。1.效果图2.源代码源代码如下：#!/usr/bin/envpython#-*-coding:utf-8-*-importpygameaspygameimp......
使用python 绘制中国人口热气图
使用pythonmatlib绘制热力图绘制世界地图点击查看代码importmatplotlib.pyplotaspltfrommpl_toolkits.basemapimportBasemapplt.figure(figsize=(16,8))m......

python 爬虫 -----Bs4 爬取并且下载图片

相关文章

赞助商

阅读排行