PYTHON 简单的网页图片爬虫

时间：2023-09-07 09:57:02浏览次数：32

标签：网页 get PYTHON pic list 爬虫 htmls print pages

直接上代码：

'''
简单的网页图片爬虫   
要先安装requests，BeautifulSoup的库  
pip install requests
pip install bs4  是一个可以从HTML或XML文件中提取数据的Python库
pip install lxml
'''
import requests  #导入requests库
from bs4 import BeautifulSoup


def get_htmls(pages=list(range(2, 5))):
    #获取待爬取的网页
    pages_list = []
    for page in pages:
        url = f"https://pic.netbian.com/4kfengjing/index_{page}.html"  #网址
        response = requests.get(url)
        response.encoding = 'gbk'
        pages_list.append(response.text)
    return pages_list


def get_picturs(htmls):
    #获取所有图片，并下载
    for html in htmls:
        soup = BeautifulSoup(html, 'html.parser')  #解析html或xml
        # print(soup.prettify())  #把要解析的字符串以标准的缩进格式输出
        # print(soup.title.string)  #输出HTML中title节点的文本内容
        # print(soup.link.attrs)  #中间的link是页签？比如<link> <title> <head>
        # print(soup.link.attrs['href'])  #指定节点的数据

        pic_li = soup.find('div', id='main').find('div', class_='slist').find(
            'ul', class_='clearfix')
        image_path = pic_li.find_all('img')
        for file in image_path:
            pic_name = './partice05' + file['alt'].replace(" ", '_') + '.jpg'
            src = file['src']
            src = f"https://pic.netbian.com/{src}"

            response = requests.get(src)

            with open(pic_name, 'wb') as f:
                f.write(response.content)
                print("picturs dowmload in:{}".format(pic_name))


htmls = get_htmls(pages=list(range(2, 3)))  #得到网页的代码list
# print(htmls)
get_picturs(htmls)

标签：网页,get,PYTHON,pic,list,爬虫,htmls,print,pages
From： https://www.cnblogs.com/seven1314pp/p/17684027.html

Working With Files in Python
文件操作Python中文件的读和写参考文档WorkingWithFilesinPythonPython的读写非常简单，但是需要在合适的模式下打开。通常都是使用以下规范的操作来进行：i.打开文件open()ii.读写文件read()、write()iii.关闭文件close()这是一个简单的例子：withopen('data.tx......
python基础 05流程控制
流程控制你现在在十字路口，过马路。如果看到绿灯就过马路；否则看到红灯就等待if就是如果的意思light='red'iflight=='red':print('等')eliflight=='green' print('过')最简单的if（单分支结构）<代码块1>if<条件>:<代码块2>#当条件为True的时......
pip install ale_python_interface 安装报错，ModuleNotFoundError: No module named 'a
参考：https://www.cnblogs.com/hasakei/p/10035198.htmlhttps://blog.csdn.net/senjie_wang/article/details/84073823https://github.com/bbitmaster/ale_python_interface/issues/2https://blog.csdn.net/dream6985/article/details/127746687 ======================......
这可能是最全面的Python入门手册了！
无论是学习任何一门语言，基础知识一定要扎实，基础功非常的重要，找到一个合适的学习方法和资料会让你少走很多弯路，你的进步速度也会快很多，无论我们学习的目的是什么，不得不说Python真的是一门值得付出时间去学习的优秀编程语言。普通人学Python有什么用Python对于普通人而言，开启了无......
python基础 05基本运算符
基本运算一、算术运算符x=10y=20print(x+y)#30print(x-y)#-10print(x*y)#200print(x/y)#0.5print(x%y)#10print(x//y)#0print(x**y)#100000000000000000000二、比较运算符返回的都是布尔值x=10y=20print(x>y)......
《Python魔法大冒险》010 魔法宝箱：列表与元组的探险
城堡的大门随着小鱼和魔法师的深入，他们来到了一个古老的废弃城堡。城堡的大门上挂着一个巨大的锁，而锁的旁边有一排小抽屉，每个抽屉里都有一个物品。魔法师对小鱼说：“这是一个古老的魔法宝箱，小鱼。为了打开这扇门，我们需要正确地组合这些物品。在Python的魔法世界中，我们使用列表和......
ElasticSearch系列——查询、Python使用、Django/Flask集成、集群搭建，数据分片、位置
@目录Elasticsearch之-查询一基本查询1.1match查询1.2term查询1.3terms查询1.4控制查询的返回数量（分页）1.5match_all查询1.6match_phrase查询1.7multi_match1.8指定返回的字段1.9sort结果排序1.10range范围查询1.11wildcard查询二组合查询2.1bool查询2.2简单过滤......
python实现数的排列问题
功能需求有四个数字1，2，3，4，能够组成多少个互不相同并且无重复的三位数字？各是多少？程序分析使用for循环遍历所有可能，将重复的删去，设定一个计数变量total，每完成一次符合要求的组合，total自加1.程序实现（一）使用for循环#total初始值为0total=0#最外层循环从1开始遍历，直到4foriinrang......
python-pycharm打印不换行，清空
一、参考代码foriinrange(100):time.sleep(0.4)#print(i)print('\r','count:'+str(i),end='---')#这种方式可以避免输出内容刷屏......
Python内置函数 - enumerate, range, max, len
1, enumerate(可迭代对象,index_base)fromcollections.abcimportIteratormy_list=["aa","b","c"]result=enumerate(my_list)#迭代器:每次返回一个元组,tuple(index,value)print(type(result))#<class'enumerate'>prin......

PYTHON 简单的网页图片爬虫

相关文章

赞助商

阅读排行