Spider代码

class BizhizolSpider(scrapy.Spider):
    name = "bizhizol"
    allowed_domains = ["zol.com.cn"]
    start_urls = ["https://desk.zol.com.cn/youxi/"]

    def parse(self, response,**kwargs):
        # print(response.text)
        res_list_li = response.xpath('//*[@class="pic-list2  clearfix"]/li')
        # print(res_list_li)
        for res_list in res_list_li:
            img_url = res_list.xpath('./a/@href').extract_first()
            if img_url.endswith(".exe"):
                continue
            # print(img_url)
            """
            python  URL拼接
            # from urllib.parse import urljoin
            # print(urljoin(response.url,img_url))
            """
            #使用scrapy自带的拼接，其实也是调用了urllib模块
            child_url =response.urljoin(img_url)
            # print(child_url)

            #拿到图片的URL需要重新发起请求
            yield Request(
                url=child_url,
                method="get",
                callback=self.suibianqimignzi,
            )
    def suibianqimignzi(self, response,**kwargs):
        img_src = response.xpath("//*[@id='bigImg']/@src").extract_first()
        # print(img_src)
        yield {
            "img_src":img_src
        }

Pepiline代码

# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: https://docs.scrapy.org/en/latest/topics/item-pipeline.html


# useful for handling different item types with a single interface
from itemadapter import ItemAdapter
# ImagesPipeline 图片专用的管道
import scrapy
from scrapy.pipelines.images import ImagesPipeline

class BizhiPipeline:
    def process_item(self, item, spider):
        return item

class MyTuPipeline(ImagesPipeline):
    # 1. 发送请求(下载图片, 文件, 视频,xxx)
    def get_media_requests(self, item, info):
        url = item['img_src']
        headers = {
            'sec-ch-ua': '"Google Chrome";v="119", "Chromium";v="119", "Not?A_Brand";v="24"',
            'Referer': 'https://desk.zol.com.cn/showpic/1920x1080_100899_144.html',
            'sec-ch-ua-mobile': '?0',
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36',
            'sec-ch-ua-platform': '"Windows"',
        }
        yield scrapy.Request(url=url,headers=headers, meta={"sss": url})  # 直接返回一个请求对象即可

    # 2. 图片的存储路径
    # 完整的路径: IMAGES_STORE + file_path()的返回值
    # 在这个过程中. 文件夹自动创建
    def file_path(self, request, response=None, info=None, *, item=None):
        # 可以准备文件夹
        img_path = "/youxi/"
        print(request)
        # 准备文件名字
        # 坑: response.url 没办法正常使用
        # file_name = response.url.split("/")[-1]  # 直接用响应对象拿到url
        # print("response:", file_name)
        file_name = item['img_src'].split("/")[-1]  # 用item拿到url
        print("item:", file_name)
        file_name = request.meta['sss'].split("/")[-1]
        print("meta:", file_name)

        real_path = img_path + "/" + file_name  # 文件夹路径拼接
        return real_path  # 返回文件存储路径即可

    # 3. 可能需要对item进行更新
    def item_completed(self, results, item, info):
        # print(results)
        for r in results:
            print(r[1]['path'])
        return item  # 一定要return item 把数据传递给下一个管道

效果展示

标签：桌面壁纸,img,item,Python,ZOL,url,file,print,response
From： https://www.cnblogs.com/fuchangjiang/p/17899209.html

Python高级之装饰器
装饰器【一】装饰器介绍装饰器的由来软件的设计应该遵循开放封闭原则，即对扩展是开放的，而对修改是封闭的。对扩展开放，意味着有新的需求或变化时，可以对现有代码进行扩展，以适应新的情况。对修改封闭，意味着对象一旦设计完成，就可以独立完成其工作，而不要对其进行修改。软件包......
【Python爬虫】Scrapy框架文件写入方式CSV，MYSQL，MongoDB_爬取新浪彩票双色球
Spider代码爬取新浪彩票双色球页面数据，只爬取期号、红球、篮球classShuangseqiu11Spider(scrapy.Spider):name="shuangseqiu11"allowed_domains=["sina.com.cn"]start_urls=["https://view.lottery.sina.com.cn/lotto/pc_zst/index?lottoType=ssq&......
C++堆——heap与二叉树和python
数据结构栈-->stack队列-->queue树-->tree堆-->heap散列-->hash图-->graph图结构一般包括顶点和边邻接矩阵DAG,DirectedAcyclicGraph即「有向无环图」树树（Tree）是一种非线性的数据结构，由n个节点组成，其中每个节点都有零个或多个子节点。......
Python 初学之华为OD机试题：求最大数字
题目描述给定一个由纯数字组成以宇符串表示的数值，现要求字符串中的每个数字最多只能出现2次，超过的需要进行删除；删除某个重复的数字后，其它数字相对位置保持不变。如"34533”，数字3重复超过2次，需要册除其中一个3，删除第一个3后获得最大数值"4533"。请返回经过删除操作后的最大的数值......
pythonDay22
【序列化与反序列化】（定义）（使用）（案例）（猴子补丁）【xml模块】（格式）（案例）【importconfigparser模块】写的不好，后续修改【subprocess模块，执行系统命令】 ......
【Python爬虫】Python爬虫入门教程&注意事项
一、引言随着互联网的快速发展，网络数据已经成为人们获取信息的重要来源。而爬虫技术作为获取网络数据的重要手段，越来越受到人们的关注。在众多编程语言中，Python因其易学易用、库丰富、社区活跃等优势，成为爬虫开发的首选。本文将带你走进Python爬虫的世界，让你......
Python——第五章：logging模块
filename：文件名format：数据的格式化输出。最终在日志文件中的样子时间-名称-级别-模块：错误信息datefmt：时间的格式level：错误的级别权重，当错误的级别权重大于等于leval的时候才会写入文件importlogginglogging.basicConfig(filename='x1.txt',format='%(asc......
【Python小随笔】 Grpc协议的使用
定义接口//test.protosyntax="proto3";optioncc_generic_services=true;serviceGreeter{//第一个接口rpcOne(OneRequest)returns(OneResponse){}//第二个接口rpcTwo(TwoRequest)returns(TwoResponse){}}//第1个接口请求值messageOn......
Python——第五章：shutil模块
复制文件把dir1的文件a.txt移动到dir2内importshutilshutil.move("dir1/a.txt","dir2")复制两个文件句柄f1=open("dir2/a.txt",mode="rb")#准备读f1f2=open("dir1/b.txt",mode="wb")#准备写f2shutil.copyfileobj(f1,......
python N 字形变换多种解法
解法一：使用二维数组defconvert(s,numRows):ifnumRows==1ornumRows>=len(s):returnsrows=['']*numRowsindex,step=0,1forcharins:rows[index]+=charifindex==0:......

【Python爬虫】Scrapy框架图片下载_桌面壁纸ZOL（纯案例）

Spider代码

Pepiline代码

效果展示

相关文章

赞助商

阅读排行