- 创建一个scrapy项目
- 编写主程序文件代码
- 配置settings文件
- 编写pipelines文件代码
目标地址:Desktop wallpapers hd, free desktop backgrounds
界面显示
1.创建一个scrapy项目
scrapy startproject wallpapers
进入到项目里
cd + 项目路径
创建CrawlSpider
scrapy genspider -t crawl 爬虫名 (allowed_url)
scrapy genspider -t crawl wallpaper wallpaperscraft.com
创建一个窗口
code .
2.编写wallpaper.py文件
import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule
class Wallpaper3Spider(CrawlSpider):
name = "wallpaper"
allowed_domains = ["wallpaperscraft.com"]
start_urls = ["https://wallpaperscraft.com"]
#将下一页的xpath放到rules里,爬取每一页的壁纸
rules = (Rule(LinkExtractor(restrict_xpaths="//ul[@class='pager__list']/li[@class='pager__item'][2]/a[@class='pager__link']"), callback="parse_item", follow=True),)
def parse_item(self, response):
image_url_300_168 = response.xpath("//div[@class='wallpapers wallpapers_zoom wallpapers_main']/ul/li/a/span/img/@src").extract()
#修改壁纸的大小
image_url_1280_720 = []
for url in image_url_300_168:
new_url = url.replace('300x168', '1280x720')
image_url_1280_720.append(new_url)
name = response.xpath("//span[@class='wallpapers__info'][2]/text()").extract()
yield {
"images_url":image_url_1280_720,
"names":name
}
网页首页的壁纸默认是300x168,想要1280x720的效果,可以将src修改成1280x720
3.配置settings.py文件
添加useragent,把robot协议注释,每爬取一页等待2秒,打开管道,设置图片保存路径
4.编写pipeline.py文件
# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: https://docs.scrapy.org/en/latest/topics/item-pipeline.html
# useful for handling different item types with a single interface
from itemadapter import ItemAdapter
import re
from scrapy.http.request import Request
class Wallpapers2Pipeline:
def process_item(self, item, spider):
return item
from scrapy.pipelines.images import ImagesPipeline
class MyImagePipeline(ImagesPipeline):
def get_media_requests(self, item, info):
for url in item["images_url"]:
yield Request(url)
def file_path(self, request, response=None, info=None, *, item=None):
获得图片url的索引位置
index = item["images_url"].index(request.url)
#将名字与图片对应
name = item["names"][index]
return f"{re.sub(",","_",name)}.jpg"
5.运行程序
新建一个begin文件。执行下面代码
from scrapy.cmdline import execute
execute("scrapy crawl wallpaper3".split())
成功爬取
标签:url,scrapy,爬取,item,Scrapy,import,壁纸,class,wallpapers From: https://blog.csdn.net/qq_39000057/article/details/140319872