首页 > 其他分享 >scrapy当当网练习

scrapy当当网练习

时间:2023-10-04 16:22:06浏览次数:59  
标签:src name price 练习 当当网 item scrapy extract first

    def parse(self, response):
        print('当当网')
        li = response.xpath('//ul[@id="component_59"]/li')
      #src,name,price有个共同的父元素li,但是对于第一个li,没有data-original,所以遍历根据li的索引判断是否为none for item in li: srcFirst = item.xpath('./a/img/@src') src = item.xpath('./a/img/@data-original') name = item.xpath('./a/img/@alt')
        #获取内容 price = item.xpath( './p[@class="price"]/span[@class="search_now_price"]/text()') if(src.extract_first()): resSrc = 'http:' + src.extract_first() else: resSrc = 'http:' + srcFirst.extract_first() resName = name.extract_first() resPrice = price.extract_first() print(resSrc,resName,resPrice) book = ScrapyproItem(src=resSrc,name=resName,price=resPrice) #交给pipeline yield book pass

  settings.py

ITEM_PIPELINES = {
   'scrapyPro.pipelines.ScrapyproPipeline': 300,
}

  items.py

class ScrapyproItem(scrapy.Item):
    # define the fields for your item here like:
    # name = scrapy.Field()
    src = scrapy.Field()
    name = scrapy.Field()
    price = scrapy.Field()
    pass

  piplines.py

class ScrapyproPipeline:
    def process_item(self, item, spider):
        with open('book.json','a',encoding='utf-8')as fp:
            fp.write(str(item))
        return item

  新定义一个pepeline用来下载图片:

class DangDownloadPicture:

    def process_item(self, item, spider):
        url = item.get('src')
        name = './books/' + item.get('name') + '.jpg'
        urllib.request.urlretrieve(url=url,filename=name)

        return item

  settings.py  301表示优先级,数字越小优先级越高

ITEM_PIPELINES = {
   'scrapyPro.pipelines.DangDownloadPicture': 301,
}

  下载100页的图片和json数据:

class DangSpider(scrapy.Spider):
    name = 'dang'
    allowed_domains = ['category.dangdang.com']
    start_urls = ['http://category.dangdang.com/cp01.01.02.00.00.00.html']
    # http://category.dangdang.com/pg2-cp01.01.02.00.00.00.html
    base_url = 'http://category.dangdang.com/pg'
    page = 1
    def parse(self, response):
        print('当当网')
        li = response.xpath('//ul[@id="component_59"]/li')
        for item in li:
            srcFirst = item.xpath('./a/img/@src')
            src = item.xpath('./a/img/@data-original')
            name = item.xpath('./a/img/@alt')
            price = item.xpath(
                './p[@class="price"]/span[@class="search_now_price"]/text()')
            if(src.extract_first()):
                resSrc = 'http:' + src.extract_first()
            else:
                resSrc = 'http:' + srcFirst.extract_first()

            resName = name.extract_first()
            resPrice = price.extract_first()
            print(resSrc,resName,resPrice)
            book = ScrapyproItem(src=resSrc,name=resName,price=resPrice)
            #交给pipeline
            yield book
        pass

        if self.page < 100:
            self.page = self.page + 1
            url = self.base_url + str(self.page) + '-cp01.01.02.00.00.00.html'
            yield scrapy.Request(url=url,callback=self.parse)

  

标签:src,name,price,练习,当当网,item,scrapy,extract,first
From: https://www.cnblogs.com/sgj191024/p/17742406.html

相关文章

  • c语言代码练习9
    #define_CRT_SECURE_NO_WARNINGS1#include<stdio.h>#include<string.h>intmain(){ //判断1000-2000年中的闰年 //闰年:能被四整除不能被100整除,或能被400整除。 intn=0; intsum=0; for(n=1000;n<=2000;n++) { if((n%4==0&&n%100!=0)||n%400......
  • python练习1| 实现学生类
    定义一个学生类,属性包含姓名,学号,语数外三门的成绩要求:能够设置学生某科目的成绩;能打印该学生的所有科目成绩点击查看代码classStudent:def__init__(self,student_name,student_id):self.name=student_nameself.student_id=student_idse......
  • c语言代码练习7
    #define_CRT_SECURE_NO_WARNINGS1#include<stdio.h>#include<string.h>intmain(){inti=0;intnum=0;while(i<=100){printf("%d的三倍是:");num=i*3;printf("%d\n",num)......
  • c语言代码练习6
    #define_CRT_SECURE_NO_WARNINGS1#include<stdio.h>#include<string.h>intmain(){inta=0;intb=0;intc=0;inti=0;scanf("%d%d%d",&a,&b,&c);if(a<b){......
  • python基础操作练习题
    使用版本:python3.6.8IDE:pycharm前言这些练习题是在神经网络与深度学习课程上老师提供的,原因是有些同学没学过python,作为简单的练手习题。题目都很简单,加上python本身也比较简单,有些题目的作答可以一行代码实现(虽然可读性就下降了)。练习题2.1数位之和编写程序,输入一个正......
  • c语言代码练习5
    #define_CRT_SECURE_NO_WARNINGS1#include<stdio.h>#include<string.h>intmain(){inti=0;charpassword[20]={0};for(i=0;i<3;i++){printf("请输入您的密码,今天是你第%d次输入:",i+1);scanf("%s&quo......
  • c语言代码练习4(改进)
    #define_CRT_SECURE_NO_WARNINGS1#include<stdio.h>#include<string.h>#include<windows.h>#include<stdlib.h>intmain(){/*呈现效果*################*a##############!*ay############!!*......*ayuex......
  • c语言代码练习4
    #define_CRT_SECURE_NO_WARNINGS1#include<stdio.h>#include<string.h>intmain(){/*呈现效果*################*a##############!*ay############!!*......*ayuexuexiC!!!!*/intx=0;inty=0;......
  • c语言代码练习3改进
    #define_CRT_SECURE_NO_WARNINGS1#include<stdio.h>intmain(){intx=0;printf("请输入一个整数:");scanf("%d",&x);intarr[]={1,2,3,4,5,6,7,8,9};inta=sizeof(arr)/sizeof(arr[0]);//计算元素个数intl......
  • c语言代码练习2(2)
    #define_CRT_SECURE_NO_WARNINGS1#include<stdio.h>intmain(){inti=1;intnum=1;intx=0;intsum=0;for(x=1;x<=10;x++){num=1;for(i=1;i<=x;i++){num=n......