首页 > 其他分享 >.CrawlSpider读书网练习

.CrawlSpider读书网练习

时间:2023-10-05 16:45:36浏览次数:201  
标签:www name url CrawlSpider self 练习 scrapy 读书网 page

1.创建项目:scrapy startproject dushuproject
2.跳转到spiders路径 cd\dushuproject\dushuproject\spiders
3.创建爬虫类:scrapy genspider read www.dushu.com

  

import scrapy
from readPro.items import ReadproItem


class ReadnetSpider(scrapy.Spider):
    name = 'readNet'
    allowed_domains = ['www.dushu.com']
    start_urls = ['https://www.dushu.com/book/1179_1.html']
    base_url = 'https://www.dushu.com/book/1179_'
    page = 1
    def parse(self, response):
        print("读书网")
        img = response.xpath('//div[@class="bookslist"]//li//img')
        for item in img:

            src = item.xpath('./@data-original').extract_first()
            name = item.xpath('./@alt').extract_first()
            print(src,name)
            book = ReadproItem(src=src,name=name)
            yield book

        pass

        if self.page < 101:
            self.page = self.page + 1
            url = self.base_url + str(self.page) + '.html'
            yield scrapy.Request(url=url, callback=self.parse)

  

标签:www,name,url,CrawlSpider,self,练习,scrapy,读书网,page
From: https://www.cnblogs.com/sgj191024/p/17743510.html

相关文章

  • c语言代码练习20
    需求:每调用一次函数,num增加一次。#define_CRT_SECURE_NO_WARNINGS1#include<stdio.h>voidayue(int*p){(*p)++;}intmain(){intnum=0;ayue(&num);printf("num=%d\n",num);ayue(&num);printf("num=%d&......
  • c语言代码练习19
    需求:利用二分查找,查找数组中是否有用户输入的数字。#define_CRT_SECURE_NO_WARNINGS1#include<stdio.h>//这里的arr[]实际上是一个指针intayue(intarr[],inta,intp){intleft=0;intright=p-1;while(left<=right){inti=(ri......
  • scrapy post请求练习
    importscrapyimportjsonclassTransferpostSpider(scrapy.Spider):name='transferPost'allowed_domains=['fanyi.baidu.com']#start_urls=['http://fanyi.baidu.com/']#post请求不能用默认生成的,因为不能携带请求参数#de......
  • 读书网入库练习
    settings.pyDB_HOST='localhost'DB_PORT=3306DB_USER='root'DB_PWD='1234'DB_NAME='guli'DB_CHARSET='utf8'#Configureitempipelines#Seehttps://docs.scrapy.org/en/latest/topics/item-pipeli......
  • 函数练习错题
    A函数的返回值不能一次返回两个数例如inttest(){}return3,4;           intmain(){          最后的结果不是3,4而是4,因为这是个逗号表达式,整个表达式会从}intret=test();     左到右依次计算,但是最后的结果只会返回最......
  • c语言代码练习17
    需求:判断用户输入的一个数字是否为一个素数。#define_CRT_SECURE_NO_WARNINGS1#include<stdio.h>voidayue(int*pinput){inti=0;for(i=2;i<*pinput;i++){if(*pinput%i==0){printf("您输入的数字不是一个......
  • c语言代码练习--函数
    函数:一,概念:1,在计算科学中,子程序(英语:Subroutione,procedure,function,rotine,method.subprogram,callableunit),是一个大型程序中的某部分代码,由一个或多个语句块组成。它负责完成某项特定任务,而且相较于其它代码,具备相对的独立性。2,一般会由输入参数并有返回值,提供对过程的封......
  • python练习3| 测试类
    点击查看代码#shopping_list.pyclassShoppingList:#shopping_list是一个字典结构def__init__(self,shopping_list):self.shopping_list=shopping_listdefget_item_count(self):returnlen(self.shopping_list)defget_total_price(......
  • scrapy电影天堂练习
    movie.pyimportscrapyfrommovieProject.itemsimportMovieprojectItemclassMovieSpider(scrapy.Spider):name='movie'allowed_domains=['www.ygdy8.net']start_urls=['https://www.ygdy8.net/html/gndy/china/index.......
  • c语言代码练习16
    //计算a,b间的最大值#define_CRT_SECURE_NO_WARNINGS1#include<stdio.h>intayue(inta,intb){if(a>b){returna;}else{returnb;}}intmain(){inta=10;intb=20;intmax=ayue(a,......