Scrapy框架之某视频网站采集案例

时间：2022-10-10 08:57:42浏览次数：55

标签：__ 视频 xpath 案例 self scrapy item Scrapy page

代码只能作为学习，请不要用于其他。

一、效果图

二、代码编写

1、items.py

# -*- coding: utf-8 -*-

# Define here the models for your scraped items
#
# See documentation in:
# https://doc.scrapy.org/en/latest/topics/items.html

import scrapy


class Scrapy77DianshiItem(scrapy.Item):
    # define the fields for your item here like:
    #电影标题
    title = scrapy.Field()
    #电影封面
    pic = scrapy.Field()
    #电影描述
    desc = scrapy.Field()
    #电影类型
    remarks = scrapy.Field()
    #详情地址
    link = scrapy.Field()
    pass

2、movie.py

# -*- coding: utf-8 -*-
import scrapy
from ..items import Scrapy77DianshiItem

class MovieSpider(scrapy.Spider):
    name = 'movie'
    allowed_domains = ['77dianshi.com']

    page = 1
    host = "http://77dianshi.com"
    url = host + "/iTe5kdy/page_{0}.html"
    start_urls = [url.format(str(page))]

    def parse(self, response):

        print("当前采集第{0}页".format(self.page))

        # 获取列表
        for each in response.xpath("//ul[@class='fed-list-info fed-part-rows']//li"):
            item = Scrapy77DianshiItem()

            item['title'] = each.xpath('./a[2]//text()').extract()[0]
            item['link'] = self.host + each.xpath('./a[1]/@href').extract()[0]
            item['desc'] = each.xpath('./span[1]//text()').extract()[0]
            item['remarks'] = each.xpath('./a[1]//span[3]//text()').extract()[0]
            item['pic'] = each.xpath('./a[1]/@data-original').extract()[0]

            yield item

        #判断是否到最后一页
        last_page = response.xpath("//div[@class='pages text-center']//a[last()]//text()").extract()[0]
        if last_page == "»":
            #不是最后一页
            self.page += 1
            yield scrapy.Request(self.url.format(self.page), callback=self.parse)
        else:
            print('结束采集，最后一页：' + str(self.page))

3、pipelines.py

# -*- coding: utf-8 -*-

# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: https://doc.scrapy.org/en/latest/topics/item-pipeline.html
import json

class Scrapy77DianshiPipeline(object):

    def __init__(self):
        #打开json文件
        self.file = open('movie.json', 'w')

    def process_item(self, item, spider):
        # print(item)
        # 保存到json文件
        self.file.write(json.dumps(dict(item), ensure_ascii=False)+'\n')
        return item

    def close_spider(self, spider):
        self.file.close()

4、start.py启动文件

from scrapy import cmdline
import os

if __name__ == "__main__":
    # 获取当前路径
    dirpath=os.path.dirname(os.path.abspath(__file__))
    # 切换到当前目录
    os.chdir(dirpath)         
    #执行爬虫
    cmdline.execute("scrapy crawl movie".split())

标签：__,视频,xpath,案例,self,scrapy,item,Scrapy,page
From： https://www.cnblogs.com/yang-2018/p/16774382.html

html css JavaScript web网页设计期末作业共5页【视频演示】
......
5大步骤+10个案例，堪称SQL优化万能公式
一、前言在应用开发的早期，数据量少，开发人员开发功能时更重视功能上的实现，随着生产数据的增长，很多SQL语句开始暴露出性能问题，对生产的影响也越来越大，有时可能这......
autohotkey 视频遥控器
GroupAddremoteKey,ahk_exemsedge.exeGroupAddremoteKey,ahk_exechrome.exeGroupAddremoteKey,ahk_exe哔哩哔哩.exeGroupAddremoteKey,ahk_exebdcam.exe......
案例分享-https证书链不完整导致请求失败
背景话不多说，直接上堆栈javax.net.ssl.SSLHandshakeException:sun.security.validator.ValidatorException:PKIXpathbuildingfailed:sun.security.provider.certp......
断点应用案例
1#include<stdio.h>23voidmain(){4intsum=0;5inti=0;6for(i=0;i<10;i++){7sum+=i;8printf("\ni=......
遇到的比较好的学习视频
B站上确实有很多不错的视频。自己看到过的，以后都在这里更新出来。 nginx讲解视频（2016 讲的不太细致）https://www.bilibili.com/video/av34488063/?p=7nginx讲解视......
EasyCVR视频融合云平台页面logo一直显示加载却无法进入该如何解决？
EasyCVR视频融合云平台可支持多协议、多设备接入，包括国标GB28181、RTMP、RTSP/Onvif、海康SDK、大华SDK、Ehome等协议，同时也能分发RTSP、RTMP、FLV、HLS、WebRTC等格式的视......
视频融合云平台EasyCVR最新版电子地图数据不显示该如何解决？
EasyCVR视频融合云平台是基于云边端一体化架构，可在复杂的网络环境中，将分散的各类视频资源进行统一汇聚、整合、集中管理。平台可支持的协议包括：国标GB/T28181、RTMP、RTSP/......
尚硅谷Linux运维讲解视频（2018.12）
分享一个B站上的尚硅谷Linux运维的讲解视频，时间是2018年12月份的，还是比较新的。可以看一看 https://www.bilibili.com/video/av41052360/?p=17......
Swift Metal渲染视频
一、基本Metal概念 Metal是iOS推出的图像渲染工具，类似于OpenGL，Metal为图形和数据并行计算工作负载提供单一，统一的编程接口和语言。Metal使您能够更有效地集成图形和计算......

Scrapy框架之某视频网站采集案例

代码只能作为学习，请不要用于其他。

一、效果图

二、代码编写

1、items.py

2、movie.py

3、pipelines.py

4、start.py启动文件

相关文章

赞助商

阅读排行

Scrapy框架 之某视频网站采集案例

代码只能作为学习，请不要用于其他。

一、效果图

二、代码编写

1、items.py

2、movie.py

3、pipelines.py

4、start.py启动文件

相关文章

赞助商

阅读排行

Scrapy框架之某视频网站采集案例