首页 > 编程语言 >python爬虫练习2

python爬虫练习2

时间:2022-10-26 20:32:09浏览次数:75  
标签:love author python text 练习 爬虫 think quotes u201d


难度


目标网站

​http://quotes.toscrape.com/tag/humor/​

用到库

scrapy1.4

系统说明

python3.6.1 64位

目标

获取名言与作者等(分页)

新建文件quotes_spider.py,输入一下代码:

import scrapy
class QuotesSpider(scrapy.Spider):
name = "quotes"
start_urls=['http://quotes.toscrape.com/tag/humor',]

def parse(self, response):
for quote in response.css('div.quote'):
yield {
'text':quote.css('span.text::text').extract_first(),
'author': quote.xpath('span/small/text()').extract_first(),
}
next_page = response.css('li.next a::attr("href")').extract_first()
if next_page is not None:
yield

打开命令行,进入文件所在目录,运行命令:

scrapy  runspider quotes_spider.py -o quotes.json

如下运行结果:

python爬虫练习2_python



打开代码文件所在目录,查看输出的结果文件quotes.json :

[
{"text": "\u201cThe person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.\u201d", "author": "Jane Austen"},
{"text": "\u201cA day without sunshine is like, you know, night.\u201d", "author": "Steve Martin"},
{"text": "\u201cAnyone who thinks sitting in church can make you a Christian must also think that sitting in a garage can make you a car.\u201d", "author": "Garrison Keillor"},
{"text": "\u201cBeauty is in the eye of the beholder and it may be necessary from time to time to give a stupid or misinformed beholder a black eye.\u201d", "author": "Jim Henson"},
{"text": "\u201cAll you need is love. But a little chocolate now and then doesn't hurt.\u201d", "author": "Charles M. Schulz"},
{"text": "\u201cRemember, we're madly in love, so it's all right to kiss me anytime you feel like it.\u201d", "author": "Suzanne Collins"},
{"text": "\u201cSome people never go crazy. What truly horrible lives they must lead.\u201d", "author": "Charles Bukowski"},
{"text": "\u201cThe trouble with having an open mind, of course, is that people will insist on coming along and trying to put things in it.\u201d", "author": "Terry Pratchett"},
{"text": "\u201cThink left and think right and think low and think high. Oh, the thinks you can think up if only you try!\u201d", "author": "Dr. Seuss"},
{"text": "\u201cThe reason I talk to myself is because I\u2019m the only one whose answers I accept.\u201d", "author": "George Carlin"},
{"text": "\u201cI am free of all prejudice. I hate everyone equally. \u201d", "author": "W.C. Fields"},
{"text": "\u201cA lady's imagination is very rapid; it jumps from admiration to love, from love to matrimony in a moment.\u201d", "author": "Jane Austen"}
]


标签:love,author,python,text,练习,爬虫,think,quotes,u201d
From: https://blog.51cto.com/u_15847885/5798389

相关文章

  • python爬虫练习1
    目录索引:​​python爬虫练习6:今日头条搜索图集​​​​python爬虫练习5:博客阅读量助手​​python爬虫练习4:刷浏览量​​python爬虫练习3:豆瓣电影推荐页电影详情​​​​pyth......
  • Python的@staticmethod @classmethod @property
    @staticmethod静态方法用于修饰类中的方法,使其可以在不创建类实例的情况下调用方法,好处是执行效率比较高;静态方法就是类对外部函数的封装,有助于优化代码结构、提高程序......
  • Python——爬虫准备工作
    Python——爬虫准备工作第三方模块的下载与使用'''第三方模块: 别人写的模块,一般情况下,功能都非常强大 使用第三方模块: 第一次使用必须先下载,后面才可以反复使用(下......
  • 爬虫、openpyxl、pandas
    目录今日内容概要今日内容详细作业详解第三方模块的下载与使用网络爬虫模块之requests模块网络爬虫实战之爬取链家二手房数据自动化办公领域之openpyxl模块homework今日内......
  • python进阶之路21 正则应用 第三方模块之requests模块 openpyxl模块 简易爬虫(panda
    作业讲解"""网络爬虫没有我们现在接触的那么简单 有时候页面数据无法直接拷贝获取 有时候页面还存在防爬机制弄不好ip会被短暂拉黑"""http://www.redbull.com.cn/ab......
  • 23、python模块篇 第三方模块、requests模块、openpyxl模块
    目录一、第三方模块的下载与使用1、什么是第三方模块2、如何安装第三方模块方式一:pip工具方式二:pycharm中下载3、注意事项1、报错并有警告信息2、报错,提示关键字3、报错,无......
  • day22爬虫
    第三方模块的下载与使用网络爬虫模块之requests模块网络爬虫实战之爬取链家二手房数据自动化办公领域之openpyxl模块今日内容详细作业讲解"""网络爬虫没有我们......
  • 网络爬虫之requests模块
    第三方模块的下载与使用网络爬虫模块之requests模块网络爬虫实战之爬取链接二手房数据自动化办公领域之openpyxl模块第三方模块的扩展(模块叠模块)网络爬虫之小实战......
  • requests模块/openpyxl模块/简单爬虫实战
    内容概要第三方模块的下载及使用网络爬虫及requests模块网络爬虫实战爬取二手房信息自动化办公领域模块openpyxl练习题及答案第三方模块的下载第三方模块就类似与......
  • 网络爬虫及openyxl模块
    网络爬虫及openyxl模块一、第三方模块简介1.第三方模块的用处python之所以在这么多的编程语言中脱颖而出的优点是有众多的第三方库函数,可以更高效率的实现开发2.......