背景:python实现网页爬虫,可以使用scrapy,首先,需要安装python的运行环境,我们这里使用anaconda集成环境。
安装好以后,打开Anaconda Navigator,打开CMD.exe Prompt,在命令行窗口运行:pip install scrapy,运行完,没有报错,意味着scrapy就安装好了,然后,在当前文件夹下新建一个文件,名为:myspider.py,代码如下:
import scrapy class BlogSpider(scrapy.Spider): name = 'blogspider' start_urls = ['https://www.zyte.com/blog/'] def parse(self, response): for title in response.css('.oxy-post-title'): yield {'title': title.css('::text').get()} for next_page in response.css('a.next'): yield response.follow(next_page, self.parse)
在命令行窗口下,运行:scrapy runspider myspider.py
参考资料:
https://scrapy.org/
TRANSLATE with x English TRANSLATE with COPY THE URL BELOW Back EMBED THE SNIPPET BELOW IN YOUR SITE Enable collaborative features and customize widget: Bing Webmaster Portal Back 标签:title,python,入门教程,scrapy,location,https,response From: https://www.cnblogs.com/jamstack/p/17534673.html