目录
Preface
Scrapy is an application framework for crawling web sites and extracting structured data which can be used for a wide range of useful applications, like data mining, information processing or historical archival.
Even though Scrapy was originally designed for web scraping, it can also be used to extract data using APIs (such as Amazon Associates Web Services) or as a general purpose web crawler.
Architecture
Components
Scrapy Engine
The engine is responsible for controlling the data flow between all components of the system, and triggering events when certain actions occur. It controls the entire process.
Scheduler
Downloader
Spiders
Spiders are custom classes written by Scrapy users to parse responses and extract items from them or additional requests to follow. For more information see Spiders.
Item Pipeline
Example
Demand
Crawl the information of 豆瓣读书Top250,and store in the MongDB database.
Step
Specify the content we are desired to crawl
We need the title, author,year,score, brief introduction.
标签:web,information,spider,Scrapy,Spiders,data From: https://www.cnblogs.com/memokeerbisi/p/18446026