在我们请求的时候会遇见ua反爬我们可以用一个ua的列表来更换实现反扒
class RandomUADownloaderMiddleware: def process_request(self, request, spider): ua_list = [ "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.153 Safari/537.36", "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:30.0) Gecko/20100101 Firefox/30.0", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.75.14 (KHTML, like Gecko) Version/7.0.3 Safari/537.75.14", "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; Win64; x64; Trident/6.0)", "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.63 Safari/537.36" ] request.headers["User-Agent"] = random.choice(ua_list) return None
在middlewares中自定义一个中间件
需要在settings中配置路由
DOWNLOADER_MIDDLEWARES = { # "scrapy_proxy.middlewares.ScrapyProxyDownloaderMiddleware": 543,"scrapy_proxy.middlewares.RandomUADownloaderMiddleware":2 }
标签:5.0,Mozilla,537.36,agent,scrapy,user,Safari,Gecko From: https://www.cnblogs.com/zhangpd/p/17697376.html