对大佬的有些修改,用大佬的编码转换那里老是出问题 这个亲测可以
# -*- coding:utf-8 -*-
import requests
import parsel
url = ""#小说网站
response = requests.get(url)
response.encoding = "utf-8"#小说网站编码
responses = response.text
selector = parsel.Selector(responses)
novel_name = selector.css('#info h1::text').get() #小说名
href = selector.css('#list dd a::attr(href)').getall() #小说章节
for link in href:
link_url = '' + link #小说网站
response_1 = requests.get(link_url)
response_1.encoding='gbk'#小说网站编码
responses_1 = response_1.text
selecter_1 = parsel.Selector(responses_1)
title_name = selecter_1.css('.bookname h1::text').get() #小说章节
content_list = selecter_1.css('#content::text').getall() #小说内容
content = '\n'.join(content_list)
# 保存
with open(novel_name + '.txt',mode = 'a',encoding = 'utf-8',) as f:
f.write(title_name)
f.write('\n')
f.write(content)
f.write('\n')
print(title_name)
print(novel_name)
当返回头里面有content_type 的时候,
1 如果有charset=xxx,则encoding的编码为chatset的值。
2 如果只是text/html,则编码为ISO-8859-1
3 如果什么都没有,就自动识别编码,很准的
标签:编码,name,Python,text,爬虫,content,简单,小说,response From: https://www.cnblogs.com/python-xiaopang/p/16793955.html