在网页头部信息看到: Accept-Encoding:gzip, deflate 图片:
而爬到的汉字的部分是乱码:
查看获取响应的数据类型:
import re import requests from bs4 import BeautifulSoup headers = { 'Upgrade-Insecure-Requests': '1', 'DNT': '1', 'User-Agent': '',# 输入个人的user_agent 'Referer': 'http://jnga.jinan.gov.cn/col/col22173/index.html', } url = "http://jnga.jinan.gov.cn/col/col22173/index.html" response = requests.get(url,headers=headers) print(response.encoding) #ISO-8859-1
我个人的解决方法:
请求网页获取响应后,将响应的编码类型改为utf-8或者gdk
response.encoding='gdk' #或者'utf-8'
标签:网页,cn,遇到,encoding,爬取,headers,import,response From: https://www.cnblogs.com/wwei12/p/17154139.html