目录
\(\tt requests\) 库爬取页面
import requests # 引入库
url = "......" # 爬取的地址
kv = {"User-Agent": "Mozilla/5.0"} # 用户信息,增加成功率,可以不需要
r = requests.get(url, headers = kv)
print(r.text)
\(\tt requests\) 库爬取搜索引擎
搜狗搜索网址:www.sogou.com/web?query=......
import requests
keyword = "......" # 设定关键词
kv = { "User-Agent": "Mozilla/5.0" }
query = {"query" : keyword} # 设定关键词键字对
r = requests.get('https://www.sogou.com/web?', headers = kv, params = query)
print(r.text)
r.close()
\(\tt requests\) 库爬取网络图片
这里同时用到了 \(\tt os\) 库用于本地储存。
import requests
import os
url = "......" # 爬取的地址
root = "......" # 这里输入要储存的本地地址
path = root + url.split('/')[-1] + 'jpg'
try:
if not os.path.exists(root):
os.mkdir(root)
if not os.path.exists(path):
r = requests.get(url)
with open (path,'wb')as f:
f.write(r.content)
f.close()
print("文件保存成功")
else:
print("文件已存在")
except:
print("爬取失败")
标签:Python,tt,爬虫,print,url,数据挖掘,path,requests,os
From: https://www.cnblogs.com/WIDA/p/17113827.html