Python网络爬虫与数据挖掘——复习笔记

时间：2023-02-12 15:13:38浏览次数：54

标签：Python tt 爬虫 print url 数据挖掘 path requests os

\(\tt requests\) 库爬取页面
\(\tt requests\) 库爬取搜索引擎
\(\tt requests\) 库爬取网络图片

\(\tt requests\) 库爬取页面

import requests # 引入库
url = "......" # 爬取的地址
kv = {"User-Agent": "Mozilla/5.0"} # 用户信息，增加成功率，可以不需要
r = requests.get(url, headers = kv)
print(r.text)

\(\tt requests\) 库爬取搜索引擎

搜狗搜索网址：www.sogou.com/web?query=......

import requests
keyword = "......" # 设定关键词
kv = { "User-Agent": "Mozilla/5.0" }
query = {"query" : keyword} # 设定关键词键字对
r = requests.get('https://www.sogou.com/web?', headers = kv, params = query)
print(r.text)
r.close()

\(\tt requests\) 库爬取网络图片

这里同时用到了 \(\tt os\) 库用于本地储存。

import requests
import os
url = "......" # 爬取的地址
root = "......" # 这里输入要储存的本地地址
path = root + url.split('/')[-1] + 'jpg'
try:
    if not os.path.exists(root):
        os.mkdir(root)
    if not os.path.exists(path):
        r = requests.get(url)
        with open (path,'wb')as f:
            f.write(r.content)
            f.close()
            print("文件保存成功")
    else:
        print("文件已存在")
except:
    print("爬取失败")

标签：Python,tt,爬虫,print,url,数据挖掘,path,requests,os
From： https://www.cnblogs.com/WIDA/p/17113827.html

Python----基础知识测试
一、单选题(每题2分)1、列标识符命名中，符合规范的是()A、1aB、forC、_123D、#_b2、下列标识符中，不是Python支持的数据类型的是（）A、charB、intC、floatD、str3、下......
Python之装饰器
1.装饰器的定义就是给已有函数增加额外功能的函数，它本质上就是一个闭包函数。装饰器的功能特点:不修改已有函数的源代码不修改已有函数的调用方式给已有函数增加额外的功能......
python 单例
importthreadingclassThreadSafeSingleton(type):_instances={}_singleton_lock=threading.Lock()def__call__(cls,*args,**kwargs):......
redis_python_连接
/Users/song/Code/redis_learn/le00.py#!/usr/bin/envpython3#-*-coding:utf-8-*-importsysimportasynciofromredisimportRedis,AuthenticationError,Time......
python django二手商城（课设、学习、毕设、源码下载）
pythondjango二手商城pythondjango校园二手商城django校园商城django校园商店django电子商城django网上商城前端：htmlcss等后端：python django数据库：MYSQL涉......
Python django酒店旅游推荐系统（课设、学习、毕设、源码下载）
Pythondjango酒店旅游推荐系统酒店系统酒店推荐系统旅游系统旅游推荐系统技术：Python django数据库：MySQL涉及功能：登录、注册、登出、修改密码、查看个人中心酒......
python django 个人电影网项目（课设、学习、毕设、源码下载）
pythondjango个人电影网项目pythondjango电影推荐网pythondjango电影网基于pythondjango个人电影网项目该系统详情：后端：python3.6+MySQL5.7+Django框架......
Python django 汽车商城（课设、学习、毕设、源码下载）
Pythondjango汽车商城汽车系统在线商城二手汽车网汽车网站django汽车推荐商城技术：Python django数据库：MySQL前端：html css js涉及功能：登录，注册，登出，......
Python django 个人博客（课设、学习、毕设、源码下载）
Python django个人博客基于Python django个人博客后端：python+django数据库：MySQL、redis缓存前端：bootstrap、HTML css js该系统涉及功能：1、登录、登出、......
python django聊天系统（课设、学习、毕设、源码下载）
pythondjango聊天系统pythondjango对话系统基于pythondjango聊天系统后端：pythondjango数据库：MySQL5.7前端：html cssjs等涉及功能：登陆、注册、退出、发送信......

Python网络爬虫与数据挖掘——复习笔记

\(\tt requests\) 库爬取页面

\(\tt requests\) 库爬取搜索引擎

\(\tt requests\) 库爬取网络图片

相关文章

赞助商

阅读排行