【爬虫】项目篇-新东方六级听力音频

时间：2024-04-05 23:46:14浏览次数：24

标签：六级 random req 爬虫 headers 1650027981031 新东方

import requests,time,random
from fake_useragent import UserAgent


urls=open(r'E:\01pycharm project\网络爬虫技术\sjj1.txt',encoding='utf-8').read().split()
i=1
for url in urls:
    headers={
        #'User-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.75 Safari/537.36 Edg/100.0.1185.39',
        'user-agent':UserAgent(path=r'./fake_useragent.json').Chrome,
        'Referer':'https://dogwood.xdfsjj.com/',
        'cookie':'_yttoken_=yvpoqsot2il2geyv; _ytuserid_=113498738; zg_did={"did": "17f64e7dcddd63-02f3d4617972ef-56171d58-144000-17f64e7dcdef3f"}; zg_37d62e79d2fa4b8aa0dcfdd95a665ced={"sid": 1650027981031,"updated": 1650027981031,"info": 1650015954242,"superProperty": "{}","platform": "{}","utm": "{}","referrerDomain": "mail.qq.com","zs": 0,"sc": 0,"firstScreen": 1650027981031}',

    }
    req=requests.get(url,headers=headers)
    print(req.status_code)
    with open(r'E:\六级正序\list%d.mp3'%i,'wb') as file:
       file.write(req.content)
       print(i,"下载成功")
       req.close()
    i+=1
    time.sleep(random.randint(1,3))

标签：六级,random,req,爬虫,headers,1650027981031,新东方
From： https://www.cnblogs.com/Gimm/p/18116998

【爬虫】项目篇-使用selenium、requests爬取天猫商品评论
目录使用selenium使用requests使用seleniumfromselenium.webdriverimportChrome,ChromeOptionsfromselenium.webdriver.support.waitimportWebDriverWaitfromselenium.webdriver.common.byimportByfromselenium.webdriver.supportimportexpected_conditionsasE......
【爬虫】项目篇-爬取豆瓣电影周榜Top10，保存至Redis
写法一：编写两个爬虫程序文件：爬虫1将豆瓣一周口碑榜的电影url添加到redis中名为movie_url的列表中（注意避免多次运行导致重复的问题）；爬虫2从movie_url中读出网址，爬取每一部电影的导演、主演、类型、制片国家/地区、语言、上映日期、片长，并将它们保存到redis的hash表（自行命名）中。d......
【爬虫】项目篇-在https://www.kanunu8.com/book2抓取电子书
目录1)使用正则表达式2)使用bs41)使用正则表达式#使用requests库和正则表达式抓取在https://www.kanunu8.com/book3/任选的一本电子书importrequestsimportreimportosimporttimeheader={'user-agent':"Mozilla/5.0(WindowsNT10.0;Win64;x64)AppleWebKit......
【爬虫】项目篇-豆瓣读书Top250（https://book.douban.com/top250）
抓取豆瓣读书Top250（https://book.douban.com/top250）每本书的书名、作者、出版社、出版时间、价格、评分等数据，将结果分别保存为csv文件和excel文件importxlwtimportxlsxwriterimportreimportrequestsfromfake_useragentimportUserAgentimportcchardetimporttime......
【爬虫】项目篇-爬取丁香园的疫情数据
```#编写程序，从丁香园获取国内近期疫情数据，按省份提取当前确诊数，#确诊总数，疑似病例数，治愈数，死亡数，高危数等数据，保存到csv文件或excel文件中。importrequestsimportxlsxwriterfromfake_useragentimportUserAgentimportcchardetimportreimportjsonfrombs4importBeautif......
【爬虫】项目篇-爬取福州公交线路并保存至MongoDB
#http://www.fz-bus.cn/index.asp#1）在MongoDB中创建一个数据库和一个集合。#2）在程序执行过程中可输入线路名称查询公交线路，#每查询到一条线路的信息后，查询MongoDB数据库中是否存在该线路。若存在，则不做任何操作，否则执行第3步。#将线路名称、起点和终点、途径站点、#冬季首......
【爬虫】debug篇-关于fake_useragent无法使用：Error occurred during loading data. Tr
Erroroccurredduringloadingdata.Tryingtousecacheserverhttps://fake-useragent.herokuapp.com/browsers/0.1.11Traceback(mostrecentcalllast):File"D:\python\lib\site-packages\fake_useragent\utils.py",line154,inloadfori......
【爬虫】项目篇-爬取豆瓣电影周榜
目录使用re爬取+为请求头，保存为csv使用re爬取2+不保存使用xpath+lxml.html+lxml.etree使用re爬取+为请求头，保存为csvimportrequestsimportreimportcsvfromfake_useragentimportUserAgent#re文档：#https://docs.python.org/zh-cn/3.8/library/re.html#re.Sheader=......
【爬虫】第三章-解析库的使用
目录正则表达式XPathBeautifulSoupCSS-Selectorpyquery正则表达式XPathhttps://www.w3school.com.cn/xpath/xpath_axes.aspBeautifulSoupCSS-Selectorhttps://www.w3school.com.cn/css/css_list.asppyquery......
【爬虫】项目篇-使用xpath爬取搜房网二手房信息
#使用requests和xpath从搜房网上抓取福州地区的二手房房源信息#（要求获取所有分页上的房源，且每套房源包含标题、楼盘、#地点、经纬度、面积、房型、楼层、朝向、建筑年代、单价、总价、经纪人、联系电话等，缺数据的留空）。importrequestsfromlxmlimportetreefromfake_use......

【爬虫】项目篇-新东方六级听力音频

相关文章

赞助商

阅读排行