爬取起点小说信息存入excel

时间：2022-10-21 12:55:05浏览次数：58

标签：xpath parse img worksheet excel 存入爬取 print class

点击查看代码

import urllib.request
from lxml import etree
import xlwt
# 请求地址


url = 'https://www.qidian.com/all/action1-page1'
# 用户代理
headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36 SLBrowser/8.0.0.2242 SLBChan/10'
}
# 定制请求头

request = urllib.request.Request(url=url,headers=headers)
# print(request)
# 发送请求访问服务器，返回响应对象
response = urllib.request.urlopen(request)
# 解码响应对象，得到页面源码
content = response.read().decode('utf-8')
# print(content)
# 解析服务器响应的文件
parse_html = etree.HTML(content)

# 编写xpath路径，获取想要的数据,xpath的返回值是列表类型
# 小说路径：
bookurl=parse_html.xpath('//ul[@class="all-img-list cf"]/li/div/a/@href')
# 小说照片：
bookps=parse_html.xpath('//ul[@class="all-img-list cf"]/li/div/a/img/@src')
# 小说名称：
bookname=parse_html.xpath('//ul[@class="all-img-list cf"]/li/div/h2/a/text()')
# 小说作者：
bookauthor=parse_html.xpath('//ul[@class="all-img-list cf"]/li/div/p/a[1]/text()')
# 小说大类别：
booktype=parse_html.xpath('//ul[@class="all-img-list cf"]/li/div/p/a[2]/text()')
# 小说小类别：
bookmintype=parse_html.xpath('//ul[@class="all-img-list cf"]/li/div/p/a[@class="go-sub-type"]/text()')
# 小说完本：
bookend=parse_html.xpath('//ul[@class="all-img-list cf"]/li/div/p[1]/span[1]/text()')
# 小说简介：
bookcoll=parse_html.xpath('//ul[@class="all-img-list cf"]/li/div/p[@class="intro"]/text()')

# print(len(bookurl))
# print(bookps)
# print(bookname)
# print(bookauthor)
# print(booktype)
# print(bookmintype)
# print(bookend)
# print(type(bookcoll))

# datalist=[bookurl,bookps,bookname,bookauthor,booktype,bookmintype,bookend,bookcoll]
# print(len(datalist))

wookbook = xlwt.Workbook(encoding="utf-8") # 创建一个Workbook对象
worksheet = wookbook.add_sheet("起点") #创建一个一个表
col = ('小说连接', "图片链接", "小说名称", "小说作者", "小说大类别", "小说小类别", "小说完本", "小说简介")
for i in range(0,8):
    worksheet.write(0,i,col[i])

for i in range(0,20):
    worksheet.write(i + 1, 0, bookurl[i])
    worksheet.write(i + 1, 1, bookps[i])
    worksheet.write(i + 1, 2, bookname[i])
    worksheet.write(i + 1, 3, bookauthor[i])
    worksheet.write(i + 1, 4, booktype[i])
    worksheet.write(i + 1, 5, bookmintype[i])
    worksheet.write(i + 1, 6, bookend[i])
    worksheet.write(i + 1, 7, bookcoll[i])
# 覆盖保存
wookbook.save(r"C:\Users\Administrator\Desktop\book.xls") #保存数据表

标签：xpath,parse,img,worksheet,excel,存入,爬取,print,class
From： https://www.cnblogs.com/lzp110119/p/16813095.html

spring导入excel
springboot导入excel引入pom依赖<dependency><groupId>org.apache.poi</groupId><artifactId>poi</artifactId><version>3.13</version></dependency>......
java版Excel文件导入数据库源代码
java版Excel文件导入数据库源代码 servlet容器:tomcat(或者其他)数据库:oracle(或者其他)使用框架:jsp+struts1.2支持字符型/数据型/日期型/CLOB等数据使用步骤:1.......
python 爬取网页，带有post参数
https://blog.csdn.net/weixin_40340586/article/details/119457955 记录一下自己的爬虫踩过的坑，上一次倒是写了一些，但是写得不够清楚，这次，写清楚爬取的过程。这个网站......
postman使用excel参数批量执行
第一步，写好连接，报错。参数使用{{name}},这样的划分。保存接口第二步，找到runner。选择接口所在的文件夹，点击runner 第三步，选择接口和文件点击run，运行，......
EasyExcel根据模板填充（多sheet页封装工具方法）
原文链接:https://www.cnblogs.com/Donnnnnn/p/15412128.html官方教程：https://www.yuque.com/easyexcel/doc/fill 一、填充模板里单个sheet页模板 ......
蜘蛛爬取网页
蜘蛛访问每一个网页时，都会访问网站目录下的robots.txt文件，如果robots.txt文件禁止搜索引擎抓取，搜索引擎将遵循规则。每个搜索引擎蜘蛛都有自己的身份用户代理名称；为了......
【Python】【爬虫】爬取小说5000章，遇到的爬虫问题与解决思路
爬虫问题分析回顾之前写了一个爬取小说网站的多线程爬虫，操作流程如下：先爬取小说介绍页，获取所有章节信息（章节名称，章节对应阅读链接），然后使用多线程的方式（pool=Pool(50)），......
EasyExcel导出多个文件并直接打包成zip下载
EasyExcel导出多个文件并直接打包成zip下载StringzipName="xxx.zip";try{response.addHeader("Content-Disposition","attachment;filename="+URLEncoder.......
2022最新可用，喜马拉雅付费音频爬取工具，给力推荐！
之前找了几个Python爬取喜马拉雅付费音频的脚本，但是无奈好多都用不了了，毕竟经常算法更新什么的，然后自己又不会写代码，太难了。找了好久，终于找到一款不需要会代码就能爬......
python 爬取国家统计局官网的统计用区划和城乡划分代码发现了惊天秘密!!!附python
为了在页面做5级级联菜单需要将名称和代码进行简化SELECTSUBSTR(AreaCode,1,Level*2)asa,AreaCode,Level,NameFROMareaWHEREAreaCodeLIKE'11%'ANDLevel<4LIM......

爬取起点小说信息存入excel

相关文章

赞助商

阅读排行