python爬取某游戏皮肤（学习使用）

时间：2022-11-29 23:36:15浏览次数：40

标签：xpath 游戏 python resp 爬取 headers url print div

使用到了requests,xpath,re解析模块，同时使用了time,os模块辅助。使用xpath解析时，并不是所见即所得，没有re好用。

一、requests模块：

url = "https://pvp.qq.com/web201605/herolist.shtml"
headers = {
            "User-Agent": "Mozilla/5.0(Windows NT 6.1;WOW64) AppleWebKit/537.36(KABUL, like Gecko) "
                            "Chrome/86.0.4240.198Safari/537.36 "
        }
resp = requests.get(url=url, headers=headers)
resp.encoding = resp.apparent_encoding

二、xpath模块

from lxml import etree

e = etree.HTML(resp.text)
href = e.xpath("//div[@class='herolist-box']/div/ul/li/a/@href")
names = e.xpath("//div[@class='herolist-box']/div/ul/li/a/img/@alt")

三、re模块

import re

reg = r'background:url\(\'.*?\'\)'
    src = re.findall(reg, resp_1.text, re.S)

四、os和time模块

import time
import os

if not os.path.exists('03-herodetail'):
    os.makedirs('03-herodetail')
    
time.sleep(2)

五、图片文件保存：

with open(f'03-herodetail/{i}.jpg', "wb") as f:
    f.write(resp.content)

六、完整实例：

import requests
from lxml import etree
import time
import os
import re

if not os.path.exists('03-herodetail'):
    os.makedirs('03-herodetail')
url = "https://pvp.qq.com/web201605/herolist.shtml"
headers = {
            "User-Agent": "Mozilla/5.0(Windows NT 6.1;WOW64) AppleWebKit/537.36(KABUL, like Gecko) "
                            "Chrome/86.0.4240.198Safari/537.36 "
        }
resp = requests.get(url=url, headers=headers)
resp.encoding = resp.apparent_encoding
e = etree.HTML(resp.text)
href = e.xpath("//div[@class='herolist-box']/div/ul/li/a/@href")
names = e.xpath("//div[@class='herolist-box']/div/ul/li/a/img/@alt")
# for name in names:
#     print(name)
lst_link = []
for link in href:
    lst_link.append("https://pvp.qq.com/web201605/"+ link)
# print(lst_link)

for item in lst_link:
    print(item)
    resp_1 = requests.get(item, headers=headers)
    resp_1.encoding = resp_1.apparent_encoding
    # print(resp_1.status_code)
    # print(resp_1.text)
    # break
    e_1 = etree.HTML(resp_1.text)
    data_title = e_1.xpath("//div[@class='zk-con1 zk-con']/div/div/div/ul/@data-imgname")

    # print(data_title,type(data_title))
    # break
    # bi_zhi_url = e_1.xpath("//div[3]/div[1]/@style")
    data_src = e_1.xpath("//div[@class='zk-con1 zk-con']/div/div/div/ul/li//@src")
    reg = r'background:url\(\'.*?\'\)'
    src = re.findall(reg, resp_1.text, re.S)
    n = src[0].split('//')
    # print(n)
    # print(type(n))
    name = n[1][:-7]
    # break
    # data_name = e_1.xpath("//div[@class='zk-con1 zk-con']/div/div/div/ul/@data-imgname")
    # print(data_name)
    names = data_title[0].split('|')
    count = 0
    for i in names:
        count += 1
        # print(i)
        src_1 = src
        href = "http://" + name + str(count) + ".jpg"
        resp = requests.get(url=href, headers=headers)
        with open(f'03-herodetail/{i}.jpg', "wb") as f:
            f.write(resp.content)
    time.sleep(2)

七：效果截图：

python爬取某游戏皮肤（学习使用）_chrome

标签：xpath,游戏,python,resp,爬取,headers,url,print,div
From： https://blog.51cto.com/u_14012524/5897223

python 遍历的四种方法
dic1={'date':'2018.11.2','name':'carlber','work':"遍历",'number':3}foriindic1:#遍历字典中的键print(i)forkeyindic1.keys():print(key)f......
python中面向对象特性
1.类的继承#面向对象的特性：封装，继承，多态#先有封装，才有继承#先有封装和继承，才有多态#1.类的继承#富二代继承财产#定义类，不写括号，也继承基类objectclassW......
python 对接各大数据库，快速上手！
1、mysql 安装pymysqlpipintsallpymysql快速上手importpymysql#第一步：连接到数据库con=pymysql.connect(host="xxxx",#数据库......
Python基本语法
Python基本语法字面量掌握字面量的含义代码中，被写在代码中的固定的值，称之为字面量常见的字面量类型我们目前了解：整数、浮点数、字符串这三类即可基于print语句完成......
redis 及其在 python 内的使用
2022-11-2923:03:17星期二Redis简介 redis是一个Key-Value数据库，Value支持string(字符串)，list(列表)，set(集合)，zset(有序集合)，hash(哈希类型)等类型。是一......
python中安装第三方库
#寻找指定第三方库pipsearchujson提示：如果要更新pip自身，对于macOS系统来说，可以使用命令pipinstall-Upip。在Windows系统上，可以将命令替换为python-mpipinstal......
Python 中的 gRPC 文件上传和下载
通过阅读本文，您将学习如何设置自己的gRPC客户端和服务器以使用Python上传/下载文件。供您参考，gRPC被称为远程过程调用，这是一种现代开放源代码，用于将设备、移动应......
[oeasy]python0022_ python虚拟机_反编译_cpu架构_二进制字节码_汇编语言
程序本质回忆上次内容python3的程序是一个5.3M的可执行文件我们通过which命令找到这个python3.8的位置将这个python3.8复制到我们的用户目录下这个......
python-反反爬抖音(绕过抖音登录与新手引导)
"""反反爬--driver.get(网页)--人工登录后关闭程序--driver.get(网页)"""#1.普通浏览器:不能记录登录信息，容易触发反爬机制#fr......
python-爬取有道翻译功能
fromselenium.webdriverimportChrome,ChromeOptionsfromselenium.webdriver.common.byimportByimporttimeoption=ChromeOptions()option.add_argument("-......

python爬取某游戏皮肤（学习使用）

相关文章

赞助商

阅读排行