首页 > 编程语言 >win10下python3.9的代理报错问题解决(附web3的polygon爬虫源码)

win10下python3.9的代理报错问题解决(附web3的polygon爬虫源码)

时间:2023-01-19 14:11:40浏览次数:87  
标签:polygon res self list list2 源码 报错 print proxies

背景

因为工作中经常需要代理访问,而开了代理,request就会报错SSLError,如下:

requests.exceptions.SSLError: HTTPSConnectionPool(host='test-admin.xxx.cn', port=443): Max retries exceeded with url: /xxx/xxx/xxx (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1123)')))

具体原因是python自身的 urllib 库没有正确配置 HTTPS 代理,参考链接(我本地环境是win10,python3.9)

如何解决

当然最简单的方式是手动关闭代理,如何开着代理也ok呢,如下所示

proxies = {"http": None, "https": 'http://127.0.0.1:7890'}  # 设置本地代理
res = requests.get(url=self.url, params=self.params, proxies=proxies)

附:polygon爬虫源码

import math
import requests
from bs4 import BeautifulSoup


class Polygon():
    url = 'https://polygonscan.com/txs'
    headers = {
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36'
    }
    var_trans_list = []

    def __init__(self):
        global mian_address, proxies
        mian_address = '0x4C59739D4B43cc9b8C81d959F93e4cfD69736244' # input("请输入账号地址:")
        proxies = {"http": None, "https": 'http://127.0.0.1:7890'}  # 设置本地代理
        self.params = {
            'a': mian_address
        }
        self.res = ''
        page = self.is_paging()
        if page>1:
            print(f'共有{page}页数据')
            for p in range(1, page+1):
                self.params = {
                    'a': mian_address,
                    'p': p
                }
                print(f'正在获取第{p}页。。。')
                self.find_data()
        else:
            self.find_data()
        print(f'全部数据共{len(self.var_trans_list)}条,为{self.var_trans_list}')

    # 调用接口获取源数据
    def catch_polygon(self):
        self.res = requests.get(url=self.url, params=self.params, proxies=proxies)
        if self.res.status_code == 200:
            res_utf8 = self.res.content.decode("utf-8")
            self.res_html = BeautifulSoup(res_utf8, 'html.parser')
        else:
            print('抓取页面失败')

    # 是否存在分页
    def is_paging(self):
        self.catch_polygon()
        new_reslut = self.res_html.find(class_='mb-2 mb-md-0')
        if new_reslut:
            list = new_reslut.text.split(' ')
        if int(list[3]) >= 0:
            print(f"一共查询到{list[3]}条数据")
            return math.ceil((int(list[3])+1)/50)

    # 过滤匹配数据
    def find_data(self):
        list2 = []
        self.catch_polygon()
        new_reslut = self.res_html.find(class_='table table-hover').find_all('tr')
        for i in new_reslut:
            list = []
            for j in i:
                if j.text:
                    list.append(j.text)
            list2.append(list)
        if list2:
            list2 = list2[1:]  # 去除第一行无效数据
            print(f"当页共{len(list2)}条数据")
            self.var_trans_list.extend(list2)
        else:
            print('没有查询到数据')

if __name__ == '__main__':
    Polygon()

  

标签:polygon,res,self,list,list2,源码,报错,print,proxies
From: https://www.cnblogs.com/qgc1995/p/16792362.html

相关文章