因为有服务器线上偶发异常,所以需要获取Nginx访问日志重新请求补全数据,这时会借助python获取错误请求的URL,然后重新请求。
具体如下:
import time
import requests
from urllib.parse import urljoin
#定义一个集合用于存储已处理的URL,避免重复处理
processed_urls = set()
with open('hua.push.com.access.log', 'r') as f:
lines = f.readlines()
for line in lines:
if "/report/app?platform=test" in line:
# 使用split函数定位到目标字符串后, 取其后的全部内容
target_str = line.split('GET')[1].split(" HTTP/1.1")[0].strip()
sub_string = "/report/app?platform=test"
contains_sub_string = sub_string in target_str
if contains_sub_string:
# 发送GET请求
full_url = 'https://hua.push.com' + target_str
# 检查URL是否已经处理过
if full_url not in processed_urls:
processed_urls.add(full_url) # 添加到集合中表示已处理
print(target_str)
response = requests.get(full_url)
#延时
time.sleep(1)
print('非重复数:')
print(len(processed_urls))
标签:full,sub,python,processed,urls,日志,string,请求
From: https://www.cnblogs.com/air-liyan/p/18352872