使用python解析nginx日志

时间：2023-08-09 09:56:27浏览次数：41

标签：count sheet python request write nginx time 日志 row

性能测试时，需使用生产环境各接口请求比例分配接口请求比，nginx统计脚本如下：

import re
import pandas as pd
import xlwt

obj = re.compile(
    r'(?P<ip>.*?)- - \[(?P<time>.*?)\] "(?P<request>.*?)" (?P<request_time>.*?) (?P<status>.*?) (?P<bytes>.*?) "(?P<referer>.*?)" "(?P<ua>.*?)"')


def load_log(path):
    lst = []
    error_lst = []
    i = 0
    with open(path, mode="r", encoding="utf-8") as f:
        for line in f:
            line = line.strip()
            dic = parse(line)
            if dic:
                lst.append(dic)
            else:
                error_lst.append(line)
            i += 1

    return lst, error_lst

def NumIn(s):
    for char in s:
        if char.isdigit():
            return True
    return False

def parse(line):
    dic = {}
    try:
        result = obj.match(line)

        time = result.group("time")
        time = time.replace(" +0800", "")
        time_min = time[:17]
        time_10min = time[:16]
        time_hour = time[:14]
        dic['time'] = time
        dic['time_min'] = time_min
        dic['time_10min'] = time_10min
        dic['time_hour'] = time_hour

        request = result.group("request")
        a = request.split()[1].split("?")[0]
        c = '/'.join(a.split('/')[:5])
        b = request.split()[0]
        for item in c.split('/')[4]:
            if NumIn(item):
                c='/'.join(a.split('/')[:4])
        dic['request'] = b + " " + c

        return dic

    except:
        return False

def analyse(lst,project):
    df = pd.DataFrame(lst)
    df = df[df['request'].str.contains(project)]
    request_time_count = pd.value_counts(df['time']).reset_index().rename(columns={"index": "time", "time": "count"}).iloc[:100, :]
    request_time_min_count = pd.value_counts(df['time_min']).reset_index().rename(columns={"index": "time_min", "time_min": "count"}).iloc[:100, :]
    request_time_10min_count = pd.value_counts(df['time_10min']).reset_index().rename(columns={"index": "time_10min", "time_10min": "count"}).iloc[:100, :]
    request_time_hour_count = pd.value_counts(df['time_hour']).reset_index().rename(columns={"index": "time_hour", "time_hour": "count"}).iloc[:24, :]
    request_count = pd.value_counts(df['request']).reset_index().rename(columns={"index": "request", "request": "count"}).iloc[:, :]
    request_time_count_values = request_time_count.values
    request_time_min_count_values = request_time_min_count.values
    request_time_10min_count_values = request_time_10min_count.values
    request_time_hour_count_values = request_time_hour_count.values
    request_count_values = request_count.values


    wb = xlwt.Workbook()

    sheet = wb.add_sheet("url请求次数及占比")
    row = 0
    sheet.write(row, 0, "request_url")
    sheet.write(row, 1, "request_type")
    sheet.write(row, 2, "count")
    sheet.write(row, 3, "百分比")
    sheet.write(row, 4, "请求总数")
    row += 1
    sheet.write(row, 4, df.shape[0])
    for item in request_count_values:
        sheet.write(row, 0, item[0].split(" ")[1])
        sheet.write(row, 1, item[0].split(" ")[0])
        sheet.write(row, 2, item[1])
        sheet.write(row, 3, "%.2f%%" % (round(float(item[1]/df.shape[0]) * 100, 2)))
        row += 1

    sheet = wb.add_sheet("秒级请求数top100")

    row = 0
    sheet.write(row, 0, "time")
    sheet.write(row, 1, "count")
    row += 1
    for item in request_time_count_values:
        sheet.write(row, 0, item[0])
        sheet.write(row, 1, item[1])
        row += 1

    sheet = wb.add_sheet("分钟请求数top100")

    row = 0
    sheet.write(row, 0, "time_min")
    sheet.write(row, 1, "count")
    row += 1
    for item in request_time_min_count_values:
        sheet.write(row, 0, item[0]+':00'+"-"+item[0]+':59')
        sheet.write(row, 1, item[1])
        row += 1

    sheet = wb.add_sheet("10分钟请求数top100")

    row = 0
    sheet.write(row, 0, "time10")
    sheet.write(row, 1, "count")
    row += 1
    for item in request_time_10min_count_values:
        sheet.write(row, 0, item[0]+'0:00'+"-"+item[0]+'9:59')
        sheet.write(row, 1, item[1])
        row += 1

    sheet = wb.add_sheet("小时级请求数")

    row = 0
    sheet.write(row, 0, "timehour")
    sheet.write(row, 1, "count")
    row += 1
    for item in request_time_hour_count_values:
        sheet.write(row, 0, item[0]+':00:00'+"-"+item[0]+':59:59')
        sheet.write(row, 1, item[1])
        row += 1



    wb.save("nginx_log.xls")

if __name__ == '__main__':
    lst, error_lst = load_log(path="D:\Desktop\\****imc.log")
    analyse(lst,project='/SVC***/')

统计结果如下：

标签：count,sheet,python,request,write,nginx,time,日志,row
From： https://www.cnblogs.com/teangtang/p/17616087.html

[Python爬虫]selenium4新版本使用指南
From:码同学测试公众号------------------------------------Selenium是一个用于Web应用程序测试的工具。Selenium测试直接运行在浏览器中，就像真正的用户在操作一样。支持的浏览器包括IE（7,8,9,10,11），MozillaFirefox，Safari，GoogleChrome，Opera，Edge等。这个工具的主要功能包括......
serilog日志
1安装包第一个是基础的配置，第二个包是集成Serilog日志记录器的扩展包，简化在.NET应用程序中集成和配置Serilog的过程。第三个是保存在文档中 2 配置文件 public static void AddSerilog(this WebApplicationBuilder builder) ......
EFK家族---Fluentd日志收集
介绍Fluentd是一个开源的数据收集器，专为处理数据流设计，使用JSON作为数据格式。它采用了插件式的架构，具有高可扩展性高可用性，同时还实现了高可靠的信息转发。具备每天收集5000+台服务器上5T的日志数据，每秒处理50000条消息的性能；Fluentd是由Fluent+d得来，d生动形象地标明了它是以一个......
遇到问题--python--爬虫--使用代理ip第二次获取代理ip失败
情况获取代理ip的代码defferch_proxy_ips():try:api="http://dynamic.goubanjia.com/dynamic/get/12323.html?sep=3"response=urllib.request.urlopen(api,timeout=8)the_page=response.read()content=the_page.decode(&......
python--web--让python提供api服务--aiohttp
aiohttp介绍官网上有这样一句话介绍：AsyncHTTPclient/serverforasyncioandPython翻译过来就是基于asyncio和Python实现的异步HTTP客户端/服务器asyncio可以实现单线程并发IO操作。也就是做异步操作。如果仅用在客户端，发挥的威力不大。如果把asyncio用在服务器端，例如Web服务器......
python--mysql--驱动简介和使用
本篇文章介绍Python操作MySQL的几种方式，你可以在实际开发过程中根据实际情况合理选择。MySQL-python(MySQLdb）–支持python2.xMySQL-python又叫MySQLdb，是Python连接MySQL最流行的一个驱动，很多框架都也是基于此库进行开发，遗憾的是它只支持Python2.x，而且安装的时候有很多前......
python积累--读写文本文件实例
读写文件是最常见的IO操作。我们经常从文件读取输入，将内容写到文件。读文件在Python中，读文件主要分为三个步骤：打开文件读取内容关闭文件一般使用形式如下：try:f=open('/path/to/file','r')#打开文件data=f.read()#读取文件内容fina......
数据挖掘(七) -----在python程序中使用hail
我们在之前的文章中已经尝试安装了hail和简单的使用数据挖掘(五)-----基于Spark的可伸缩基因数据分析平台开源存储运算架构hail全面了解和安装但是我们发现这种hail的运行方式是需要进入到conda的hail的虚拟环境中才能运行的。我们业务一般来说都是在外层执行，还有其他的业务逻......
opencv-python特征匹配
本章节介绍暴力特征匹配，FLANN特征匹配等。根据前面章节获取的图像特征点和描述子之后，可以将两幅图像进行特征匹配。1暴力特征匹配通过枚举的方式进行特征匹配，使用第一幅图像中一个特征的描述子，并使用一些距离计算将其与第二幅图像中的所有其他特征匹配，返回最近的一个。opencv......
python 测试框架中的数据库连接类(mysql示例)
1.数据库信息yaml文件conf_env.yamlhost:doname:demo.pab.com.cnport:80database:host:"db.fat.qa.pab.com.cn"user:"deploy"password:"thess"dbname:"testdb"charset:"utf8"2.与数据库yaml文件同级目录，创建配置conf......

使用python解析nginx日志

相关文章

赞助商

阅读排行