接口获取阿里日志服务

标签：get url ip 接口阿里 time 日志 txt

接口获取日志服务的nginx日志　　周一的时候，领导给了个任务我：说怎么把nginx的访问日志做成api。更准确地说，就是用接口的方式拿到nginx日志指定几个字段（日期、时间、IP、访问url、IP来源地），导入到mysql里，最后再放到监控大屏上（这步还没有）。　　目前已经实现了前两步。 需求实现 　　我一开始想到是用阿里云的日志服务去做，因为做等保的时候，一直有导访问日志进去，虽然只是保留了短短30天。果然是有接口去获得，再加上调接口用的是ak，安全。　　调试可以先利用阿里的调试api控制台：https://next.api.aliyun.com/api/Sls/2020-12-30/GetLogs?spm=api-workbench.CodeSample%20Detail%20Page.0.0.6b1a5e54f0YEUM&lang=PYTHON&sdkStyle=dara 　　代码基本都写好了，我只需要稍稍修改就能用，传参主要是开始和结束时间戳，及查询sql语句（query变量）　　阿里云的日志服务把nginx完整的日志都收集过来的，所以日志量比较大，直接导入到数据库不太现实，正常做分析只需要几个url就行了。　　用了两个脚本去做：（1）调用阿里接口截取日志字段并写入txt文件（2）连接mysql，把txt文件写入到mysql数据表 注意点： 1、查询条数限制 　　大家看过控制台调试页面都知道，query变量写的是sql语句，默认符合查询条件，返回的条数只有1000条。我一开始以为不用limit去指定，会返回所有条数。而且这个limit范围还有限制，最大只能是1000000条（https://help.aliyun.com/document_detail/63470.html），超过的话，就有点麻烦了，得不断缩小查询的时间范围。 2、IP来源地为空情况 　　今天特意看看周五放的定时任务有没有写入到mysql表的时候，发现没有！！！就是因为在nginx日志中截取的IP，阿里云获取来源地的函数：ip_to_city("real_remote_addr") 是空的，我估计这个函数背后也是有个地址库之类的，有些IP来源地还是没有收录到（香港、澳门这些）　　本来我在txt文件上处理，但是因为有字段顺序，发现还是在脚本（1）里处理会方便 3、历史日志不建议导入到原logstore 　　上面说过，之前收集的日志设置了保留30天，但是领导想分析历史数据，所以叫我测试下支不支持导入以前的日志。　　首先是支持的。参考：https://help.aliyun.com/document_detail/71414.html 　　然后判断有没有导入的依据是：本地事件文件local_event.json有没有被清空，Logtail安装目录ilogtail.LOG文件中是否包含process local event参数。　　一开始我傻不溜秋地想导入到原来收集日志的logstore上，以为导入历史日志，控制台展示的是nginx日志的那个时间字段。假设导入3月8日的nginx日志，它最终显示的不是在下面3月8日上有数据，而是当天你导入的时间点上。

　　这就有问题了，原日志logstore基本每时每刻有新数据写入，导入的历史数据，也一起导进去，导致跟当前日志用同一个时间戳。查询的时候，会发现某个时间戳范围内夹杂了当前的日志，还有历史日志。这对之后的分析十分不便。本来用一天内的时间戳能查出今天的数据，但因为额外导入了历史数据，所以得不断缩小时间戳范围来筛选日志，前面也说到，查询语句最多只能查100万条。

　　所以最后决定用另一个logstore去导入历史日志，不要影响原来的。

最后贴个脚本

nginx日志格式

 map $http_x_forwarded_for  $real_remote_addr {
    ""    $remote_addr;
          ~^(?P<firstAddr>[0-9\.]+),?.*$ $firstAddr;
    }

    log_format mylogs
       '{"@timestamp":"$time_iso8601",'
    '"host":"$hostname",'
    '"server_ip":"$server_addr",'
    '"client_ip":"$remote_addr",'
    '"xff":"$http_x_forwarded_for",'
    '"real_remote_addr":"$real_remote_addr",'
    '"domain":"$host",'
    '"url":"$uri",'
    '"referer":"$http_referer",'
    '"upstreamtime":"$upstream_response_time",'
    '"responsetime":"$request_time",'
    '"status":"$status",'
    '"size":"$body_bytes_sent",'
    '"protocol":"$server_protocol",'
    '"upstreamhost":"$upstream_addr",'
    '"file_dir":"$request_filename",'
    '"http_user_agent":"$http_user_agent"'
    '}';

（1）调用阿里接口截取日志字段并写入txt文件

 1 # encoding: utf-8
 2 import time
 3 import os
 4 
 5 from aliyun.log import *
 6 
 7 def main():
 8     # 日志服务的服务入口。更多信息，请参见服务入口。此处以杭州为例，其它地域请根据实际情况填写。
 9     endpoint = 'cn-xxx.log.aliyuncs.com'
10     # 阿里云访问密钥AccessKey
11     access_key_id = 'xxx'
12     access_key = 'xxx'
13 
14     # Project和Logstore名称。
15     project = ''
16     logstore = ''
17 
18     # 创建日志服务Client。
19     client = LogClient(endpoint, access_key_id, access_key)
20 
21     # 使用sql查询日志。
22     query = 'select "time_iso8601" as "时间", "uri" as "访问url", "real_remote_addr" as "客户端ip", ip_to_city("real_remote_addr") as "访问来源" from log order by "时间" desc limit 1000000'
23     
24     # from_time和to_time表示查询日志的时间范围，UNIX时间戳格式。
25     
26     ## 今天0点的时间戳 
27     to_time = int(time.time()) -int(time.time()-time.timezone) %86400
28 
29     ## 前一天0点的时间戳
30     from_time = to_time - 86400
31 
32     print("ready to query logs from logstore %s" % logstore)
33 
34     # 该示例中，query为查询语句，接口中line参数控制返回日志条数，line取值为3, 以query查出来的为准。
35     request = GetLogsRequest(project, logstore, from_time, to_time, '', query=query, line=3, offset=0, reverse=True)
36     response = client.get_logs(request)
37     # 打印查询结果。
38     print('-------------Query is started.-------------')
39     #for log in response.get_logs():
40     #   print(log.contents.items())
41     print('-------------Query is finished.-------------')
42 
43     ## 我发现不删除txt文件有时候会写不进去，即使后面 mode='w'
44     if os.path.exists("存放结果i.txt"):
45         os.remove("存放结果.txt")
46     else:
47         print ("The file does not exist")
48 
49     # 取出log中key的值，并保存至本地文件。
50     print('-------------Start writing logs to local files.-------------')
51     
52     webpage_text = [""] 
53     for loglocal in response.get_logs():
54         filename = '存放结果.txt'
55         with open(filename, mode='a') as fileobject:
56             ### 1、nginx时间处理，分 日期，时间
57             t0 = loglocal.contents.get('时间')
58             format1 = "%Y-%m-%dT%H:%M:%S+08:00"
59             t1 = time.strptime(t0, format1)
60             format2 = "%Y-%m-%d %H:%M:%S"
61             t2 = time.strftime(format2, t1)
62 
63             ### 2、访问来源为空判断
64             address = loglocal.contents.get('访问来源')
65             if len(address)  == 0:
66                 address = "未知"
67 
68             ### 3、url判断
69             url = loglocal.contents.get('访问url')
70             if url in ['筛选url', '筛选url-2', '筛选url-3', '....']:
71 
72                 ### 4、IP来源判断
73                 IP = loglocal.contents.get('客户端ip')
74                 ## 对特定白名单ip进行分类
75                 if IP in ['xxx', 'xxx']:
76                     fileobject.write(t2 + ' ' + loglocal.contents.get('客户端ip') + ' ' + address + ' ' + loglocal.contents.get('访问url') + ' ' + '某外部平台' + '\n')
77                 elif IP == "xxx" or IP == "xxx":
78                     fileobject.write(t2 + ' ' + loglocal.contents.get('客户端ip') + ' ' + address + ' ' + loglocal.contents.get('访问url') + ' ' + '某外部系统' + '\n')
79                 else:
80                     fileobject.write(t2 + ' ' + loglocal.contents.get('客户端ip') + ' ' + address + ' ' + loglocal.contents.get('访问url') + '\n')
81 
82     print('-------------Finishing writing logs to local files.-------------')
83 
84 if __name__ == '__main__':
85     main()

View Code

（2）写入mysql表

 1 import pymysql
 2 import re
 3 import time
 4 """
 5 1、连接本地数据库
 6 2、建立游标
 7 3、创建表
 8 4、插入表数据、查询表数据、更新表数据、删除表数据
 9 """
10 
11 db = pymysql.connect(host=mysql的ip地址,
12                    user=mysql登录用户,
13                    port=mysql端口,
14                    passwd=mysql登录用户密码,
15                    db='ljy_test',
16                    charset='utf8')
17 #创建游标                                         
18 cursor = db.cursor()
19 
20 count=0
21 with open('存放结果.txt', "r",encoding = 'utf-8') as f2:
22     content=f2.readlines()#读取文本内容,返回list
23     for i in content:
24         count += 1
25         txt=re.split('\s+',i)
26  
27         #print(count,txt[0],txt[1],txt[2],txt[3],txt[4],txt[5])
28         #time.sleep(0.1)
29         sql4_insert_info='insert into ljy_mobileapi(visit_date,visit_time,ip_address,src_area,url,visit_user) values(%s,%s,%s,%s,%
30 s,%s);'
31 
32         args1=(txt[0],txt[1],txt[2],txt[3],txt[4],txt[5])
33         cursor.execute(sql4_insert_info,args=args1)
34 
35         db.commit()
36     print("数据插入完毕")

View Code

标签：get,url,ip,接口,阿里,time,日志,txt
From： https://www.cnblogs.com/windysai/p/17209423.html

接口获取阿里日志服务

相关文章

赞助商

阅读排行