我正在进行日志分析,我需要通过首先提取文件中的日期来分析日志文件。然后,我需要使用这些日期来定义开始日期和结束日期。根据选定的开始和结束日期,只有该范围内的特定内容才可用,从而有效地按日期过滤日志内容。
我已成功使用正则表达式格式成功提取日期,但过滤基于开始和结束日期的日志内容未按预期工作。
@staticmethod
def filter_log_entries(log_content, start_date, end_date):
start_datetime = datetime.strptime(start_date, '%d/%b/%Y').replace(tzinfo=timezone.utc)
end_datetime = datetime.strptime(end_date, '%d/%b/%Y').replace(tzinfo=timezone.utc)
# Adjust end_datetime to include the entire end day
end_datetime = end_datetime + timedelta(days=1) - timedelta(seconds=1)
log_entry_pattern = re.compile(r'\[(\d{2}/[A-Za-z]{3}/\d{4}:\d{2}:\d{2}:\d{2} [+-]\d{4})\]')
filtered_entries = []
for line in log_content.split('\n'):
match = log_entry_pattern.search(line)
if match:
entry_datetime_str = match.group(1)
try:
entry_datetime = datetime.strptime(entry_datetime_str, '%d/%b/%Y:%H:%M:%S %z')
if start_datetime <= entry_datetime <= end_datetime:
filtered_entries.append(line)
except ValueError:
st.write(f"Date parsing error for line: {line}")
filtered_log_content = "\n".join(filtered_entries)
return filtered_log_content
日志内容(显示):
日志文件中的日期格式为 [17/May/2015:10:05:03 +0000 ],日志文件结束于 [20/May/2015:10:05:03 +0000]。我想过滤日志内容,这样如果我选择日期范围从 17/May/2015 到 18/May/2015,则仅选择此时间线内的内容。
83.149.9.216 - - [17/May/2015:10:05:03 +0000] "GET /presentations/logstash-monitorama-2013/images/kibana-search.png HTTP/1.1" 200 203023 "http://semicomplete.com/presentations/logstash-monitorama-2013/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36"83.149.9.216 - - [17/May/2015:10:05:43 +0000] "GET /presentations/logstash-monitorama-2013/images/kibana-dashboard3.png HTTP/1.1" 200 171717 "http://semicomplete.com/presentations/logstash-monitorama-2013/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36"83.149.9.216 - - [17/May/2015:10:05:47 +0000] "GET /presentations/logstash-monitorama-2013/plugin/highlight/highlight.js HTTP/1.1" 200 26185 "http://semicomplete.com/presentations/logstash-monitorama-2013/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36"83.149.9.216 - - [17/May/2015:10:05:12 +0000] "GET /presentations/logstash-monitorama-2013/plugin/zoom-js/zoom.js HTTP/1.1" 200 7697 "http://semicomplete.com/presentations/logstash-monitorama-2013/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36"83.149.9.216 - - [17/May/2015:10:05:07 +0000] "GET /presentations/logstash-monitorama-2013/plugin/notes/notes.js HTTP/1.1" 200 2892 "http://semicomplete.com/presentations/logstash-monitorama-2013/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36"83.149.9.216 - - [17/May/2015:10:05:34 +0000] "GET /presentations/logstash-monitorama-2013/images/sad-medic.png HTTP/1.1" 200 430406 "http://semicomplete.com/presentations/logstash-monitorama-2013/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36"83.149.9.216 - - [17/May/2015:10:05:57 +0000] "GET /presentations/logstash-monitorama-2013/css/fonts/Roboto-Bold.ttf HTTP/1.1" 200 38720 "http://semicomplete.com/presentations/logstash-monitorama-2013/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36"83.149.9.216 - - [17/May/2015:10:05:50 +0000] "GET /presentations/logstash-monitorama-2013/css/fonts/Roboto-Regular.ttf HTTP/1.1" 200 41820 "http://semicomplete.com/presentations/logstash-monitorama-2013/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36"83.149.9.216 - - [17/May/2015:10:05:24 +0000] "GET /presentations/logstash-monitorama-2013/images/frontend-response-codes.png HTTP/1.1" 200 52878 "http://semicomplete.com/presentations/logstash-monitorama-2013/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36"83.149.9.216 - - [17/May/2015:10:05:50 +0000]
完整链接: https:// github.com/linuxacademy/content-elastic-log-samples/blob/master/access.log
代码中的问题在于,正在尝试使用时区感知日期时间对象与不包含时区信息的日期时间对象进行比较。
修改的代码以解析日志条目中的时区信息或从开始日期和结束日期中删除时区信息。以下是更新后的代码:
@staticmethod
def filter_log_entries(log_content, start_date, end_date):
start_datetime = datetime.strptime(start_date, '%d/%b/%Y').replace(tzinfo=timezone.utc)
end_datetime = datetime.strptime(end_date, '%d/%b/%Y').replace(tzinfo=timezone.utc)
# Adjust end_datetime to include the entire end day
end_datetime = end_datetime + timedelta(days=1) - timedelta(seconds=1)
log_entry_pattern = re.compile(r'\[(\d{2}/[A-Za-z]{3}/\d{4}:\d{2}:\d{2}:\d{2} [+-]\d{4})\]')
filtered_entries = []
for line in log_content.split('\n'):
match = log_entry_pattern.search(line)
if match:
entry_datetime_str = match.group(1)
try:
# Parse the timezone information from the log entry
entry_datetime = datetime.strptime(entry_datetime_str, '%d/%b/%Y:%H:%M:%S %z')
# Now the comparison will work correctly
if start_datetime <= entry_datetime <= end_datetime:
filtered_entries.append(line)
except ValueError:
st.write(f"Date parsing error for line: {line}")
filtered_log_content = "\n".join(filtered_entries)
return filtered_log_content
更改说明:
-
我们现在使用
%z
格式代码解析日志条目中的时区信息。这将创建一个时区感知的datetime
对象。 -
我们保留了开始日期和结束日期作为时区感知
datetime
对象,以确保一致性。
通过这些更改,的代码现在应该可以根据提供的开始日期和结束日期正确过滤日志条目。
标签:python,linux,ubuntu,azure-log-analytics From: 78757328