整体链路
filebeat ----> kakfa----> logstash -----> elk
filebeat
采集主机上服务日志和慢日志推送到kafka
filebeat.inputs:
- type: log
paths:
- "/databases/deploy/log/tidb_slow_query.log"
- "/databases/deploy/log/tidb_slow_query.log"
fields:
service: tidb_slow
ip: yourip
cluster: clustername (用于区分多集群)
exclude_lines: ['COMMIT', 'commitTxn', 'mysql.', 'commit;']
multiline.pattern: "# Time:"
multiline.negate: true
multiline.match: after
tail_files: true
fields_under_root: true
scan_frequency: 1s
- type: log
paths:
- "/databases/deploy/log/pd.log"
- "/databases/deploy/log/tidb.log"
- "/databases/deploy/log/tikv.log"
fields:
service: tidb_serverlog
ip: yourip
cluster: com
exclude_lines: ['pushgateway','connection closed','new connection']
tail_files: true
fields_under_root: true
scan_frequency: 1s
- type: log
paths:
- "/data/log/mongodb/*.log"
- "/data/mongodb*/log/*.log"
fields:
service: mongodb_slow
ip: yourip
include_lines: ['ms']
exclude_lines: ['Successfully', 'killcursors', 'oplog.rs', 'admin.', 'config.']
multiline.pattern: "^20"
multiline.negate: true
multiline.match: after
tail_files: true
fields_under_root: true
scan_frequency: 1s
processors:
- drop_fields:
fields: ["beat", "offset", "input", "prospector"]
output.kafka:
hosts: ["kafka1.zzq.com:9092", "kafka2.zzq.com:9092", "kafka3.zzq.com:9092"]
topic: "beat-%{[service]}"
partition.round_robin:
reachable_only: false
required_acks: 1
compression: none
max_message_bytes: 10000000
logstash 主要是进行grok 对日志进行格式化
grok {
match => {
"message" => ["# Time: %{TIMESTAMP_ISO8601:generate_time}\n#\s+Txn_start_ts:\s+%{WORD}\n#\s+User:\s+%{WORD:user}\@%{IP:client_ip}\n#\s+Conn_ID:\s+%{WORD}\n#\s+Query_time:\s+%{NUMBER:query_time:float}\n#\s+Parse_time:\s+%{NUMBER}\n#\s+Compile_time:\s+%{NUMBER}\n#\s%{GREEDYDATA}\s+Prewrite_region:\s+%{NUMBER}\n#\s+DB:\s+%{WORD:DB}\n#\s+Is_internal:\s+%{WORD}\n#\s+Digest:\s+%{WORD}\n#\s+Num_cop_tasks:\s+%{NUMBER}\n#\s+Prepared:\s+%{WORD}\n#\s+Has_more_results:\s+%{WORD}\n#\s+Succ:\s+%{WORD}\n%{GREEDYDATA:sql}"]
}
remove_field => [ "message" ]
}
grok {
match => {
"message" => ["# Time: %{TIMESTAMP_ISO8601:generate_time}\n#\s+Txn_start_ts:\s+%{WORD}\n#\s+User:\s+%{WORD:user}\@%{IP:client_ip}\n#\s+Conn_ID:\s+%{WORD}\n#\s+Query_time:\s+%{NUMBER:query_time:float}\n#\s+Parse_time:\s+%{NUMBER}\n#\s+Compile_time:\s+%{NUMBER}\n#\s+Prewrite_time:\s+%{NUMBER}\s+Commit_time:\s+%{NUMBER}\s+Get_commit_ts_time:\s+%{NUMBER}\s+Write_keys:\s+%{NUMBER}\s+Write_size:\s+%{NUMBER}\s+Prewrite_region:\s+%{NUMBER}\n#\s+DB:\s+%{WORD:DB}\n#\s+Is_internal:\s+%{WORD}\n#\s+Digest:\s+%{WORD}\n#\s+Num_cop_tasks:\s+%{NUMBER}\n#\s+Prepared:\s+%{WORD}\n#\s+Has_more_results:\s+%{WORD}\n#\s+Succ:\s+%{WORD}\n%{GREEDYDATA:sql}"]
}
remove_field => [ "message" ]
}
ELK 展示界面
根据集群选择查看指定集群日志,故障时期按照process_keys 和 query_time 强排序 快速定位慢sql
FAQs
- tidb 3.0 没有集成慢日志系统,需要自行补齐.高版本有成型的慢日志体系
- tidb 3.0 中server 组件分布式部署,每个节点的slow_query log 都要收集否则会有遗漏。高版本中已经存在 cluster slow query 的概念更便捷