Grafana监控java应用以及vCenter的方法
背景
最开始弄过vCenter的监控.
但是发现很多地方已经不合适了.
今天看了下jmx监控 java的应用. 顺便监控了下vCenter.
这里简单记录一下, 以便备忘.
需要注意的是 这里其实并不优秀 因为使用了agent 可能对产品有负面影响.
jmx监控
思路:
利用 javaagent的方式暴露一个端口出来.
然后通过prometheus 收集暴露出来的数据
通过grafana进行展示.
需要下载的文件主要有:
jmx_prometheus_javaagent-0.17.2.jar
prometheus-2.39.0-rc.0.linux-amd64.tar.gz
grafana-enterprise-9.1.6-1.x86_64.rpm
jmx监控准备
- 添加一个配置文件:
cat >simple-config.yml <<-EOF
lowercaseOutputLabelNames: true
lowercaseOutputName: true
whitelistObjectNames: ["java.lang:type=OperatingSystem"]
blacklistObjectNames: []
rules:
- pattern: 'java.lang<type=OperatingSystem><>(committed_virtual_memory|free_physical_memory|free_swap_space|total_physical_memory|total_swap_space)_size:'
name: os_$1_bytes
type: GAUGE
attrNameSnakeCase: true
- pattern: 'java.lang<type=OperatingSystem><>((?!process_cpu_time)\w+):'
name: os_$1
type: GAUGE
attrNameSnakeCase: true
EOF
启动脚本修改
在启动脚本的地方增加如下配置.
-javaagent:./jmx_prometheus_javaagent-0.17.2.jar=8080:simple-config.yml
添加之后启动服务就可以了.
修改prometheus
安装prometheus. 然后进行相应的配置
tar包安装非常简单, 建议直接在 /prometheus 目录下进行解压缩.
然后修改 prometheus.yaml 文件, 添加对应就可以了.
我这边的配置文件如下:
注意1: 需要记住 job_name. grafana里面需要进行设置.
注意2: instance: grafana里面可以进行区分.
scrape_configs:
- job_name: jmx
static_configs:
- targets: ['localhost:8080']
labels:
instance: xxx-server
- targets: ['10.110.83.113:8080']
labels:
instance: xxx-dm
安装Grafana
rpm包安装grafana就可以了.
安装成功后:
systemctl enable grafana-server.service
systemctl restart grafana-server.service
注意需要设置一下密码
默认密码是 admin/admin
配置Grafana
第一步: 增加数据源
点击左下角的齿轮状配置按钮.
add data source
选择prometheus就可以了.
注意可以讲本地IP地址设为白名单.
然后将本地IP地址添加进去.
第二步: 导入dashboard
点击左上角 四个正方形的按钮.
browse 然后点击new 的下拉列表. 选择import
输入: 8563
就可以导入 JVM dashboard
效果图
vCenter的展示
跟之前写的一样, 这里面用到了influxdb以及telegraf
需要的文件主要有:
influxdb-1.8.9.x86_64.rpm
telegraf-1.19.2-1.x86_64.rpm
注意安装方式有是本地安装就可以.
这里吐槽一下OpenEuler:
第一: ESXi使用 6.7 More的兼容配置,无法读取安装介质.无法安装
第二: influxdb 安装完没有形成 influxdb 的配置文件的步骤.
influxdb的设置.
这里还是用的1.x
本着够用就好的方针..
安装完成之后 命令行输入 influx 就可以登录数据库
需要注意的是 需要先启动服务
systemctl restart influxdb
添加用户以及修改密码的命令:
CREATE USER "influxdb" WITH PASSWORD 'Test@xxxxxxx'
GRANT ALL PRIVILEGES TO influxdb
Telegraf 的设置
注意 Telegraf里面添加了配置之后会自动创建数据库.
另外需要注意的是 要监控vCenter 可能需要两个Telegraf来运行
但是可以导入到一个influxdb的数据库里面.
配置文件举例为:
/etc/telegraf/{telegraf01.conf,telegraf02.conf}
注意两个配置文件唯一不同的可能就是 vcenter的用户以及密码.
需要注意 主要有input和output 两处设置.
模板就是如下:
Telegraf模板配置文件
[global_tags]
[agent]
interval = "10s"
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = "0s"
flush_interval = "10s"
flush_jitter = "0s"
precision = ""
hostname = ""
omit_hostname = false
[[outputs.influxdb]]
#这里需要修改。
urls = ["http://127.0.0.1:8086"]
database = "vm187"
timeout = "0s"
username = "zhaobsh"
password = "Test@xxxx"
[[inputs.vsphere]]
# 这里需要设置为密码
vcenters = [ "https://10.110.xx.xx/sdk" ]
username = "[email protected]"
password = "somepassword"
vm_metric_include = [
"cpu.demand.average",
"cpu.idle.summation",
"cpu.latency.average",
"cpu.readiness.average",
"cpu.ready.summation",
"cpu.run.summation",
"cpu.usagemhz.average",
"cpu.used.summation",
"cpu.wait.summation",
"mem.active.average",
"mem.granted.average",
"mem.latency.average",
"mem.swapin.average",
"mem.swapinRate.average",
"mem.swapout.average",
"mem.swapoutRate.average",
"mem.usage.average",
"mem.vmmemctl.average",
"net.bytesRx.average",
"net.bytesTx.average",
"net.droppedRx.summation",
"net.droppedTx.summation",
"net.usage.average",
"power.power.average",
"virtualDisk.numberReadAveraged.average",
"virtualDisk.numberWriteAveraged.average",
"virtualDisk.read.average",
"virtualDisk.readOIO.latest",
"virtualDisk.throughput.usage.average",
"virtualDisk.totalReadLatency.average",
"virtualDisk.totalWriteLatency.average",
"virtualDisk.write.average",
"virtualDisk.writeOIO.latest",
"sys.uptime.latest",
]
host_metric_include = [
"cpu.coreUtilization.average",
"cpu.costop.summation",
"cpu.demand.average",
"cpu.idle.summation",
"cpu.latency.average",
"cpu.readiness.average",
"cpu.ready.summation",
"cpu.swapwait.summation",
"cpu.usage.average",
"cpu.usagemhz.average",
"cpu.used.summation",
"cpu.utilization.average",
"cpu.wait.summation",
"disk.deviceReadLatency.average",
"disk.deviceWriteLatency.average",
"disk.kernelReadLatency.average",
"disk.kernelWriteLatency.average",
"disk.numberReadAveraged.average",
"disk.numberWriteAveraged.average",
"disk.read.average",
"disk.totalReadLatency.average",
"disk.totalWriteLatency.average",
"disk.write.average",
"mem.active.average",
"mem.latency.average",
"mem.state.latest",
"mem.swapin.average",
"mem.swapinRate.average",
"mem.swapout.average",
"mem.swapoutRate.average",
"mem.totalCapacity.average",
"mem.usage.average",
"mem.vmmemctl.average",
"net.bytesRx.average",
"net.bytesTx.average",
"net.droppedRx.summation",
"net.droppedTx.summation",
"net.errorsRx.summation",
"net.errorsTx.summation",
"net.usage.average",
"power.power.average",
"storageAdapter.numberReadAveraged.average",
"storageAdapter.numberWriteAveraged.average",
"storageAdapter.read.average",
"storageAdapter.write.average",
"sys.uptime.latest",
]
cluster_metric_include = []
datastore_metric_include = []
datacenter_metric_include = []
datacenter_metric_exclude = [ "*" ]
insecure_skip_verify = true
手动启动telegraf
nohup telegraf -config /etc/telegraf/telegraf01.conf >/dev/null 2>1.txt &
nohup telegraf -config /etc/telegraf/telegraf02.conf >/dev/null 2>2.txt &
注意 手动启动就可以. 重启机器后需要重新启动.
Grafana监控vCenter
第一步: 添加数据源:
注意选择influxdb
地址可以输入ip地址比如我这个:
http://10.110.136.70:8086
influxdb数据库选择 telegraf里面定义的那个
用户密码输入 创建的有权限的用户和密码就可以了.
Grafana监控vCenter
导入grafana的配置项目主要有如下三个:
其他的好像需要更高的版本
名称 编号
vSphere - Overview 12786
vSphere - Host details 12852
vSphere - VM details 12874
Grafana监控vCenter的简要效果
- 概览信息
- 宿主机的信息
- 虚拟机的信息