首页 > 数据库 >Grafana监控OracleDB的完整过程

Grafana监控OracleDB的完整过程

时间:2023-10-20 11:08:25浏览次数:32  
标签:context name metric bytes value Grafana OracleDB tablespace 监控

背景

两年前曾经写过一个进行Oracle 监控的简单blog
但是周天晚上尝试进行处理时发现很不完整了.
很多数据获取不到.
晚上又熬夜了好久进行处理. 感觉还是需要总结一下,不然就忘记了

获取镜像

还是使用 docker 的方式来暴露Oracle的服务
使用的镜像为:
docker pull ghcr.io/iamseth/oracledb_exporter:0.4.3

需要注意这个镜像比较大, 压缩完还 270MB 左右

注意事项-1

镜像内的监控事项比较少. 可能只是非常简单的. 
我这边想监控的是一些性能相关的内容, 仅看一些简单内容, 感觉没有任何意义.
然后继续从github上面找相关的资料: 
https://github.com/saitiger008/oracledb_exporter_metrics/blob/main/default-metrics.toml

这个哥们自己加了一些监控事项挺好的. 
我这边将他的监控与默认的default进行合并. 内容放到最下面. 
然后通过-v 的方式在容器内加载默认的metrics的监控事项. 
可以通过docker logs --tail=100 containername的方式进行查看具体内容.

注意事项-2

正常启用之后发现还是无法进行相关的监控信息.
主要是无法获取一些系统的监控信息. 

日志里面很多这样的提示信息:
 msg="Error scraping for" context=resource metricsDesc="unsupported value type" s
 msg="Error scraping for" context=sysmetric metricsDesc="unsupported value type" 
 msg="Error scraping for" context=wait_time metricsDesc="unsupported value type" 
 msg="Error scraping for" context=rman_info metricsDesc="unsupported value type" 

提示无法获取对应的监控信息.

注意事项-2

这个地方耗时很久进行处理. 发现总是比较难易解决问题.
主要问题是 只有 sys 账户会查询 select * from v$sysmetirc的视图.
并且必须要sysdba的账户进行登录, 如果不是sysdba 查询此视图都是空的.

所以问题的核心是用Docker的语法实现 sysdba 角色的登录. 
我这边尝试 / as sysdba 发现是没有效果的.
查看官方文档是发现有一个解决方法为:
在 DATA_SOURCE_NAME 的和面添加 ?as=sysdba 的方式来指定角色.

# 清理之前的容器
docker rm -f oracledb_exporter
# 创建对应的容器进行监听  
docker run -d --name oracledb_exporter  --restart=always --net=host \
-e DATA_SOURCE_NAME='sys/[email protected]/gscdb?as=sysdba' \
-v /oracledb/default-metrics.toml:/default-metrics.toml ghcr.io/iamseth/oracledb_exporter:0.4.3
# 查看日志信息.
docker logs -f  oracledb_exporter

注意事项-3

使用 桥接网络 也可以 但是需要注意写法为:

docker rm -f oracledb_exporter2 
docker run -d --name oracledb_exporter2  --restart=always -p 9162:9161 --privileged \
-e DATA_SOURCE_NAME='sys/[email protected]/ora19c?as=sysdba' \
-v /oracledb/default-metrics.toml:/default-metrics.toml ghcr.io/iamseth/oracledb_exporter:0.4.3 
docker logs -f  oracledb_exporter2

注意事项-4

Prometheus的设置为:
scrape_configs:
  - job_name: '116Oracle_ora19c' # job_name默认写入timeseries的labels中,可以用于查询使用
    static_configs: # 静态配置
      - targets: ['10.110.136.xxx:9162']

注意 需要重启Prometheus使之生效.
另外 yaml 格式一定注意空格和格式.

注意事项-5

Grafana的设置
注意 这次选用的是国人开发的一个配置:
编号为: 17136

具体地址为: 
https://grafana.com/grafana/dashboards/17136-oracledb-new-202210/

效果为

Grafana监控OracleDB的完整过程_docker


监控配置脚本

[[metric]]
context = "sessions"
labels = [ "status", "type" ]
metricsdesc = { value= "Gauge metric with count of sessions by status and type." }
request = "SELECT status, type, COUNT(*) as value FROM v$session GROUP BY status, type"

[[metric]]
context = "resource"
labels = [ "resource_name" ]
metricsdesc = { current_utilizatinotallow= "Generic counter metric from v$resource_limit view in Oracle (current value).", limit_value="Generic counter metric from v$resource_limit view in Oracle (UNLIMITED: -1)." }
request="SELECT resource_name,current_utilization,CASE WHEN TRIM(limit_value) LIKE 'UNLIMITED' THEN '-1' ELSE TRIM(limit_value) END as limit_value FROM v$resource_limit"

[[metric]]
context = "asm_diskgroup"
labels = [ "name" ]
metricsdesc = { total = "Total size of ASM disk group.", free = "Free space available on ASM disk group." }
request = "SELECT name,total_mb*1024*1024 as total,free_mb*1024*1024 as free FROM v$asm_diskgroup_stat where exists (select 1 from v$datafile where name like '+%')"
ignorezeroresult = true

[[metric]]
context = "activity"
metricsdesc = { value="Generic counter metric from v$sysstat view in Oracle." }
fieldtoappend = "name"
request = "SELECT name, value FROM v$sysstat WHERE name IN ('parse count (total)', 'execute count', 'user commits', 'user rollbacks')"

[[metric]]
context = "process"
metricsdesc = { count="Gauge metric with count of processes." }
request = "SELECT COUNT(*) as count FROM v$process"

[[metric]]
context = "wait_time"
metricsdesc = { value="Generic counter metric from v$waitclassmetric view in Oracle." }
fieldtoappend= "wait_class"
request = '''
SELECT
  n.wait_class as WAIT_CLASS,
  round(m.time_waited/m.INTSIZE_CSEC,3) as VALUE
FROM
  v$waitclassmetric  m, v$system_wait_class n
WHERE
  m.wait_class_id=n.wait_class_id AND n.wait_class != 'Idle'
'''

[[metric]]
context = "tablespace"
labels = [ "tablespace", "type" ]
metricsdesc = { bytes = "Generic counter metric of tablespaces bytes in Oracle.", max_bytes = "Generic counter metric of tablespaces max bytes in Oracle.", free = "Generic counter metric of tablespaces free bytes in Oracle.", used_percent = "Gauge metric showing as a percentage of how much of the tablespace has been used." }
request = '''
SELECT
    dt.tablespace_name as tablespace,
    dt.contents as type,
    dt.block_size * dtum.used_space as bytes,
    dt.block_size * dtum.tablespace_size as max_bytes,
    dt.block_size * (dtum.tablespace_size - dtum.used_space) as free,
    dtum.used_percent
FROM  dba_tablespace_usage_metrics dtum, dba_tablespaces dt
WHERE dtum.tablespace_name = dt.tablespace_name
ORDER by tablespace
'''

[[metric]]
context = "parameter"
labels = [ "name","param" ]
metricsdesc = { value ="Gauge metric with database parameter" }
request = "select name,value as param,1 as value from v$parameter"

[[metric]]
context = "sessions"
labels = [ "status", "type" ]
metricsdesc = { value= "Gauge metric with count of sessions by status and type." }
request = "SELECT status, type, COUNT(*) as value FROM v$session GROUP BY status, type"


[[metric]]
context = "parameter_sessions"
metricsdesc = { value ="Gauge metric with sessions parameter" }
request = "select value  from v$parameter t where t.name = 'sessions'"


[[metric]]
context = "volumns"
metricsdesc = { value= "database total volumes,size G" }
request = "SELECT sum(bytes/1024/1024/1024) as value FROM dba_segments"


[[metric]]
context = "env"
labels = [ "host_name" ]
metricsdesc = { is_rac = "if 1 then rac." }
request = '''
select host_name,
decode((select value from v$option where parameter = 'Real Application Clusters'), 'TRUE',1,0) as is_rac
from v$instance
'''

[[metric]]
context = "env"
labels = [ "version" ]
metricsdesc = { running_time = "get version and running_time" }
request = '''
select version,trunc(sysdate - startup_time) as running_time from v$instance
'''

[[metric]]
context = "env"
labels = [ "platform_name","log_mode","force_logging","flashback_on","database_role","open_mode","nls_characterset" ]
metricsdesc = { value = "get other env" }
request = '''
select platform_name,log_mode,force_logging,flashback_on,database_role,open_mode,
(select value from nls_database_parameters where parameter = 'NLS_CHARACTERSET') as nls_characterset, 1 as value  from v$database
'''

[[metric]]
context = "env"
labels = [ "name" ]
metricsdesc = { redo_value = "get redo logs info" }
request = '''
with temp as(
select count(group#) as group_num,min(members) as member_num,min(bytes/1024/1024) as log_size from v$log
)
select  'group_num' as name ,group_num as redo_value from temp
union all
select  'member_num' as name ,member_num as redo_value from temp
union all
select  'log_size' as name ,log_size as redo_value from temp
'''

[[metric]]
context = "env"
metricsdesc = { redo_switch_value = "Gauge metric with sum of segments data." }
request = '''
select max(sequence#) as redo_switch_value from v$log
'''


[[metric]]
context = "env"
metricsdesc = { vols = "Gauge metric with sum of segments data." }
request = '''
select to_char(sum(bytes)) as vols from dba_segments
'''


[[metric]]
context = "events"
labels = [ "event_name"]
metricsdesc = { value = "get events from v$session_wait" }
request = '''
select event as event_name, count(*) as value
   from v$session_wait
      where wait_time >= -1 group by event order by 2 desc
'''

[[metric]]
context = "db_time"
metricsdesc = { value="Generic counter metric from v$sys_time_model view in Oracle." }
request = '''
select value from v$sys_time_model t where t.STAT_NAME = 'DB time'
'''

[[metric]]
context = "sysmetric"
metricsdesc = { value="Get sysmetrics from v$sysmetric view in Oracle." }
fieldtoappend = "metric_name"
request = '''
select replace(metric_name,'%','') as metric_name ,value from v$sysmetric  where group_id = 2 and metric_name != 'Average Synchronous Single-Block Read Latency'
'''

[[metric]]
context = "sysstat"
metricsdesc = { value="Compute Oracle IOPS and MBPS from v$sysstat view in Oracle." }
fieldtoappend = "name"
request = '''
select name , trunc(sum(value), 2) as value
  from (select decode(name,
                      'physical read total IO requests',
                      'Oracle IOPS',
                      'physical write total IO requests',
                      'Oracle IOPS',
                      'redo writes',
                      'Oracle IOPS',
                      'physical read total bytes',
                      'Oracle MBPS',
                      'physical write total bytes',
                      'Oracle MBPS',
                      'redo size',
                      'Oracle MBPS') as name,
               decode(name,
                      'physical read total IO requests',
                      value,
                      'physical write total IO requests',
                      value,
                      'redo writes',
                      value,
                      'physical read total bytes',
                      value / 1024 / 1024,
                      'physical write total bytes',
                      value / 1024 / 1024,
                      'redo size',
                      value / 1024 / 1024) as value
          from v$sysstat
         where name in (
                        'physical read total bytes',
                        'physical write total bytes',
                        'physical read total IO requests',
                        'physical write total IO requests',
                        'redo writes',
                        'redo size'))
 where name is not null
 group by name
union all
SELECT a.name, a.value
  FROM v$sysstat a, v$statname b
 where a.statistic# = b.statistic#
   and b.name in (
                  'consistent gets',
                  'execute count',
                  'parse count (hard)',
                  'parse count (total)',
                  'redo size',
                  'redo writes',
                  'sorts (disk)',
                  'sorts (memory)',
                  'sorts (rows)',
                  'user commits',
                  'user rollbacks',
                  'logons cumulative',
                  'bytes sent via SQL*Net to client',
                  'bytes received via SQL*Net from client')
'''


[[metric]]
context = "tablespace"
labels = [ "tablespace", "type" ]
metricsdesc = { bytes = "Generic counter metric of tablespaces bytes in Oracle.", max_bytes = "Generic counter metric of tablespaces max bytes in Oracle.", free = "Generic counter metric of tablespaces free bytes in Oracle." }
request = '''
SELECT
  df.tablespace_name       as tablespace,
  df.type                  as type,
  nvl(sum(df.bytes),0)     as bytes,
  nvl(sum(df.max_bytes),0) as max_bytes,
  nvl(sum(f.free),0)       as free
FROM
  (
    SELECT
      ddf.file_id,
      dt.contents as type,
      ddf.file_name,
      ddf.tablespace_name,
      TRUNC(ddf.bytes) as bytes,
      TRUNC(GREATEST(ddf.bytes,ddf.maxbytes)) as max_bytes
    FROM
      dba_data_files ddf,
      dba_tablespaces dt
    WHERE ddf.tablespace_name = dt.tablespace_name
  ) df,
  (
    SELECT
      TRUNC(SUM(bytes)) AS free,
      file_id
    FROM dba_free_space
    GROUP BY file_id
  ) f
WHERE df.file_id = f.file_id (+)
GROUP BY df.tablespace_name, df.type
UNION ALL
SELECT
  Y.name                   as tablespace_name,
  Y.type                   as type,
  SUM(Y.bytes)             as bytes,
  SUM(Y.max_bytes)         as max_bytes,
  MAX(nvl(Y.free_bytes,0)) as free
FROM
  (
    SELECT
      dtf.tablespace_name as name,
      dt.contents as type,
      dtf.status as status,
      dtf.bytes as bytes,
      (
        SELECT
          ((f.total_blocks - s.tot_used_blocks)*vp.value)
        FROM
          (SELECT tablespace_name, sum(used_blocks) tot_used_blocks FROM gv$sort_segment WHERE  tablespace_name!='DUMMY' GROUP BY tablespace_name) s,
          (SELECT tablespace_name, sum(blocks) total_blocks FROM dba_temp_files where tablespace_name !='DUMMY' GROUP BY tablespace_name) f,
          (SELECT value FROM v$parameter WHERE name = 'db_block_size') vp
        WHERE f.tablespace_name=s.tablespace_name AND f.tablespace_name = dtf.tablespace_name
      ) as free_bytes,
      CASE
        WHEN dtf.maxbytes = 0 THEN dtf.bytes
        ELSE dtf.maxbytes
      END as max_bytes
    FROM
      sys.dba_temp_files dtf,
      sys.dba_tablespaces dt
    WHERE dtf.tablespace_name = dt.tablespace_name
  ) Y
GROUP BY Y.name, Y.type
ORDER BY tablespace
'''

[[metric]]
context = "db_recovery_file_dest"
metricsdesc = { value="sum(PERCENT_SPACE_USED) of V$RECOVERY_AREA_USAG" }
request = '''
select sum(t.PERCENT_SPACE_USED) as value from V$RECOVERY_AREA_USAGE t
'''


[[metric]]
context = "db_recovery_file_dest"
labels = [ "file_type"]
metricsdesc = { percent_space_used = "percent_space_used of V$RECOVERY_AREA_USAGE ." }
request = '''
select t.file_type,t.percent_space_used from V$RECOVERY_AREA_USAGE t
'''

[[metric]]
context = "rman_status"
metricsdesc = { value= "0 indicates rman error or no config rman job, 1 indicates rman is ok" }
request = '''
select decode(max(t.COMPLETION_TIME),
              '',
              0,
              case when sysdate-max(t.COMPLETION_TIME) > 1 then 0 else 1 end) as value
  from v$backup_piece t
 where t.STATUS = 'A'
'''

[[metric]]
context = "rman_info"
labels = [ "recid","lv_type","controlfile_included","status","device_type","start_time","completion_time","elapsed_seconds","bak_size","compressed","tag" ]
metricsdesc = { value ="list rman information" }
request = '''
select a.recid as recid,
               decode(b.incremental_level,
                      '',
                      decode(backup_type, 'L', 'Archivelog', 'ControlFile'),
                      1,
                      'Incr-1',
                      0,
                      'Incr-0',
                      b.incremental_level) as lv_type,
               b.controlfile_included as controlfile_included,
               decode(a.status,
                      'A',
                      'AVAILABLE',
                      'D',
                      'DELETED',
                      'X',
                      'EXPIRED',
                      'ERROR') as status,
               a.device_type as device_type,
               a.start_time as start_time,
               a.completion_time as completion_time,
               a.elapsed_seconds as elapsed_seconds,
               round(a.bytes / 1024 / 1024 / 1024,3) as bak_size,
               a.compressed as compressed,
               a.tag as tag,
               1 as value
          from v$backup_piece a, v$backup_set b
         where a.set_stamp = b.set_stamp
           and a.deleted = 'NO'
           and a.set_count = b.set_count
           and a.start_time > sysdate-8
         order by a.completion_time desc
'''


[[metric]]
context = "dataguard_exist"
metricsdesc = { value ="1 indicates the presence of dg, 0 indicates no dg." }
request = '''
select decode(db_unique_name, 'NONE', 0, 1) as value
  from v$archive_dest_status
 where dest_name = 'LOG_ARCHIVE_DEST_2'
'''


[[metric]]
context = "dataguard_status"
metricsdesc = { value ="0 indicates no dg, 1 indicates dg has errors, 1 indicates dg is ok;" }
request = '''
select case
         when db_unique_name = 'NONE' then
          0
         when status != 'VALID' then
          1
         else
          decode(recovery_mode,
                 'IDLE',
                 1,
                 'MANAGED',
                 2,
                 'MANAGED REAL TIME APPLY',
                 2,
                 'LOGICAL REAL TIME APPLY',
                 2,
                 'LOGICAL APPLY',
                 2,
                 1)
       end as value
  from v$archive_dest_status
 where dest_name = 'LOG_ARCHIVE_DEST_2'
'''


标签:context,name,metric,bytes,value,Grafana,OracleDB,tablespace,监控
From: https://blog.51cto.com/u_64214/7947820

相关文章

  • 为什么将LoRa用于智能环境监控解决方案
    环境监测是指测量影响环境质量的因素的代表值,以确定环境污染程度及其变化趋势。环境监测的主要目的是管理和尽量减少任何人类活动对环境的影响。在确保活动符合相关法律法规的前提下,我们努力改变人们的行为方式,以保护健康,减少对自然环境的风险和危险。本文向我们介绍了为什么物联网......
  • AI智慧安防智能监控平台如何做到健身房智能视频监控?
    随着大家对健身的重视,健身房也开始遍地开花,健身房的兴起是必然的,但是健身房的管理不容疏忽,通过EasyCVR智能视频监控系统,则可以解决监管不足的问题。1、安全摄像头布局根据健身房的大小和布局,合理规划安装摄像头的位置。确保所有重要区域都能被完全覆盖,包括入口、出口、器械区......
  • TSINGSEE烟火识别算法的技术原理是什么?如何应用在视频监控中?
    AI烟火识别算法是基于深度学习技术的一种视觉识别算法,主要用于在视频监控场景中自动检测和识别烟雾、火焰的行为。该技术基于深度学习神经网络技术,可以动态识别烟雾和火焰从有到无、从小到大、从大到小、从小烟到浓烟的状态转换过程。1、技术原理1)数据采集与准备:首先需要采集大量带......
  • 安防监控国标GB28181平台LiteCVR修改录像计划的等待时间较长,该如何解决?
    我国在智能视频安防监控领域相较国外起步较晚,但随着近些年互联网等技术的发展,我国在该领域迅猛发展,取得了不错的成果。有用户反馈,GB28181视频监控平台LiteCVR修改录像计划的等待时间较长。今天我们来针对这个案例做一个分析和讲解。根据反馈我们立即进行排查,发现其实修改单个通......
  • TSINGSEE烟火识别算法的技术原理是什么?如何应用在视频监控中?
    AI烟火识别算法是基于深度学习技术的一种视觉识别算法,主要用于在视频监控场景中自动检测和识别烟雾、火焰的行为。该技术基于深度学习神经网络技术,可以动态识别烟雾和火焰从有到无、从小到大、从大到小、从小烟到浓烟的状态转换过程。1、技术原理1)数据采集与准备:首先需要采集大......
  • 浅谈AI人体姿态识别技术的先进性及安防视频监控应用场景
    随着计算机视觉技术和安防监控技术的不断发展,基于AI算法的人体姿态识别技术也得到了广泛的应用。然而,传统的安防监控系统通常只局限于简单的视频监控等功能,无法准确地识别人体的姿态,使得一些安防监控存在着一定的漏洞和不足之处。基于AI算法的人体姿态识别技术是基于人工智能和计......
  • 视频监控/安防监控平台EasyCVR安防视频监控平台应用场景
    EasyCVR是一款专门用于安防视频管理的综合性软件产品。它是一款完全自主可控的软件,旨在为用户提供全方位的视频监控管理解决方案。EasyCVR具备多种功能和特点。首先,它与EasyNVR服务器无缝集成,实现了对摄像机和监控设备的统一管理。通过EasyCVR,用户可以方便地查看和管理所有连......
  • jemeter使用jp@gc - PerfMon Metrics Collector性能监控startAgent2.2.1版本崩溃记录
    jemeter进行性能测试时,一开启startAgent就退出,以下是正常情况:原因:JDK版本与startAgent版本不对应解决方式:之前使用的是jdk1.8.0_321,更换为jdk1.8.0_141后就正常了 ......
  • Zabbix监控 (字体乱码)
    #图形中的中文字体不显示#首选确定字体存放路径和名字grep'ZBX_FONT'/usr/share/zabbix/include/defines.inc.phpdefine('ZBX_FONTPATH',realpath('assets/fonts'));//wheretosearchforfont(GD>2.0.18)define('ZBX_FON......
  • 开源项目 | 美团开源监控告警服务,Java 开发的实时应用监控平台,能够帮助开发者快速定位
     一、项目概述        CAT是基于Java开发的实时应用监控平台,为美团点评提供了全面的实时监控告警服务。        CAT作为服务端项目基础组件,提供了Java,C/C++,Node.js,Python,Go等多语言客户端,已经在美团点评的基础架构中间件框架(MVC框架,RPC框架,数据......