首页 > 其他分享 >搭建hwi和告警和日志信息监控

搭建hwi和告警和日志信息监控

时间:2023-07-06 18:47:19浏览次数:43  
标签:rw hwi 05 hadoop --. master 日志 root 告警

搭建hwi

1.通过网址https://archive.apache.org/dist/hive/hive-2.0.0/安装hive/hive-2.0.0包
[hadoop@master ~]$ wget https://archive.apache.org/dist/hive/hive-2.0.0/apache-hive-2.0.0-src.tar.gz
[hadoop@master opt]$ ls
apache-hive-2.0.0-src.tar.gz  hadoop-2.7.1.tar.gz  jdk-8u152-linux-x64.tar.gz
[root@master opt]# ls
apache-hive-2.0.0-src.tar.gz  hadoop-2.7.1.tar.gz  jdk-8u152-linux-x64.tar.gz
[root@master opt]# tar xf apache-hive-2.0.0-src.tar.gz 
[root@master opt]# ls
apache-hive-2.0.0-src         hadoop-2.7.1.tar.gz
apache-hive-2.0.0-src.tar.gz  jdk-8u152-linux-x64.tar.gz
[root@master opt]# cd apache-hive-2.0.0-src
[root@master apache-hive-2.0.0-src]# ls
accumulo-handler  conf           hcatalog  llap-client  orc                service
ant               contrib        hplsql    llap-common  packaging          shims
beeline           data           hwi       llap-server  pom.xml            spark-client
bin               dev-support    itests    llap-tez     ql                 storage-api
checkstyle        docs           jdbc      metastore    README.txt         testutils
cli               findbugs       lib       NOTICE       RELEASE_NOTES.txt
common            hbase-handler  LICENSE   odbc         serde
[root@master apache-hive-2.0.0-src]# cd hwi
[root@master hwi]# pwd
/opt/apache-hive-2.0.0-src/hwi
[root@master hwi]# ls
pom.xml  src  web
[root@master hwi]# jar -Mcf hive-hwi-2.0.0.war -C web .
[root@master hwi]# ls
hive-hwi-2.0.0.war  pom.xml  src  web
[root@master hwi]# file hive-hwi-2.0.0.war 
hive-hwi-2.0.0.war: Zip archive data, at least v1.0 to extract
[root@master hwi]# chown -R hadoop.hadoop /usr/local/src/
[root@master hwi]# su - hadoop
Last login: Fri May  5 23:19:49 EDT 2023 on pts/0
[hadoop@master ~]$ cd /usr/local/src
[hadoop@master src]$ ll
total 8
drwxr-xr-x.  7 hadoop hadoop  178 Apr 18 22:22 flume
drwxr-xr-x. 13 hadoop hadoop  196 Mar 16 05:32 hadoop
drwxr-xr-x.  9 hadoop hadoop  183 Apr  6 05:29 hbase
drwxr-xr-x.  9 hadoop hadoop  170 Mar 22 23:29 hive
drwxr-xr-x.  8 hadoop hadoop  255 Sep 14  2017 jdk
drwxr-xr-x.  9 hadoop hadoop 4096 Dec 18  2017 sqoop
drwxr-xr-x. 12 hadoop hadoop 4096 Mar 28 21:34 zookeeper
[hadoop@master src]$ cd hive
[hadoop@master hive]$ cd conf/
[hadoop@master conf]$ ll
total 432
-rw-r--r--. 1 hadoop hadoop   1596 Jan 21  2016 beeline-log4j2.properties.template
-rw-r--r--. 1 hadoop hadoop 207525 Feb  9  2016 hive-default.xml.template
-rw-r--r--. 1 hadoop hadoop   2378 Apr 22  2015 hive-env.sh.template
-rw-r--r--. 1 hadoop hadoop   2287 Jan 21  2016 hive-exec-log4j2.properties.template
-rw-r--r--. 1 hadoop hadoop   2758 Jan 21  2016 hive-log4j2.properties.template
-rw-r--r--. 1 hadoop hadoop 207462 Mar 22 23:27 hive-site.xml
-rw-r--r--. 1 hadoop hadoop   2049 Jan 21  2016 ivysettings.xml
-rw-r--r--. 1 hadoop hadoop   3885 Jan 21  2016 llap-daemon-log4j2.properties.template
[hadoop@master conf]$ vi hive-site.xml 
#修改
  <property>
    <name>hive.hwi.war.file</name>
    <value>lib/hive-hwi-2.0.0.war</value>
    <description>This sets the path to the HWI war file, relative to ${HIVE_HOME}. </description>

2.通过https://archive.apache.org/dist/ant/binaries/安装ant包

[hadoop@master opt]$ ls
apache-ant-1.9.1-bin.tar.gz  apache-hive-2.0.0-src.tar.gz  jdk-8u152-linux-x64.tar.gz
apache-hive-2.0.0-src        hadoop-2.7.1.tar.gz
[hadoop@master opt]$ tar xf apache-ant-1.9.1-bin.tar.gz -C /usr/local/src
[hadoop@master opt]$ cd /usr/local/src
[hadoop@master src]$ ls
apache-ant-1.9.1  flume  hadoop  hbase  hive  jdk  sqoop  zookeeper
[hadoop@master src]$ ll
total 8
drwxr-xr-x.  6 hadoop hadoop  174 May 15  2013 apache-ant-1.9.1
drwxr-xr-x.  7 hadoop hadoop  178 Apr 18 22:22 flume
drwxr-xr-x. 13 hadoop hadoop  196 Mar 16 05:32 hadoop
drwxr-xr-x.  9 hadoop hadoop  183 Apr  6 05:29 hbase
drwxr-xr-x.  9 hadoop hadoop  170 Mar 22 23:29 hive
drwxr-xr-x.  8 hadoop hadoop  255 Sep 14  2017 jdk
drwxr-xr-x.  9 hadoop hadoop 4096 Dec 18  2017 sqoop
drwxr-xr-x. 12 hadoop hadoop 4096 Mar 28 21:34 zookeeper
[hadoop@master src]$ mv apache-ant-1.9.1 ant
[hadoop@master src]$ ls
ant  flume  hadoop  hbase  hive  jdk  sqoop  zookeeper
[hadoop@master src]$ 

3.配置环境变量

[root@master hwi]# vi /etc/profile.d/ant.sh
#添加
export ANT_HOME=/usr/local/src/ant
export PATH=${ANT_HOME}/bin:$PATH

[root@master hwi]# su - hadoop
Last login: Tue May  9 04:39:37 EDT 2023 on pts/0
[hadoop@master ~]$ ant -version
Apache Ant(TM) version 1.9.1 compiled on May 15 2013
[hadoop@master ~]$ 

4.复制文件

[hadoop@master ~]$ cp /usr/local/src/jdk/lib/tools.jar /usr/local/src/hive/lib/
[hadoop@master ~]$ cp /usr/local/src/ant/lib/ant.jar /usr/local/src/hive/lib/
[hadoop@master ~]$ ll /usr/local/src/hive/lib/ant.jar 
-rw-r--r--. 1 hadoop hadoop 1997485 May  9 05:16 /usr/local/src/hive/lib/ant.jar
[hadoop@master ~]$ ll /usr/local/src/hive/lib/tools.jar 
-rw-r--r--. 1 hadoop hadoop 18290333 May  9 05:16 /usr/local/src/hive/lib/tools.jar

5.访问http://mater:9999/hwi/

image-20230509172231705

实验一:查看大数据平台日志信息

1. 实验任务一:查看大数据平台主机日志

Linux 操作系统本身和大部分服务器程序的日志文件都默认放在目录/var/log/下。一 部分程序共用一个日志文件,一部分程序使用单个日志文件,而有些大型服务器程序由于日 志文件不止一个,所以会在/var/log/目录中建立相应的子目录来存放日志文件,这样既保 证了日志文件目录的结构清晰,又可以快速定位日志文件。有相当一部分日志文件只有 root 用户才有权限读取,这保证了相关日志信息的安全性。

使用 hadoop 用户登录 Linux 主机,切换到/var/log 目录,执行 ll 命令查询该目录所 有日志文件。

[hadoop@master ~]$ cd /var/log
[hadoop@master log]$ ll
total 3516
drwxr-xr-x. 2 root   root       204 Mar 15 06:11 anaconda
drwx------. 2 root   root        23 Mar 15 06:12 audit
-rw-------. 1 root   root         0 May  9 05:15 boot.log
-rw-------. 1 root   root     78764 Mar 22 09:09 boot.log-20230322
-rw-------. 1 root   root     44295 Apr 12 05:06 boot.log-20230412
-rw-------. 1 root   root     61033 May  9 05:15 boot.log-20230509
-rw-------. 1 root   utmp         0 May  9 05:15 btmp
-rw-------. 1 root   utmp       384 May  9 04:04 btmp-20230509
drwxr-xr-x. 2 chrony chrony       6 Apr 12  2018 chrony
-rw-------. 1 root   root       298 May  9 05:15 cron
-rw-------. 1 root   root      9807 Apr 12 05:06 cron-20230412
-rw-------. 1 root   root      3109 May  9 05:15 cron-20230509
-rw-r--r--. 1 root   root    123696 May  9 04:04 dmesg
-rw-r--r--. 1 root   root    123578 May  5 23:32 dmesg.old
-rw-r--r--. 1 root   root         0 Mar 15 06:12 firewalld
-rw-r--r--. 1 root   root       193 Mar 15 06:08 grubby_prune_debug
-rw-r--r--. 1 root   root    292292 May  9 05:13 lastlog
-rw-------. 1 root   root         0 May  9 05:15 maillog
-rw-------. 1 root   root     11704 Mar 22 11:02 maillog-20230412
-rw-------. 1 root   root         0 Apr 12 05:06 maillog-20230509
-rw-------. 1 root   root       234 May  9 05:15 messages
-rw-------. 1 root   root   1912585 Apr 12 05:01 messages-20230412
-rw-------. 1 root   root    934185 May  9 05:13 messages-20230509
-rw-r--r--. 1 mysql  mysql    57275 May  9 04:04 mysqld.log
drwxr-xr-x. 2 root   root         6 Mar 15 06:11 rhsm
-rw-------. 1 root   root         0 May  9 05:15 secure
-rw-------. 1 root   root     47284 Apr 12 04:48 secure-20230412
-rw-------. 1 root   root     13842 May  9 05:13 secure-20230509
-rw-------. 1 root   root         0 May  9 05:15 spooler
-rw-------. 1 root   root         0 Mar 15 06:09 spooler-20230412
-rw-------. 1 root   root         0 Apr 12 05:06 spooler-20230509
-rw-------. 1 root   root         0 Mar 15 06:08 tallylog
drwxr-xr-x. 2 root   root        23 Mar 15 06:12 tuned
-rw-r--r--. 1 root   root     28261 May  9 04:04 vmware-vgauthsvc.log.0
-rw-r--r--. 1 root   root     32585 May  9 04:03 vmware-vmsvc.log
-rw-rw-r--. 1 root   utmp     51840 May  9 05:13 wtmp
-rw-------. 1 root   root      1852 Mar 22 10:27 yum.log

结果显示,包含了以下多种功能的日志文件,下面逐一查看这些日志内容。

1.1. 步骤一:查看内核及公共消息日志(/var/log/messages)。

内核及公共信息日志是许多进程日志文件的汇总,可以切换到 root 用户,采用 cat 或 tail 命令查看该文件。

[hadoop@master log]$ exit
logout
[root@master hwi]# cd /var/log
[root@master log]# 
[root@master log]# cat messages
May  9 05:15:01 master rsyslogd: [origin software="rsyslogd" swVersion="8.24.0" x-pid="916" x-info="http://www.rsyslog.com"] rsyslogd was HUPed
May  9 05:15:19 master chronyd[659]: Source 193.182.111.143 replaced with 143.107.229.211

以上结果不仅包含了 master 主机用户切换的日志,而且包含了服务的状态,如防火墙 服务状态的记录:“Started firewalld - dynamic firewall daemon”和“Stopped firewalld - dynamic firewall daemon.”。

1.2. 步骤二:查看计划任务日志/var/log/cron。

该文件会记录 crontab 计划任务的创建、执行信息。执行 cat cron 命令,显示如下:

[root@master log]# cat cron
May  9 05:15:01 master run-parts(/etc/cron.daily)[1521]: finished logrotate
May  9 05:15:01 master run-parts(/etc/cron.daily)[1509]: starting man-db.cron
May  9 05:15:01 master run-parts(/etc/cron.daily)[1532]: finished man-db.cron
May  9 05:15:01 master anacron[1277]: Job `cron.daily' terminated

1.3. 步骤三:查看系统引导日志/var/log/dmesg。

该文件记录硬件设备信息(device)属纯文本,也可以用 dmesg 命令查看。由于文件内容 比较多,截取了部分的内容,显示如下:

[root@master log]# dmesg
.....
[    5.195834] XFS (sda1): Mounting V5 Filesystem
[    5.545742] XFS (sda1): Starting recovery (logdev: internal)
[    5.548948] XFS (sda1): Ending recovery (logdev: internal)
[    5.686390] type=1305 audit(1683619454.113:4): audit_pid=623 old=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:auditd_t:s0 res=1
[    6.232387] NET: Registered protocol family 40
[    6.726856] IPv6: ADDRCONF(NETDEV_UP): ens33: link is not ready
[    6.729804] e1000: ens33 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[    6.731762] IPv6: ADDRCONF(NETDEV_UP): ens33: link is not ready
[    6.734428] IPv6: ADDRCONF(NETDEV_UP): ens33: link is not ready
[    6.735223] IPv6: ADDRCONF(NETDEV_CHANGE): ens33: link becomes ready
[    7.893332] floppy0: no floppy controllers found
[    7.893364] work still pending

以上结果显示了网卡设备 e1000 启动,获取 IPv6 地址的过程。

1.4. 步骤四:查看邮件系统日志:/var/log/maillog。

该日志文件记录了每一个发送到系统或从系统发出的电子邮件的活动。它可以用来查看 用户使用哪个系统发送工具或把数据发送到哪个系统。可以采用 cat /var/log/maillog 或 者 tail -f /var/log/maillog 查看电子邮件的活动。

1.5. 步骤五:查看用户登录日志。

这种日志数据用于记录 Linux 操作系统用户登录及退出系统的相关信息,包括用户名、 登录的终端、登录时间、来源主机、正在使用的进程操作等。 以下文件保存了用户登录、退出系统等相关信息 1)/var/log/lastlog :最近的用户登录事件

2)/var/log/wtmp :用户登录注销及系统开、关机事件

3)/var/run/utmp :当前登录的每个用户的详细信息

4)/var/log/secure :与用户验证相关的安全性事件

(1)lastlog 列出所有用户最近登录的信息 lastlog 引用的是/var/log/lastlog 文件中的信息,包括登录名、端口、最后登录时 间等。
[root@master log]# lastlog
#用户名             端口     来自           最后登陆时间
Username         Port     From             Latest
root             pts/1    192.168.100.1    Tue May  9 05:13:21 -0400 2023
bin                                        **Never logged in**
daemon                                     **Never logged in**
adm                                        **Never logged in**
lp                                         **Never logged in**
sync                                       **Never logged in**
shutdown                                   **Never logged in**
halt                                       **Never logged in**
mail                                       **Never logged in**
operator                                   **Never logged in**
games                                      **Never logged in**
ftp                                        **Never logged in**
nobody                                     **Never logged in**
systemd-network                            **Never logged in**
dbus                                       **Never logged in**
polkitd                                    **Never logged in**
sshd                                       **Never logged in**
postfix                                    **Never logged in**
chrony                                     **Never logged in**
hadoop           pts/0                     Tue May  9 05:11:30 -0400 2023
mysql                                      **Never logged in**

(2)last 列出当前和曾经登入系统的用户信息 它默认读取的是/var/log/wtmp 文件的信息。输出的内容包括:用户名、终端位置、登 录源信息、开始时间、结束时间、持续时间。注意最后一行输出的是 wtmp 文件起始记录的 时间。当然也可以通过 last -f 参数指定读取文件,可以是/var/log/btmp、/var/run/utmp 文件。 执行命令 last,
[root@master log]# last
root     pts/1        192.168.100.1    Tue May  9 05:13   still logged in   
root     pts/0        192.168.100.1    Tue May  9 04:21   still logged in   
root     tty1                          Tue May  9 04:04   still logged in   
reboot   system boot  3.10.0-862.el7.x Tue May  9 04:04 - 05:36  (01:31)    
reboot   system boot  3.10.0-862.el7.x Fri May  5 23:32 - 05:36 (3+06:03)   
root     pts/0        192.168.100.1    Fri May  5 23:19 - crash  (00:13)    
reboot   system boot  3.10.0-862.el7.x Fri May  5 23:18 - 05:36 (3+06:17)   
root     pts/0        192.168.100.1    Tue Apr 18 22:30 - crash (17+00:47)  
reboot   system boot  3.10.0-862.el7.x Tue Apr 18 22:30 - 05:36 (20+07:05)  
root     pts/1        192.168.100.1    Tue Apr 18 22:27 - crash  (00:02)    
root     pts/0        192.168.100.1    Tue Apr 18 21:36 - crash  (00:53)    
root     tty1                          Tue Apr 18 21:33 - crash  (00:56)    
reboot   system boot  3.10.0-862.el7.x Tue Apr 18 21:33 - 05:36 (20+08:02)  
root     pts/1        192.168.100.1    Wed Apr 12 08:24 - crash (6+13:08)   
root     pts/0        192.168.100.1    Wed Apr 12 08:00 - crash (6+13:33)   
reboot   system boot  3.10.0-862.el7.x Wed Apr 12 07:59 - 05:36 (26+21:36)  
root     pts/1        192.168.100.1    Wed Apr 12 06:19 - crash  (01:40)    
root     pts/0        192.168.100.1    Wed Apr 12 05:26 - crash  (02:33)    
reboot   system boot  3.10.0-862.el7.x Wed Apr 12 05:24 - 05:36 (27+00:12)  
root     pts/0        192.168.100.1    Wed Apr 12 04:45 - crash  (00:38)    
root     tty1                          Wed Apr 12 04:43 - crash  (00:40)    
reboot   system boot  3.10.0-862.el7.x Wed Apr 12 04:42 - 05:36 (27+00:53)  
root     pts/0        192.168.100.1    Tue Apr 11 22:40 - crash  (06:02)    
root     tty1                          Tue Apr 11 22:38 - crash  (06:03)    
reboot   system boot  3.10.0-862.el7.x Tue Apr 11 22:37 - 05:36 (27+06:58)  
root     pts/0        192.168.100.1    Thu Apr  6 05:15 - crash (5+17:21)   
reboot   system boot  3.10.0-862.el7.x Thu Apr  6 05:15 - 05:36 (33+00:20)  
root     pts/0        192.168.100.1    Tue Mar 28 21:21 - crash (8+07:54)   
root     tty1                          Tue Mar 28 21:19 - crash (8+07:55)   
reboot   system boot  3.10.0-862.el7.x Tue Mar 28 21:19 - 05:36 (41+08:16)  
root     pts/1        192.168.100.1    Wed Mar 22 23:58 - crash (5+21:21)   
root     tty1                          Wed Mar 22 22:53 - crash (5+22:25)   
root     pts/0        192.168.100.1    Wed Mar 22 22:53 - crash (5+22:25)   
reboot   system boot  3.10.0-862.el7.x Wed Mar 22 22:51 - 05:36 (47+06:44)  
root     pts/0        192.168.100.1    Wed Mar 22 09:36 - 11:01  (01:24)    
root     pts/0        192.168.100.1    Wed Mar 22 08:56 - 09:08  (00:12)    
root     tty1                          Wed Mar 22 08:54 - crash  (13:57)    
reboot   system boot  3.10.0-862.el7.x Wed Mar 22 08:47 - 05:36 (47+20:48)  
root     tty1                          Tue Mar 21 21:57 - crash  (10:50)    
reboot   system boot  3.10.0-862.el7.x Tue Mar 21 21:56 - 05:36 (48+07:39)  
hadoop   pts/2        master           Fri Mar 17 01:41 - 01:41  (00:00)    
hadoop   pts/2        master           Fri Mar 17 01:40 - 01:40  (00:00)    
hadoop   pts/1        master           Fri Mar 17 01:38 - crash (4+20:18)   
root     pts/1        192.168.100.1    Fri Mar 17 01:35 - 01:35  (00:00)    
root     pts/1        192.168.100.1    Fri Mar 17 01:22 - 01:23  (00:01)    
root     pts/0        192.168.100.1    Fri Mar 17 01:13 - crash (4+20:43)   
root     pts/0        192.168.100.1    Fri Mar 17 00:59 - 01:12  (00:13)    
reboot   system boot  3.10.0-862.el7.x Fri Mar 17 00:58 - 05:36 (53+04:37)  
root     pts/0        192.168.100.1    Thu Mar 16 05:15 - crash  (19:42)    
reboot   system boot  3.10.0-862.el7.x Thu Mar 16 05:15 - 05:36 (54+00:20)  
root     pts/0        192.168.100.1    Thu Mar 16 05:10 - down   (00:04)    
reboot   system boot  3.10.0-862.el7.x Thu Mar 16 05:10 - 05:15  (00:04)    
root     pts/0        192.168.100.1    Thu Mar 16 03:34 - down   (01:35)    
root     tty1                          Thu Mar 16 03:33 - 05:09  (01:36)    
reboot   system boot  3.10.0-862.el7.x Thu Mar 16 03:33 - 05:09  (01:36)    
root     pts/0        192.168.100.1    Wed Mar 15 00:32 - crash (1+03:01)   
reboot   system boot  3.10.0-862.el7.x Wed Mar 15 00:32 - 05:09 (1+04:37)   
root     pts/0        192.168.100.1    Tue Mar 14 22:18 - crash  (02:13)    
reboot   system boot  3.10.0-862.el7.x Tue Mar 14 22:18 - 05:09 (1+06:51)   
root     tty1                          Wed Mar 15 06:12 - crash  (-7:-54)   
reboot   system boot  3.10.0-862.el7.x Wed Mar 15 06:12 - 05:09  (22:57)    

wtmp begins Wed Mar 15 06:12:14 2023

使用命令 last -f /var/run/utmp,查看 utmp 文件
[root@master log]# last -f /var/run/utmp
root     pts/1        192.168.100.1    Tue May  9 05:13   still logged in   
root     pts/0        192.168.100.1    Tue May  9 04:21   still logged in   
root     tty1                          Tue May  9 04:04   still logged in   
reboot   system boot  3.10.0-862.el7.x Tue May  9 04:04 - 05:36  (01:32)    

utmp begins Tue May  9 04:04:12 2023

(3)lastb 列出失败尝试的登录信息 lastb 和 last 命令功能完全相同,只不过它默认读取的是/var/log/btmp 文件的信息。
[root@master log]# lastb

btmp begins Tue May  9 05:15:01 2023

上面结果显示,zhq 和 hadoop 用户曾经登录失败。

(4)通过 Linux 系统安全日志文件/var/log/secure 可查看 SSH 登录行为,该文件读 取需要 root 权限。

切换为 root 用户,执行 cat /var/log/secure 命令查看服务器登陆行为

[root@master log]# cat /var/log/secure
May  9 05:29:00 master su: pam_unix(su-l:session): session closed for user hadoop

2. 实验任务二: 在 Hadoop MapReduce Jobs 中查看日志 信息

首先,我们需要在 MapReduce 作业中记录信息,比如使用标准库,log4j 和用 System.out.println()或者 System.err.println()写入标准输出流。Hadoop 提供了一个查 看日志的用户界面。

Hadoop 中每一个 Mapper 和 Reducer 都有以下三种类型的日志:

(1)stdout-System.out.println()的输出定向到这个文件。

(2)stderr-System.err.println()的输出定向到这个文件。

(3)syslog-log4j 的日记输出定向到这个文件。在作业执行中出现和没有被处理的所 有异常的栈跟踪信息会在 syslog 中显示。

在浏览器地址栏中输入 http://master:19888/jobhistory,将显示关于作业的摘要信 息,

注意:需先启动 jobhistory 进程
启动: 在 hadoop 用户下执行
[hadoop@master ~]$ cd /usr/local/src/hadoop/sbin
[hadoop@master sbin]$ ./mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /usr/local/src/hadoop/logs/mapred-hadoop-historyserver-master.out
[hadoop@master sbin]$ 

image-20230509175754574

单击 Job ID 为 job_1597629049628_0012 链接,会出现以下界面,

image-20230509175741309

显示有关作业的额外信息,包括它的执行状态、开始和停止次数,以及运行所 在的队列等基本信息。我们可以查看有多少 Mapper 和 Reducer 用于执行作业。注意 Map、 Shuffle、Sort、Reduce 每个阶段的平均执行时间。现在我们可以点击链接去查看 Mapper 和 第 15 章 告警和日志信息监控 13 Reducer 的信息信息。

点击 Maps 的“1”的链接查看详细的 Mapper 日志,

image-20230509175847004

现在可以查看 Mapper 特定实例的日志。我们可以点击 Logs(日志)以查看如下图 15- 4 所示的信息。

image-20230509175927416

从上图我们可以查看三种日志:

(1)stdout,标准输出;

(2)stderr,标准错误;

(3)syslog,系统日志.

如果不启用日志聚合,则会显示一条日志不可用的消息,

在 yarn-site.xml 中加入以下配置启动日志聚合
[hadoop@master sbin]$ cd /usr/local/src/hadoop/etc/hadoop
[hadoop@master hadoop]$ vi yarn-site.xml 
#添加
        <property>
                <name>>yarn.loag-aggregation-enable</name>
                <value>true</value>
        </property>

3. 实验任务三:通过用户界面查看 Hadoop 日志

默认情况下,可以通过以下 URL 访问日志,http: //master:19888。 如前所述,日志是否聚集对用户都是透明的。如果日志是聚集的,Job History Manager 将会把日志从 HDFS 中取回。如果日志是非聚集的,将通过向单个节点的节点管理器发送请 求来获取日志。

作业运行的时候,能通过 Application Master Web 界面查看的日志将可通过节点管理 器 web 界面查看。 Application Master Web 界面反过来可以通过资源管理器 Web 界面左边 “RUNNING”的作业链接来访问。默认资源管理器 Web 界面可通过以下 URL 访问: http://master:8088

image-20230509180446473

根据上图所示,当前未有运行中的作业,故显示正在运行的作业为空。

点击左边的菜单“FINISHED”显示已经完成运行的作业,

image-20230509180516148

我们也可以通过 Hadoop 的 用 户 界 面 查 看 日 志 信 息 , 使 用 浏 览 器 访 问 http://master:50070,点击 Utilities-->Logs,

image-20230509180701638

从上图,我们可以看到 Hadoop 中的日志文件列表,包括 NameNode(名称节点)、 SecondaryNamenode(二级名称节点)、HistoryServer(历史服务器)、NodeManager(节点 管理器)和 ResourceManager(资源管理器)等日志文件。点击相应的日志文件即可查看日 志的内容。如点击查看hadoop-hadoop-namenode-master.out.2日志文件,

image-20230509180731109

4. 实验任务四:通过命令查看 Hadoop 日志

可以通过与命令行交互的的方式获取 Hadoop 的日志文件列表。

当某个日志达到一定的大小,将会被切割出一个新的文件,切割出来的日志文件名类似 “XXX.log.数字”的形式,后面的数字越大,代表日志越旧。在默认情况下,只保存前 20 个日志文件。 使用 hadoop 用户登录,并切换到/usr/local/src/hadoop/logs 目录,执行 ll 命令, 查看日志列表。

[hadoop@master ~]$ cd /usr/local/src/hadoop/logs
[hadoop@master logs]$ ll
total 3580
-rw-rw-r--. 1 hadoop hadoop  231622 Mar 22 10:03 hadoop-hadoop-datanode-master.log.COMPLETED
-rw-rw-r--. 1 hadoop hadoop     716 Mar 16 05:11 hadoop-hadoop-datanode-master.out.1.COMPLETED
-rw-rw-r--. 1 hadoop hadoop       0 Mar 15 00:33 hadoop-hadoop-datanode-master.out.2.COMPLETED
-rw-rw-r--. 1 hadoop hadoop     716 Mar 15 00:32 hadoop-hadoop-datanode-master.out.3.COMPLETED
-rw-rw-r--. 1 hadoop hadoop     716 Mar 14 23:32 hadoop-hadoop-datanode-master.out.4.COMPLETED
-rw-rw-r--. 1 hadoop hadoop     716 Mar 22 09:40 hadoop-hadoop-datanode-master.out.COMPLETED
-rw-rw-r--. 1 hadoop hadoop 2143513 May  9 06:08 hadoop-hadoop-namenode-master.log
-rw-rw-r--. 1 hadoop hadoop     716 May  9 05:48 hadoop-hadoop-namenode-master.out
-rw-rw-r--. 1 hadoop hadoop     716 May  5 23:20 hadoop-hadoop-namenode-master.out.1
-rw-rw-r--. 1 hadoop hadoop     716 Apr 18 22:31 hadoop-hadoop-namenode-master.out.2
-rw-rw-r--. 1 hadoop hadoop     716 Apr 18 22:28 hadoop-hadoop-namenode-master.out.3
-rw-rw-r--. 1 hadoop hadoop     716 Apr 12 08:01 hadoop-hadoop-namenode-master.out.4
-rw-rw-r--. 1 hadoop hadoop     716 Apr 12 05:28 hadoop-hadoop-namenode-master.out.5
-rw-rw-r--. 1 hadoop hadoop  516399 May  9 06:08 hadoop-hadoop-secondarynamenode-master.log
-rw-rw-r--. 1 hadoop hadoop   55880 May  9 06:08 hadoop-hadoop-secondarynamenode-master.out
-rw-rw-r--. 1 hadoop hadoop     716 May  5 23:20 hadoop-hadoop-secondarynamenode-master.out.1
-rw-rw-r--. 1 hadoop hadoop     716 Apr 18 22:31 hadoop-hadoop-secondarynamenode-master.out.2
-rw-rw-r--. 1 hadoop hadoop       0 Apr 18 22:28 hadoop-hadoop-secondarynamenode-master.out.3
-rw-rw-r--. 1 hadoop hadoop     716 Apr 12 08:02 hadoop-hadoop-secondarynamenode-master.out.4
-rw-rw-r--. 1 hadoop hadoop     716 Apr 12 05:28 hadoop-hadoop-secondarynamenode-master.out.5
-rw-r--r--. 1 hadoop hadoop   28619 Mar 15 00:34 hadoop-root-datanode-master.log.COMPLETED
-rw-r--r--. 1 hadoop hadoop     714 Mar 15 00:32 hadoop-root-datanode-master.out.COMPLETED
-rw-rw-r--. 1 hadoop hadoop   61443 May  9 06:08 mapred-hadoop-historyserver-master.log
-rw-rw-r--. 1 hadoop hadoop       0 May  9 05:51 mapred-hadoop-historyserver-master.out
-rw-rw-r--. 1 hadoop hadoop       0 May  9 05:41 mapred-hadoop-historyserver-master.out.1
-rw-rw-r--. 1 hadoop hadoop       0 Apr 18 22:28 SecurityAuth-hadoop.audit
-rw-rw-r--. 1 hadoop hadoop       0 Mar 14 23:31 SecurityAuth-hadoop.audit.COMPLETED
-rw-r--r--. 1 hadoop hadoop       0 Mar 15 00:32 SecurityAuth-root.audit.COMPLETED
-rw-rw-r--. 1 hadoop hadoop  497484 May  9 05:59 yarn-hadoop-resourcemanager-master.log
-rw-rw-r--. 1 hadoop hadoop    2078 May  9 05:56 yarn-hadoop-resourcemanager-master.out
-rw-rw-r--. 1 hadoop hadoop     700 May  5 23:20 yarn-hadoop-resourcemanager-master.out.1
-rw-rw-r--. 1 hadoop hadoop     700 Apr 18 22:31 yarn-hadoop-resourcemanager-master.out.2
-rw-rw-r--. 1 hadoop hadoop     700 Apr 12 08:02 yarn-hadoop-resourcemanager-master.out.3
-rw-rw-r--. 1 hadoop hadoop     700 Apr 12 05:28 yarn-hadoop-resourcemanager-master.out.4
-rw-rw-r--. 1 hadoop hadoop     700 Apr 12 04:48 yarn-hadoop-resourcemanager-master.out.5
-rw-rw-r--. 1 hadoop hadoop    2078 Mar 17 01:52 yarn-hadoop-resourcemanager-master.out.5.COMPLETED

我们可以获知日志文件的大小和 Hadoop 中所属组件的日记文件;yarn-hadoopresourcemanager-master.out 文件被切割为五个文件,并且后面的数字越大,代表该文件 越旧,符合 Hadoop 日志文件切割原则。

5. 实验任务五:查看 HBase 日志

Hbase提供了Web用户界面对日志文件的查看,使用浏览器访问http://master:60010, 显示 HBase 的 web 主界面,如图 15-11 所示。

image-20230509181349885

点击“Local Logs”菜单打开 HBase 的日志列表,

image-20230509181413456

点击其中一条链接来访问相应的日志信息,如hbase-hadoop-master-master.log

image-20230509181445331

6. 实验任务六:查看 Hive 日志

Hive 日志存储的位置为/tmp/hadoop,在命令行的模式下,切换到该目录,执行 ll 命 令,查看 Hive 的日志列表,显示如下。
[root@master ~]# cd /tmp/hadoop
[root@master hadoop]# ll
total 200
-rw-rw-r--. 1 hadoop hadoop 200414 May  9 05:23 hive.log
-rw-rw-r--. 1 hadoop hadoop   2038 May  9 05:19 stderr

使用 cat 命令查看 hive.log 日志文件,显示如下:

[root@master hadoop]# cat hive.log 
2023-05-09T05:13:20,615 INFO  [main]: hwi.HWIServer (HWIServer.java:main(131)) - HWI is starting up
2023-05-09T05:13:23,275 INFO  [main]: mortbay.log (Slf4jLog.java:info(67)) - Logging to org.apache.logging.slf4j.Log4jLogger@738dc9b via org.mortbay.log.Slf4jLog
2023-05-09T05:13:23,384 INFO  [main]: mortbay.log (Slf4jLog.java:info(67)) - jetty-6.1.26
2023-05-09T05:13:23,558 INFO  [main]: mortbay.log (Slf4jLog.java:info(67)) - Extract /usr/local/src/hive/lib/hive-hwi-2.0.0.war to /tmp/Jetty_0_0_0_0_9999_hive.hwi.2.0.0.war__hwi__p3f5if/webapp
2023-05-09T05:13:25,120 INFO  [main]: mortbay.log (Slf4jLog.java:info(67)) - Started [[email protected]:9999, null]
2023-05-09T05:19:24,145 INFO  [main]: hwi.HWIServer (HWIServer.java:main(131)) - HWI is starting up
2023-05-09T05:19:24,801 INFO  [main]: mortbay.log (Slf4jLog.java:info(67)) - Logging to org.apache.logging.slf4j.Log4jLogger@4145bad8 via org.mortbay.log.Slf4jLog
2023-05-09T05:19:24,983 INFO  [main]: mortbay.log (Slf4jLog.java:info(67)) - jetty-6.1.26
2023-05-09T05:19:25,176 INFO  [main]: mortbay.log (Slf4jLog.java:info(67)) - Extract /usr/local/src/hive/lib/hive-hwi-2.0.0.war to /tmp/Jetty_0_0_0_0_9999_hive.hwi.2.0.0.war__hwi__p3f5if/webapp
2023-05-09T05:19:26,310 INFO  [main]: mortbay.log (Slf4jLog.java:info(67)) - Started [[email protected]:9999, null]
2023-05-09T05:21:24,720 ERROR [867988177@qtp-1094523823-0]: mortbay.log (Slf4jLog.java:warn(87)) - /hwi/
org.apache.tools.ant.BuildException: The following error occurred while executing this line:
jar:file:/usr/local/src/hive/lib/ant-1.9.1.jar!/org/apache/tools/ant/antlib.xml:37: Could not create task or type of type: componentdef.

Ant could not find the task or a class this task relies upon.

This is common and has a number of causes; the usual 
solutions are to read the manual pages then download and
install needed JAR files, or fix the build file: 
 - You have misspelt 'componentdef'.
   Fix: check your spelling.
 - The task needs an external JAR file to execute
     and this is not found at the right place in the classpath.
   Fix: check the documentation for dependencies.
   Fix: declare the task.
 - The task is an Ant optional task and the JAR file and/or libraries
     implementing the functionality were not found at the time you
     yourself built your installation of Ant from the Ant sources.
   Fix: Look in the ANT_HOME/lib for the 'ant-' JAR corresponding to the
     task and make sure it contains more than merely a META-INF/MANIFEST.MF.
     If all it contains is the manifest, then rebuild Ant with the needed
     libraries present in ${ant.home}/lib/optional/ , or alternatively,
     download a pre-built release version from apache.org
 - The build file was written for a later version of Ant
   Fix: upgrade to at least the latest release version of Ant
 - The task is not an Ant core or optional task 
     and needs to be declared using <taskdef>.
 - You are attempting to use a task defined using 
    <presetdef> or <macrodef> but have spelt wrong or not 
   defined it at the point of use

Remember that for JAR files to be visible to Ant tasks implemented
in ANT_HOME/lib, the files must be in the same directory or on the
classpath
.....

2. 实验二 查看大数据平台告警信息

1. 实验任务一:查看大数据平台主机告警信息

主机是大数据平台重要的基础设施,包含硬件资源(CPU、内存、存储等)和操作系统 (Linux),而 Linux 操作系统管理着硬件资源,按需求调度 CPU、内存和存储等资源,通过 Linux 操作查看相关日志的告警信息,可以了解硬件资源的状态,从而帮助运维人员快速定 位问题,解决问题。

Linux 操作系统的的日志文件存储在/var/log 文件夹中。我们可以利用日志管理工具 journalctl 查看 Linux 操作系统主机上的告警信息。journalctl 是 centos7 上专有的日志 管理工具,该工具是从 message 这个文件里读取信息。

切换到/var/log 文件夹,执行命令 journalctl -p err..alert 查询系统错误告警信 息,显示如下:

[root@master hadoop]# cd /var/log
[root@master log]# journalctl -p err..alert
-- Logs begin at Tue 2023-05-09 04:04:10 EDT, end at Tue 2023-05-09 06:15:29 EDT. --
May 09 04:04:10 localhost.localdomain kernel: Detected CPU family 6 model 141 stepping 1
May 09 04:04:10 localhost.localdomain kernel: Warning: Intel Processor - this hardware has
May 09 04:04:10 localhost.localdomain kernel: sd 2:0:0:0: [sda] Assuming drive cache: writ
May 09 04:04:13 master kernel: piix4_smbus 0000:00:07.3: SMBus Host Controller not enabled
May 09 04:04:16 master systemd[1]: Failed to start Postfix Mail Transport Agent.
May 09 05:00:35 master sshd[1240]: pam_systemd(sshd:session): Failed to release 

我们通过查看分析 Linux 操作系统主机的告警信息,就能有针对性的解决各 种服务的问题。

我们也可以使用 journalctl 命令,根据服务的 ID 号来查询其告警信息。如根据上面的 结果显示,我们得知 sshd 服务的 ID 为 13067,查询 SSHD 服务错误告警信息,执行命令 journalctl _PID=13067 -p err,结果显示如下。


[root@master log]# journalctl _PID=13067 -p err
-- No entries --

实验任务二:查看 Hadoop 告警信息

Hadoop 的日志主要是存在/usr/local/src/hadoop/logs 文件夹中,而日志文件包含 Hadoop 各组件的状态和告警信息。切换到/usr/local/src/hadoop/logs 目录,文件列表如 下:

[root@master log]# cd /usr/local/src/hadoop/logs
[root@master logs]# ll
total 3908
-rw-rw-r--. 1 hadoop hadoop  231622 Mar 22 10:03 hadoop-hadoop-datanode-master.log.COMPLETED
-rw-rw-r--. 1 hadoop hadoop     716 Mar 16 05:11 hadoop-hadoop-datanode-master.out.1.COMPLETED
-rw-rw-r--. 1 hadoop hadoop       0 Mar 15 00:33 hadoop-hadoop-datanode-master.out.2.COMPLETED
-rw-rw-r--. 1 hadoop hadoop     716 Mar 15 00:32 hadoop-hadoop-datanode-master.out.3.COMPLETED
-rw-rw-r--. 1 hadoop hadoop     716 Mar 14 23:32 hadoop-hadoop-datanode-master.out.4.COMPLETED
-rw-rw-r--. 1 hadoop hadoop     716 Mar 22 09:40 hadoop-hadoop-datanode-master.out.COMPLETED
-rw-rw-r--. 1 hadoop hadoop 2190848 May  9 06:19 hadoop-hadoop-namenode-master.log
-rw-rw-r--. 1 hadoop hadoop     716 May  9 05:48 hadoop-hadoop-namenode-master.out
-rw-rw-r--. 1 hadoop hadoop     716 May  5 23:20 hadoop-hadoop-namenode-master.out.1
-rw-rw-r--. 1 hadoop hadoop     716 Apr 18 22:31 hadoop-hadoop-namenode-master.out.2
-rw-rw-r--. 1 hadoop hadoop     716 Apr 18 22:28 hadoop-hadoop-namenode-master.out.3
-rw-rw-r--. 1 hadoop hadoop     716 Apr 12 08:01 hadoop-hadoop-namenode-master.out.4
-rw-rw-r--. 1 hadoop hadoop     716 Apr 12 05:28 hadoop-hadoop-namenode-master.out.5
-rw-rw-r--. 1 hadoop hadoop  549245 May  9 06:19 hadoop-hadoop-secondarynamenode-master.log
-rw-rw-r--. 1 hadoop hadoop   87472 May  9 06:19 hadoop-hadoop-secondarynamenode-master.out
-rw-rw-r--. 1 hadoop hadoop     716 May  5 23:20 hadoop-hadoop-secondarynamenode-master.out.1
-rw-rw-r--. 1 hadoop hadoop     716 Apr 18 22:31 hadoop-hadoop-secondarynamenode-master.out.2
-rw-rw-r--. 1 hadoop hadoop       0 Apr 18 22:28 hadoop-hadoop-secondarynamenode-master.out.3
-rw-rw-r--. 1 hadoop hadoop     716 Apr 12 08:02 hadoop-hadoop-secondarynamenode-master.out.4
-rw-rw-r--. 1 hadoop hadoop     716 Apr 12 05:28 hadoop-hadoop-secondarynamenode-master.out.5
-rw-r--r--. 1 hadoop hadoop   28619 Mar 15 00:34 hadoop-root-datanode-master.log.COMPLETED
-rw-r--r--. 1 hadoop hadoop     714 Mar 15 00:32 hadoop-root-datanode-master.out.COMPLETED
-rw-rw-r--. 1 hadoop hadoop   73762 May  9 06:19 mapred-hadoop-historyserver-master.log
-rw-rw-r--. 1 hadoop hadoop       0 May  9 05:51 mapred-hadoop-historyserver-master.out
-rw-rw-r--. 1 hadoop hadoop       0 May  9 05:41 mapred-hadoop-historyserver-master.out.1
-rw-rw-r--. 1 hadoop hadoop       0 Apr 18 22:28 SecurityAuth-hadoop.audit
-rw-rw-r--. 1 hadoop hadoop       0 Mar 14 23:31 SecurityAuth-hadoop.audit.COMPLETED
-rw-r--r--. 1 hadoop hadoop       0 Mar 15 00:32 SecurityAuth-root.audit.COMPLETED
-rw-rw-r--. 1 hadoop hadoop  497484 May  9 05:59 yarn-hadoop-resourcemanager-master.log
-rw-rw-r--. 1 hadoop hadoop    2078 May  9 05:56 yarn-hadoop-resourcemanager-master.out
-rw-rw-r--. 1 hadoop hadoop     700 May  5 23:20 yarn-hadoop-resourcemanager-master.out.1
-rw-rw-r--. 1 hadoop hadoop     700 Apr 18 22:31 yarn-hadoop-resourcemanager-master.out.2
-rw-rw-r--. 1 hadoop hadoop     700 Apr 12 08:02 yarn-hadoop-resourcemanager-master.out.3
-rw-rw-r--. 1 hadoop hadoop     700 Apr 12 05:28 yarn-hadoop-resourcemanager-master.out.4
-rw-rw-r--. 1 hadoop hadoop     700 Apr 12 04:48 yarn-hadoop-resourcemanager-master.out.5
-rw-rw-r--. 1 hadoop hadoop    2078 Mar 17 01:52 yarn-hadoop-resourcemanager-master.out.5.COMPLETED

我们通过查看某个日志文件中包含告警信息的行,然后将这些行显示出来,如查询 ResourceManager 日记最新 1000 行且包含“info”关键字的告警信息,执行命令 tail - 1000f yarn-hadoop-resourcemanager-master.log | grep info,结果显示如下。

[root@master logs]# cd /usr/local/src/hadoop/logs
[root@master logs]#  tail -1000f yarn-hadoop-resourcemanager-master.log |  grep info
2023-04-12 05:36:59,076 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Storing info for app: application_1681291712687_0001
2023-04-12 05:37:17,137 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating info for app: application_1681291712687_0001
2023-04-12 06:31:28,677 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Storing info for app: application_1681291712687_0002
2023-04-12 06:32:11,797 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating info for app: application_1681291712687_0002
2023-04-12 08:12:40,355 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Storing info for app: application_1681300935029_0001
2023-04-12 08:12:49,968 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating info for app: application_1681300935029_0001
2023-04-12 08:24:48,988 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Storing info for app: application_1681300935029_0002
2023-04-12 08:24:58,572 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating info for app: application_1681300935029_0002

实验任务三:查看 HBase 告警信息

  1. 步骤一:变更日志告警级别

在 HBase 的 Web 用户界面提供了日志告警级别的查询和设置功能。在浏览器中访问 http://master:60010/logLevel 页面

image-20230509182225673

若要查询某个日志的告警级别,输入该日志名,点击“Get Log Level”按钮,显示该 日志的告警级别。如查询日志文件 habase-hadoop-master-master.log 的告警级别,

image-20230509182336816

结果显示,日志文件 habase-hadoop-master-master.log 的告警级别为 INFO。如果要 将该日志告警级别调整为 WARN,则在第二个框中输入 Log:habase-hadoop-mastermaster.log,Level:WARN,点击“Set Log Level”按钮,

image-20230509182459315

结果显示,habase-hadoop-master-master.log 日志告警级别已变更为 WARN 级别。

2. 步骤二:查询日志告警信息

HBase 的日志文件存储在/usr/loacl/src/hbase/logs 目录中,切换到该目录下,查看 第 15 章 告警 hbase-hadoop-master-master.log 文件的“INFO”告警信息, 执行命令 tail -100f hbasehadoop-master-master.log |grep INFO,结果显示如下。

[root@master logs]# cd /usr/local/src/hadoop/logs
[root@master logs]#  tail -100f yarn-hadoop-resourcemanager-master.log |  grep info
2023-04-12 05:36:59,076 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Storing info for app: application_1681291712687_0001
2023-04-12 05:37:17,137 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating info for app: application_1681291712687_0001
2023-04-12 06:31:28,677 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Storing info for app: application_1681291712687_0002
2023-04-12 06:32:11,797 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating info for app: application_1681291712687_0002
2023-04-12 08:12:40,355 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Storing info for app: application_1681300935029_0001
2023-04-12 08:12:49,968 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating info for app: application_1681300935029_0001
2023-04-12 08:24:48,988 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Storing info for app: application_1681300935029_0002
2023-04-12 08:24:58,572 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating info for app: application_1681300935029_0002

查看 hbase-hadoop-master-master.log 文件的“WARN”级别告警信息,执行命令 tail -100f hbase-hadoop-master-master.log |grep WARN,结果显示如下

[root@master logs]# tail -100f hbase-hadoop-master-master.log |grep WARN
2023-04-06 05:29:33,995 WARN  [master:16000.activeMasterManager] wal.WALProcedureStore: Log directory not found: File hdfs://master:9000/hbase/MasterProcWALs does not exist.
2023-05-05 23:21:59,615 WARN  [master/master/192.168.100.10:16000-SendThread(slave2:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
2023-05-09 06:12:16,378 WARN  [main-SendThread(slave2:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
2023-05-09 06:12:16,481 WARN  [main-SendThread(master:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
2023-05-09 06:12:16,623 WARN  [main-SendThread(slave1:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
2023-05-09 06:12:17,727 WARN  [main-SendThread(slave2:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
2023-05-09 06:12:17,829 WARN  [main-SendThread(master:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
2023-05-09 06:12:17,971 WARN  [main-SendThread(slave1:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
2023-05-09 06:12:19,077 WARN  [main-SendThread(slave2:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
2023-05-09 06:12:19,178 WARN  [main-SendThread(master:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
2023-05-09 06:12:19,292 WARN  [main-SendThread(slave1:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
2023-05-09 06:12:20,394 WARN  [main-SendThread(slave2:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
2023-05-09 06:12:20,496 WARN  [main-SendThread(master:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
2023-05-09 06:12:20,599 WARN  [main-SendThread(slave1:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
2023-05-09 06:12:21,701 WARN  [main-SendThread(slave2:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
2023-05-09 06:12:21,802 WARN  [main-SendThread(master:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
2023-05-09 06:12:21,904 WARN  [main-SendThread(slave1:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
2023-05-09 06:12:23,007 WARN  [main-SendThread(slave2:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
2023-05-09 06:12:23,108 WARN  [main-SendThread(master:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
2023-05-09 06:12:23,219 WARN  [main-SendThread(slave1:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
2023-05-09 06:12:24,323 WARN  [main-SendThread(slave2:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
2023-05-09 06:12:24,557 WARN  [main-SendThread(slave1:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
2023-05-09 06:12:25,928 WARN  [main-SendThread(slave2:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
2023-05-09 06:12:26,314 WARN  [main-SendThread(slave1:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
2023-05-09 06:12:27,729 WARN  [main-SendThread(slave2:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
2023-05-09 06:12:28,775 WARN  [main-SendThread(slave1:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
2023-05-09 06:12:30,773 WARN  [main-SendThread(slave2:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
2023-05-09 06:12:32,060 WARN  [main-SendThread(slave1:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect

4. 实验任务四:查看 Hive 告警信息

Hive 的日志文件存储在/tmp/hadoop 目录下,切换到该目录,并执行命令 ll,显示如下。

[root@master hadoop]# cd /tmp/hadoop
[root@master hadoop]# tail -1000f hive.log |grep INFO
2023-05-09T11:05:27,968 INFO 
[org.apache.hadoop.hive.common.JvmPauseMonitor$Monitor@37d3e140]: 
common.JvmPauseMonitor (JvmPauseMonitor.java:run(194)) - Detected 
pause in JVM or host machine (eg GC): pause of approximately 4923ms
2023-05-09T15:25:31,520 INFO 
[org.apache.hadoop.hive.common.JvmPauseMonitor$Monitor@37d3e140]: 
common.JvmPauseMonitor (JvmPauseMonitor.java:run(194)) - Detected 
pause in JVM or host machine (eg GC): pause of approximately 4439ms

Stderr(标准错误),该标准 IO 流通过预定义文件指针 stderr 加以引用,且该流引用 的文件与文件描述符 STDERR_FILENO 所引用的相同。

[root@master hadoop]# tail -1000f stderr |grep ERROR
ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console.
ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console.

标签:rw,hwi,05,hadoop,--.,master,日志,root,告警
From: https://www.cnblogs.com/shuangmu668/p/17533022.html

相关文章

  • nginx日志grafana展示
    背景:nginx日志没有使用json格式,直接通过flume输入到kafka,使用logstash读取kafka,将日志转换成json格式输入到es中。再从es中到prometheus。主要记录logstash实现转换过程记录。input{#输入组件kafka{......
  • [JAVA]日志管理
    LOGGER.debug("Requesturi:{},headers:{}",signedRequest.getURI(),signedRequest.getAllHeaders());LOGGER.debug("Requestbody:{}",request.getBody());......
  • 清理Teamcenter的审计日志
    清理日志的方法修改日志保留的天数:Fnd0RetentionPeriodThisconstantisplacedontheFnd0AuditLogbusinessobjectanditschildren.ThedefaultvalueisNULL.TypeavalueintheValueboxtochangeitsvalue. 批处理样例:ECHOONsettc_root=D:\Siemens\Teamc......
  • 【linux】日志合并
    #!/bin/bashoutput_file="merged_logs.log"#合并后的日志文件名logs=$(ls|grep.access.log)#获取满足条件的日志文件列表#清空或创建新的日志文件>"$output_file"#循环处理每个日志文件forlog_filein$logs;docat"$log_file">>"$output_file&q......
  • linux登录日志查询
    1.#CentOS下查看最后登录的用户信息tail/var/log/messagestail/var/log/secure2.*查看所有登陆记录,过滤IP和用户。who/var/log/wtmp|grep10.0.1.30|greproot3.last命令往回搜索wtmp来显示自从文件第一次创建以来登录过的用户......
  • 使用loguru模块将日志写入不同的文件
    #encoding=utf-8importtimefromfunctoolsimportwrapsfromloguruimportloggerlogger.add("log1.log",filter=lambdarecord:record["extra"].get("name")=="a",enqueue=True,catch=True)logger.add("log2.lo......
  • git在日志中查找这个文件
    git在日志中查找这个文件.gitlog--pretty=oneline--branches--文件名或gitlog--pretty=oneline--branches--文件夹名注意:文件(夹)名和--前必须有空格参考:https://www.yii666.com/blog/330372.html?action=onAll......
  • shell脚本:将运行容器的日志输出到文件清理服务器上的符合条件的docker镜像
    采集容器日志的shell脚本内容为:点击查看代码#!/bin/bashexportLANG=zh_CN.gb18030.~/.bash_profile#日志放置目录log_path=/aa/bb/cc/dd/eetodaydate=$(date+%Y%m%d)nowdate=$(date+%Y%m%d%H%M)#pod列表dube_pod_id='xx1-servicexx2-servicexx3-servicexx4-se......
  • windows下mysql中binlog日志分析和数据恢复
    1.首先查看是否开启了binlogshowvariableslike'%log_bin%'; 我的已经开启了,如果没开启则开启binlog2.查看有哪些binlog文件和正在使用的binlog文件 查看有哪些binlog文件showbinarylogs;或者showmasterlogs; 查看当前正在使用的是哪一个binlog文件show......
  • docker 设置日志大小
    1、单独某个容器dockerrun或dokcercreate时添加参数如创建并运行dockerrun--log-optmax-size=10m--log-optmax-file=32、全局范围内修改dockerdaemon.json文件,配置日志文件参数默认/etc/docker/daemon.json{ "log-driver":"json-file", "log-opts":{ "max-size&qu......