使用Kubernetes部署storm集群后,发现Supervisor在运行过程中不断重启,十分奇怪。
因此新开一个窗口追踪Supervisor的日志supervisor.log。最终发现每次storm在rebalance的时候,Supervisor需要kill掉本节点上的进程。但由于storm:1.2.2中的kill命令是一个shell内置命令,导致Supervisor调用不到,最终程序异常退出,pod也因此重启。
具体日志信息如下:
2023-07-19 02:14:48.370 o.a.s.d.s.Container SLOT_6700 [INFO] Killing df3bf58e-6387-4c7a-867b-11e5ba2c9a85:8bf1dd6a-42c1-4a29-90ae-86e46e6ece54
2023-07-19 02:14:48.370 o.a.s.d.s.Container SLOT_6701 [INFO] Killing df3bf58e-6387-4c7a-867b-11e5ba2c9a85:5cbe5261-5e54-4333-9898-823126e35193
2023-07-19 02:14:48.378 o.a.s.u.Utils SLOT_6701 [INFO] IOException Error when trying to kill 160.
2023-07-19 02:14:48.378 o.a.s.u.Utils SLOT_6700 [INFO] IOException Error when trying to kill 163.
2023-07-19 02:14:48.378 o.a.s.d.s.Slot SLOT_6701 [ERROR] Error when processing event
java.io.IOException: Cannot run program "kill" (in directory "."): error=2, No such file or directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) ~[?:1.8.0_222]
at java.lang.Runtime.exec(Runtime.java:620) ~[?:1.8.0_222]
at org.apache.storm.shade.org.apache.commons.exec.launcher.Java13CommandLauncher.exec(Java13CommandLauncher.java:58) ~[storm-core-1.2.2.jar:1.2.2]
at org.apache.storm.shade.org.apache.commons.exec.DefaultExecutor.launch(DefaultExecutor.java:254) ~[storm-core-1.2.2.jar:1.2.2]
at org.apache.storm.shade.org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:319) ~[storm-core-1.2.2.jar:1.2.2]
at org.apache.storm.shade.org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:160) ~[storm-core-1.2.2.jar:1.2.2]
at org.apache.storm.shade.org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:147) ~[storm-core-1.2.2.jar:1.2.2]
at org.apache.storm.utils.Utils.execCommand(Utils.java:1904) ~[storm-core-1.2.2.jar:1.2.2]
at org.apache.storm.utils.Utils.sendSignalToProcess(Utils.java:1936) ~[storm-core-1.2.2.jar:1.2.2]
at org.apache.storm.utils.Utils.killProcessWithSigTerm(Utils.java:1951) ~[storm-core-1.2.2.jar:1.2.2]
at org.apache.storm.daemon.supervisor.Container.kill(Container.java:166) ~[storm-core-1.2.2.jar:1.2.2]
at org.apache.storm.daemon.supervisor.Container.kill(Container.java:184) ~[storm-core-1.2.2.jar:1.2.2]
at org.apache.storm.daemon.supervisor.Slot.killContainerForChangedAssignment(Slot.java:311) ~[storm-core-1.2.2.jar:1.2.2]
at org.apache.storm.daemon.supervisor.Slot.handleRunning(Slot.java:527) ~[storm-core-1.2.2.jar:1.2.2]
at org.apache.storm.daemon.supervisor.Slot.stateMachineStep(Slot.java:265) ~[storm-core-1.2.2.jar:1.2.2]
at org.apache.storm.daemon.supervisor.Slot.run(Slot.java:752) [storm-core-1.2.2.jar:1.2.2]
Caused by: java.io.IOException: error=2, No such file or directory
at java.lang.UNIXProcess.forkAndExec(Native Method) ~[?:1.8.0_222]
at java.lang.UNIXProcess.<init>(UNIXProcess.java:247) ~[?:1.8.0_222]
at java.lang.ProcessImpl.start(ProcessImpl.java:134) ~[?:1.8.0_222]
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) ~[?:1.8.0_222]
... 15 more
command terminated with exit code 137
进入storm:1.2.2中查看kill命令:
root@ubuntu:# docker run -it storm:1.2.2 /bin/bash
root@f5eacdd1fb1b:/apache-storm-1.2.2# type kill
kill is a shell builtin
解决方案
基于当前storm:1.2.2构建新镜像,并添加procps
包以提供kill命令
Dockerfile
由于我的网络是走代理连接的,因此设置了代理的环境变量。
# 基于原有的storm镜像
FROM storm:1.2.2
# 设置代理
ENV http_proxy http://proxy.example.com:8080
ENV https_proxy http://proxy.example.com:8080
ENV no_proxy localhost,127.0.0.1
# 更新系统并安装procps包
RUN apt-get update && apt-get install -y procps
构建新镜像
root@ubuntu:# docker build -t storm-fix:1.2.2 .
Sending build context to Docker daemon 43.01kB
Step 1/5 : FROM storm:1.2.2
---> 03a242e39169
Step 2/5 : ENV http_proxy http://proxy.example.com:8080
---> Running in d4c2677998f4
Removing intermediate container d4c2677998f4
---> d7116525b99d
Step 3/5 : ENV https_proxy http://proxy.example.com:8080
---> Running in ece93acfc2b7
Removing intermediate container ece93acfc2b7
---> 636b49a5f653
Step 4/5 : ENV no_proxy localhost,127.0.0.1
---> Running in f117ce7433dd
Removing intermediate container f117ce7433dd
---> c8b2f42d2eba
Step 5/5 : RUN apt-get update && apt-get install -y procps
---> Running in d01ab77eb7bb
Get:1 http://security.debian.org/debian-security buster/updates InRelease [34.8 kB]
Get:2 http://deb.debian.org/debian buster InRelease [122 kB]
Get:3 http://deb.debian.org/debian buster-updates InRelease [56.6 kB]
Get:4 http://security.debian.org/debian-security buster/updates/main amd64 Packages [541 kB]
Get:5 http://deb.debian.org/debian buster/main amd64 Packages [7909 kB]
Get:6 http://deb.debian.org/debian buster-updates/main amd64 Packages [8788 B]
Fetched 8672 kB in 10s (849 kB/s)
Reading package lists...
Reading package lists...
Building dependency tree...
Reading state information...
The following additional packages will be installed:
libgpm2 libncurses6 libncursesw6 libprocps7 libtinfo6 psmisc
Suggested packages:
gpm
The following NEW packages will be installed:
libgpm2 libncurses6 libprocps7 procps psmisc
The following packages will be upgraded:
libncursesw6 libtinfo6
2 upgraded, 5 newly installed, 0 to remove and 60 not upgraded.
Need to get 1041 kB of archives.
After this operation, 1931 kB of additional disk space will be used.
Get:1 http://deb.debian.org/debian buster/main amd64 libprocps7 amd64 2:3.3.15-2 [61.7 kB]
Get:2 http://security.debian.org/debian-security buster/updates/main amd64 libtinfo6 amd64 6.1+20181013-2+deb10u3 [326 kB]
Get:3 http://deb.debian.org/debian buster/main amd64 procps amd64 2:3.3.15-2 [259 kB]
Get:4 http://deb.debian.org/debian buster/main amd64 libgpm2 amd64 1.20.7-5 [35.1 kB]
Get:5 http://security.debian.org/debian-security buster/updates/main amd64 libncursesw6 amd64 6.1+20181013-2+deb10u3 [132 kB]
Get:6 http://deb.debian.org/debian buster/main amd64 psmisc amd64 23.2-1+deb10u1 [126 kB]
Get:7 http://security.debian.org/debian-security buster/updates/main amd64 libncurses6 amd64 6.1+20181013-2+deb10u3 [102 kB]
debconf: delaying package configuration, since apt-utils is not installed
Fetched 1041 kB in 7s (156 kB/s)
(Reading database ... 8106 files and directories currently installed.)
Preparing to unpack .../libtinfo6_6.1+20181013-2+deb10u3_amd64.deb ...
Unpacking libtinfo6:amd64 (6.1+20181013-2+deb10u3) over (6.1+20181013-2) ...
Setting up libtinfo6:amd64 (6.1+20181013-2+deb10u3) ...
(Reading database ... 8106 files and directories currently installed.)
Preparing to unpack .../libncursesw6_6.1+20181013-2+deb10u3_amd64.deb ...
Unpacking libncursesw6:amd64 (6.1+20181013-2+deb10u3) over (6.1+20181013-2) ...
Setting up libncursesw6:amd64 (6.1+20181013-2+deb10u3) ...
Selecting previously unselected package libncurses6:amd64.
(Reading database ... 8106 files and directories currently installed.)
Preparing to unpack .../libncurses6_6.1+20181013-2+deb10u3_amd64.deb ...
Unpacking libncurses6:amd64 (6.1+20181013-2+deb10u3) ...
Selecting previously unselected package libprocps7:amd64.
Preparing to unpack .../libprocps7_2%3a3.3.15-2_amd64.deb ...
Unpacking libprocps7:amd64 (2:3.3.15-2) ...
Selecting previously unselected package procps.
Preparing to unpack .../procps_2%3a3.3.15-2_amd64.deb ...
Unpacking procps (2:3.3.15-2) ...
Selecting previously unselected package libgpm2:amd64.
Preparing to unpack .../libgpm2_1.20.7-5_amd64.deb ...
Unpacking libgpm2:amd64 (1.20.7-5) ...
Selecting previously unselected package psmisc.
Preparing to unpack .../psmisc_23.2-1+deb10u1_amd64.deb ...
Unpacking psmisc (23.2-1+deb10u1) ...
Setting up libgpm2:amd64 (1.20.7-5) ...
Setting up psmisc (23.2-1+deb10u1) ...
Setting up libprocps7:amd64 (2:3.3.15-2) ...
Setting up libncurses6:amd64 (6.1+20181013-2+deb10u3) ...
Setting up procps (2:3.3.15-2) ...
update-alternatives: using /usr/bin/w.procps to provide /usr/bin/w (w) in auto mode
update-alternatives: warning: skip creation of /usr/share/man/man1/w.1.gz because associated file /usr/share/man/man1/w.procps.1.gz (of link group w) doesn't exist
Processing triggers for libc-bin (2.28-10) ...
Removing intermediate container d01ab77eb7bb
---> dacf56c95507
Successfully built dacf56c95507
Successfully tagged storm-fix:1.2.2
最后将storm.yaml的部署文件中的镜像修改为storm-fix:1.2.2
即可