一、前言
2023年10月11号下午17:16分,临近下班之际,研发同事在办公室发出了尖锐的爆鸣声....紧接着企业微信呼唤,说线上项目突然无法正常发版了....与此同时接收到消息,便展开一系列排查。
二、排查思路定位
从报错信息上来看,在git pull 项目的过程就出错了,提示无法写入新的配置文件.....config.lock
stderr: error: failed to write new configuration file /home/jenkins/agent/workspace/TRTD-tcm-web-patient/.git/config.lock
大家看到这个信息首先会想到什么呢?这个错误通常是由于文件锁定导致的,这可能是由于其他进程正在使用或占用该文件。可为什么好好文件被锁定了呢?既然如此,登到服务器上查看对应的目录是否是因为权限问题导致的。根据上面给出的错误信息可以看出,大致是Jenkins工作目录这块存在问题,而jenkins对应的workspace工作目录我是做了持久化映射存储的,所以我尝试在宿主机搜索config.lock 这个文件,结果发现并未没有。
为什么会没有呢?难道是目录没有权限或者该文件被其它进程占用导致的? 于是乎我首先确认该文件是否被占用,发现没有占用,这个可能性排除
lsof /opt/workspace/workspace/TRTD-tcm-web-patient/.git/config.lock
紧接着,检查了权限问题,发现权限都正常,如果出现权限可以直接授权777权限。
#chmod 777 /opt/workspace/workspace/TRTD-tcm-web-patient/.git/config.lock
很明显不是权限问题,又不文件锁定问题,为什么会出现这种错误呢?随后我尝试将项目目录删除,重新尝试在jenkins构建。
重新构建该项目,发现能构建但出现新的错误.....从Also开始看,貌似是由于远程调用JNLP4连接导致的错误信息.....看到这里,大家是不是第一反应是网路问题,可千万别被误导了.....其实不然,输出一大堆都不是关键准确的信息,继续往下看
Also: hudson.remoting.Channel$CallSiteStackTrace: Remote call to JNLP4-connect connection from 192.168.0.101/192.168.0.101:53248
at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1784)
at hudson.remoting.UserRequest$ExceptionResponse.retrieve(UserRequest.java:356)
at hudson.remoting.Channel.call(Channel.java:1000)
at hudson.FilePath.act(FilePath.java:1194)
at hudson.FilePath.act(FilePath.java:1183)
at hudson.FilePath.mkdirs(FilePath.java:1374)
at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController.setupControlDir(FileMonitoringTask.java:311)
at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController.<init>(FileMonitoringTask.java:295)
at org.jenkinsci.plugins.durabletask.BourneShellScript$ShellController.<init>(BourneShellScript.java:280)
at org.jenkinsci.plugins.durabletask.BourneShellScript$ShellController.<init>(BourneShellScript.java:269)
at org.jenkinsci.plugins.durabletask.BourneShellScript.launchWithCookie(BourneShellScript.java:139)
at org.jenkinsci.plugins.durabletask.FileMonitoringTask.launch(FileMonitoringTask.java:132)
at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.start(DurableTaskStep.java:324)
at org.jenkinsci.plugins.workflow.cps.DSL.invokeStep(DSL.java:322)
at org.jenkinsci.plugins.workflow.cps.DSL.invokeMethod(DSL.java:196)
at org.jenkinsci.plugins.workflow.cps.CpsScript.invokeMethod(CpsScript.java:124)
at jdk.internal.reflect.GeneratedMethodAccessor10718.invoke(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:98)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1225)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1034)
at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:41)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116)
at org.kohsuke.groovy.sandbox.impl.Checker$1.call(Checker.java:180)
at org.kohsuke.groovy.sandbox.GroovyInterceptor.onMethodCall(GroovyInterceptor.java:23)
at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxInterceptor.onMethodCall(SandboxInterceptor.java:161)
at org.kohsuke.groovy.sandbox.impl.Checker$1.call(Checker.java:178)
at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:182)
at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:152)
at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:152)
at com.cloudbees.groovy.cps.sandbox.SandboxInvoker.methodCall(SandboxInvoker.java:17)
at com.cloudbees.groovy.cps.impl.ContinuationGroup.methodCall(ContinuationGroup.java:90)
at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.dispatchOrArg(FunctionCallBlock.java:113)
at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.fixArg(FunctionCallBlock.java:83)
at jdk.internal.reflect.GeneratedMethodAccessor206.invoke(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72)
at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.dispatchOrArg(FunctionCallBlock.java:107)
at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.fixArg(FunctionCallBlock.java:83)
at jdk.internal.reflect.GeneratedMethodAccessor206.invoke(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72)
at com.cloudbees.groovy.cps.impl.CastBlock$ContinuationImpl.cast(CastBlock.java:44)
at jdk.internal.reflect.GeneratedMethodAccessor402.invoke(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72)
at com.cloudbees.groovy.cps.impl.CollectionLiteralBlock$ContinuationImpl.dispatch(CollectionLiteralBlock.java:55)
at com.cloudbees.groovy.cps.impl.CollectionLiteralBlock$ContinuationImpl.item(CollectionLiteralBlock.java:45)
at jdk.internal.reflect.GeneratedMethodAccessor209.invoke(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72)
at com.cloudbees.groovy.cps.impl.ConstantBlock.eval(ConstantBlock.java:21)
at com.cloudbees.groovy.cps.Next.step(Next.java:83)
at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:177)
at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:166)
at org.codehaus.groovy.runtime.GroovyCategorySupport$ThreadCategoryInfo.use(GroovyCategorySupport.java:136)
at org.codehaus.groovy.runtime.GroovyCategorySupport.use(GroovyCategorySupport.java:275)
at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:166)
at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.access$001(SandboxContinuable.java:18)
at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxContinuable.java:51)
at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:187)
at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:420)
at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$400(CpsThreadGroup.java:95)
at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:330)
at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:294)
at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:67)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:139)
at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
java.nio.file.FileSystemException: /home/jenkins/agent/workspace/TRTD-tcm-web-patient@tmp/durable-6420d17d: No space left on device
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
at java.nio.file.Files.createDirectory(Files.java:674)
at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781)
at java.nio.file.Files.createDirectories(Files.java:767)
at hudson.FilePath.mkdirs(FilePath.java:3609)
at hudson.FilePath.access$1100(FilePath.java:212)
at hudson.FilePath$Mkdirs.invoke(FilePath.java:1384)
at hudson.FilePath$Mkdirs.invoke(FilePath.java:1379)
at hudson.FilePath$FileCallableWrapper.call(FilePath.java:3487)
at hudson.remoting.UserRequest.perform(UserRequest.java:212)
at hudson.remoting.UserRequest.perform(UserRequest.java:54)
at hudson.remoting.Request$2.run(Request.java:369)
at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:93)
at java.lang.Thread.run(Thread.java:748)
Finished: FAILURE
关键点来了,狐狸尾巴露出来了吧....java.nio.file.FileSystemException: /home/jenkins/agent/workspace/TRTD-tcm-web-patient@tmp/durable-6420d17d: No space left on device 没有磁盘空间了!
此时我再次回到服务器查看磁盘空间状态,果不其然,/分区空间已被吃满了,而我jenkins 对应的持久化workspace工作目录正好是挂在到/分区下面,所以导致jenkins无法正常发布,这也就能解释通为啥在之前错误会提示无法正常写入文件了failed to write new configuration file
说到这里,Jenkins还挺狡猾的,系统/分区吃满了,jenkins 竟然没有直接把磁盘问题爆出来,反而直接给出一个笼统的错误信息,最后把jenkins工作目录删掉这个磁盘问题才能直接曝出来,另外在系统操作的过程中,因一切操作命令都正常,所以并未联想到和磁盘相关的问题,一直被问题带着思路跑.....
三、问题处理&总结
既然问题已经定位出来,先临时快速的将问题解决掉,保证线上正常发版,只需要切换到jenkins 持久化工作目录清理下无用的项目目录即可,暂时先释放出一些空间, 华为云CCE 服务器我们有一个远程共享存储,后续我们在jenkinsfile中的工作目录挂载到共享存储即可,
#mkdir /nas/opt/workspace #在nas下面新建立一个jenkins工作目录
重新授权jenkins工作目录
#chmod 777 -R /nas/opt/workspace/
另外重新指定jenkinsfile中持久化目录位置
写到最后,以上就是整个问题的处理过程了,大家如果对我们jenkins持久化发布感兴趣的话,可以参考https://blog.51cto.com/u_11880730/7246083《分享生产项目DevOps CICD流水线解决方案》
标签:groovy,jenkinsci,java,Kubernetes,爆雷,plugins,org,发版,cps From: https://blog.51cto.com/u_11880730/7825269