自动开关ECS及对应服务 这两天,及今天加班,完成了一个对我来说蛮大工作量的任务,一条龙服务之:关服务,关ECS,开ECS(配置开机自启),检查服务,检查网页可用性。 不难,但比较繁琐~~ = =,因为我想尽量把代码写得好修改,复用率高点,也就是真正部署到生产环境(共4台)服务器能尽量把写好的脚本,稍微改一下就能部署。 先放效果图(方便不需要的读者直接绕路,哈哈哈),然后再贴代码(调得吐血 = =) 大致说下测试环境: 共2台服务器,1台A部署有nginx、tomcat、jar包、mongo(实际上线上是分了4台服务器),1台作为管理端B,用来对A进行一系列骚操作的。 一、效果图 测试前: 1、未动A之前,跑着一堆服务 在晚上9点41分的时候,我们检查下服务: ps aux |egrep "nginx|mongo|java|tomcat" |grep -v grep
2、控制端B设置定时任务
已配置ssh免秘钥登陆A
3、效果依次如下:
(1)服务关
对应脚本:stop_allservices.进去A,人眼检查下,记得留意date输出的时间
(2)ECS停和开
对应脚本:power_off.py 和power_on.py
(3)开机后检查服务
对应脚本:check_services.sh
首次检查是表示随开机自启的服务,确实跑起来了,不然重启一次,不行再报警
(4)网站首页检查可用性
对应脚本:check_page.sh
二、脚本
1、关停服务脚本:stop_allservices.sh
该脚本在A服务器上,是B机器,ssh远程A执行的
1 #!/bin/bash 2 3 ####################################################### 4 # $Name: stop_allservices 5 # $Fun: 停止系统所有服务,再调阿里云接口关机 6 7 ####################################################### 8 9 ### root执行 10 11 # 检查服务是否已关掉,没关掉就发告警 12 check_service() { 13 port=$2 14 listenport=`/usr/bin/lsof -i:${port} |grep -v COMMAND | awk '{print $2}'` 15 16 n=`ps aux |grep $1 | grep -v grep | wc -l` 17 if [ $n -gt 1 ]; 18 then 19 msg="Fail:$1\t服务没停,请手动关停!" 20 #echo $msg 21 curl_ding $msg 22 else 23 msg="Success:\t$1\t服务已关闭!" 24 curl_ding $msg 25 fi 26 27 } 28 29 curl_ding() { 30 msg=$1 31 echo $msg 32 TOKEN="钉钉机器人" 33 PHONE="我的手机号" 34 DATE=`date +%F_%T` 35 curl -H "Content-Type:application/json" -X POST --data '{"msgtype":"text","text":{"content":"当前时间:'$DATE'\n'$msg'"} , "at": {"atMobiles": ['${PHONE}' ], "isAtAll": false}}' ${TOKEN} > /dev/null 2>&1 36 } 37 38 39 ## 1、关nginx 40 ps aux | grep nginx |grep master |awk '{print $2}' | xargs kill -9 41 pkill nginx 42 #sleep 4 43 check_service nginx 80 44 45 ## 2、后端jar包 46 /usr/bin/lsof -i:{jar包运行端口} |grep -v COMMAND | awk '{print $2}' | xargs kill -9 47 sleep 5 48 check_service {jar包关键字名} {jar包运行端口} 49 50 ## 3、tomcat 51 /home/{系统用户}/app/{tomcat项目名}/bin/catalina.sh stop 52 #sleep 5 53 /usr/bin/lsof -i:{tomcat监听端口} |grep -v COMMAND | awk '{print $2}' | xargs kill -9 54 55 check_service tomcat {tomcat监听端口} 56 57 58 ## 4、mongo 59 mongopid=`ps aux |grep mongo |grep -v grep |awk '{print $2}'` 60 sudo kill -9 ${mongopid} 61 62 check_service mongo {mongo监听端口}
2、关停ECS和开启ECS:power_off.py 和 power_on.py
(1)power_off.py
1 #!/usr/bin/env python 2 #coding=utf-8 3 4 ## 用于关闭阿里云实例 5 6 import datetime, time 7 import json 8 import sys 9 import requests 10 11 from aliyunsdkcore.client import AcsClient 12 from aliyunsdkcore.acs_exception.exceptions import ClientException 13 from aliyunsdkcore.acs_exception.exceptions import ServerException 14 from aliyunsdkcore.auth.credentials import AccessKeyCredential 15 from aliyunsdkcore.auth.credentials import StsTokenCredential 16 17 from aliyunsdkecs.request.v20140526.StopInstancesRequest import StopInstancesRequest 18 19 credentials = AccessKeyCredential('AccessKey ID', 'AccessKey Secret') 20 client = AcsClient(region_id='cn-地区', credential=credentials) 21 22 request = StopInstancesRequest() 23 request.set_accept_format('json') 24 25 request.set_DryRun(False) 26 27 #Stopped Mode停止按量实例才需要,包年包月不用 28 request.set_StoppedMode("KeepCharging") 29 30 request.set_BatchOptimization("SuccessFirst") 31 request.set_InstanceIds(["实例ID"]) 32 33 response = client.do_action_with_exception(request) 34 35 # python2: print(response) 36 print(str(response, encoding='utf-8')) 37 38 ## 2、钉钉通知 39 def send_msg(text): 40 start = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S") 41 headers = {'Content-Type': 'application/json;charset=utf-8'} 42 api_url = "钉钉机器人token" 43 json_text= { 44 "msgtype": "text", 45 "text": { 46 "content": 'xxx生产环境' 47 + '\n当前时间:' + str(start) 48 + '\n' + text 49 }, 50 "at": { 51 "atMobiles": [ 52 "我的手机号码" 53 ], 54 "isAtAll": False 55 } 56 } 57 result = requests.post(api_url,json.dumps(json_text),headers=headers).content 58 print(result) 59 60 61 msg = '' 62 msg = msg + '服务器已关机: xxx平台' 63 send_msg(msg)View Code
(2)power_on.py
1 #!/usr/bin/env python 2 #coding=utf-8 3 4 ## 用于开启阿里云实例 5 6 import datetime, time 7 import json 8 import sys 9 import requests 10 11 12 from aliyunsdkcore.client import AcsClient 13 from aliyunsdkcore.acs_exception.exceptions import ClientException 14 from aliyunsdkcore.acs_exception.exceptions import ServerException 15 from aliyunsdkcore.auth.credentials import AccessKeyCredential 16 from aliyunsdkcore.auth.credentials import StsTokenCredential 17 18 from aliyunsdkecs.request.v20140526.StartInstanceRequest import StartInstanceRequest 19 20 21 credentials = AccessKeyCredential('AccessKey ID', 'AccessKey Secret') 22 client = AcsClient(region_id='cn-shenzhen', credential=credentials) 23 24 request = StartInstanceRequest() 25 request.set_accept_format('json') 26 27 request.set_InstanceId("ECS实例ID") 28 request.set_SourceRegionId("cn-shenzhen") 29 30 response = client.do_action_with_exception(request) 31 32 # python2: print(response) 33 print(str(response, encoding='utf-8')) 34 35 36 ## 2、钉钉通知 37 def send_msg(text): 38 start = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S") 39 headers = {'Content-Type': 'application/json;charset=utf-8'} 40 api_url = "https://oapi.dingtalk.com/robot/send?access_token=768505c69ce5fb49d848638dbe8df9b3ec93c72c8f67eb561f23aadae2f496e6" 41 json_text= { 42 "msgtype": "text", 43 "text": { 44 "content": 'xxx生产环境' 45 + '\n当前时间:' + str(start) 46 + '\n' + text 47 }, 48 "at": { 49 "atMobiles": [ 50 "我的手机号" 51 ], 52 "isAtAll": False 53 } 54 } 55 result = requests.post(api_url,json.dumps(json_text),headers=headers).content 56 print(result) 57 58 msg = '' 59 msg = msg + '服务器已开机: xxx平台' 60 61 send_msg(msg)View Code
3、开机后检查服务:check_services.sh
1 #!/bin/bash 2 3 ####################################################### 4 # $Name: check_services.sh 5 # $Fun: 6 # (1)先调阿里云API接口,开ECS 7 # (2)再开服务器:所有服务 8 9 ####################################################### 10 11 ### root执行(怕tomcat读不到JAVA环境变量) 12 export JAVA_HOME=/usr/local/java/jdk1.8 13 # 检查服务是否已开启,没开启就发告警 14 15 # 1、不报警:首次检查服务没有启动;通知:已开启发告警 16 check_service() { 17 port=$2 18 listenport=`/usr/bin/lsof -i:${port} |grep -v COMMAND | awk '{print $2}'` 19 n=`ps aux |grep $1 | grep -v grep |wc -l` 20 ## 返回值做二次判断 21 if [ $n -gt 1 ] || [ -n "$listenport" ]; 22 then 23 msg="Success:\t$1\t服务已正常开启(首次检查)" 24 curl_ding $msg 25 return 0 26 else 27 return 1 28 fi 29 30 } 31 32 # 2、报警:第二次检查没启动 33 check_service2() { 34 port=$2 35 listenport=`/usr/bin/lsof -i:${port} |grep -v COMMAND | awk '{print $2}'` 36 n=`ps aux |grep $1 | grep -v grep |wc -l` 37 38 if [ $n -gt 1 ] || [ -n "$listenport" ]; 39 then 40 msg="Success:\t$1\t服务已正常开启(第二次检查,重启过一次)" 41 curl_ding $msg 42 else 43 msg="Fail:\t$1\t服务没有启动!---请检查" 44 curl_ding $msg 45 fi 46 } 47 48 curl_ding() { 49 msg=$1 50 #echo $msg 51 TOKEN="钉钉机器人token" 52 PHONE="我的手机号码" 53 DATE=`date +%F_%T` 54 curl -H "Content-Type:application/json" -X POST --data '{"msgtype":"text","text":{"content":"当前时间:'$DATE'\n'$msg' ! "} , "at": {"atMobiles": ['${PHONE}' ], "isAtAll": false}}' ${TOKEN} > /dev/null 2>&1 55 } 56 57 58 ## 1、开nginx 59 # 开机已自启一次 60 # 1.1 首次检查 61 check_service nginx {nginx监听端口} 62 Nginx_Status=`echo $?` 63 if [ $Nginx_Status != 0 ]; then 64 # 1.2 尝试重启 65 /home/{系统用户}/nginx/sbin/nginx & 66 sleep 5 67 # 1.3 检查第二次,还没起来就报警 68 check_service2 nginx {nginx监听端口} 69 fi 70 71 ## 2、开后端:jar 72 # 2.1 首次检查 73 check_service {java包关键字} {java监听端口} 74 Jar_Status=`echo $?` 75 if [ $Jar_Status != 0 ]; then 76 # 2.2 尝试重启 77 sh /home/{系统用户}/scripts/{启动后端服务}.sh {java包关键字}.jar 78 sleep 10 79 80 # 2.3 二次检查 81 check_service2 {java包关键字} {java监听端口} 82 fi 83 84 ## 3、后端:tomcat 85 # 3.1 首次检查 86 check_service tomcat {tomcat监听端口} 87 Tomcat_Status=`echo $?` 88 if [ $Tomcat_Status != 0 ]; then 89 /home/{系统用户}/app/{tomcat项目关键字}/bin/catalina.sh start 90 sleep 5 91 check_service2 tomcat {tomcat监听端口} 92 93 fi 94 95 ## 4、开中间件:mongo 96 check_service mongo {mongo监听端口} 97 Mongo_Status=`echo $?` 98 if [ $Mongo_Status != 0 ]; then 99 sudo /home/{系统用户}/mongodb/bin/mongod -f /home/{系统用户}/mongodb/conf/mongodb.conf --logappend 100 sleep 3 101 check_service2 mongo {mongo监听端口} 102 fi
4、网站可用性脚本:check_page.sh
1 #!/bin/bash 2 3 ####################################################### 4 # $Name: check_page.sh 5 # $Version: v1.0 6 # $Function: 检查xx系统访问地址 7 8 ####################################################### 9 10 11 check_url() { 12 checkurl=$1 13 status=`curl -I -m 10 -o /dev/null -s -w %{http_code} ${checkurl}` 14 15 if [ $2 == 'ABC' ]; 16 then 17 keyword="xxx平台" 18 elif [ $2 == 'CDE' ]; 19 then 20 keyword="xxx系统" 21 else 22 keyword="xxx后端" 23 fi 24 25 #不返回200,不正常 26 if [ $status -ne 200 ]; 27 then 28 msg="$keyword:访问不正常(非200状态码)" 29 else 30 msg="$keyword:访问正常" 31 fi 32 #发通知 33 curl_ding $msg 34 } 35 36 curl_ding() { 37 msg=$1 38 #echo $msg 39 TOKEN="钉钉机器人" 40 PHONE="我的手机号码" 41 DATE=`date +%F_%T` 42 curl -H "Content-Type:application/json" -X POST --data '{"msgtype":"text","text":{"content":"当前时间:'$DATE'\n'$msg'"} , "at": {"atMobiles": ['${PHONE}' ], "isAtAll": false}}' ${TOKEN} > /dev/null 2>&1 43 } 44 45 46 url_ABC='XXXXX' 47 url_CDE='XXXXX' 48 url_EFG='XXXXX' 49 50 check_url $url_ABC ABC 51 check_url $url_CDE CDE 52 check_url $url_EFG EFG
三、注意点
最后讲下坑爹的一个点,该脚本:check_services.sh 有个非常难看的:
msg="Success:\t$1\t服务已正常开启(第二次检查,重启过一次)"因为为了好看,我想输出到钉钉用空格,但是被强制当成多个参数传递给shell的函数,搞到被截断。这里用用tab键分开,不然会变成这样,后面的字都没了
没有空格又丑,将就下看,哈哈哈
最后写完博客,我就把A测试机关掉并释放,不然按量购买的机器每小时8毛那样扣费
Ada,我的爱机~~~洗洗睡~
标签:grep,text,check,自动开关,json,ECS,msg,import,对应 From: https://www.cnblogs.com/windysai/p/16721136.html