目录
Orchestrator中,在MySQL集群粒度,有故障自动恢复开关,在全局粒度,也有一个全局的开关(global recovery disable)。
本文主要介绍全局开关(global recovery disable)的基本实现方式。
下面分别从几个层面阐述。
1. DB 层
在DB层定义一个数据表,用于存储全局开关状态。
global_recovery_disable
数据表:
CREATE TABLE IF NOT EXISTS global_recovery_disable (
disable_recovery tinyint unsigned NOT NULL COMMENT 'Insert 1 to disable recovery globally',
PRIMARY KEY (disable_recovery)
) ENGINE=InnoDB DEFAULT CHARSET=ascii
相应的,有操作DB的相关接口,查询、插入记录(关闭开关)、删除记录(打开开关)。
func IsRecoveryDisabled() (disabled bool, err error) {}
func DisableRecovery() error {}
unc EnableRecovery() error {}
2. raft 同步层
为了在Orchestrator 集群节点之间同步全局开关状态,定义raft同步接口:
func (applier *CommandApplier) disableGlobalRecoveries(value []byte) interface{} {}
func (applier *CommandApplier) enableGlobalRecoveries(value []byte) interface{} {}
3. API 层
在HTTP 层,提供相关接口,供外部使用。
查询开关状态
// CheckGlobalRecoveries checks whether
func (this *HttpAPI) CheckGlobalRecoveries(params martini.Params, r render.Render, req *http.Request) {}
关闭全局开关
// DisableGlobalRecoveries globally disables recoveries
func (this *HttpAPI) DisableGlobalRecoveries(params martini.Params, r render.Render, req *http.Request, user auth.User) {}
打开全局开关
// EnableGlobalRecoveries globally enables recoveries
func (this *HttpAPI) EnableGlobalRecoveries(params martini.Params, r render.Render, req *http.Request, user auth.User) {}
4. snapshot 层
从数据表中查询记录,写入snapshot:
func CreateSnapshotData() *SnapshotData {
snapshotData := NewSnapshotData()
... ...
snapshotData.RecoveryDisabled, _ = IsRecoveryDisabled()
... ...
}
从snapshot中恢复到数据表中:
func (this *SnapshotDataCreatorApplier) Restore(rc io.ReadCloser) error {
snapshotData := NewSnapshotData()
... ...
// recovery disable
{
SetRecoveryDisabled(snapshotData.RecoveryDisabled)
}
... ...
5. 自动故障恢复
自动故障恢复发起前,检查全局开关状态,如果全局禁用,则直接返回,不继续执行:
func executeCheckAndRecoverFunction(analysisEntry inst.ReplicationAnalysis, candidateInstanceKey *inst.InstanceKey, forceInstanceRecovery bool, skipProcesses bool) (recoveryAttempted bool, topologyRecovery *TopologyRecovery, err error) {
... ...
// Check for recovery being disabled globally
if recoveryDisabledGlobally, err := IsRecoveryDisabled(); err != nil {
// Unexpected. Shouldn't get this
log.Errorf("Unable to determine if recovery is disabled globally: %v", err)
} else if recoveryDisabledGlobally {
if !forceInstanceRecovery {
log.Infof("CheckAndRecover: Analysis: %+v, InstanceKey: %+v, candidateInstanceKey: %+v, "+
"skipProcesses: %v: NOT Recovering host (disabled globally)",
analysisEntry.Analysis, analysisEntry.AnalyzedInstanceKey, candidateInstanceKey, skipProcesses)
return false, nil, err
}
log.Infof("CheckAndRecover: Analysis: %+v, InstanceKey: %+v, candidateInstanceKey: %+v, "+
"skipProcesses: %v: recoveries disabled globally but forcing this recovery",
analysisEntry.Analysis, analysisEntry.AnalyzedInstanceKey, candidateInstanceKey, skipProcesses)
}
... ...
}
6. Dashboard 页面
在Dashboard上,通过调用HTTP接口查看和操作。
如下图所示: