Part 2A: Leader election

为了防止所有节点同时发起选举形成活锁，节点的超时时间需要带有随机性，论文中推荐150~300ms：

We recommend using a conservative election timeout such as 150–300ms; such timeouts are unlikely to cause unnecessary leader changes and will still provide good availability.

优化思考：投了拒绝票的server，需要reset election timer吗？如果不重置，可能出现刚选举出一个新leader，投了拒绝票的follower又发起一次新的选举。

Part 2B: Log replication

在没有新请求进来的情况下，拥有较大term的leader无法提交较小term的entry，导致最后一条entry始终无法提交。解决方法定期向日志中append一个特殊的entry，这条entry的command要是int类型，否则某些用例过不了。

在leader当选时需要触发一个空的entry，来迫使拥有更小term并且未提交的entry被提交。这一点在2021年视频课程的lecture 11 1:18:33中也讲到了。但实验中某些用例会通不过，例如TestRPCBytes2B，只能通过定期写入特殊日志的方式来解决这个问题。

Part 2C: Persistence

TestUnreliableChurn2C遇到过一个请求乱序引起的问题，出错概率在万分之一。

Leader复制日志下标为[2, 3, 4]的entries到follower
Follower提交了这几个entries
Follower收到了一个过期的AppendEntries请求，请求中的entries为[2, 3]
Follower进行一致性校验时，把4给截掉了
下一次leader election，这个follower投了错误的选票，导致没有最新entry的server当选为leader

解决方法：在执行一致性检查的时候，已经commit的entry需要跳过，不能被截断。

Part 2D: Log compaction

死锁问题

由于测试框架是循环从applyCh中取出消息，在for循环内部调用Snapshot接口进行日志压缩，因此向applyCh发消息不能在锁中完成，否则会形成下面的死锁：

线程1	线程2
mu.Lock()	for m := range applyCh
applyCh <- msg ---- 等待	Snapshot接口中的mu.Lock() ---- 等待

线程1等待线程2从channel中取走消息，线程2等待线程1释放mutex。解决方法是创建一个带长度的临时channel，apply时只是将ApplyMsg放入临时channel，然后开一个独立的协程无锁的提交这些日志：

func (rf *Raft) applier() {
	for msg := range rf.tmpApplyCh {
		rf.applyCh <- msg	// 无锁写入channel
	}
}

日志压缩

由于快照是由每个节点自由控制完成的，没有一个统一的快照或下标，因此，日志压缩后，保留下来的entry的下标是不能轻易变动的，否则日志复制会产生错乱。例如节点A将X复制B，并告知节点B这条日志的下标为1，但对于节点B来说，由于可能还未执行日志压缩，X应该存储在下标为10的位置而不是1的位置。因此，正是由于压缩行为是各节点自己控制的，所以2D部分的核心就是如何维护各自的日志下标。

这里的实现采用了逻辑下标与物理下标相结合的方式。逻辑下标表示如果没有发生日志压缩，日志应该所处的位置，物理下标就是日志当前真正所处的位置，通过物理下标能实际获得该entry。逻辑下标和物理下标之间可以相互转换，规则为：

逻辑下标 - lastIncludedIndex = 物理下标

节点之间传播的是逻辑下标，节点内部和下标相关的状态，例如commitIndex、lastApplied等也都使用逻辑下标，这样带来的好处是原先的RPC相关代码不用改动。只有真正要读写日志时，才将逻辑下标映射到物理下标。例如要获取日志中最后一个entry的逻辑下标和term：

func (rf *Raft) lastEntryInfo() (index int, term int) {
	lastLogIndex := rf.lastEntryLogicIndex()
	lastLogTerm := rf.getEntry(lastLogIndex).Term
	if lastLogTerm == 0 {
		// 最后一条entry已经被放进snapshot了
		lastLogTerm = rf.lastIncludedTerm
	}
	return lastLogIndex, lastLogTerm
}

// 计算得到日志的逻辑长度
func (rf *Raft) logLogicSize() int {
	return len(rf.log) + rf.lastIncludedIndex
}

// 计算得到最后一个entry的逻辑下标
func (rf *Raft) lastEntryLogicIndex() int {
	return rf.logLogicSize() - 1
}

持久化问题

Lab说明中有一个hint：

Hint: Even when the log is trimmed, your implemention still needs to properly send the term and index of the entry prior to new entries in AppendEntries RPCs; this may require saving and referencing the latest snapshot's lastIncludedTerm/lastIncludedIndex (consider whether this should be persisted).

这是关于lastIncludedTerm/lastIncludedIndex持久化问题，有两种实现方式：

持久化到state中
持久化到snapshot中

论文中是将这两者保存到了snapshot中：

但是这里有个问题，就是snapshot是由上层service来构造的，上层不保证会把这两个字段包含进来。例如测试框架就没有把lastIncludedTerm包含进来：

w := new(bytes.Buffer)
e := labgob.NewEncoder(w)
e.Encode(m.CommandIndex)	// 这个lastIncludedIndex用于InstallSnapshot时候的校验
var xlog []interface{}
for j := 0; j <= m.CommandIndex; j++ {
	xlog = append(xlog, cfg.logs[i][j])
}
e.Encode(xlog)
rf.Snapshot(m.CommandIndex, w.Bytes())

因此，我采用了第一种实现方式，将它们持久化到state中，这样即使上层service构建snapshot时不包含lastIncludedTerm/lastIncludedIndex，raft层自身也能从持久化中恢复出来。

总结：Raft层不感知snapshot具体格式，由service层解析。

标签：6.824,下标,Lab,rf,snapshot,entry,日志,节点
From： https://www.cnblogs.com/oyld/p/17124941.html

6.824 Lab 2