# Amazon Aurora -- On Avoiding Distributed Consensus

## Amazon Aurora: On Avoiding Distributed Consensus for I/Os, Commits, and Membership Changes

### 0x00 引言

We describe the following contributions:

(1) How Aurora performs writes using asynchronous flows, establishes local consistency points, uses consistency points for commit processing, and re-establishes them upon crash recovery. (Section 2)

(2) How Aurora avoids quorum reads and how reads are scaled across replicas. (Section 3)

(3) How Aurora uses quorum sets and epochs to make non-blocking reversible membership changes to process failures, grow storage, and reduce costs. (Section 4)


### 0x02 高效写

VCL为了保证数据的持久化，只有SCL这些还是不够的，因为这里的SCL应用的对象是segment，并不能代表全局的所有的数据。存储层保证不大于这个VCL的日志已经提交的。一个事务可能不止产生一条的log，这一组的log中产生一条记录的操作就叫做min transaction(MTR)，而System Commit Number(SCN)指的就是这些log中LSN最大的，当这条记录都持久化之后，就能保证这个事务所有的日志都已经持久化了，这个的大小不会超过VCL。这里说的东西和在论文[2]中的没有多大的差别。下面的提交的时候异步的方法也就是去年的论文上面的东西，

  When a commit is received, the worker thread writes the commit record, puts the transaction on a commit queue, and returns to a common task queue to find the next request to be processed. When a driver thread advances VCL, it wakes up a dedicated commit thread that scans the commit queue for SCNs below the new VCL and sends acknowledgements to the clients waiting for commit. There is no induced latency from group commits and no idle time for worker threads.


#### Crash Recovery in Aurora

​ 这里要处理的第一个问题：数据中保存的一些数据是没有持久化的，比如之前说的PGCL和VCL。这个时候就要求从存储结点的SCL信息中将这些信息恢复出来，

 The database snips off the ragged edge of the log by recording a truncation range that annuls any log records beyond the newly computed VCL (Figure 4). This ensures that, even if in-flight asynchronous operations complete during the process of crash recovery, they are ignored. New redo records after crash recovery are allocated LSNs above the truncation range.


### 0x03 高效读

Aurora does not do quorum reads. Through its bookkeeping of writes and consistency points, the database instance knows which segments have the last durable version of a data block and can request it directly from any of those segments.


The use of monotonically increasing consistency points – SCLs, PGCLs, PGMRPLs, VCLs, and VDLs – ensures the representation of consistency points is compact and comparable. These may seem like complex concepts but are just the extension of familiar database notions of LSNs and SCNs. The key invariant is that the log only ever marches forward. This also simplifies the process of coordinating multiple request processors, as shown here for replicas operating against common storage.


### 0x04 故障 和 Quorum成员

​ 这里说的是Aurora中的成员变更的机制。这里使用了基于epoch的方法。在一个数据库治理和一个对等的存储结点请求一个存储结点的时候，都会带上这个epoch的值(Aurora这里有两个epoch概念，一个是数据卷的epoch，一个是Quorum成员的epoch，这里说的后者)。

Let us now consider what happens if E also fails while we are replacing F with G, and we wish to replace it with H. In this case, we would move from a write quorum set of ((4/6 of ABCDEF AND 4/6 of ABCDEG) AND (4/6 of ABCDFH AND 4/6 of ABCDGH)). As with a single failure, I/Os can proceed, the operation is reversible, and the membership change can occur with an epoch increment. Note that, both with a single failure and with multiple failures, simply writing to the four members ABCD meets quorum.


• LSN，这里不是来自于Aurora，
• Segment Complete LSN (SCL)，
• Protection Group Complete LSN (PGCL)，
• Volume Complete LSN (VCL) ，
• System Commit Number(SCN)，
• Volume Durable LSN (VDL)，
• mini-transactions (MTRs)，
• Protection Group Minimum Read Point LSN (PGMRPL)，
• Consistency Point LSNs，

