# Flexible Paxos and Paxos Quorum Leases

## Flexible Paxos: Quorum Intersection Revisited

### 0x00 引言

Phase 1 - Prepare & Promise
* i. A proposer selects a unique proposal number p and sends prepare(p) to the acceptors.
* ii. Each acceptor receives prepare(p). If p is the highest proposal number promised, then p is written to persistent storage and the acceptor replies with promise(p’,v’). (p’,v’) is the last accepted proposal (if present) where p’ is the proposal number and v’ is the corresponding proposed value.
* iii. Once the proposer receives promise from the majority of acceptors, it proceeds to phase two. Otherwise, it may try again with higher proposal number.

Phase 2 - Propose & Accept
* i. The proposer must now select a value v. If more than one proposal was returned in phase 1 then it must choose the value associated with the highest proposal number. If no proposals were returned, then the proposer can choose its own value for v. The proposer then sends propose(p,v) to the acceptors.
* ii. Each acceptor receives a propose(p,v). If p is equal to or greater than the highest promised proposal number, then the promised proposal number and accepted proposal is written to persistent storage and the acceptor replies with accept(p).
* iii. Once the proposer receives accept(p) from the majority of acceptors, it learns that the value v is decided. Otherwise, it may try phase 1 again with a higher proposal number.


Paxos算法的Quorum策略在上面的两个阶段都是多数(称之为Majority Quorum)，即Prepare和Propose都是得到多数的回应。而FPaxos基于一个这样的观察，这里只要保证两个阶段有交集即能保证算法的正确性(需要在加上算法其它的一些变化，Paper中只将了Quorum策略，不过这样改的话应该还是需要算法的其它操作改动)，提出了两种新的Quorum的策略。Paper中称两个阶段的Quorum分为为Q1、Q2。

### 0x02 两种Quorum策略

#### Simple Quorums

Simple Quorums的方式让人联想到Dynamo中使用的Quorum的策略。在这种方式中，FPaxos要求就是 Q1 + Q2 > N。考虑到实际的应用，这里一般是Q2 < N/2，这样就要Q2 < Q1。这几个值至少满足Q1 = N − Q2 + 1。这样是不能够容忍的副本故障的数量为Q2 − 1。而在Q2到N − Q2副本故障的时候，系统还可以进行复制的操作。

... As has been previously observed, we do not need to send prepare and propose messages to all acceptors, only to at least |Q1| or |Q2| acceptors. If any of these acceptors do not reply, then the leader can send the messages to more acceptors. This reduces the number of messages from 4 × N to (2 × |Q1|) + (2 × |Q2|).


#### Grid Quorums

Grid使用了一种“巧妙”的方法，可以使得Q1 + Q2 < N也可以满足Q1和Q2会有交集的。这里有N1 × N2 = N(N1、N2为行、列)。如果Q1为这个Grid的一个一行，则如果是Q2是一列的话，就可以保证两者存在交集。这样可以容忍故障的数量为MIN(N1, N2)到(N1 − 1) × (N2 − 1)。后者的情况就是下面的白色的格子都故障系统都可以运行。而且这里是N1为1 or N2为1是可以得到两种特殊的情况，

... let us consider setting N1 = N and N2 = 1 when using grid quorums or equivalently setting |Q1| = 1 and |Q2| = N with simple quorums. This would require every acceptor to participate in Q2 but only a single acceptor is needed for Q1. If any acceptors are still up, then we can complete Q1 and learn past decisions.


#### 安全性

FPaxos的安全性基于两点，

Theorem 1. If value v is decided with proposal number p and v′ is decided with proposal number p′ then v = v′

Theorem 2. If value v is decided with proposal number p then for any message propose(p’,v’) where p′ > p then v = v′


### 0x03 WPaxos: Ruling the Archipelago with Fast Consensus

WPaxos的基本思路也是来自Flexible Paxos。WPaxos将FPaxos中Grid Quorum的概念拓展到跨数据中心的复制上面。在Wpaxos的Quorum中，一列可靠看着是一个分区or一个数据中心内的副本。利用这样的Qourm思路可以优化在使用Paxos跨数据中心达成一致的时候可以优化性能。相比FPaxos的另外而言，WPaxos的论文对算法有着更加详细的描述。

## Paxos Quorum Leases: Fast Reads Without Sacrificing Writes

### 0x10 引言

... Quorum leases are a compromise between the two previous schemes. Over 80% of reads at every site are performed locally. Over 70% of updates have the minimum latency achievable by a geo-distributed Multi-Paxos system, matching that provided by the single leader lease, and 2x to 3x faster (i.e. 100 to 200 milliseconds lower latency) than writes with the Megastore approach.


### 0x11 基本设计

Paxos Quorum Leases的设计可以应用到各种的Paxos的变体，Paper中讨论的是在Multi-Paxos中应用的情况。它在Paxos的消息之外，加入了Lease管理消息。为了优化性能，这些消息可以和Paxos通常的消息组合在一起发送。Lease管理主要包含了两种类型的消息，一种是关于租约配置的改变，即与那些副本属于一个Quorum以及那部分的对象与这个Quorum关联，另外一种就是刷新租约的操作。Lease的配置和授权的逻辑是分开的。

#### Lease配置

Paxos Quorum Leases中一个配置是增量地在有前面的配置变化而来。Lease配置的变化就可以通过Paxos本身来完成。如上面的图所示，一个配置必须超过半数的副本达成共识之后才算有效(这样才能保证对一个对象更新的时候能通知Lease失效or必须等到Lease过期之后才能更新)，而一个Lease的Quorum可以是任意的大小。至于具体怎么去修改这个配置，有几种不同的策略，

(a) the lease is granted to a simple majority of nodes;
(b) the number of locally-satisfied read operations is maximized;
(c) the total number of leases being managed is modest.


... The leader counts the number of forwarded read requests for each object-replica pair. If the number of reads from a given replica is larger than the number of reads from another replica previously included as a lease holder for the object, the leader will include the new replica as a lease holder in the next lease configuration update.


#### 恢复

... the grantor will contact the Multi-Paxos leader requesting a special lease configuration update that specifies that the replica suspected of failure should be excluded from all quorums it was part of. Replicas that switch to this new configuration no longer need to synchronously notify the possibly-failed replica of updates, and the system can safely resume using leases.


## 参考

1. Flexible Paxos: Quorum intersection revisited, arXiv.
2. WPaxos: Ruling the Archipelago with Fast Consensus, arXiv.
3. Paxos Quorum Leases: Fast Reads Without Sacrificing Writes, SoCC’14.