# Monkey -- Optimal Navigable Key-Value Store

## Monkey: Optimal Navigable Key-Value Store

### 0x00 引言

Monkey解决的一个问题就是如何寻找LSM-Tree中最佳的Bloom Filter的设置的问题。在LSM-Tree中，查找的时候会先使用Bloom Filter(or其它类似的数据结构)来过滤不必要的查询，减少在磁盘的数据上面查找的次数。在一般的基于LSM-Tree的系统上面，这个Bloom Filter的false positive rates在每一次层都是固定的，比如均为0.82%。在实际的系统，每一层的数据访问的量和访问的成本是不一样的，这里一个显然的思路就是将访问次数过多的、访问成本更加高层的Bloom Filter的false positive rates更加低，能获得更好的性能。下面的图就基本展示它的基本的原理，

### 0x02 基本思路

#### lookup cost & update cost

$$\\ O(T \cdot \log_{T}\frac{NE}{M_{buffer}} \cdot e^{-\frac{M_{filters}}{N}}), O(\log_{T}\frac{NE}{M_{buffer}} \cdot e^{-\frac{M_{filters}}{N}})$$ 同理，得到点更新的复杂度为(这里就可以看出来前者是为更新优化的)， $$\\ O(\frac{1}{B} \cdot \log_{T}\frac{NE}{M_{buffer}}), O(\frac{T}{B} \cdot \log_{T}\frac{NE}{M_{buffer}})$$

### 0x02.8 Design Space

• the merge policy (tiering vs. leveling),

• the size ratio T between levels,

• the allocation of main memory among the buffer $M_{buffer}$ and the Bloom filters,

• the allocation of $M_{filters}$ among each of the different Bloom filters,

• 不同得M_{filters}之间内存分配的冲突，

• M-buffer和和$M_{filters}$之间内存分配的冲突;

• size ratio和merge policy的各种的取舍:

This is complicated because workloads consist of different proportions of (1) updates, (2) zero-result lookups, (3) non-zero-result lookups, and (4) range lookups of different selectivities. Decreasing the size ratio under tiering and increasing the size ratio under leveling improves lookup cost and degrades update cost, but the impact and rate of change on the costs of different operation types is different.


## 参考

1. Monkey: Optimal Navigable Key-Value Store, SIGMOD 2017.