# Write-Optimized and High-Performance Hashing Index Scheme for Persistent Memory

## Write-Optimized and High-Performance Hashing Index Scheme for Persistent Memory

### 0x00 引言

To cost-efficiently resize this hash table, level hashing leverages an in- place resizing scheme that only needs to rehash 1/3 of buckets instead of the entire table, thus significantly reducing the number of rehashed buckets and improving the resizing performance. Experimental results demon- strate that level hashing achieves 1.4×−3.0× speedup for insertions, 1.2×−2.1× speedup for updates, and over 4.3× speedup for resizing, while maintaining high search and deletion performance, compared with state-of-the-art hashing schemes.


### 0x01 基本思路

##### Sharing-based Two-level Structure

Bottom Level里面的一个bucket是被top level里面的2个bucket共用的，这里的思想和Path Hashing里面的是一样的。不够这里的Bottom Level是上次rehash之前的Top Level。

##### At Most One Movement for Each Successful Insertion

1. 利用两个hash函数计算出的2个位置，检测使用优空的slot；
2. 有，则选择；没有，则检查是否通过移动一个元素就能解决；
3. 能，就移动，然后选择空出来的位置；不能，则用同样的方法查看Bottom Level；
4. 都不能，进行resize操作；

Lt 1 = hash1 (K)%N, Lt 2 = hash2 (K)%N (1)
Lb 1 = hash1 (K)%(N/2), Lb 2 = hash2 (K)%(N/2) (2)


### 0x02 Cost-efficient In-place Resizing

Resize这里还有2个优化:

1. 为了解决2个Level数据发布不均导致的影响负载因子的问题，这里使用了一种叫做bottom-to-top movement (B2T) 的方法。具体方法是在插入操作的时候，如果所有的buckets都是满的，那么就尝试将Bottom Level两个候选的位置里面的项尝试移动到Top Level，只有在这些都不能移动的时候才resize。

By performing the B2T scheme, the items between top and bottom levels are redistributed, thus improving the maximum load factor. The red line in Figure 4 shows the load factors when the resizings occur via using the B2T scheme.

2. 在resize之后，由于现在的大部分数据在Bottom Level，而查找的时候总是后查找Bottom Level，这样就导致了性能的降低。这里解决方法是dynamic search scheme，通过Top Level和Bottom Level里面的项的数量来决定先查哪一个。

Thus after resizing, the items in the bottom level are more than those in the top level and hence we first probe the bottom level, thus improving the search performance.


To reduce the overhead of guaranteeing consistency in level hashing, we propose log-free consistency guarantee schemes for deletion, insertion, and resizing operations, and an opportunistic log-free guarantee scheme for update operation, by leveraging the tokens to be performed in the atomic-write manner.

##### Log-free Deletion

 since the item becomes valid until the token is set to ‘1’. If a system failure occurs during writing the item, this item may be partially written but invalid since the current token is ‘0’ and this slot is still available. Hence, the hash table is in a consistent state when system failures occur.

##### Log-free Insertion

 If a system failure occurs after changing the token of slot-alt before changing the token of slot-cur, the hash table contains two duplicate key-value items, which however does not impact on the data consistency. It is because when searching this key-value item, the returned value is always correct whichever one of the two items is queried.

##### Log-free Resizing

we first copy the key-value item of slotold into slotnew, and then modifies the token of slotnew from ‘0’ to ‘1’ and finally modifies the token of slotold from ‘1’ to ‘0’. The ordering of the three steps is ensured via MFENCEs.


##### Opportunistic Log-free Update

1. 一个bucket里面是否存在可使用的空slot，就使用类似先插入，然后同时更改标志位即可，因为同一个bucket里面的标志位在一起，可以一起修改；
2. 如果没有，就使用log。
When updating an existing key-value item, if the updated item has two copies in the hash table, we first delete one and then update the other.


## 参考

1. Write-Optimized and High-Performance Hashing Index Scheme for Persistent Memory, OSDI 2018.
2. A Write-Friendly and Cache-Optimized Hashing Scheme for Non-Volatile Memory Systems，TPDS 2018.