# Flash-based SSDs

## Flash-based SSDs

### 0x00 引言

SSD在个人电脑是逐渐要普及了。在一个很多公司，数据库的机器也是被特殊对待，一般会配备性能杠杠的SSD。这篇主要参考了[1]，这本书是可以免费获取的。

### 0x01 基本结构和操作

SSD中分为2级的结构，blocks常见的大小是128KB 256KB，另外一个是Page，最常见的大小是4KB。SSD中更加基本的结构是transistor ，一个transistor 可以存储1 2 3 4bits(目前最多一般只有4个)的信息。一个transistor 里面bit越多，性能越差，寿命越短(相对而言，在同样的技术条件下)。

#### 基本操作

Flash的基本操作有3个:

• Erase (a block): 很不幸的是，这个是Flash-based SSD的最大最麻烦的一个问题，也极大地影响了SSD和其控制器的设计。在一个page被program之前，SSD必须擦除page所在的整个块。这个是由于SSD存储的原理决定的。
• Program (a page): 将数据写入到一个已经擦除的page中。

### 0x02 FTL

The typical lifetime of a block is currently not well known. Manufac- turers rate MLC-based blocks as having a 10,000 P/E (Program/Erase) cycle lifetime; that is, each block can be erased and programmed 10,000 times before failing. SLC-based chips, because they store only a single bit per transistor, are rated with a longer lifetime, usually 100,000 P/E cycles. However, recent research has shown that lifetimes are much longer than expected.


### 0x03 A Log-Structured FTL

​ 一个常用的方法就是Log-Structured的 FTL，思路和Log-Structured File System相似(都是这里要解决的问题不同)。基本操作都是写入下一个空闲的page，然后更新mapping table。这里就可以想象，SSD的主控就是一个功能专用的计算机。Mapping Table是保存在内存里面的(SSD的内存)，那么这里就遇到了一个和很多存储系统相同的问题：掉电了怎么办？解决办法当然也和很村存储系统一样，使用logging加上checkpoint机制。

#### Mapping Table Size

The key to the hybrid mapping strategy is keeping the number of log blocks small. To keep the number of log blocks small, the FTL has to periodically examine log blocks (which have a pointer per page) and switch them into blocks that can be pointed to by only a single block pointer. This switch is accomplished by one of three main techniques, based on the contents of the block


### 0x04 Garbage Collection

​ Log-Structured是FTL需要解决的一个问题。对于SSD来说，回收垃圾，然后重新安排存活的page，将空闲的空间放在一块有利于提高性能。这里的基本思路也是和 Log-Structured File System相似，都是读取存活的数据，然后写到另外一个地方，同时将垃圾回收。为此，SSD一般预留了一些额外的空间，或者出现了一些240GB的容量之类的。SSD的垃圾回收也是很影响性能的。

To reduce GC costs, some SSDs overprovision the device; by adding extra flash capacity, cleaning can be delayed and pushed to the background, perhaps done at a time when the device is less busy.


### 0x05 Wear Leveling

​ 前面提到，SSD的block擦除的次数是有限了，一个block坏了就很麻烦。最后的方式就是能将写入分摊到各个block上，这里又要考虑到很多的东西。想详细理解可以参考相关论文。

To remedy this problem, the FTL must periodically read all the live data out of such blocks and re-write it elsewhere, thus making the block available for writing again. This process of wear leveling increases the write amplification of the SSD, and thus decreases performance as extra I/O is required to ensure that all blocks wear at roughly the same rate.


## 参考

1. “Operating Systems: Three Easy Pieces“ (Chapter: Flash-based SSDs) by Remzi Arpaci-Dusseau and Andrea Arpaci-Dusseau. Arpaci-Dusseau Books, 2014.
2. LightNVM: The Linux Open-Channel SSD Subsystem, FAST 2017