## f4: Facebook’s Warm BLOB Storage System

F4使用设计和Haystack一起使用，来降低成本的对象存储系统，一个基本的出发点就是Haystack中保存的对象(比如图片)在刚刚一开始用户传上去的时候使用频繁，不久之后使用率就会大大降低。Haystack 3-备份的形式好处在于性能良好，缺点在与成本比较高。F4使用了Reed-Solomon编码在实现高可靠的同时将副本的数量降下来。

### Overview

The block-size for encoding is chosen to be a large value—typically 1 GB—for two reasons. First, it decreases the number of BLOBs that span multiple blocks and thus require multiple I/O operations to read. Second, it reduces the amount of per-block metadata that f4 needs to maintain.


### 基本结构

##### Storage Nodes

...Index API. It stores the index—BLOB to data file, offset, and length—file on disk and loads them into custom data structures in memory. It also loads the location-map for each volume that maps offsets in data files to the physically-stored data blocks. Index files and location maps are pinned in memory to avoid disk seeks.
...
The Data API provides data access to the data and parity blocks the node stores. Normal-case reads are redirected to the appropriate storage node (R2) that then reads the BLOB directly from its enclosing data block (R3). Failure-case reads use the Data API to read companion and parity blocks needed to reconstruct the BLOB on a backoff node.

##### Backoff Nodes

Each backoff node exposes a File API that receives reads from the router tier after a normal-case read fails (R4). The read request has already been mapped to a data file, offset, and length by a primary volume-server. The backoff volume-server sends reads of that length from the equivalent offsets from all n − 1 companion blocks and k parity blocks for the unavailable block (R5). Once it receives n responses it decodes them to reconstruct the requested BLOB.

##### Rebuilder Nodes

Rebuilder nodes are storage-less, CPU-heavy nodes that handle failure detec- tion and background reconstruction of data blocks. Each rebuilder node detects failure through probing and re- ports the failure to a coordinator node. It rebuilds blocks by fetching n companion or parity blocks from the failed block’s strip and decoding them.

##### Coordinator Nodes

A cell requires many maintenance task, such as scheduling block rebuilding and ensuring that the current data layout minimizes the chances of data unavailability. Coordinator nodes are storage-less, CPU-heavy nodes that handle these cell-wide tasks.


### Fault Tolerance

F4 单个cell可以磁盘，主机，机柜基本的错误，其次通过跨机房复制来容忍整个数据中心故障。 此外，F4对于block的分布是设计原则是尽可能的将相关数据的不同的block发布在不同的机柜里面。

Our current implementation initially lays out blocks making a best-effort to put each on a different rack. The placement balancer process detects and corrects any rare violations that place a stripe’s blocks on the same rack.


## 参考

1. f4: Facebook’s Warm BLOB Storage System, OSDI 2014.
2. Finding a needle in Haystack: Facebook’s photo storage, OSDI 2010.