1.**Process crash with WAL disabled** (`WriteOptions::disableWAL=1`), which loses writes since the last memtable flush.
2.**System crash with WAL enabled** (`WriteOptions::disableWAL=0`), which loses writes since the last memtable flush or WAL sync (`WriteOptions::sync=1`, `SyncWAL()`, or `FlushWAL(true /* sync */)`).
3.**Process crash with manual WAL flush** (`DBOptions::manual_wal_flush=1`), which loses writes since the last memtable flush or manual WAL flush (`FlushWAL()`).
4.**System crash with manual WAL flush** (`DBOptions::manual_wal_flush=1`), which loses writes since the last memtable flush or synced manual WAL flush (`FlushWAL(true /* sync */)`, or `FlushWAL(false /* sync */)` followed by WAL sync).
* [False detection of corruption after system crash due to race condition with WAL sync and `track_and_verify_wals_in_manifest](https://github.com/facebook/rocksdb/pull/10185)
* [Undetected hole in recovery after system crash due to race condition in WAL sync](https://github.com/facebook/rocksdb/pull/10560)
* [Recovery failure after system crash due to missing directory sync for critical metadata file](https://github.com/facebook/rocksdb/pull/10573)
Our correctness testing framework consists of a stress test program (`db_stress`) and a wrapper script (`db_crashtest.py`).
`db_crashtest.py` manages instances of `db_stress`, starting them and injecting crashes.
`db_stress` operates a DB and test oracle ("Latest values file").
At startup, `db_stress` verifies the DB using the test oracle, skipping keys that had pending writes when the last crash happened.
`db_stress` then stresses the DB with random operations, keeping the test oracle up-to-date.
As the name "Latest values file" implies, this test oracle only tracks the latest value for each key.
As a result, this setup is unable to verify recoveries involving lost buffered writes, where recovering older values is tolerated as long as there is no hole.
### Extensions for lost buffered writes
To accommodate lost buffered writes, we extended the test oracle to include two new files: "`verifiedSeqno`.state" and "`verifiedSeqno`.trace".
`verifiedSeqno` is the sequence number of the last successful verification.
"`verifiedSeqno`.state" is the expected values file at that sequence number, and "`verifiedSeqno`.trace" is the trace file of all operations that happened after that sequence number.
When buffered writes may have been lost by the previous `db_stress` instance, the current `db_stress` instance must reconstruct the latest values file before startup verification.
M is the recovery sequence number of the current `db_stress` instance and N is the recovery sequence number of the previous `db_stress` instance.
M is learned from the DB, while N is learned from the filesystem by parsing the "*.{trace,state}" filenames.
Then, the latest values file ("LATEST.state") can be reconstructed by replaying the first M-N traced operations (in "N.trace") on top of the last instance's starting point ("N.state").
When buffered writes may be lost by the current `db_stress` instance, we save the current expected values into "M.state" and begin tracing newer operations in "M.trace".
### Simulating system crash
When simulating system crash, we send file writes to a `TestFSWritableFile`, which buffers unsynced writes in process memory.
That way, the existing `db_stress` process crash mechanism will lose unsynced writes.