Sanitize min_write_buffer_number_to_merge to 1 with atomic_flush (#10773)

Summary:
With current implementation, within the same RocksDB instance, all column families with non-empty memtables will be scheduled for flush if RocksDB determines that any column family needs to be flushed, e.g. memtable full, write buffer manager, etc., if atomic flush is enabled. Not doing so can lead to data loss and inconsistency when WAL is disabled, which is a common setting when atomic flush is enabled. Therefore, setting a per-column-family knob, min_write_buffer_number_to_merge to a value greater than 1 is not compatible with atomic flush, and should be sanitized during column family creation and db open.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10773

Test Plan:
Reproduce: D39993203 has detailed steps.
Run the test with and without the fix.

Reviewed By: cbi42

Differential Revision: D40077955

Pulled By: cbi42

fbshipit-source-id: 451a9179eb531ac42eaccf40b451b9dec4085240
main
Yanqin Jin 2 years ago committed by Facebook GitHub Bot
parent eca47fb696
commit 4d82b94896
  1. 3
      HISTORY.md
  2. 15
      db/column_family.cc
  3. 5
      include/rocksdb/advanced_options.h

@ -19,6 +19,9 @@
### New Features
* Add a new option IOOptions.do_not_recurse that can be used by underlying file systems to skip recursing through sub directories and list only files in GetChildren API.
### Behavior Changes
* Sanitize min_write_buffer_number_to_merge to 1 if atomic flush is enabled to prevent unexpected data loss when WAL is disabled in a multi-column-family setting (#10773).
## 7.7.0 (09/18/2022)
### Bug Fixes
* Fixed a hang when an operation such as `GetLiveFiles` or `CreateNewBackup` is asked to trigger and wait for memtable flush on a read-only DB. Such indirect requests for memtable flush are now ignored on a read-only DB.

@ -233,6 +233,21 @@ ColumnFamilyOptions SanitizeOptions(const ImmutableDBOptions& db_options,
result.min_write_buffer_number_to_merge = 1;
}
if (db_options.atomic_flush && result.min_write_buffer_number_to_merge > 1) {
ROCKS_LOG_WARN(
db_options.logger,
"Currently, if atomic_flush is true, then triggering flush for any "
"column family internally (non-manual flush) will trigger flushing "
"all column families even if the number of memtables is smaller "
"min_write_buffer_number_to_merge. Therefore, configuring "
"min_write_buffer_number_to_merge > 1 is not compatible and should "
"be satinized to 1. Not doing so will lead to data loss and "
"inconsistent state across multiple column families when WAL is "
"disabled, which is a common setting for atomic flush");
result.min_write_buffer_number_to_merge = 1;
}
if (result.num_levels < 1) {
result.num_levels = 1;
}

@ -271,7 +271,10 @@ struct AdvancedColumnFamilyOptions {
// read amplification because a get request has to check in all of these
// files. Also, an in-memory merge may result in writing lesser
// data to storage if there are duplicate records in each of these
// individual write buffers. Default: 1
// individual write buffers.
// If atomic flush is enabled (options.atomic_flush == true), then this
// option will be sanitized to 1.
// Default: 1
int min_write_buffer_number_to_merge = 1;
// DEPRECATED

Loading…
Cancel
Save