Fix interference between max_total_wal_size and db_write_buffer_size checks

Summary:
This is a trivial fix for OOMs we've seen a few days ago in logdevice.

RocksDB get into the following state:
(1) Write throughput is too high for flushes to keep up. Compactions are out of the picture - automatic compactions are disabled, and for manual compactions we don't care that much if they fall behind. We write to many CFs, with only a few L0 sst files in each, so compactions are not needed most of the time.
(2) total_log_size_ is consistently greater than GetMaxTotalWalSize(). It doesn't get smaller since flushes are falling ever further behind.
(3) Total size of memtables is way above db_write_buffer_size and keeps growing. But the write_buffer_manager_->ShouldFlush() is not checked because (2) prevents it (for no good reason, afaict; this is what this commit fixes).
(4) Every call to WriteImpl() hits the MaybeFlushColumnFamilies() path. This keeps flushing the memtables one by one in order of increasing log file number.
(5) No write stalling trigger is hit. We rely on max_write_buffer_number
Closes https://github.com/facebook/rocksdb/pull/1893

Differential Revision: D4593590

Pulled By: yiwu-arbug

fbshipit-source-id: af79c5f
main
Mike Kolupaev 7 years ago committed by Facebook Github Bot
parent 1560b2f5f0
commit 18eeb7b90e
  1. 3
      db/db_impl.cc

@ -4737,7 +4737,8 @@ Status DBImpl::WriteImpl(const WriteOptions& write_options,
if (UNLIKELY(!single_column_family_mode_ &&
total_log_size_ > GetMaxTotalWalSize())) {
MaybeFlushColumnFamilies();
} else if (UNLIKELY(write_buffer_manager_->ShouldFlush())) {
}
if (UNLIKELY(write_buffer_manager_->ShouldFlush())) {
// Before a new memtable is added in SwitchMemtable(),
// write_buffer_manager_->ShouldFlush() will keep returning true. If another
// thread is writing to another DB with the same write buffer, they may also

Loading…
Cancel
Save