From 18eeb7b90e45af4bbac0777021711d8547f41eca Mon Sep 17 00:00:00 2001 From: Mike Kolupaev Date: Tue, 21 Feb 2017 15:53:59 -0800 Subject: [PATCH] Fix interference between max_total_wal_size and db_write_buffer_size checks Summary: This is a trivial fix for OOMs we've seen a few days ago in logdevice. RocksDB get into the following state: (1) Write throughput is too high for flushes to keep up. Compactions are out of the picture - automatic compactions are disabled, and for manual compactions we don't care that much if they fall behind. We write to many CFs, with only a few L0 sst files in each, so compactions are not needed most of the time. (2) total_log_size_ is consistently greater than GetMaxTotalWalSize(). It doesn't get smaller since flushes are falling ever further behind. (3) Total size of memtables is way above db_write_buffer_size and keeps growing. But the write_buffer_manager_->ShouldFlush() is not checked because (2) prevents it (for no good reason, afaict; this is what this commit fixes). (4) Every call to WriteImpl() hits the MaybeFlushColumnFamilies() path. This keeps flushing the memtables one by one in order of increasing log file number. (5) No write stalling trigger is hit. We rely on max_write_buffer_number Closes https://github.com/facebook/rocksdb/pull/1893 Differential Revision: D4593590 Pulled By: yiwu-arbug fbshipit-source-id: af79c5f --- db/db_impl.cc | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/db/db_impl.cc b/db/db_impl.cc index 84f2831bf..c25a20ed9 100644 --- a/db/db_impl.cc +++ b/db/db_impl.cc @@ -4737,7 +4737,8 @@ Status DBImpl::WriteImpl(const WriteOptions& write_options, if (UNLIKELY(!single_column_family_mode_ && total_log_size_ > GetMaxTotalWalSize())) { MaybeFlushColumnFamilies(); - } else if (UNLIKELY(write_buffer_manager_->ShouldFlush())) { + } + if (UNLIKELY(write_buffer_manager_->ShouldFlush())) { // Before a new memtable is added in SwitchMemtable(), // write_buffer_manager_->ShouldFlush() will keep returning true. If another // thread is writing to another DB with the same write buffer, they may also