From 61152544166ecf9fd50f90ad190561e5b11ccf55 Mon Sep 17 00:00:00 2001 From: sdong Date: Wed, 29 Jun 2022 11:20:36 -0700 Subject: [PATCH] Fix A Bug Where Concurrent Compactions Cause Further Slowing Down (#10270) Summary: Currently, when installing a new super version, when stalling condition triggers, we compare estimated compaction bytes to previously, and if the new value is larger or equal to the previous one, we reduce the slowdown write rate. However, if concurrent compactions happen, the same value might be used. The result is that, although some compactions reduce estimated compaction bytes, we treat them as a signal for further slowing down. In some cases, it causes slowdown rate drops all the way to the minimum, far lower than needed. Fix the bug by not triggering a re-calculation if a new super version doesn't have Version or a memtable change. With this fix, number of compaction finishes are still undercounted in this algorithm, but it is still better than the current bug where they are negatively counted. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10270 Test Plan: Run a benchmark where the slowdown rate is dropped to minimal unnessarily and see it is back to a normal value. Reviewed By: ajkr Differential Revision: D37497327 fbshipit-source-id: 9bca961cc38fed965c3af0fa6c9ca0efaa7637c4 --- HISTORY.md | 1 + db/column_family.cc | 15 ++++++++++++--- 2 files changed, 13 insertions(+), 3 deletions(-) diff --git a/HISTORY.md b/HISTORY.md index f49b38985..63a9c2095 100644 --- a/HISTORY.md +++ b/HISTORY.md @@ -14,6 +14,7 @@ ### Bug Fixes * Fix a bug in which backup/checkpoint can include a WAL deleted by RocksDB. +* Fix a bug where concurrent compactions might cause unnecessary further write stalling. In some cases, this might cause write rate to drop to minimum. ## 7.4.0 (06/19/2022) ### Bug Fixes diff --git a/db/column_family.cc b/db/column_family.cc index 952c04def..2c93eed2c 100644 --- a/db/column_family.cc +++ b/db/column_family.cc @@ -1316,9 +1316,18 @@ void ColumnFamilyData::InstallSuperVersion( super_version_ = new_superversion; ++super_version_number_; super_version_->version_number = super_version_number_; - super_version_->write_stall_condition = - RecalculateWriteStallConditions(mutable_cf_options); - + if (old_superversion == nullptr || old_superversion->current != current() || + old_superversion->mem != mem_ || + old_superversion->imm != imm_.current()) { + // Should not recalculate slow down condition if nothing has changed, since + // currently RecalculateWriteStallConditions() treats it as further slowing + // down is needed. + super_version_->write_stall_condition = + RecalculateWriteStallConditions(mutable_cf_options); + } else { + super_version_->write_stall_condition = + old_superversion->write_stall_condition; + } if (old_superversion != nullptr) { // Reset SuperVersions cached in thread local storage. // This should be done before old_superversion->Unref(). That's to ensure