Fix ConcurrentTaskLimiter token release for shutdown (#8253)

Summary:
Previously the shutdown process did not properly wait for all
`compaction_thread_limiter` tokens to be released before proceeding to
delete the DB's C++ objects. When this happened, we saw tests like
"DBCompactionTest.CompactionLimiter" flake with the following error:

```
virtual
rocksdb::ConcurrentTaskLimiterImpl::~ConcurrentTaskLimiterImpl():
Assertion `outstanding_tasks_ == 0' failed.
```

There is a case where a token can still be alive even after the shutdown
process has waited for BG work to complete. In particular, this happens
because the shutdown process only waits for flush/compaction scheduled/unscheduled counters to all
reach zero. These counters are decremented in `BackgroundCallCompaction()`
functions. However, tokens are released in `BGWork*Compaction()` functions, which
actually wrap the `BackgroundCallCompaction()` function.

A simple sleep could repro the race condition:

```
$ diff --git a/db/db_impl/db_impl_compaction_flush.cc
b/db/db_impl/db_impl_compaction_flush.cc
index 806bc548a..ba59efa89 100644
 --- a/db/db_impl/db_impl_compaction_flush.cc
+++ b/db/db_impl/db_impl_compaction_flush.cc
@@ -2442,6 +2442,7 @@ void DBImpl::BGWorkCompaction(void* arg) {
       static_cast<PrepickedCompaction*>(ca.prepicked_compaction);
   static_cast_with_check<DBImpl>(ca.db)->BackgroundCallCompaction(
       prepicked_compaction, Env::Priority::LOW);
+  sleep(1);
   delete prepicked_compaction;
 }

$ ./db_compaction_test --gtest_filter=DBCompactionTest.CompactionLimiter
db_compaction_test: util/concurrent_task_limiter_impl.cc:24: virtual rocksdb::ConcurrentTaskLimiterImpl::~ConcurrentTaskLimiterImpl(): Assertion `outstanding_tasks_ == 0' failed.
Received signal 6 (Aborted)
#0   /usr/local/fbcode/platform007/lib/libc.so.6(gsignal+0xcf) [0x7f02673c30ff] ??      ??:0
https://github.com/facebook/rocksdb/issues/1   /usr/local/fbcode/platform007/lib/libc.so.6(abort+0x134) [0x7f02673ac934] ??       ??:0
...
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8253

Test Plan: sleeps to expose race conditions

Reviewed By: akankshamahajan15

Differential Revision: D28168064

Pulled By: ajkr

fbshipit-source-id: 9e5167c74398d323e7975980c5cc00f450631160
main
Andrew Kryczka 3 years ago committed by Facebook GitHub Bot
parent c2a3424de5
commit c70bae1b05
  1. 8
      db/db_impl/db_impl_compaction_flush.cc
  2. 9
      util/concurrent_task_limiter_impl.cc
  3. 7
      util/concurrent_task_limiter_impl.h

@ -2759,6 +2759,14 @@ void DBImpl::BackgroundCallCompaction(PrepickedCompaction* prepicked_compaction,
// See if there's more work to be done
MaybeScheduleFlushOrCompaction();
if (prepicked_compaction != nullptr &&
prepicked_compaction->task_token != nullptr) {
// Releasing task tokens affects the DB state, so must be done before we
// potentially signal the DB close process to proceed below.
prepicked_compaction->task_token->ReleaseOnce();
}
if (made_progress ||
(bg_compaction_scheduled_ == 0 &&
bg_bottom_compaction_scheduled_ == 0) ||

@ -59,9 +59,14 @@ ConcurrentTaskLimiter* NewConcurrentTaskLimiter(
return new ConcurrentTaskLimiterImpl(name, limit);
}
TaskLimiterToken::~TaskLimiterToken() {
--limiter_->outstanding_tasks_;
void TaskLimiterToken::ReleaseOnce() {
if (!released_) {
--limiter_->outstanding_tasks_;
released_ = true;
}
assert(limiter_->outstanding_tasks_ >= 0);
}
TaskLimiterToken::~TaskLimiterToken() { ReleaseOnce(); }
} // namespace ROCKSDB_NAMESPACE

@ -53,11 +53,16 @@ class ConcurrentTaskLimiterImpl : public ConcurrentTaskLimiter {
class TaskLimiterToken {
public:
explicit TaskLimiterToken(ConcurrentTaskLimiterImpl* limiter)
: limiter_(limiter) {}
: limiter_(limiter), released_(false) {}
~TaskLimiterToken();
// Releases the token from the `ConcurrentTaskLimiterImpl` if not already
// released.
// Not thread-safe.
void ReleaseOnce();
private:
ConcurrentTaskLimiterImpl* limiter_;
bool released_;
// no copying allowed
TaskLimiterToken(const TaskLimiterToken&) = delete;

Loading…
Cancel
Save