Deflake DBErrorHandlingFSTest.*WALWriteError (#10642)

Summary:
Example flake: https://app.circleci.com/pipelines/github/facebook/rocksdb/17660/workflows/7a891875-f07b-4a67-b204-eaa7ca9f9aa2/jobs/467496

The test could get stuck in out-of-space due to a callback executing `SetFilesystemActive(false /* active */)` after the test executed `SetFilesystemActive(true /* active */)`. This could happen because background info logging went through the SyncPoint callback "WritableFileWriter::Append:BeforePrepareWrite", probably unintentionally. The solution of this PR is to call `ClearAllCallBacks()` to wait for any such pending callbacks to drain before calling `SetFilesystemActive(true /* active */)`

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10642

Reviewed By: cbi42

Differential Revision: D39265381

Pulled By: ajkr

fbshipit-source-id: 9a2f4916ab19726c8fb4b3a3b590b1b9ed93de1b
main
Andrew Kryczka 2 years ago committed by Facebook GitHub Bot
parent 76de3c85cc
commit 1ffadbe9fc
  1. 8
      db/error_handler_fs_test.cc

@ -1308,6 +1308,10 @@ TEST_F(DBErrorHandlingFSTest, WALWriteError) {
ASSERT_EQ(s, s.NoSpace()); ASSERT_EQ(s, s.NoSpace());
} }
SyncPoint::GetInstance()->DisableProcessing(); SyncPoint::GetInstance()->DisableProcessing();
// `ClearAllCallBacks()` is needed in addition to `DisableProcessing()` to
// drain all callbacks. Otherwise, a pending callback in the background
// could re-disable `fault_fs_` after we enable it below.
SyncPoint::GetInstance()->ClearAllCallBacks();
fault_fs_->SetFilesystemActive(true); fault_fs_->SetFilesystemActive(true);
ASSERT_EQ(listener->WaitForRecovery(5000000), true); ASSERT_EQ(listener->WaitForRecovery(5000000), true);
for (auto i = 0; i < 199; ++i) { for (auto i = 0; i < 199; ++i) {
@ -1474,6 +1478,10 @@ TEST_F(DBErrorHandlingFSTest, MultiCFWALWriteError) {
ASSERT_TRUE(s.IsNoSpace()); ASSERT_TRUE(s.IsNoSpace());
} }
SyncPoint::GetInstance()->DisableProcessing(); SyncPoint::GetInstance()->DisableProcessing();
// `ClearAllCallBacks()` is needed in addition to `DisableProcessing()` to
// drain all callbacks. Otherwise, a pending callback in the background
// could re-disable `fault_fs_` after we enable it below.
SyncPoint::GetInstance()->ClearAllCallBacks();
fault_fs_->SetFilesystemActive(true); fault_fs_->SetFilesystemActive(true);
ASSERT_EQ(listener->WaitForRecovery(5000000), true); ASSERT_EQ(listener->WaitForRecovery(5000000), true);

Loading…
Cancel
Save