CompactRange() refit level should confirm destination level is not empty (#7261)

Summary:
There is potential data race related CompactRange() with level refitting. After the compaction step and refitting step, some automatic compaction could put data to the destination level and cause the DB to be corrupted. Fix the bug by checking the target level to be empty.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7261

Test Plan: Add a unit test, which would fail with "Corruption: L1 have overlapping ranges '666F6F' seq:6, type:1 vs. '626172' seq:2, type:1", and now it succeeds.

Reviewed By: ajkr

Differential Revision: D23142269

fbshipit-source-id: 28bc14d5ac934c192260b23a4ce3f10a95e3ee91
main
sdong 4 years ago committed by Facebook GitHub Bot
parent 500eeb6fd3
commit 1760637539
  1. 1
      HISTORY.md
  2. 77
      db/db_compaction_test.cc
  3. 34
      db/db_impl/db_impl_compaction_flush.cc

@ -3,6 +3,7 @@
### Bug fixes ### Bug fixes
* Fix a performance regression introduced in 6.4 that makes a upper bound check for every Next() even if keys are within a data block that is within the upper bound. * Fix a performance regression introduced in 6.4 that makes a upper bound check for every Next() even if keys are within a data block that is within the upper bound.
* Fix a possible corruption to the LSM state (overlapping files within a level) when a `CompactRange()` for refitting levels (`CompactRangeOptions::change_level == true`) and another manual compaction are executed in parallel. * Fix a possible corruption to the LSM state (overlapping files within a level) when a `CompactRange()` for refitting levels (`CompactRangeOptions::change_level == true`) and another manual compaction are executed in parallel.
* Fix a bug where a level refitting in CompactRange() might race with an automatic compaction that puts the data to the target level of the refitting. The bug has been there for years.
### New Features ### New Features
* A new option `std::shared_ptr<FileChecksumGenFactory> file_checksum_gen_factory` is added to `BackupableDBOptions`. The default value for this option is `nullptr`. If this option is null, the default backup engine checksum function (crc32c) will be used for creating, verifying, or restoring backups. If it is not null and is set to the DB custom checksum factory, the custom checksum function used in DB will also be used for creating, verifying, or restoring backups, in addition to the default checksum function (crc32c). If it is not null and is set to a custom checksum factory different than the DB custom checksum factory (which may be null), BackupEngine will return `Status::InvalidArgument()`. * A new option `std::shared_ptr<FileChecksumGenFactory> file_checksum_gen_factory` is added to `BackupableDBOptions`. The default value for this option is `nullptr`. If this option is null, the default backup engine checksum function (crc32c) will be used for creating, verifying, or restoring backups. If it is not null and is set to the DB custom checksum factory, the custom checksum function used in DB will also be used for creating, verifying, or restoring backups, in addition to the default checksum function (crc32c). If it is not null and is set to a custom checksum factory different than the DB custom checksum factory (which may be null), BackupEngine will return `Status::InvalidArgument()`.

@ -52,6 +52,16 @@ class DBCompactionDirectIOTest : public DBCompactionTest,
DBCompactionDirectIOTest() : DBCompactionTest() {} DBCompactionDirectIOTest() : DBCompactionTest() {}
}; };
// Param = true : target level is non-empty
// Param = false: level between target level and source level
// is not empty.
class ChangeLevelConflictsWithAuto
: public DBCompactionTest,
public ::testing::WithParamInterface<bool> {
public:
ChangeLevelConflictsWithAuto() : DBCompactionTest() {}
};
namespace { namespace {
class FlushedFileCollector : public EventListener { class FlushedFileCollector : public EventListener {
@ -5442,6 +5452,73 @@ TEST_F(DBCompactionTest, UpdateUniversalSubCompactionTest) {
ASSERT_TRUE(has_compaction); ASSERT_TRUE(has_compaction);
} }
TEST_P(ChangeLevelConflictsWithAuto, TestConflict) {
// A `CompactRange()` may race with an automatic compaction, we'll need
// to make sure it doesn't corrupte the data.
Options options = CurrentOptions();
options.level0_file_num_compaction_trigger = 2;
Reopen(options);
ASSERT_OK(Put("foo", "v1"));
ASSERT_OK(Put("bar", "v1"));
ASSERT_OK(Flush());
ASSERT_OK(dbfull()->TEST_WaitForFlushMemTable());
{
CompactRangeOptions cro;
cro.change_level = true;
cro.target_level = 2;
ASSERT_OK(dbfull()->CompactRange(cro, nullptr, nullptr));
}
ASSERT_EQ("0,0,1", FilesPerLevel(0));
// Run a qury to refitting to level 1 while another thread writing to
// the same level.
SyncPoint::GetInstance()->LoadDependency({
// The first two dependencies ensure the foreground creates an L0 file
// between the background compaction's L0->L1 and its L1->L2.
{
"DBImpl::CompactRange:BeforeRefit:1",
"AutoCompactionFinished1",
},
{
"AutoCompactionFinished2",
"DBImpl::CompactRange:BeforeRefit:2",
},
});
SyncPoint::GetInstance()->EnableProcessing();
std::thread auto_comp([&] {
TEST_SYNC_POINT("AutoCompactionFinished1");
ASSERT_OK(Put("bar", "v2"));
ASSERT_OK(Put("foo", "v2"));
ASSERT_OK(Flush());
ASSERT_OK(Put("bar", "v3"));
ASSERT_OK(Put("foo", "v3"));
ASSERT_OK(Flush());
dbfull()->TEST_WaitForCompact();
TEST_SYNC_POINT("AutoCompactionFinished2");
});
{
CompactRangeOptions cro;
cro.change_level = true;
cro.target_level = GetParam() ? 1 : 0;
// This should return non-OK, but it's more important for the test to
// make sure that the DB is not corrupted.
dbfull()->CompactRange(cro, nullptr, nullptr);
}
auto_comp.join();
// Refitting didn't happen.
SyncPoint::GetInstance()->DisableProcessing();
// Write something to DB just make sure that consistency check didn't
// fail and make the DB readable.
}
INSTANTIATE_TEST_CASE_P(ChangeLevelConflictsWithAuto,
ChangeLevelConflictsWithAuto, testing::Bool());
TEST_F(DBCompactionTest, ChangeLevelCompactRangeConflictsWithManual) { TEST_F(DBCompactionTest, ChangeLevelCompactRangeConflictsWithManual) {
// A `CompactRange()` with `change_level == true` needs to execute its final // A `CompactRange()` with `change_level == true` needs to execute its final
// step, `ReFitLevel()`, in isolation. Previously there was a bug where // step, `ReFitLevel()`, in isolation. Previously there was a bug where

@ -845,6 +845,9 @@ Status DBImpl::CompactRange(const CompactRangeOptions& options,
} }
if (options.change_level) { if (options.change_level) {
TEST_SYNC_POINT("DBImpl::CompactRange:BeforeRefit:1");
TEST_SYNC_POINT("DBImpl::CompactRange:BeforeRefit:2");
ROCKS_LOG_INFO(immutable_db_options_.info_log, ROCKS_LOG_INFO(immutable_db_options_.info_log,
"[RefitLevel] waiting for background threads to stop"); "[RefitLevel] waiting for background threads to stop");
DisableManualCompaction(); DisableManualCompaction();
@ -1280,20 +1283,29 @@ Status DBImpl::ReFitLevel(ColumnFamilyData* cfd, int level, int target_level) {
} }
auto* vstorage = cfd->current()->storage_info(); auto* vstorage = cfd->current()->storage_info();
if (to_level > level) { if (to_level != level) {
if (level == 0) { if (to_level > level) {
return Status::NotSupported( if (level == 0) {
"Cannot change from level 0 to other levels.");
}
// Check levels are empty for a trivial move
for (int l = level + 1; l <= to_level; l++) {
if (vstorage->NumLevelFiles(l) > 0) {
return Status::NotSupported( return Status::NotSupported(
"Levels between source and target are not empty for a move."); "Cannot change from level 0 to other levels.");
}
// Check levels are empty for a trivial move
for (int l = level + 1; l <= to_level; l++) {
if (vstorage->NumLevelFiles(l) > 0) {
return Status::NotSupported(
"Levels between source and target are not empty for a move.");
}
}
} else {
// to_level < level
// Check levels are empty for a trivial move
for (int l = to_level; l < level; l++) {
if (vstorage->NumLevelFiles(l) > 0) {
return Status::NotSupported(
"Levels between source and target are not empty for a move.");
}
} }
} }
}
if (to_level != level) {
ROCKS_LOG_DEBUG(immutable_db_options_.info_log, ROCKS_LOG_DEBUG(immutable_db_options_.info_log,
"[%s] Before refitting:\n%s", cfd->GetName().c_str(), "[%s] Before refitting:\n%s", cfd->GetName().c_str(),
cfd->current()->DebugString().data()); cfd->current()->DebugString().data());

Loading…
Cancel
Save