Option migration tool to break down files for FIFO compaction (#10600)

Summary: Right now, when the option migration tool migrates to FIFO compaction, it compacts all the data into one single SST file and move to L0. Although it creates a valid LSM-tree for FIFO, for any data to be deleted for FIFO, the giant file will be deleted, which might make the DB almost empty. There is not good solution for it, because usually we don't have enough information to reconstruct the FIFO LSM-tree. This change changes to a solution that compromises the FIFO condition. We hope the solution is more useable. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10600 Test Plan: Add unit tests for that. Reviewed By: jay-zhuang Differential Revision: D39106424 fbshipit-source-id: bdfd852c3b343373765b8d9716fefc08fd27145c
3 years ago · 9509003503
parent 228f2c5bf5
commit 9509003503
4 changed files with 147 additions and 24 deletions
--- a/HISTORY.md
+++ b/HISTORY.md
@ -15,6 +15,9 @@
 * Add CompactionPriority.RoundRobin.
 * Revert to using the default metadata charge policy when creating an LRU cache via the Java API.
 ### Behavior Change
 * Right now, when the option migration tool (OptionChangeMigration()) migrates to FIFO compaction, it compacts all the data into one single SST file and move to L0. This might create a problem for some users: the giant file may be soon deleted to satisfy max_table_files_size, and might cayse the DB to be almost empty. We change the behavior so that the files are cut to be smaller, but these files might not follow the data insertion order. With the change, after the migration, migrated data might not be dropped by insertion order by FIFO compaction.
 ## 7.6.0 (08/19/2022)
 ### New Features
 * Added `prepopulate_blob_cache` to ColumnFamilyOptions. If enabled, prepopulate warm/hot blobs which are already in memory into blob cache at the time of flush. On a flush, the blob that is in memory (in memtables) get flushed to the device. If using Direct IO, additional IO is incurred to read this blob back into memory again, which is avoided by enabling this option. This further helps if the workload exhibits high temporal locality, where most of the reads go to recently written data. This also helps in case of the remote file system since it involves network traffic and higher latencies.
--- a/include/rocksdb/utilities/option_change_migration.h
+++ b/include/rocksdb/utilities/option_change_migration.h
@ -14,6 +14,10 @@ namespace ROCKSDB_NAMESPACE {
 // Multiple column families is not supported.
 // It is best-effort. No guarantee to succeed.
 // A full compaction may be executed.
 // If the target options use FIFO compaction, the FIFO condition might be
 // sacrificed: for data migrated, data inserted later might be dropped
 // earlier. This is to gurantee FIFO compaction won't drop all the
 // migrated data to fit max_table_files_size.
 Status OptionChangeMigration(std::string dbname, const Options& old_opts,
                             const Options& new_opts);
 }  // namespace ROCKSDB_NAMESPACE
--- a/utilities/option_change_migration/option_change_migration.cc
+++ b/utilities/option_change_migration/option_change_migration.cc
@ -35,18 +35,26 @@ Status OpenDb(const Options& options, const std::string& dbname,
  return s;
 }
 // l0_file_size specifies size of file on L0. Files will be range partitioned
 // after a full compaction so they are likely qualified to put on L0. If
 // left as 0, the files are compacted in a single file and put to L0. Otherwise,
 // will try to compact the files as size l0_file_size.
 Status CompactToLevel(const Options& options, const std::string& dbname,
-                      int dest_level, bool need_reopen) {
+                      int dest_level, uint64_t l0_file_size, bool need_reopen) {
  std::unique_ptr<DB> db;
  Options no_compact_opts = GetNoCompactionOptions(options);
  if (dest_level == 0) {
    if (l0_file_size == 0) {
      // Single file.
      l0_file_size = 999999999999999;
    }
    // L0 has strict sequenceID requirements to files to it. It's safer
    // to only put one compacted file to there.
    // This is only used for converting to universal compaction with
    // only one level. In this case, compacting to one file is also
    // optimal.
-    no_compact_opts.target_file_size_base = 999999999999999;
+    no_compact_opts.target_file_size_base = l0_file_size;
-    no_compact_opts.max_compaction_bytes = 999999999999999;
+    no_compact_opts.max_compaction_bytes = l0_file_size;
  }
  Status s = OpenDb(no_compact_opts, dbname, &db);
  if (!s.ok()) {
@ -98,7 +106,8 @@ Status MigrateToUniversal(std::string dbname, const Options& old_opts,
      }
    }
    if (need_compact) {
-      return CompactToLevel(old_opts, dbname, new_opts.num_levels - 1, true);
+      return CompactToLevel(old_opts, dbname, new_opts.num_levels - 1,
                            /*l0_file_size=*/0, true);
    }
    return Status::OK();
  }
@ -119,7 +128,7 @@ Status MigrateToLevelBase(std::string dbname, const Options& old_opts,
    // multiplier from 4 to 8, with the same data, we will have fewer
    // levels. Unless we issue a full comaction, the LSM tree may stuck
    // with more levels than needed and it won't recover automatically.
-    return CompactToLevel(opts, dbname, 1, true);
+    return CompactToLevel(opts, dbname, 1, /*l0_file_size=*/0, true);
  } else {
    // Compact everything to the last level to guarantee it can be safely
    // opened.
@ -127,11 +136,13 @@ Status MigrateToLevelBase(std::string dbname, const Options& old_opts,
      return Status::OK();
    } else if (new_opts.num_levels > old_opts.num_levels) {
      // Dynamic level mode requires data to be put in the last level first.
-      return CompactToLevel(new_opts, dbname, new_opts.num_levels - 1, false);
+      return CompactToLevel(new_opts, dbname, new_opts.num_levels - 1,
                            /*l0_file_size=*/0, false);
    } else {
      Options opts = old_opts;
      opts.target_file_size_base = new_opts.target_file_size_base;
-      return CompactToLevel(opts, dbname, new_opts.num_levels - 1, true);
+      return CompactToLevel(opts, dbname, new_opts.num_levels - 1,
                            /*l0_file_size=*/0, true);
    }
  }
 }
@ -150,7 +161,14 @@ Status OptionChangeMigration(std::string dbname, const Options& old_opts,
    return MigrateToLevelBase(dbname, old_opts, new_opts);
  } else if (new_opts.compaction_style ==
             CompactionStyle::kCompactionStyleFIFO) {
-    return CompactToLevel(old_opts, dbname, 0, true);
+    uint64_t l0_file_size = 0;
    if (new_opts.compaction_options_fifo.max_table_files_size > 0) {
      // Create at least 8 files when max_table_files_size hits, so that the DB
      // doesn't just disappear. This in fact violates the FIFO condition, but
      // otherwise, the migrated DB is unlikley to be usable.
      l0_file_size = new_opts.compaction_options_fifo.max_table_files_size / 8;
    }
    return CompactToLevel(old_opts, dbname, 0, l0_file_size, true);
  } else {
    return Status::NotSupported(
        "Do not how to migrate to this compaction style");
--- a/utilities/option_change_migration/option_change_migration_test.cc
+++ b/utilities/option_change_migration/option_change_migration_test.cc
@ -20,7 +20,7 @@ namespace ROCKSDB_NAMESPACE {
 class DBOptionChangeMigrationTests
    : public DBTestBase,
      public testing::WithParamInterface<
-          std::tuple<int, int, bool, int, int, bool>> {
+          std::tuple<int, int, bool, int, int, bool, uint64_t>> {
 public:
  DBOptionChangeMigrationTests()
      : DBTestBase("db_option_change_migration_test", /*env_do_fsync=*/true) {
@ -31,6 +31,7 @@ class DBOptionChangeMigrationTests
    level2_ = std::get<3>(GetParam());
    compaction_style2_ = std::get<4>(GetParam());
    is_dynamic2_ = std::get<5>(GetParam());
    fifo_max_table_files_size_ = std::get<6>(GetParam());
  }
  // Required if inheriting from testing::WithParamInterface<>
@ -44,6 +45,8 @@ class DBOptionChangeMigrationTests
  int level2_;
  int compaction_style2_;
  bool is_dynamic2_;
  uint64_t fifo_max_table_files_size_;
 };
 #ifndef ROCKSDB_LITE
@ -97,6 +100,10 @@ TEST_P(DBOptionChangeMigrationTests, Migrate1) {
  if (new_options.compaction_style == CompactionStyle::kCompactionStyleFIFO) {
    new_options.max_open_files = -1;
  }
  if (fifo_max_table_files_size_ != 0) {
    new_options.compaction_options_fifo.max_table_files_size =
        fifo_max_table_files_size_;
  }
  new_options.target_file_size_base = 256 * 1024;
  new_options.num_levels = level2_;
  new_options.max_bytes_for_level_base = 150 * 1024;
@ -172,6 +179,10 @@ TEST_P(DBOptionChangeMigrationTests, Migrate2) {
  if (new_options.compaction_style == CompactionStyle::kCompactionStyleFIFO) {
    new_options.max_open_files = -1;
  }
  if (fifo_max_table_files_size_ != 0) {
    new_options.compaction_options_fifo.max_table_files_size =
        fifo_max_table_files_size_;
  }
  new_options.target_file_size_base = 256 * 1024;
  new_options.num_levels = level1_;
  new_options.max_bytes_for_level_base = 150 * 1024;
@ -251,6 +262,10 @@ TEST_P(DBOptionChangeMigrationTests, Migrate3) {
  if (new_options.compaction_style == CompactionStyle::kCompactionStyleFIFO) {
    new_options.max_open_files = -1;
  }
  if (fifo_max_table_files_size_ != 0) {
    new_options.compaction_options_fifo.max_table_files_size =
        fifo_max_table_files_size_;
  }
  new_options.target_file_size_base = 256 * 1024;
  new_options.num_levels = level2_;
  new_options.max_bytes_for_level_base = 150 * 1024;
@ -332,6 +347,10 @@ TEST_P(DBOptionChangeMigrationTests, Migrate4) {
  if (new_options.compaction_style == CompactionStyle::kCompactionStyleFIFO) {
    new_options.max_open_files = -1;
  }
  if (fifo_max_table_files_size_ != 0) {
    new_options.compaction_options_fifo.max_table_files_size =
        fifo_max_table_files_size_;
  }
  new_options.target_file_size_base = 256 * 1024;
  new_options.num_levels = level1_;
  new_options.max_bytes_for_level_base = 150 * 1024;
@ -357,21 +376,100 @@ TEST_P(DBOptionChangeMigrationTests, Migrate4) {
 INSTANTIATE_TEST_CASE_P(
    DBOptionChangeMigrationTests, DBOptionChangeMigrationTests,
-    ::testing::Values(std::make_tuple(3, 0, false, 4, 0, false),
+    ::testing::Values(
-                      std::make_tuple(3, 0, true, 4, 0, true),
+        std::make_tuple(3 /* old num_levels */, 0 /* old compaction style */,
-                      std::make_tuple(3, 0, true, 4, 0, false),
+                        false /* is dynamic leveling in old option */,
-                      std::make_tuple(3, 0, false, 4, 0, true),
+                        4 /* old num_levels */, 0 /* new compaction style */,
-                      std::make_tuple(3, 1, false, 4, 1, false),
+                        false /* is dynamic leveling in new option */,
-                      std::make_tuple(1, 1, false, 4, 1, false),
+                        0 /*fifo max_table_files_size*/),
-                      std::make_tuple(3, 0, false, 4, 1, false),
+        std::make_tuple(3 /* old num_levels */, 0 /* old compaction style */,
-                      std::make_tuple(3, 0, false, 1, 1, false),
+                        true /* is dynamic leveling in old option */,
-                      std::make_tuple(3, 0, true, 4, 1, false),
+                        4 /* old num_levels */, 0 /* new compaction style */,
-                      std::make_tuple(3, 0, true, 1, 1, false),
+                        true /* is dynamic leveling in new option */,
-                      std::make_tuple(1, 1, false, 4, 0, false),
+                        0 /*fifo max_table_files_size*/),
-                      std::make_tuple(4, 0, false, 1, 2, false),
+        std::make_tuple(3 /* old num_levels */, 0 /* old compaction style */,
-                      std::make_tuple(3, 0, true, 2, 2, false),
+                        true /* is dynamic leveling in old option */,
-                      std::make_tuple(3, 1, false, 3, 2, false),
+                        4 /* old num_levels */, 0 /* new compaction style */,
-                      std::make_tuple(1, 1, false, 4, 2, false)));
+                        false, 0 /*fifo max_table_files_size*/),
        std::make_tuple(3 /* old num_levels */, 0 /* old compaction style */,
                        false /* is dynamic leveling in old option */,
                        4 /* old num_levels */, 0 /* new compaction style */,
                        true /* is dynamic leveling in new option */,
                        0 /*fifo max_table_files_size*/),
        std::make_tuple(3 /* old num_levels */, 1 /* old compaction style */,
                        false /* is dynamic leveling in old option */,
                        4 /* old num_levels */, 1 /* new compaction style */,
                        false /* is dynamic leveling in new option */,
                        0 /*fifo max_table_files_size*/),
        std::make_tuple(1 /* old num_levels */, 1 /* old compaction style */,
                        false /* is dynamic leveling in old option */,
                        4 /* old num_levels */, 1 /* new compaction style */,
                        false /* is dynamic leveling in new option */,
                        0 /*fifo max_table_files_size*/),
        std::make_tuple(3 /* old num_levels */, 0 /* old compaction style */,
                        false /* is dynamic leveling in old option */,
                        4 /* old num_levels */, 1 /* new compaction style */,
                        false /* is dynamic leveling in new option */,
                        0 /*fifo max_table_files_size*/),
        std::make_tuple(3 /* old num_levels */, 0 /* old compaction style */,
                        false /* is dynamic leveling in old option */,
                        1 /* old num_levels */, 1 /* new compaction style */,
                        false /* is dynamic leveling in new option */,
                        0 /*fifo max_table_files_size*/),
        std::make_tuple(3 /* old num_levels */, 0 /* old compaction style */,
                        true /* is dynamic leveling in old option */,
                        4 /* old num_levels */, 1 /* new compaction style */,
                        false /* is dynamic leveling in new option */,
                        0 /*fifo max_table_files_size*/),
        std::make_tuple(3 /* old num_levels */, 0 /* old compaction style */,
                        true /* is dynamic leveling in old option */,
                        1 /* old num_levels */, 1 /* new compaction style */,
                        false /* is dynamic leveling in new option */,
                        0 /*fifo max_table_files_size*/),
        std::make_tuple(1 /* old num_levels */, 1 /* old compaction style */,
                        false /* is dynamic leveling in old option */,
                        4 /* old num_levels */, 0 /* new compaction style */,
                        false /* is dynamic leveling in new option */,
                        0 /*fifo max_table_files_size*/),
        std::make_tuple(4 /* old num_levels */, 0 /* old compaction style */,
                        false /* is dynamic leveling in old option */,
                        1 /* old num_levels */, 2 /* new compaction style */,
                        false /* is dynamic leveling in new option */,
                        0 /*fifo max_table_files_size*/),
        std::make_tuple(3 /* old num_levels */, 0 /* old compaction style */,
                        true /* is dynamic leveling in old option */,
                        2 /* old num_levels */, 2 /* new compaction style */,
                        false /* is dynamic leveling in new option */,
                        0 /*fifo max_table_files_size*/),
        std::make_tuple(3 /* old num_levels */, 1 /* old compaction style */,
                        false /* is dynamic leveling in old option */,
                        3 /* old num_levels */, 2 /* new compaction style */,
                        false /* is dynamic leveling in new option */,
                        0 /*fifo max_table_files_size*/),
        std::make_tuple(1 /* old num_levels */, 1 /* old compaction style */,
                        false /* is dynamic leveling in old option */,
                        4 /* old num_levels */, 2 /* new compaction style */,
                        false /* is dynamic leveling in new option */, 0),
        std::make_tuple(4 /* old num_levels */, 0 /* old compaction style */,
                        false /* is dynamic leveling in old option */,
                        1 /* old num_levels */, 2 /* new compaction style */,
                        false /* is dynamic leveling in new option */,
                        5 * 1024 * 1024 /*fifo max_table_files_size*/),
        std::make_tuple(3 /* old num_levels */, 0 /* old compaction style */,
                        true /* is dynamic leveling in old option */,
                        2 /* old num_levels */, 2 /* new compaction style */,
                        false /* is dynamic leveling in new option */,
                        5 * 1024 * 1024 /*fifo max_table_files_size*/),
        std::make_tuple(3 /* old num_levels */, 1 /* old compaction style */,
                        false /* is dynamic leveling in old option */,
                        3 /* old num_levels */, 2 /* new compaction style */,
                        false /* is dynamic leveling in new option */,
                        5 * 1024 * 1024 /*fifo max_table_files_size*/),
        std::make_tuple(1 /* old num_levels */, 1 /* old compaction style */,
                        false /* is dynamic leveling in old option */,
                        4 /* old num_levels */, 2 /* new compaction style */,
                        false /* is dynamic leveling in new option */,
                        5 * 1024 * 1024 /*fifo max_table_files_size*/)));
 class DBOptionChangeMigrationTest : public DBTestBase {
 public: