Make default share_files_with_checksum=true (#8020)

Summary:
New comment for share_files_with_checksum:
// Only used if share_table_files is set to true. Setting to false is
// DEPRECATED and potentially dangerous because in that case BackupEngine
// can lose data if backing up databases with distinct or divergent
// history, for example if restoring from a backup other than the latest,
// writing to the DB, and creating another backup. Setting to true (default)
// prevents these issues by ensuring that different table files (SSTs) with
// the same number are treated as distinct. See
// share_files_with_checksum_naming and ShareFilesNaming.

I have also removed interim option kFlagMatchInterimNaming, which is no
longer needed and was never needed for correct+compatible operation
(just performance).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8020

Test Plan:
tests updated. Backward+forward compatibility verified with
SHORT_TEST=1 check_format_compatible.sh. ldb uses default backup
options, and I manually verified shared_checksum in
/tmp/rocksdb_format_compatible_peterd/bak/current/ after run.

Reviewed By: ajkr

Differential Revision: D26786331

Pulled By: pdillinger

fbshipit-source-id: 36f968dfef1f5cacbd65154abe1d846151a55130
main
Peter Dillinger 4 years ago committed by Facebook GitHub Bot
parent 0028e3398b
commit 847ca9f964
  1. 1
      HISTORY.md
  2. 5
      db_stress_tool/db_stress_test_base.cc
  3. 35
      include/rocksdb/utilities/backupable_db.h
  4. 16
      utilities/backupable/backupable_db.cc
  5. 69
      utilities/backupable/backupable_db_test.cc

@ -4,6 +4,7 @@
* Fixed the truncation error found in APIs/tools when dumping block-based SST files in a human-readable format. After fix, the block-based table can be fully dumped as a readable file. * Fixed the truncation error found in APIs/tools when dumping block-based SST files in a human-readable format. After fix, the block-based table can be fully dumped as a readable file.
### Public API change ### Public API change
* Changed default `BackupableDBOptions::share_files_with_checksum` to `true` and deprecated `false` because of potential for data loss. Note that accepting this change in behavior can temporarily increase backup data usage because files are not shared between backups using the two different settings. Also removed obsolete option kFlagMatchInterimNaming.
* Add a new option BlockBasedTableOptions::max_auto_readahead_size. RocksDB does auto-readahead for iterators on noticing more than two reads for a table file if user doesn't provide readahead_size. The readahead starts at 8KB and doubles on every additional read upto max_auto_readahead_size and now max_auto_readahead_size can be configured dynamically as well. Found that 256 KB readahead size provides the best performance, based on experiments, for auto readahead. Experiment data is in PR #3282. If value is set 0 then no automatic prefetching will be done by rocksdb. Also changing the value will only affect files opened after the change. * Add a new option BlockBasedTableOptions::max_auto_readahead_size. RocksDB does auto-readahead for iterators on noticing more than two reads for a table file if user doesn't provide readahead_size. The readahead starts at 8KB and doubles on every additional read upto max_auto_readahead_size and now max_auto_readahead_size can be configured dynamically as well. Found that 256 KB readahead size provides the best performance, based on experiments, for auto readahead. Experiment data is in PR #3282. If value is set 0 then no automatic prefetching will be done by rocksdb. Also changing the value will only affect files opened after the change.
* Add suppport to extend DB::VerifyFileChecksums API to also verify blob files checksum. * Add suppport to extend DB::VerifyFileChecksums API to also verify blob files checksum.
* When using the new BlobDB, the amount of data written by flushes/compactions is now broken down into table files and blob files in the compaction statistics; namely, Write(GB) denotes the amount of data written to table files, while Wblob(GB) means the amount of data written to blob files. * When using the new BlobDB, the amount of data written by flushes/compactions is now broken down into table files and blob files in the compaction statistics; namely, Write(GB) denotes the amount of data written to table files, while Wblob(GB) means the amount of data written to blob files.

@ -1297,11 +1297,6 @@ Status StressTest::TestBackupRestore(
backup_opts.share_files_with_checksum_naming | backup_opts.share_files_with_checksum_naming |
BackupableDBOptions::kFlagIncludeFileSize; BackupableDBOptions::kFlagIncludeFileSize;
} }
if (thread->rand.OneIn(2)) {
backup_opts.share_files_with_checksum_naming =
backup_opts.share_files_with_checksum_naming |
BackupableDBOptions::kFlagMatchInterimNaming;
}
} }
} }
BackupEngine* backup_engine = nullptr; BackupEngine* backup_engine = nullptr;

@ -92,16 +92,16 @@ struct BackupableDBOptions {
// Default: nullptr // Default: nullptr
std::shared_ptr<RateLimiter> restore_rate_limiter{nullptr}; std::shared_ptr<RateLimiter> restore_rate_limiter{nullptr};
// Only used if share_table_files is set to true. If true, will consider // Only used if share_table_files is set to true. Setting to false is
// that backups can come from different databases, even differently mutated // DEPRECATED and potentially dangerous because in that case BackupEngine
// databases with the same DB ID. See share_files_with_checksum_naming and // can lose data if backing up databases with distinct or divergent
// ShareFilesNaming for details on how table files names are made // history, for example if restoring from a backup other than the latest,
// unique between databases. // writing to the DB, and creating another backup. Setting to true (default)
// prevents these issues by ensuring that different table files (SSTs) with
// the same number are treated as distinct. See
// share_files_with_checksum_naming and ShareFilesNaming.
// //
// Using 'true' is fundamentally safer, and performance improvements vs. // Default: true
// original design should leave almost no reason to use the 'false' setting.
//
// Default (only for historical reasons): false
bool share_files_with_checksum; bool share_files_with_checksum;
// Up to this many background threads will copy files for CreateNewBackup() // Up to this many background threads will copy files for CreateNewBackup()
@ -167,16 +167,6 @@ struct BackupableDBOptions {
// contribute significantly to naming uniqueness. // contribute significantly to naming uniqueness.
kFlagIncludeFileSize = 1U << 31, kFlagIncludeFileSize = 1U << 31,
// When encountering an SST file from a Facebook-internal early
// release of 6.12, use the default naming scheme in effect for
// when the SST file was generated (assuming full file checksum
// was not set to GetFileChecksumGenCrc32cFactory()). That naming is
// <file_number>_<db_session_id>.sst
// and ignores kFlagIncludeFileSize setting.
// NOTE: This flag is intended to be temporary and should be removed
// in a later release.
kFlagMatchInterimNaming = 1U << 30,
kMaskNamingFlags = ~kMaskNoNamingFlags, kMaskNamingFlags = ~kMaskNoNamingFlags,
}; };
@ -193,7 +183,7 @@ struct BackupableDBOptions {
// directory, under the new shared name in addition to the old shared // directory, under the new shared name in addition to the old shared
// name. // name.
// //
// Default: kUseDbSessionId | kFlagIncludeFileSize | kFlagMatchInterimNaming // Default: kUseDbSessionId | kFlagIncludeFileSize
// //
// Note: This option comes into effect only if both share_files_with_checksum // Note: This option comes into effect only if both share_files_with_checksum
// and share_table_files are true. // and share_table_files are true.
@ -210,8 +200,7 @@ struct BackupableDBOptions {
uint64_t _callback_trigger_interval_size = 4 * 1024 * 1024, uint64_t _callback_trigger_interval_size = 4 * 1024 * 1024,
int _max_valid_backups_to_open = INT_MAX, int _max_valid_backups_to_open = INT_MAX,
ShareFilesNaming _share_files_with_checksum_naming = ShareFilesNaming _share_files_with_checksum_naming =
static_cast<ShareFilesNaming>(kUseDbSessionId | kFlagIncludeFileSize | static_cast<ShareFilesNaming>(kUseDbSessionId | kFlagIncludeFileSize))
kFlagMatchInterimNaming))
: backup_dir(_backup_dir), : backup_dir(_backup_dir),
backup_env(_backup_env), backup_env(_backup_env),
share_table_files(_share_table_files), share_table_files(_share_table_files),
@ -221,7 +210,7 @@ struct BackupableDBOptions {
backup_log_files(_backup_log_files), backup_log_files(_backup_log_files),
backup_rate_limit(_backup_rate_limit), backup_rate_limit(_backup_rate_limit),
restore_rate_limit(_restore_rate_limit), restore_rate_limit(_restore_rate_limit),
share_files_with_checksum(false), share_files_with_checksum(true),
max_background_operations(_max_background_operations), max_background_operations(_max_background_operations),
callback_trigger_interval_size(_callback_trigger_interval_size), callback_trigger_interval_size(_callback_trigger_interval_size),
max_valid_backups_to_open(_max_valid_backups_to_open), max_valid_backups_to_open(_max_valid_backups_to_open),

@ -329,13 +329,6 @@ class BackupEngineImpl : public BackupEngine {
BackupableDBOptions::kLegacyCrc32cAndFileSize || BackupableDBOptions::kLegacyCrc32cAndFileSize ||
sid.empty(); sid.empty();
} }
inline bool UseInterimNaming(const std::string& sid) const {
// The indicator of SST file from early internal 6.12 release
// is a '-' in the DB session id. DB session id was made more
// concise without '-' after that.
return (GetNamingFlags() & BackupableDBOptions::kFlagMatchInterimNaming) &&
sid.find('-') != std::string::npos;
}
inline std::string GetSharedFileWithChecksum( inline std::string GetSharedFileWithChecksum(
const std::string& file, bool has_checksum, const std::string& file, bool has_checksum,
const std::string& checksum_hex, const uint64_t file_size, const std::string& checksum_hex, const uint64_t file_size,
@ -348,8 +341,6 @@ class BackupEngineImpl : public BackupEngine {
file_copy.insert(file_copy.find_last_of('.'), file_copy.insert(file_copy.find_last_of('.'),
"_" + ToString(ChecksumHexToInt32(checksum_hex)) + "_" + "_" + ToString(ChecksumHexToInt32(checksum_hex)) + "_" +
ToString(file_size)); ToString(file_size));
} else if (UseInterimNaming(db_session_id)) {
file_copy.insert(file_copy.find_last_of('.'), "_" + db_session_id);
} else { } else {
file_copy.insert(file_copy.find_last_of('.'), "_s" + db_session_id); file_copy.insert(file_copy.find_last_of('.'), "_s" + db_session_id);
if (GetNamingFlags() & BackupableDBOptions::kFlagIncludeFileSize) { if (GetNamingFlags() & BackupableDBOptions::kFlagIncludeFileSize) {
@ -982,6 +973,13 @@ Status BackupEngineImpl::CreateNewBackupWithMetadata(
ROCKS_LOG_INFO(options_.info_log, ROCKS_LOG_INFO(options_.info_log,
"Started the backup process -- creating backup %u", "Started the backup process -- creating backup %u",
new_backup_id); new_backup_id);
if (options_.share_table_files && !options_.share_files_with_checksum) {
ROCKS_LOG_WARN(options_.info_log,
"BackupableDBOptions::share_files_with_checksum=false is "
"DEPRECATED and could lead to data loss.");
}
if (s.ok()) { if (s.ok()) {
s = backup_env_->CreateDir(private_dir); s = backup_env_->CreateDir(private_dir);
} }

@ -43,10 +43,7 @@ const auto kLegacyCrc32cAndFileSize =
BackupableDBOptions::kLegacyCrc32cAndFileSize; BackupableDBOptions::kLegacyCrc32cAndFileSize;
const auto kUseDbSessionId = BackupableDBOptions::kUseDbSessionId; const auto kUseDbSessionId = BackupableDBOptions::kUseDbSessionId;
const auto kFlagIncludeFileSize = BackupableDBOptions::kFlagIncludeFileSize; const auto kFlagIncludeFileSize = BackupableDBOptions::kFlagIncludeFileSize;
const auto kFlagMatchInterimNaming = const auto kNamingDefault = kUseDbSessionId | kFlagIncludeFileSize;
BackupableDBOptions::kFlagMatchInterimNaming;
const auto kNamingDefault =
kUseDbSessionId | kFlagIncludeFileSize | kFlagMatchInterimNaming;
class DummyDB : public StackableDB { class DummyDB : public StackableDB {
public: public:
@ -1908,63 +1905,6 @@ TEST_F(BackupableDBTest, ShareTableFilesWithChecksumsNewNaming) {
"[0-9]+_s[0-9A-Z]{20}_[0-9]+[.]sst"}, "[0-9]+_s[0-9A-Z]{20}_[0-9]+[.]sst"},
}; };
for (const auto& pair : option_to_expected) {
// kFlagMatchInterimNaming must not matter on new SST files
for (const auto option :
{pair.first, pair.first | kFlagMatchInterimNaming}) {
CloseAndReopenDB();
backupable_options_->share_files_with_checksum_naming = option;
OpenBackupEngine(true /*destroy_old_data*/);
ASSERT_OK(backup_engine_->CreateNewBackup(db_.get()));
CloseDBAndBackupEngine();
AssertBackupConsistency(1, 0, keys_iteration, keys_iteration * 2);
AssertDirectoryFilesMatchRegex(backupdir_ + "/shared_checksum",
std::regex(pair.second),
1 /* minimum_count */);
if (std::string::npos != pair.second.find("_[0-9]+[.]sst")) {
AssertDirectoryFilesSizeIndicators(backupdir_ + "/shared_checksum",
1 /* minimum_count */);
}
}
}
}
// Mimic SST file generated by early internal-only 6.12 release
// and test various naming options. This test can be removed when
// the kFlagMatchInterimNaming feature is removed.
TEST_F(BackupableDBTest, ShareTableFilesWithChecksumsInterimNaming) {
const int keys_iteration = 5000;
// Essentially, reinstate old implementaiton of generating a DB
// session id. This is how we distinguish "interim" SST files from
// newer ones: from the form of the db session id string.
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->SetCallBack(
"DBImpl::SetDbSessionId", [&](void* sid_void_star) {
std::string* sid = static_cast<std::string*>(sid_void_star);
*sid = test_db_env_->GenerateUniqueId();
if (!sid->empty() && sid->back() == '\n') {
sid->pop_back();
}
});
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->EnableProcessing();
OpenDBAndBackupEngine(true, false, kShareWithChecksum);
FillDB(db_.get(), 0, keys_iteration);
CloseDBAndBackupEngine();
static const std::map<ShareFilesNaming, std::string> option_to_expected = {
{kLegacyCrc32cAndFileSize, "[0-9]+_[0-9]+_[0-9]+[.]sst"},
// kFlagMatchInterimNaming ignored here
{kLegacyCrc32cAndFileSize | kFlagMatchInterimNaming,
"[0-9]+_[0-9]+_[0-9]+[.]sst"},
{kUseDbSessionId, "[0-9]+_s[0-9a-fA-F-]+[.]sst"},
{kUseDbSessionId | kFlagIncludeFileSize,
"[0-9]+_s[0-9a-fA-F-]+_[0-9]+[.]sst"},
{kUseDbSessionId | kFlagMatchInterimNaming, "[0-9]+_[0-9a-fA-F-]+[.]sst"},
{kUseDbSessionId | kFlagIncludeFileSize | kFlagMatchInterimNaming,
"[0-9]+_[0-9a-fA-F-]+[.]sst"},
};
for (const auto& pair : option_to_expected) { for (const auto& pair : option_to_expected) {
CloseAndReopenDB(); CloseAndReopenDB();
backupable_options_->share_files_with_checksum_naming = pair.first; backupable_options_->share_files_with_checksum_naming = pair.first;
@ -1975,10 +1915,11 @@ TEST_F(BackupableDBTest, ShareTableFilesWithChecksumsInterimNaming) {
AssertDirectoryFilesMatchRegex(backupdir_ + "/shared_checksum", AssertDirectoryFilesMatchRegex(backupdir_ + "/shared_checksum",
std::regex(pair.second), std::regex(pair.second),
1 /* minimum_count */); 1 /* minimum_count */);
if (std::string::npos != pair.second.find("_[0-9]+[.]sst")) {
AssertDirectoryFilesSizeIndicators(backupdir_ + "/shared_checksum",
1 /* minimum_count */);
}
} }
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->DisableProcessing();
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->ClearAllCallBacks();
} }
// Mimic SST file generated by pre-6.12 releases and verify that // Mimic SST file generated by pre-6.12 releases and verify that

Loading…
Cancel
Save