Bug Fix for memtables not trimmed down. (#7296)

Summary:
When a memtable is trimmed in MemTableListVersion, the memtable
is only added to delete list if it is
the last reference. However it is not the last reference as it is held
by the super version. But the super version would not be switched if the
delete list is empty. So the memtable is never destroyed and memory
usage increases beyond write_buffer_size +
max_write_buffer_size_to_maintain.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7296

Test Plan:
1.  ./db_bench -benchmarks=randomtransaction
-optimistic_transaction_db=1 -statistics -stats_interval_seconds=1
-duration=90 -num=500000 --max_write_buffer_size_to_maintain=16000000
--transaction_set_snapshot

Reviewed By: ltamasi

Differential Revision: D23267395

Pulled By: akankshamahajan15

fbshipit-source-id: 3a8d437fe9f4015f851ff84c0e29528aa946b650
main
Akanksha Mahajan 4 years ago committed by Facebook GitHub Bot
parent 187964a039
commit 3844612625
  1. 1
      HISTORY.md
  2. 5
      db/db_impl/db_impl_write.cc
  3. 10
      db/memtable_list.cc
  4. 7
      db/memtable_list.h

@ -6,6 +6,7 @@
* Sanitize `recycle_log_file_num` to zero when the user attempts to enable it in combination with `WALRecoveryMode::kTolerateCorruptedTailRecords`. Previously the two features were allowed together, which compromised the user's configured crash-recovery guarantees. * Sanitize `recycle_log_file_num` to zero when the user attempts to enable it in combination with `WALRecoveryMode::kTolerateCorruptedTailRecords`. Previously the two features were allowed together, which compromised the user's configured crash-recovery guarantees.
* Fix a bug where a level refitting in CompactRange() might race with an automatic compaction that puts the data to the target level of the refitting. The bug has been there for years. * Fix a bug where a level refitting in CompactRange() might race with an automatic compaction that puts the data to the target level of the refitting. The bug has been there for years.
* BackupEngine::CreateNewBackup could fail intermittently with non-OK status when backing up a read-write DB configured with a DBOptions::file_checksum_gen_factory. This issue has been worked-around such that CreateNewBackup should succeed, but (until fully fixed) BackupEngine might not see all checksums available in the DB. * BackupEngine::CreateNewBackup could fail intermittently with non-OK status when backing up a read-write DB configured with a DBOptions::file_checksum_gen_factory. This issue has been worked-around such that CreateNewBackup should succeed, but (until fully fixed) BackupEngine might not see all checksums available in the DB.
* Fix a bug where immutable flushed memtable is never destroyed because a memtable is not added to delete list because of refernce hold by super version and super version doesn't switch because of empty delete list. So memory usage increases beyond write_buffer_size + max_write_buffer_size_to_maintain.
### New Features ### New Features
* A new option `std::shared_ptr<FileChecksumGenFactory> file_checksum_gen_factory` is added to `BackupableDBOptions`. The default value for this option is `nullptr`. If this option is null, the default backup engine checksum function (crc32c) will be used for creating, verifying, or restoring backups. If it is not null and is set to the DB custom checksum factory, the custom checksum function used in DB will also be used for creating, verifying, or restoring backups, in addition to the default checksum function (crc32c). If it is not null and is set to a custom checksum factory different than the DB custom checksum factory (which may be null), BackupEngine will return `Status::InvalidArgument()`. * A new option `std::shared_ptr<FileChecksumGenFactory> file_checksum_gen_factory` is added to `BackupableDBOptions`. The default value for this option is `nullptr`. If this option is null, the default backup engine checksum function (crc32c) will be used for creating, verifying, or restoring backups. If it is not null and is set to the DB custom checksum factory, the custom checksum function used in DB will also be used for creating, verifying, or restoring backups, in addition to the default checksum function (crc32c). If it is not null and is set to a custom checksum factory different than the DB custom checksum factory (which may be null), BackupEngine will return `Status::InvalidArgument()`.

@ -1556,11 +1556,14 @@ Status DBImpl::TrimMemtableHistory(WriteContext* context) {
} }
for (auto& cfd : cfds) { for (auto& cfd : cfds) {
autovector<MemTable*> to_delete; autovector<MemTable*> to_delete;
cfd->imm()->TrimHistory(&to_delete, cfd->mem()->ApproximateMemoryUsage()); bool trimmed = cfd->imm()->TrimHistory(
&to_delete, cfd->mem()->ApproximateMemoryUsage());
if (!to_delete.empty()) { if (!to_delete.empty()) {
for (auto m : to_delete) { for (auto m : to_delete) {
delete m; delete m;
} }
}
if (trimmed) {
context->superversion_context.NewSuperVersion(); context->superversion_context.NewSuperVersion();
assert(context->superversion_context.new_superversion.get() != nullptr); assert(context->superversion_context.new_superversion.get() != nullptr);
cfd->InstallSuperVersion(&context->superversion_context, &mutex_); cfd->InstallSuperVersion(&context->superversion_context, &mutex_);

@ -309,14 +309,17 @@ bool MemTableListVersion::MemtableLimitExceeded(size_t usage) {
} }
// Make sure we don't use up too much space in history // Make sure we don't use up too much space in history
void MemTableListVersion::TrimHistory(autovector<MemTable*>* to_delete, bool MemTableListVersion::TrimHistory(autovector<MemTable*>* to_delete,
size_t usage) { size_t usage) {
bool ret = false;
while (MemtableLimitExceeded(usage) && !memlist_history_.empty()) { while (MemtableLimitExceeded(usage) && !memlist_history_.empty()) {
MemTable* x = memlist_history_.back(); MemTable* x = memlist_history_.back();
memlist_history_.pop_back(); memlist_history_.pop_back();
UnrefMemTable(to_delete, x); UnrefMemTable(to_delete, x);
ret = true;
} }
return ret;
} }
// Returns true if there is at least one memtable on which flush has // Returns true if there is at least one memtable on which flush has
@ -549,11 +552,12 @@ void MemTableList::Add(MemTable* m, autovector<MemTable*>* to_delete) {
ResetTrimHistoryNeeded(); ResetTrimHistoryNeeded();
} }
void MemTableList::TrimHistory(autovector<MemTable*>* to_delete, size_t usage) { bool MemTableList::TrimHistory(autovector<MemTable*>* to_delete, size_t usage) {
InstallNewVersion(); InstallNewVersion();
current_->TrimHistory(to_delete, usage); bool ret = current_->TrimHistory(to_delete, usage);
UpdateCachedValuesFromMemTableListVersion(); UpdateCachedValuesFromMemTableListVersion();
ResetTrimHistoryNeeded(); ResetTrimHistoryNeeded();
return ret;
} }
// Returns an estimate of the number of bytes of data in use. // Returns an estimate of the number of bytes of data in use.

@ -148,7 +148,8 @@ class MemTableListVersion {
// REQUIRE: m is an immutable memtable // REQUIRE: m is an immutable memtable
void Remove(MemTable* m, autovector<MemTable*>* to_delete); void Remove(MemTable* m, autovector<MemTable*>* to_delete);
void TrimHistory(autovector<MemTable*>* to_delete, size_t usage); // Return true if memtable is trimmed
bool TrimHistory(autovector<MemTable*>* to_delete, size_t usage);
bool GetFromList(std::list<MemTable*>* list, const LookupKey& key, bool GetFromList(std::list<MemTable*>* list, const LookupKey& key,
std::string* value, std::string* timestamp, Status* s, std::string* value, std::string* timestamp, Status* s,
@ -291,7 +292,9 @@ class MemTableList {
// max_write_buffer_size_to_maintain is used, total size of mutable and // max_write_buffer_size_to_maintain is used, total size of mutable and
// immutable memtables is checked against it to decide whether to trim // immutable memtables is checked against it to decide whether to trim
// memtable list. // memtable list.
void TrimHistory(autovector<MemTable*>* to_delete, size_t usage); //
// Return true if memtable is trimmed
bool TrimHistory(autovector<MemTable*>* to_delete, size_t usage);
// Returns an estimate of the number of bytes of data used by // Returns an estimate of the number of bytes of data used by
// the unflushed mem-tables. // the unflushed mem-tables.

Loading…
Cancel
Save