BlobDB: ignore trivially moved files when updating the SST<->blob file mapping (#6381)

Summary:
BlobDB keeps track of the mapping between SSTs and blob files using
the `OnFlushCompleted` and `OnCompactionCompleted` callbacks of
the `EventListener` interface: upon receiving a flush notification, a link
is added between the newly flushed SST and the corresponding blob file;
for compactions, links are removed for the inputs and added for the outputs.
The earlier code performed this link deletion and addition even for
trivially moved files; the new code walks through the two lists together
(in a fashion that's similar to merge sort) and skips such files.
This should mitigate https://github.com/facebook/rocksdb/issues/6338,
wherein an assertion is triggered with the earlier code when a compaction
notification for a trivial move precedes the flush notification for the
moved SST.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6381

Test Plan: make check

Differential Revision: D19773729

Pulled By: ltamasi

fbshipit-source-id: ae0f273ded061110dd9334e8fb99b0d7786650b0
main
Levi Tamasi 5 years ago committed by Facebook Github Bot
parent 107a7ca930
commit 1b4be4cac9
  1. 1
      HISTORY.md
  2. 64
      utilities/blob_db/blob_db_impl.cc

@ -8,6 +8,7 @@
* Fix a bug that prevents opening a DB after two consecutive crash with TransactionDB, where the first crash recovers from a corrupted WAL with kPointInTimeRecovery but the second cannot.
* Fixed issue #6316 that can cause a corruption of the MANIFEST file in the middle when writing to it fails due to no disk space.
* Add DBOptions::skip_checking_sst_file_sizes_on_db_open. It disables potentially expensive checking of all sst file sizes in DB::Open().
* BlobDB now ignores trivially moved files when updating the mapping between blob files and SSTs. This should mitigate issue #6338 where out of order flush/compaction notifications could trigger an assertion with the earlier code.
### Public API Change
* The BlobDB garbage collector now emits the statistics `BLOB_DB_GC_NUM_FILES` (number of blob files obsoleted during GC), `BLOB_DB_GC_NUM_NEW_FILES` (number of new blob files generated during GC), `BLOB_DB_GC_FAILURES` (number of failed GC passes), `BLOB_DB_GC_NUM_KEYS_RELOCATED` (number of blobs relocated during GC), and `BLOB_DB_GC_BYTES_RELOCATED` (total size of blobs relocated during GC). On the other hand, the following statistics, which are not relevant for the new GC implementation, are now deprecated: `BLOB_DB_GC_NUM_KEYS_OVERWRITTEN`, `BLOB_DB_GC_NUM_KEYS_EXPIRED`, `BLOB_DB_GC_BYTES_OVERWRITTEN`, `BLOB_DB_GC_BYTES_EXPIRED`, and `BLOB_DB_GC_MICROS`.

@ -476,25 +476,69 @@ void BlobDBImpl::ProcessCompactionJobInfo(const CompactionJobInfo& info) {
}
// Note: the same SST file may appear in both the input and the output
// file list in case of a trivial move. We process the inputs first
// to ensure the blob file still has a link after processing all updates.
// file list in case of a trivial move. We walk through the two lists
// below in a fashion that's similar to merge sort to detect this.
auto cmp = [](const CompactionFileInfo& lhs, const CompactionFileInfo& rhs) {
return lhs.file_number < rhs.file_number;
};
auto inputs = info.input_file_infos;
auto iit = inputs.begin();
const auto iit_end = inputs.end();
std::sort(iit, iit_end, cmp);
auto outputs = info.output_file_infos;
auto oit = outputs.begin();
const auto oit_end = outputs.end();
std::sort(oit, oit_end, cmp);
WriteLock lock(&mutex_);
for (const auto& input : info.input_file_infos) {
if (input.oldest_blob_file_number == kInvalidBlobFileNumber) {
continue;
while (iit != iit_end && oit != oit_end) {
const auto& input = *iit;
const auto& output = *oit;
if (input.file_number == output.file_number) {
++iit;
++oit;
} else if (input.file_number < output.file_number) {
if (input.oldest_blob_file_number != kInvalidBlobFileNumber) {
UnlinkSstFromBlobFile(input.file_number, input.oldest_blob_file_number);
}
++iit;
} else {
assert(output.file_number < input.file_number);
if (output.oldest_blob_file_number != kInvalidBlobFileNumber) {
LinkSstToBlobFile(output.file_number, output.oldest_blob_file_number);
}
++oit;
}
}
while (iit != iit_end) {
const auto& input = *iit;
if (input.oldest_blob_file_number != kInvalidBlobFileNumber) {
UnlinkSstFromBlobFile(input.file_number, input.oldest_blob_file_number);
}
UnlinkSstFromBlobFile(input.file_number, input.oldest_blob_file_number);
++iit;
}
for (const auto& output : info.output_file_infos) {
if (output.oldest_blob_file_number == kInvalidBlobFileNumber) {
continue;
while (oit != oit_end) {
const auto& output = *oit;
if (output.oldest_blob_file_number != kInvalidBlobFileNumber) {
LinkSstToBlobFile(output.file_number, output.oldest_blob_file_number);
}
LinkSstToBlobFile(output.file_number, output.oldest_blob_file_number);
++oit;
}
MarkUnreferencedBlobFilesObsolete();

Loading…
Cancel
Save