From 146263887fa6ba13a1ace5303d7bc3085416b45a Mon Sep 17 00:00:00 2001 From: Levi Tamasi Date: Sat, 12 Jun 2021 12:08:30 -0700 Subject: [PATCH] Disable subcompactions for user-defined timestamps (#8393) Summary: The subcompaction boundary picking logic does not currently guarantee that all user keys that differ only by timestamp get processed by the same subcompaction. This can cause issues with the `CompactionIterator` state machine: for instance, one subcompaction that processes a subset of such KVs might drop a tombstone based on the KVs it sees, while in reality the tombstone might not have been eligible to be optimized out. (See also https://github.com/facebook/rocksdb/issues/6645, which adjusted the way compaction inputs are picked for the same reason.) Pull Request resolved: https://github.com/facebook/rocksdb/pull/8393 Test Plan: Ran `make check` and the crash test script with timestamps enabled. Reviewed By: jay-zhuang Differential Revision: D29071635 Pulled By: ltamasi fbshipit-source-id: f6c72442122b4e581871e096fabe3876a9e8a5a6 --- HISTORY.md | 1 + db/compaction/compaction.cc | 8 ++++++++ db/compaction/compaction_job.cc | 4 ++++ 3 files changed, 13 insertions(+) diff --git a/HISTORY.md b/HISTORY.md index aa394a754..882b8f2e3 100644 --- a/HISTORY.md +++ b/HISTORY.md @@ -5,6 +5,7 @@ * Obsolete keys in the bottommost level that were preserved for a snapshot will now be cleaned upon snapshot release in all cases. This form of compaction (snapshot release triggered compaction) previously had an artificial limitation that multiple tombstones needed to be present. ### Bug Fixes * fs_posix.cc GetFreeSpace() always report disk space available to root even when running as non-root. Linux defaults often have disk mounts with 5 to 10 percent of total space reserved only for root. Out of space could result for non-root users. +* Subcompactions are now disabled when user-defined timestamps are used, since the subcompaction boundary picking logic is currently not timestamp-aware, which could lead to incorrect results when different subcompactions process keys that only differ by timestamp. ## 6.21.0 (2021-05-21) ### Bug Fixes diff --git a/db/compaction/compaction.cc b/db/compaction/compaction.cc index 32ab98288..57f814fbc 100644 --- a/db/compaction/compaction.cc +++ b/db/compaction/compaction.cc @@ -563,6 +563,14 @@ bool Compaction::ShouldFormSubcompactions() const { if (max_subcompactions_ <= 1 || cfd_ == nullptr) { return false; } + + // Note: the subcompaction boundary picking logic does not currently guarantee + // that all user keys that differ only by timestamp get processed by the same + // subcompaction. + if (cfd_->user_comparator()->timestamp_size() > 0) { + return false; + } + if (cfd_->ioptions()->compaction_style == kCompactionStyleLevel) { return (start_level_ == 0 || is_manual_compaction_) && output_level_ > 0 && !IsOutputLevelEmpty(); diff --git a/db/compaction/compaction_job.cc b/db/compaction/compaction_job.cc index 7c3f2db3a..38b0a3a80 100644 --- a/db/compaction/compaction_job.cc +++ b/db/compaction/compaction_job.cc @@ -1099,6 +1099,10 @@ void CompactionJob::ProcessKeyValueCompaction(SubcompactionState* sub_compact) { // (a) concurrent compactions, // (b) CompactionFilter::Decision::kRemoveAndSkipUntil. read_options.total_order_seek = true; + + // Note: if we're going to support subcompactions for user-defined timestamps, + // the timestamp part will have to be stripped from the bounds here. + assert((!start && !end) || cfd->user_comparator()->timestamp_size() == 0); read_options.iterate_lower_bound = start; read_options.iterate_upper_bound = end;