Allow applying `CompactionFilter` outside of compaction (#8243)

Summary:
From HISTORY.md release note:

- Allow `CompactionFilter`s to apply in more table file creation scenarios such as flush and recovery. For compatibility, `CompactionFilter`s by default apply during compaction. Users can customize this behavior by overriding `CompactionFilterFactory::ShouldFilterTableFileCreation()`.
- Removed unused structure `CompactionFilterContext`

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8243

Test Plan: added unit tests

Reviewed By: pdillinger

Differential Revision: D28088089

Pulled By: ajkr

fbshipit-source-id: 0799be7908e3b39fea09fc3f1ab00e13ad817fae
main
Andrew Kryczka 5 years ago committed by Facebook GitHub Bot
parent 242ac6c17c
commit a639c02f8e
  1. 4
      HISTORY.md
  2. 32
      db/builder.cc
  3. 7
      db/compaction/compaction.cc
  4. 14
      db/compaction/compaction_iterator.cc
  5. 2
      db/compaction/compaction_iterator.h
  6. 136
      db/db_compaction_filter_test.cc
  7. 96
      include/rocksdb/compaction_filter.h
  8. 1
      include/rocksdb/listener.h
  9. 28
      include/rocksdb/options.h

@ -12,13 +12,15 @@
### New Features
* Add new option allow_stall passed during instance creation of WriteBufferManager. When allow_stall is set, WriteBufferManager will stall all writers shared across multiple DBs and columns if memory usage goes beyond specified WriteBufferManager::buffer_size (soft limit). Stall will be cleared when memory is freed after flush and memory usage goes down below buffer_size.
* Allow `CompactionFilter`s to apply in more table file creation scenarios such as flush and recovery. For compatibility, `CompactionFilter`s by default apply during compaction. Users can customize this behavior by overriding `CompactionFilterFactory::ShouldFilterTableFileCreation()`.
* Added more fields to FilterBuildingContext with LSM details, for custom filter policies that vary behavior based on where they are in the LSM-tree.
### Performace Improvements
### Performance Improvements
* BlockPrefetcher is used by iterators to prefetch data if they anticipate more data to be used in future. It is enabled implicitly by rocksdb. Added change to take in account read pattern if reads are sequential. This would disable prefetching for random reads in MultiGet and iterators as readahead_size is increased exponential doing large prefetches.
### Public API change
* Removed a parameter from TableFactory::NewTableBuilder, which should not be called by user code because TableBuilder is not a public API.
* Removed unused structure `CompactionFilterContext`.
* The `skip_filters` parameter to SstFileWriter is now considered deprecated. Use `BlockBasedTableOptions::filter_policy` to control generation of filters.
* ClockCache is known to have bugs that could lead to crash or corruption, so should not be used until fixed. Use NewLRUCache instead.
* Deprecated backupable_db.h and BackupableDBOptions in favor of new versions with appropriate names: backup_engine.h and BackupEngineOptions. Old API compatibility is preserved.

@ -109,6 +109,26 @@ Status BuildTable(
TableProperties tp;
if (iter->Valid() || !range_del_agg->IsEmpty()) {
std::unique_ptr<CompactionFilter> compaction_filter;
if (ioptions.compaction_filter_factory != nullptr &&
ioptions.compaction_filter_factory->ShouldFilterTableFileCreation(
tboptions.reason)) {
CompactionFilter::Context context;
context.is_full_compaction = false;
context.is_manual_compaction = false;
context.column_family_id = tboptions.column_family_id;
context.reason = tboptions.reason;
compaction_filter =
ioptions.compaction_filter_factory->CreateCompactionFilter(context);
if (compaction_filter != nullptr &&
!compaction_filter->IgnoreSnapshots()) {
s.PermitUncheckedError();
return Status::NotSupported(
"CompactionFilter::IgnoreSnapshots() = false is not supported "
"anymore.");
}
}
TableBuilder* builder;
std::unique_ptr<WritableFileWriter> file_writer;
{
@ -143,11 +163,11 @@ Status BuildTable(
builder = NewTableBuilder(tboptions, file_writer.get());
}
MergeHelper merge(env, tboptions.internal_comparator.user_comparator(),
ioptions.merge_operator.get(), nullptr, ioptions.logger,
MergeHelper merge(
env, tboptions.internal_comparator.user_comparator(),
ioptions.merge_operator.get(), compaction_filter.get(), ioptions.logger,
true /* internal key corruption is not ok */,
snapshots.empty() ? 0 : snapshots.back(),
snapshot_checker);
snapshots.empty() ? 0 : snapshots.back(), snapshot_checker);
std::unique_ptr<BlobFileBuilder> blob_file_builder(
(mutable_cf_options.enable_blob_files && blob_file_additions)
@ -165,8 +185,8 @@ Status BuildTable(
snapshot_checker, env, ShouldReportDetailedTime(env, ioptions.stats),
true /* internal key corruption is not ok */, range_del_agg.get(),
blob_file_builder.get(), ioptions.allow_data_in_errors,
/*compaction=*/nullptr,
/*compaction_filter=*/nullptr, /*shutting_down=*/nullptr,
/*compaction=*/nullptr, compaction_filter.get(),
/*shutting_down=*/nullptr,
/*preserve_deletes_seqnum=*/0, /*manual_compaction_paused=*/nullptr,
db_options.info_log, full_history_ts_low);

@ -526,10 +526,17 @@ std::unique_ptr<CompactionFilter> Compaction::CreateCompactionFilter() const {
return nullptr;
}
if (!cfd_->ioptions()
->compaction_filter_factory->ShouldFilterTableFileCreation(
TableFileCreationReason::kCompaction)) {
return nullptr;
}
CompactionFilter::Context context;
context.is_full_compaction = is_full_compaction_;
context.is_manual_compaction = is_manual_compaction_;
context.column_family_id = cfd_->GetID();
context.reason = TableFileCreationReason::kCompaction;
return cfd_->ioptions()->compaction_filter_factory->CreateCompactionFilter(
context);
}

@ -100,8 +100,8 @@ CompactionIterator::CompactionIterator(
blob_garbage_collection_cutoff_file_number_(
ComputeBlobGarbageCollectionCutoffFileNumber(compaction_.get())),
current_key_committed_(false),
cmp_with_history_ts_low_(0) {
assert(compaction_filter_ == nullptr || compaction_ != nullptr);
cmp_with_history_ts_low_(0),
level_(compaction_ == nullptr ? 0 : compaction_->level()) {
assert(snapshots_ != nullptr);
bottommost_level_ = compaction_ == nullptr
? false
@ -232,7 +232,7 @@ bool CompactionIterator::InvokeFilterIfNeeded(bool* need_skip,
if (kTypeBlobIndex == ikey_.type) {
blob_value_.Reset();
filter = compaction_filter_->FilterBlobByKey(
compaction_->level(), filter_key, &compaction_filter_value_,
level_, filter_key, &compaction_filter_value_,
compaction_filter_skip_until_.rep());
if (CompactionFilter::Decision::kUndetermined == filter &&
!compaction_filter_->IsStackedBlobDbInternalCompactionFilter()) {
@ -251,6 +251,12 @@ bool CompactionIterator::InvokeFilterIfNeeded(bool* need_skip,
valid_ = false;
return false;
}
if (compaction_ == nullptr) {
status_ =
Status::Corruption("Unexpected blob index outside of compaction");
valid_ = false;
return false;
}
const Version* const version = compaction_->input_version();
assert(version);
@ -271,7 +277,7 @@ bool CompactionIterator::InvokeFilterIfNeeded(bool* need_skip,
}
if (CompactionFilter::Decision::kUndetermined == filter) {
filter = compaction_filter_->FilterV2(
compaction_->level(), filter_key, value_type,
level_, filter_key, value_type,
blob_value_.empty() ? value_ : blob_value_, &compaction_filter_value_,
compaction_filter_skip_until_.rep());
}

@ -340,6 +340,8 @@ class CompactionIterator {
// Saved result of ucmp->CompareTimestamp(current_ts_, *full_history_ts_low_)
int cmp_with_history_ts_low_;
const int level_;
bool IsShuttingDown() {
// This is a best-effort facility, so memory_order_relaxed is sufficient.
return shutting_down_ && shutting_down_->load(std::memory_order_relaxed);

@ -82,6 +82,11 @@ class DeleteFilter : public CompactionFilter {
return true;
}
bool FilterMergeOperand(int /*level*/, const Slice& /*key*/,
const Slice& /*operand*/) const override {
return true;
}
const char* Name() const override { return "DeleteFilter"; }
};
@ -190,18 +195,36 @@ class KeepFilterFactory : public CompactionFilterFactory {
bool compaction_filter_created_;
};
// This filter factory is configured with a `TableFileCreationReason`. Only
// table files created for that reason will undergo filtering. This
// configurability makes it useful to tests for filtering non-compaction table
// files, such as "CompactionFilterFlush" and "CompactionFilterRecovery".
class DeleteFilterFactory : public CompactionFilterFactory {
public:
explicit DeleteFilterFactory(TableFileCreationReason reason)
: reason_(reason) {}
std::unique_ptr<CompactionFilter> CreateCompactionFilter(
const CompactionFilter::Context& context) override {
if (context.is_manual_compaction) {
return std::unique_ptr<CompactionFilter>(new DeleteFilter());
} else {
EXPECT_EQ(reason_, context.reason);
if (context.reason == TableFileCreationReason::kCompaction &&
!context.is_manual_compaction) {
// Table files created by automatic compaction do not undergo filtering.
// Presumably some tests rely on this.
return std::unique_ptr<CompactionFilter>(nullptr);
}
return std::unique_ptr<CompactionFilter>(new DeleteFilter());
}
bool ShouldFilterTableFileCreation(
TableFileCreationReason reason) const override {
return reason_ == reason;
}
const char* Name() const override { return "DeleteFilterFactory"; }
private:
const TableFileCreationReason reason_;
};
// Delete Filter Factory which ignores snapshots
@ -349,7 +372,8 @@ TEST_F(DBTestCompactionFilter, CompactionFilter) {
// create a new database with the compaction
// filter in such a way that it deletes all keys
options.compaction_filter_factory = std::make_shared<DeleteFilterFactory>();
options.compaction_filter_factory = std::make_shared<DeleteFilterFactory>(
TableFileCreationReason::kCompaction);
options.create_if_missing = true;
DestroyAndReopen(options);
CreateAndReopenWithCF({"pikachu"}, options);
@ -421,7 +445,8 @@ TEST_F(DBTestCompactionFilter, CompactionFilter) {
// entries in VersionEdit, but none of the 'AddFile's.
TEST_F(DBTestCompactionFilter, CompactionFilterDeletesAll) {
Options options = CurrentOptions();
options.compaction_filter_factory = std::make_shared<DeleteFilterFactory>();
options.compaction_filter_factory = std::make_shared<DeleteFilterFactory>(
TableFileCreationReason::kCompaction);
options.disable_auto_compactions = true;
options.create_if_missing = true;
DestroyAndReopen(options);
@ -450,6 +475,64 @@ TEST_F(DBTestCompactionFilter, CompactionFilterDeletesAll) {
}
#endif // ROCKSDB_LITE
TEST_F(DBTestCompactionFilter, CompactionFilterFlush) {
// Tests a `CompactionFilterFactory` that filters when table file is created
// by flush.
Options options = CurrentOptions();
options.compaction_filter_factory =
std::make_shared<DeleteFilterFactory>(TableFileCreationReason::kFlush);
options.merge_operator = MergeOperators::CreateStringAppendOperator();
Reopen(options);
// Puts and Merges are purged in flush.
ASSERT_OK(Put("a", "v"));
ASSERT_OK(Merge("b", "v"));
ASSERT_OK(Flush());
ASSERT_EQ("NOT_FOUND", Get("a"));
ASSERT_EQ("NOT_FOUND", Get("b"));
// However, Puts and Merges are preserved by recovery.
ASSERT_OK(Put("a", "v"));
ASSERT_OK(Merge("b", "v"));
Reopen(options);
ASSERT_EQ("v", Get("a"));
ASSERT_EQ("v", Get("b"));
// Likewise, compaction does not apply filtering.
ASSERT_OK(dbfull()->CompactRange(CompactRangeOptions(), nullptr, nullptr));
ASSERT_EQ("v", Get("a"));
ASSERT_EQ("v", Get("b"));
}
TEST_F(DBTestCompactionFilter, CompactionFilterRecovery) {
// Tests a `CompactionFilterFactory` that filters when table file is created
// by recovery.
Options options = CurrentOptions();
options.compaction_filter_factory =
std::make_shared<DeleteFilterFactory>(TableFileCreationReason::kRecovery);
options.merge_operator = MergeOperators::CreateStringAppendOperator();
Reopen(options);
// Puts and Merges are purged in recovery.
ASSERT_OK(Put("a", "v"));
ASSERT_OK(Merge("b", "v"));
Reopen(options);
ASSERT_EQ("NOT_FOUND", Get("a"));
ASSERT_EQ("NOT_FOUND", Get("b"));
// However, Puts and Merges are preserved by flush.
ASSERT_OK(Put("a", "v"));
ASSERT_OK(Merge("b", "v"));
ASSERT_OK(Flush());
ASSERT_EQ("v", Get("a"));
ASSERT_EQ("v", Get("b"));
// Likewise, compaction does not apply filtering.
ASSERT_OK(dbfull()->CompactRange(CompactRangeOptions(), nullptr, nullptr));
ASSERT_EQ("v", Get("a"));
ASSERT_EQ("v", Get("b"));
}
TEST_P(DBTestCompactionFilterWithCompactParam,
CompactionFilterWithValueChange) {
Options options = CurrentOptions();
@ -842,6 +925,49 @@ TEST_F(DBTestCompactionFilter, IgnoreSnapshotsFalse) {
delete options.compaction_filter;
}
class TestNotSupportedFilterFactory : public CompactionFilterFactory {
public:
explicit TestNotSupportedFilterFactory(TableFileCreationReason reason)
: reason_(reason) {}
bool ShouldFilterTableFileCreation(
TableFileCreationReason reason) const override {
return reason_ == reason;
}
std::unique_ptr<CompactionFilter> CreateCompactionFilter(
const CompactionFilter::Context& /* context */) override {
return std::unique_ptr<CompactionFilter>(new TestNotSupportedFilter());
}
const char* Name() const override { return "TestNotSupportedFilterFactory"; }
private:
const TableFileCreationReason reason_;
};
TEST_F(DBTestCompactionFilter, IgnoreSnapshotsFalseDuringFlush) {
Options options = CurrentOptions();
options.compaction_filter_factory =
std::make_shared<TestNotSupportedFilterFactory>(
TableFileCreationReason::kFlush);
Reopen(options);
ASSERT_OK(Put("a", "v10"));
ASSERT_TRUE(Flush().IsNotSupported());
}
TEST_F(DBTestCompactionFilter, IgnoreSnapshotsFalseRecovery) {
Options options = CurrentOptions();
options.compaction_filter_factory =
std::make_shared<TestNotSupportedFilterFactory>(
TableFileCreationReason::kRecovery);
Reopen(options);
ASSERT_OK(Put("a", "v10"));
ASSERT_TRUE(TryReopen(options).IsNotSupported());
}
} // namespace ROCKSDB_NAMESPACE
int main(int argc, char** argv) {

@ -14,23 +14,15 @@
#include <vector>
#include "rocksdb/rocksdb_namespace.h"
#include "rocksdb/types.h"
namespace ROCKSDB_NAMESPACE {
class Slice;
class SliceTransform;
// Context information of a compaction run
struct CompactionFilterContext {
// Does this compaction run include all data files
bool is_full_compaction;
// Is this compaction requested by the client (true),
// or is it occurring as an automatic compaction process
bool is_manual_compaction;
};
// CompactionFilter allows an application to modify/delete a key-value at
// the time of compaction.
// CompactionFilter allows an application to modify/delete a key-value during
// table file creation.
class CompactionFilter {
public:
@ -52,31 +44,33 @@ class CompactionFilter {
enum class BlobDecision { kKeep, kChangeValue, kCorruption, kIOError };
// Context information of a compaction run
// Context information for a table file creation.
struct Context {
// Does this compaction run include all data files
// Whether this table file is created as part of a compaction including all
// table files.
bool is_full_compaction;
// Is this compaction requested by the client (true),
// or is it occurring as an automatic compaction process
// Whether this table file is created as part of a compaction requested by
// the client.
bool is_manual_compaction;
// Which column family this compaction is for.
// The column family that will contain the created table file.
uint32_t column_family_id;
// Reason this table file is being created.
TableFileCreationReason reason;
};
virtual ~CompactionFilter() {}
// The compaction process invokes this
// method for kv that is being compacted. A return value
// of false indicates that the kv should be preserved in the
// output of this compaction run and a return value of true
// indicates that this key-value should be removed from the
// output of the compaction. The application can inspect
// the existing value of the key and make decision based on it.
// The table file creation process invokes this method before adding a kv to
// the table file. A return value of false indicates that the kv should be
// preserved in the new table file and a return value of true indicates
// that this key-value should be removed from the new table file. The
// application can inspect the existing value of the key and make decision
// based on it.
//
// Key-Values that are results of merge operation during compaction are not
// passed into this function. Currently, when you have a mix of Put()s and
// Merge()s on a same key, we only guarantee to process the merge operands
// through the compaction filters. Put()s might be processed, or might not.
// Key-Values that are results of merge operation during table file creation
// are not passed into this function. Currently, when you have a mix of Put()s
// and Merge()s on a same key, we only guarantee to process the merge operands
// through the `CompactionFilter`s. Put()s might be processed, or might not.
//
// When the value is to be preserved, the application has the option
// to modify the existing_value and pass it back through new_value.
@ -85,8 +79,9 @@ class CompactionFilter {
// Note that RocksDB snapshots (i.e. call GetSnapshot() API on a
// DB* object) will not guarantee to preserve the state of the DB with
// CompactionFilter. Data seen from a snapshot might disappear after a
// compaction finishes. If you use snapshots, think twice about whether you
// want to use compaction filter and whether you are using it in a safe way.
// table file created with a `CompactionFilter` is installed. If you use
// snapshots, think twice about whether you want to use `CompactionFilter` and
// whether you are using it in a safe way.
//
// If multithreaded compaction is being used *and* a single CompactionFilter
// instance was supplied via Options::compaction_filter, this method may be
@ -94,7 +89,7 @@ class CompactionFilter {
// that the call is thread-safe.
//
// If the CompactionFilter was created by a factory, then it will only ever
// be used by a single thread that is doing the compaction run, and this
// be used by a single thread that is doing the table file creation, and this
// call does not need to be thread-safe. However, multiple filters may be
// in existence and operating concurrently.
virtual bool Filter(int /*level*/, const Slice& /*key*/,
@ -104,9 +99,9 @@ class CompactionFilter {
return false;
}
// The compaction process invokes this method on every merge operand. If this
// method returns true, the merge operand will be ignored and not written out
// in the compaction output
// The table file creation process invokes this method on every merge operand.
// If this method returns true, the merge operand will be ignored and not
// written out in the new table file.
//
// Note: If you are using a TransactionDB, it is not recommended to implement
// FilterMergeOperand(). If a Merge operation is filtered out, TransactionDB
@ -143,13 +138,14 @@ class CompactionFilter {
// snapshot - beware if you're using TransactionDB or
// DB::GetSnapshot().
// - If value for a key was overwritten or merged into (multiple Put()s
// or Merge()s), and compaction filter skips this key with
// or Merge()s), and `CompactionFilter` skips this key with
// kRemoveAndSkipUntil, it's possible that it will remove only
// the new value, exposing the old value that was supposed to be
// overwritten.
// - Doesn't work with PlainTableFactory in prefix mode.
// - If you use kRemoveAndSkipUntil, consider also reducing
// compaction_readahead_size option.
// - If you use kRemoveAndSkipUntil for table files created by
// compaction, consider also reducing compaction_readahead_size
// option.
//
// Should never return kUndetermined.
// Note: If you are using a TransactionDB, it is not recommended to filter
@ -189,13 +185,13 @@ class CompactionFilter {
}
// This function is deprecated. Snapshots will always be ignored for
// compaction filters, because we realized that not ignoring snapshots doesn't
// provide the guarantee we initially thought it would provide. Repeatable
// reads will not be guaranteed anyway. If you override the function and
// returns false, we will fail the compaction.
// `CompactionFilter`s, because we realized that not ignoring snapshots
// doesn't provide the guarantee we initially thought it would provide.
// Repeatable reads will not be guaranteed anyway. If you override the
// function and returns false, we will fail the table file creation.
virtual bool IgnoreSnapshots() const { return true; }
// Returns a name that identifies this compaction filter.
// Returns a name that identifies this `CompactionFilter`.
// The name will be printed to LOG file on start up for diagnosis.
virtual const char* Name() const = 0;
@ -214,16 +210,28 @@ class CompactionFilter {
}
};
// Each compaction will create a new CompactionFilter allowing the
// application to know about different compactions
// Each thread of work involving creating table files will create a new
// `CompactionFilter` according to `ShouldFilterTableFileCreation()`. This
// allows the application to know about the different ongoing threads of work
// and makes it unnecessary for `CompactionFilter` to provide thread-safety.
class CompactionFilterFactory {
public:
virtual ~CompactionFilterFactory() {}
// Returns whether a thread creating table files for the specified `reason`
// should invoke `CreateCompactionFilter()` and pass KVs through the returned
// filter.
virtual bool ShouldFilterTableFileCreation(
TableFileCreationReason reason) const {
// For backward compatibility, default implementation only applies
// `CompactionFilter` to files generated by compaction.
return reason == TableFileCreationReason::kCompaction;
}
virtual std::unique_ptr<CompactionFilter> CreateCompactionFilter(
const CompactionFilter::Context& context) = 0;
// Returns a name that identifies this compaction filter factory.
// Returns a name that identifies this `CompactionFilter` factory.
virtual const char* Name() const = 0;
};

@ -16,6 +16,7 @@
#include "rocksdb/compression_type.h"
#include "rocksdb/status.h"
#include "rocksdb/table_properties.h"
#include "rocksdb/types.h"
namespace ROCKSDB_NAMESPACE {

@ -128,9 +128,10 @@ struct ColumnFamilyOptions : public AdvancedColumnFamilyOptions {
// Allows an application to modify/delete a key-value during background
// compaction.
//
// If the client requires a new compaction filter to be used for different
// compaction runs, it can specify compaction_filter_factory instead of this
// option. The client should specify only one of the two.
// If the client requires a new `CompactionFilter` to be used for different
// compaction runs and/or requires a `CompactionFilter` for table file
// creations outside of compaction, it can specify compaction_filter_factory
// instead of this option. The client should specify only one of the two.
// compaction_filter takes precedence over compaction_filter_factory if
// client specifies both.
//
@ -141,12 +142,21 @@ struct ColumnFamilyOptions : public AdvancedColumnFamilyOptions {
// Default: nullptr
const CompactionFilter* compaction_filter = nullptr;
// This is a factory that provides compaction filter objects which allow
// an application to modify/delete a key-value during background compaction.
//
// A new filter will be created on each compaction run. If multithreaded
// compaction is being used, each created CompactionFilter will only be used
// from a single thread and so does not need to be thread-safe.
// This is a factory that provides `CompactionFilter` objects which allow
// an application to modify/delete a key-value during table file creation.
//
// Unlike the `compaction_filter` option, which is used when compaction
// creates a table file, this factory allows using a `CompactionFilter` when a
// table file is created for various reasons. The factory can decide what
// `TableFileCreationReason`s use a `CompactionFilter`. For compatibility, by
// default the decision is to use a `CompactionFilter` for
// `TableFileCreationReason::kCompaction` only.
//
// Each thread of work involving creating table files will create a new
// `CompactionFilter` when it will be used according to the above
// `TableFileCreationReason`-based decision. This allows the application to
// know about the different ongoing threads of work and makes it unnecessary
// for `CompactionFilter` to provide thread-safety.
//
// Default: nullptr
std::shared_ptr<CompactionFilterFactory> compaction_filter_factory = nullptr;

Loading…
Cancel
Save