Refactor / clean up / optimize FullFilterBitsReader (#5941)

Summary:
FullFilterBitsReader, after creating in BloomFilterPolicy, was
responsible for decoding metadata bits. This meant that
FullFilterBitsReader::MayMatch had some metadata checks in order to
implement "always true" or "always false" functionality in the case
of inconsistent or trivial metadata. This made for ugly
mixing-of-concerns code and probably had some runtime cost. It also
didn't really support plugging in alternative filter implementations
with extensions to the existing metadata schema.

BloomFilterPolicy::GetFilterBitsReader is now (exclusively) responsible
for decoding filter metadata bits and constructing appropriate instances
deriving from FilterBitsReader. "Always false" and "always true" derived
classes allow FullFilterBitsReader not to be concerned with handling of
trivial or inconsistent metadata. This also makes for easy expansion
to alternative filter implementations in new, alternative derived
classes. This change makes calls to FilterBitsReader::MayMatch
*necessarily* virtual because there's now more than one built-in
implementation. Compared with the previous implementation's extra
'if' checks in MayMatch, there's no consistent performance difference,
measured by (an older revision of) filter_bench (differences here seem
to be within noise):

    Inside queries...
    -  Dry run (407) ns/op: 35.9996
    +  Dry run (407) ns/op: 35.2034
    -  Single filter ns/op: 47.5483
    +  Single filter ns/op: 47.4034
    -  Batched, prepared ns/op: 43.1559
    +  Batched, prepared ns/op: 42.2923
    ...
    -  Random filter ns/op: 150.697
    +  Random filter ns/op: 149.403
    ----------------------------
    Outside queries...
    -  Dry run (980) ns/op: 34.6114
    +  Dry run (980) ns/op: 34.0405
    -  Single filter ns/op: 56.8326
    +  Single filter ns/op: 55.8414
    -  Batched, prepared ns/op: 48.2346
    +  Batched, prepared ns/op: 47.5667
    -  Random filter ns/op: 155.377
    +  Random filter ns/op: 153.942
         Average FP rate %: 1.1386

Also, the FullFilterBitsReader ctor was responsible for a surprising
amount of CPU in production, due in part to inefficient determination of
the CACHE_LINE_SIZE used to construct the filter being read. The
overwhelming common case (same as my CACHE_LINE_SIZE) is now
substantially optimized, as shown with filter_bench with
-new_reader_every=1 (old option - see below) (repeatable result):

    Inside queries...
    -  Dry run (453) ns/op: 118.799
    +  Dry run (453) ns/op: 105.869
    -  Single filter ns/op: 82.5831
    +  Single filter ns/op: 74.2509
    ...
    -  Random filter ns/op: 224.936
    +  Random filter ns/op: 194.833
    ----------------------------
    Outside queries...
    -  Dry run (aa1) ns/op: 118.503
    +  Dry run (aa1) ns/op: 104.925
    -  Single filter ns/op: 90.3023
    +  Single filter ns/op: 83.425
    ...
    -  Random filter ns/op: 220.455
    +  Random filter ns/op: 175.7
         Average FP rate %: 1.13886

However PR#5936 has/will reclaim most of this cost. After that PR, the optimization of this code path is likely negligible, but nonetheless it's clear we aren't making performance any worse.

Also fixed inadequate check of consistency between filter data size and
num_lines. (Unit test updated.)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5941

Test Plan:
previously added unit tests FullBloomTest.CorruptFilters and
FullBloomTest.RawSchema

Differential Revision: D18018353

Pulled By: pdillinger

fbshipit-source-id: 8e04c2b4a7d93223f49a237fd52ef2483929ed9c
main
Peter Dillinger 5 years ago committed by Facebook Github Bot
parent fe464bca5c
commit 5f8f2fda0e
  1. 154
      util/bloom.cc
  2. 9
      util/bloom_test.cc

@ -132,39 +132,27 @@ inline void FullFilterBitsBuilder::AddHash(uint32_t h, char* data,
} }
namespace { namespace {
class AlwaysTrueFilter : public FilterBitsReader {
public:
bool MayMatch(const Slice&) override { return true; }
using FilterBitsReader::MayMatch; // inherit overload
};
class AlwaysFalseFilter : public FilterBitsReader {
public:
bool MayMatch(const Slice&) override { return false; }
using FilterBitsReader::MayMatch; // inherit overload
};
class FullFilterBitsReader : public FilterBitsReader { class FullFilterBitsReader : public FilterBitsReader {
public: public:
explicit FullFilterBitsReader(const Slice& contents) FullFilterBitsReader(const char* data, int num_probes, uint32_t num_lines,
: data_(contents.data()), uint32_t log2_cache_line_size)
data_len_(static_cast<uint32_t>(contents.size())), : data_(data),
num_probes_(0), num_probes_(num_probes),
num_lines_(0), num_lines_(num_lines),
log2_cache_line_size_(0) { log2_cache_line_size_(log2_cache_line_size) {}
assert(data_);
GetFilterMeta(contents, &num_probes_, &num_lines_);
// Sanitize broken parameter
if (num_lines_ != 0 && (data_len_-5) % num_lines_ != 0) {
num_lines_ = 0;
num_probes_ = 0;
} else if (num_lines_ != 0) {
while (true) {
uint32_t num_lines_at_curr_cache_size =
(data_len_ - 5) >> log2_cache_line_size_;
if (num_lines_at_curr_cache_size == 0) {
// The cache line size seems not a power of two. It's not supported
// and indicates a corruption so disable using this filter.
// Removed for unit testing corruption: assert(false);
num_lines_ = 0;
num_probes_ = 0;
break;
}
if (num_lines_at_curr_cache_size == num_lines_) {
break;
}
++log2_cache_line_size_;
}
}
}
// No Copy allowed // No Copy allowed
FullFilterBitsReader(const FullFilterBitsReader&) = delete; FullFilterBitsReader(const FullFilterBitsReader&) = delete;
void operator=(const FullFilterBitsReader&) = delete; void operator=(const FullFilterBitsReader&) = delete;
@ -177,11 +165,6 @@ class FullFilterBitsReader : public FilterBitsReader {
// if the key was not on the list, but it should aim to return false with a // if the key was not on the list, but it should aim to return false with a
// high probability. // high probability.
bool MayMatch(const Slice& key) override { bool MayMatch(const Slice& key) override {
if (data_len_ <= 5) { // remain same with original filter
return false;
}
// Other Error params, including a broken filter, regarded as match
if (num_probes_ == 0 || num_lines_ == 0) return true;
uint32_t hash = BloomHash(key); uint32_t hash = BloomHash(key);
uint32_t byte_offset; uint32_t byte_offset;
LegacyFullFilterImpl::PrepareHashMayMatch( LegacyFullFilterImpl::PrepareHashMayMatch(
@ -191,17 +174,6 @@ class FullFilterBitsReader : public FilterBitsReader {
} }
virtual void MayMatch(int num_keys, Slice** keys, bool* may_match) override { virtual void MayMatch(int num_keys, Slice** keys, bool* may_match) override {
if (data_len_ <= 5) { // remain same with original filter
for (int i = 0; i < num_keys; ++i) {
may_match[i] = false;
}
return;
}
for (int i = 0; i < num_keys; ++i) {
may_match[i] = true;
}
// Other Error params, including a broken filter, regarded as match
if (num_probes_ == 0 || num_lines_ == 0) return;
uint32_t hashes[MultiGetContext::MAX_BATCH_SIZE]; uint32_t hashes[MultiGetContext::MAX_BATCH_SIZE];
uint32_t byte_offsets[MultiGetContext::MAX_BATCH_SIZE]; uint32_t byte_offsets[MultiGetContext::MAX_BATCH_SIZE];
for (int i = 0; i < num_keys; ++i) { for (int i = 0; i < num_keys; ++i) {
@ -210,42 +182,20 @@ class FullFilterBitsReader : public FilterBitsReader {
/*out*/ &byte_offsets[i], /*out*/ &byte_offsets[i],
log2_cache_line_size_); log2_cache_line_size_);
} }
for (int i = 0; i < num_keys; ++i) { for (int i = 0; i < num_keys; ++i) {
if (!LegacyFullFilterImpl::HashMayMatchPrepared(hashes[i], num_probes_, may_match[i] = LegacyFullFilterImpl::HashMayMatchPrepared(
data_ + byte_offsets[i], hashes[i], num_probes_, data_ + byte_offsets[i],
log2_cache_line_size_)) { log2_cache_line_size_);
may_match[i] = false;
}
} }
} }
private: private:
// Filter meta data
const char* data_; const char* data_;
uint32_t data_len_; const int num_probes_;
int num_probes_; const uint32_t num_lines_;
uint32_t num_lines_; const uint32_t log2_cache_line_size_;
uint32_t log2_cache_line_size_;
// Get num_probes, and num_lines from filter
// If filter format broken, set both to 0.
void GetFilterMeta(const Slice& filter, int* num_probes, uint32_t* num_lines);
}; };
void FullFilterBitsReader::GetFilterMeta(const Slice& filter, int* num_probes,
uint32_t* num_lines) {
uint32_t len = static_cast<uint32_t>(filter.size());
if (len <= 5) {
// filter is empty or broken
*num_probes = 0;
*num_lines = 0;
return;
}
*num_probes = filter.data()[len - 5];
*num_lines = DecodeFixed32(filter.data() + len - 4);
}
// An implementation of filter policy // An implementation of filter policy
class BloomFilterPolicy : public FilterPolicy { class BloomFilterPolicy : public FilterPolicy {
@ -311,8 +261,60 @@ class BloomFilterPolicy : public FilterPolicy {
return new FullFilterBitsBuilder(bits_per_key_, num_probes_); return new FullFilterBitsBuilder(bits_per_key_, num_probes_);
} }
// Read metadata to determine what kind of FilterBitsReader is needed
// and return a new one.
FilterBitsReader* GetFilterBitsReader(const Slice& contents) const override { FilterBitsReader* GetFilterBitsReader(const Slice& contents) const override {
return new FullFilterBitsReader(contents); uint32_t len_with_meta = static_cast<uint32_t>(contents.size());
if (len_with_meta <= 5) {
// filter is empty or broken. Treat like zero keys added.
return new AlwaysFalseFilter();
}
char raw_num_probes = contents.data()[len_with_meta - 5];
// NB: *num_probes > 30 and < 128 probably have not been used, because of
// BloomFilterPolicy::initialize, unless directly calling
// FullFilterBitsBuilder as an API, but we are leaving those cases in
// limbo with FullFilterBitsReader for now.
if (raw_num_probes < 1) {
// Treat as zero probes (always FP) for now.
// NB: < 0 (or unsigned > 127) effectively reserved for future use.
return new AlwaysTrueFilter();
}
// else attempt decode for FullFilterBitsReader
int num_probes = raw_num_probes;
assert(num_probes >= 1);
assert(num_probes <= 127);
uint32_t len = len_with_meta - 5;
assert(len > 0);
uint32_t num_lines = DecodeFixed32(contents.data() + len_with_meta - 4);
uint32_t log2_cache_line_size;
if (num_lines * CACHE_LINE_SIZE == len) {
// Common case
log2_cache_line_size = folly::constexpr_log2(CACHE_LINE_SIZE);
} else if (num_lines == 0 || len % num_lines != 0) {
// Invalid (no solution to num_lines * x == len)
// Treat as zero probes (always FP) for now.
return new AlwaysTrueFilter();
} else {
// Determine the non-native cache line size (from another system)
log2_cache_line_size = 0;
while ((num_lines << log2_cache_line_size) < len) {
++log2_cache_line_size;
}
if ((num_lines << log2_cache_line_size) != len) {
// Invalid (block size not a power of two)
// Treat as zero probes (always FP) for now.
return new AlwaysTrueFilter();
}
}
// if not early return
return new FullFilterBitsReader(contents.data(), num_probes, num_lines,
log2_cache_line_size);
} }
// If choose to use block based builder // If choose to use block based builder

@ -589,14 +589,11 @@ TEST_F(FullBloomTest, CorruptFilters) {
ASSERT_TRUE(Matches("hello")); ASSERT_TRUE(Matches("hello"));
ASSERT_TRUE(Matches("world")); ASSERT_TRUE(Matches("world"));
// Bad filter bits // Bad filter bits - returns true for safety
// 65 bytes is not a power of two, so not a legal cache line size // 65 bytes is not a power of two, so not a legal cache line size
OpenRaw(cft.Reset(65 * 3, 3, 6, fill)); OpenRaw(cft.Reset(65 * 3, 3, 6, fill));
// ASSERT_TRUE(Matches("hello")); ASSERT_TRUE(Matches("hello"));
// ASSERT_TRUE(Matches("world")); ASSERT_TRUE(Matches("world"));
// NB: NOT PROPERLY CHECKED in implementation
ASSERT_EQ(fill, Matches("hello"));
ASSERT_EQ(fill, Matches("world"));
// Bad filter bits - returns false as if built from zero keys // Bad filter bits - returns false as if built from zero keys
// < 5 bytes overall means missing even metadata // < 5 bytes overall means missing even metadata

Loading…
Cancel
Save