Use size_t for filter APIs, protect against overflow (#7726)

Summary:
Deprecate CalculateNumEntry and replace with
ApproximateNumEntries (better name) using size_t instead of int and
uint32_t, to minimize confusing casts and bad overflow behavior
(possible though probably not realistic). Bloom sizes are now explicitly
capped at max size supported by implementations: just under 4GiB for
fv=5 Bloom, and just under 512MiB for fv<5 Legacy Bloom. This
hardening could help to set up for fuzzing.

Also, since RocksDB only uses this information as an approximation
for trying to hit certain sizes for partitioned filters, it's more important
that the function be reasonably fast than for it to be completely
accurate. It's hard enough to be 100% accurate for Ribbon (currently
reversing CalculateSpace) that adding optimize_filters_for_memory
into the mix is just not worth trying to be 100% accurate for num
entries for bytes.

Also:
- Cleaned up filter_policy.h to remove MSVC warning handling and
potentially unsafe use of exception for "not implemented"
- Correct the number of entries limit beyond which current Ribbon
implementation falls back on Bloom instead.
- Consistently use "num_entries" rather than "num_entry"
- Remove LegacyBloomBitsBuilder::CalculateNumEntry as it's essentially
obsolete from general implementation
BuiltinFilterBitsBuilder::CalculateNumEntries.
- Fix filter_bench to skip some tests that don't make sense when only
one or a small number of filters has been generated.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7726

Test Plan:
expanded existing unit tests for CalculateSpace /
ApproximateNumEntries. Also manually used filter_bench to verify Legacy and
fv=5 Bloom size caps work (much too expensive for unit test). Note that
the actual bits per key is below requested due to space cap.

    $ ./filter_bench -impl=0 -bits_per_key=20 -average_keys_per_filter=256000000 -vary_key_count_ratio=0 -m_keys_total_max=256 -allow_bad_fp_rate
    ...
    Total size (MB): 511.992
    Bits/key stored: 16.777
    ...
    $ ./filter_bench -impl=2 -bits_per_key=20 -average_keys_per_filter=2000000000 -vary_key_count_ratio=0 -m_keys_total_max=2000
    ...
    Total size (MB): 4096
    Bits/key stored: 17.1799
    ...
    $

Reviewed By: jay-zhuang

Differential Revision: D25239800

Pulled By: pdillinger

fbshipit-source-id: f94e6d065efd31e05ec630ae1a82e6400d8390c4
main
Peter Dillinger 4 years ago committed by Facebook GitHub Bot
parent 491779514e
commit 003e72b201
  1. 3
      HISTORY.md
  2. 35
      include/rocksdb/filter_policy.h
  3. 5
      table/block_based/block_based_table_builder.cc
  4. 133
      table/block_based/filter_policy.cc
  5. 12
      table/block_based/filter_policy_internal.h
  6. 4
      table/block_based/full_filter_block_test.cc
  7. 7
      table/block_based/partitioned_filter_block.cc
  8. 19
      util/bloom_test.cc
  9. 10
      util/filter_bench.cc

@ -15,6 +15,9 @@
### New Features
* User defined timestamp feature supports `CompactRange` and `GetApproximateSizes`.
### Public API Change
* Deprecated public but rarely-used FilterBitsBuilder::CalculateNumEntry, which is replaced with ApproximateNumEntries taking a size_t parameter and returning size_t.
## 6.15.0 (11/13/2020)
### Bug Fixes
* Fixed a bug in the following combination of features: indexes with user keys (`format_version >= 3`), indexes are partitioned (`index_type == kTwoLevelIndexSearch`), and some index partitions are pinned in memory (`BlockBasedTableOptions::pin_l0_filter_and_index_blocks_in_cache`). The bug could cause keys to be truncated when read from the index leading to wrong read results or other unexpected behavior.

@ -21,6 +21,7 @@
#include <stdlib.h>
#include <algorithm>
#include <memory>
#include <stdexcept>
#include <string>
@ -50,23 +51,25 @@ class FilterBitsBuilder {
// The ownership of actual data is set to buf
virtual Slice Finish(std::unique_ptr<const char[]>* buf) = 0;
// Calculate num of keys that can be added and generate a filter
// <= the specified number of bytes.
#if defined(_MSC_VER)
#pragma warning(push)
#pragma warning(disable : 4702) // unreachable code
#endif
virtual int CalculateNumEntry(const uint32_t /*bytes*/) {
#ifndef ROCKSDB_LITE
throw std::runtime_error("CalculateNumEntry not Implemented");
#else
abort();
#endif
return 0;
// Approximate the number of keys that can be added and generate a filter
// <= the specified number of bytes. Callers (including RocksDB) should
// only use this result for optimizing performance and not as a guarantee.
// This default implementation is for compatibility with older custom
// FilterBitsBuilders only implementing deprecated CalculateNumEntry.
virtual size_t ApproximateNumEntries(size_t bytes) {
bytes = std::min(bytes, size_t{0xffffffff});
return static_cast<size_t>(CalculateNumEntry(static_cast<uint32_t>(bytes)));
}
// Old, DEPRECATED version of ApproximateNumEntries. This is not
// called by RocksDB except as the default implementation of
// ApproximateNumEntries for API compatibility.
virtual int CalculateNumEntry(const uint32_t bytes) {
// DEBUG: ideally should not rely on this implementation
assert(false);
// RELEASE: something reasonably conservative: 2 bytes per entry
return static_cast<int>(bytes / 2);
}
#if defined(_MSC_VER)
#pragma warning(pop)
#endif
};
// A class that checks if a key can be in filter

@ -77,8 +77,9 @@ FilterBlockBuilder* CreateFilterBlockBuilder(
if (table_opt.partition_filters) {
assert(p_index_builder != nullptr);
// Since after partition cut request from filter builder it takes time
// until index builder actully cuts the partition, we take the lower bound
// as partition size.
// until index builder actully cuts the partition, until the end of a
// data block potentially with many keys, we take the lower bound as
// partition size.
assert(table_opt.block_size_deviation <= 100);
auto partition_size =
static_cast<uint32_t>(((table_opt.metadata_block_size *

@ -24,8 +24,8 @@
namespace ROCKSDB_NAMESPACE {
int BuiltinFilterBitsBuilder::CalculateNumEntry(const uint32_t bytes) {
int cur = 1;
size_t BuiltinFilterBitsBuilder::ApproximateNumEntries(size_t bytes) {
size_t cur = 1;
// Find overestimate
while (CalculateSpace(cur) <= bytes && cur * 2 > cur) {
cur *= 2;
@ -33,7 +33,7 @@ int BuiltinFilterBitsBuilder::CalculateNumEntry(const uint32_t bytes) {
// Change to underestimate less than factor of two from answer
cur /= 2;
// Binary search
int delta = cur / 2;
size_t delta = cur / 2;
while (delta > 0) {
if (CalculateSpace(cur + delta) <= bytes) {
cur += delta;
@ -100,18 +100,21 @@ class FastLocalBloomBitsBuilder : public XXH3pFilterBitsBuilder {
~FastLocalBloomBitsBuilder() override {}
virtual Slice Finish(std::unique_ptr<const char[]>* buf) override {
size_t num_entry = hash_entries_.size();
size_t num_entries = hash_entries_.size();
std::unique_ptr<char[]> mutable_buf;
uint32_t len_with_metadata =
CalculateAndAllocate(num_entry, &mutable_buf, /*update_balance*/ true);
size_t len_with_metadata = CalculateAndAllocate(num_entries, &mutable_buf,
/*update_balance*/ true);
assert(mutable_buf);
assert(len_with_metadata >= 5);
// Max size supported by implementation
assert(len_with_metadata <= 0xffffffffU);
// Compute num_probes after any rounding / adjustments
int num_probes = GetNumProbes(num_entry, len_with_metadata);
int num_probes = GetNumProbes(num_entries, len_with_metadata);
uint32_t len = len_with_metadata - 5;
uint32_t len = static_cast<uint32_t>(len_with_metadata - 5);
if (len > 0) {
AddAllEntries(mutable_buf.get(), len, num_probes);
}
@ -132,29 +135,27 @@ class FastLocalBloomBitsBuilder : public XXH3pFilterBitsBuilder {
return rv;
}
int CalculateNumEntry(const uint32_t bytes) override {
uint32_t bytes_no_meta = bytes >= 5u ? bytes - 5u : 0;
return static_cast<int>(uint64_t{8000} * bytes_no_meta /
millibits_per_key_);
size_t ApproximateNumEntries(size_t bytes) override {
size_t bytes_no_meta = bytes >= 5u ? bytes - 5u : 0;
return static_cast<size_t>(uint64_t{8000} * bytes_no_meta /
millibits_per_key_);
}
uint32_t CalculateSpace(const int num_entry) override {
// NB: the BuiltinFilterBitsBuilder API presumes len fits in uint32_t.
return static_cast<uint32_t>(
CalculateAndAllocate(static_cast<size_t>(num_entry),
/* buf */ nullptr,
/*update_balance*/ false));
size_t CalculateSpace(size_t num_entries) override {
return CalculateAndAllocate(num_entries,
/* buf */ nullptr,
/*update_balance*/ false);
}
// To choose size using malloc_usable_size, we have to actually allocate.
uint32_t CalculateAndAllocate(size_t num_entry, std::unique_ptr<char[]>* buf,
bool update_balance) {
size_t CalculateAndAllocate(size_t num_entries, std::unique_ptr<char[]>* buf,
bool update_balance) {
std::unique_ptr<char[]> tmpbuf;
// If not for cache line blocks in the filter, what would the target
// length in bytes be?
size_t raw_target_len = static_cast<size_t>(
(uint64_t{num_entry} * millibits_per_key_ + 7999) / 8000);
(uint64_t{num_entries} * millibits_per_key_ + 7999) / 8000);
if (raw_target_len >= size_t{0xffffffc0}) {
// Max supported for this data structure implementation
@ -164,11 +165,10 @@ class FastLocalBloomBitsBuilder : public XXH3pFilterBitsBuilder {
// Round up to nearest multiple of 64 (block size). This adjustment is
// used for target FP rate only so that we don't receive complaints about
// lower FP rate vs. historic Bloom filter behavior.
uint32_t target_len =
static_cast<uint32_t>(raw_target_len + 63) & ~uint32_t{63};
size_t target_len = (raw_target_len + 63) & ~size_t{63};
// Return value set to a default; overwritten in some cases
uint32_t rv = target_len + /* metadata */ 5;
size_t rv = target_len + /* metadata */ 5;
#ifdef ROCKSDB_MALLOC_USABLE_SIZE
if (aggregate_rounding_balance_ != nullptr) {
// Do optimize_filters_for_memory, using malloc_usable_size.
@ -189,7 +189,7 @@ class FastLocalBloomBitsBuilder : public XXH3pFilterBitsBuilder {
// and relative.
int64_t balance = aggregate_rounding_balance_->load();
double target_fp_rate = EstimatedFpRate(num_entry, target_len + 5);
double target_fp_rate = EstimatedFpRate(num_entries, target_len + 5);
double rv_fp_rate = target_fp_rate;
if (balance < 0) {
@ -203,9 +203,8 @@ class FastLocalBloomBitsBuilder : public XXH3pFilterBitsBuilder {
for (uint64_t maybe_len64 :
{uint64_t{3} * target_len / 4, uint64_t{13} * target_len / 16,
uint64_t{7} * target_len / 8, uint64_t{15} * target_len / 16}) {
uint32_t maybe_len =
static_cast<uint32_t>(maybe_len64) & ~uint32_t{63};
double maybe_fp_rate = EstimatedFpRate(num_entry, maybe_len + 5);
size_t maybe_len = maybe_len64 & ~size_t{63};
double maybe_fp_rate = EstimatedFpRate(num_entries, maybe_len + 5);
if (maybe_fp_rate <= for_balance_fp_rate) {
rv = maybe_len + /* metadata */ 5;
rv_fp_rate = maybe_fp_rate;
@ -217,7 +216,7 @@ class FastLocalBloomBitsBuilder : public XXH3pFilterBitsBuilder {
// Filter blocks are loaded into block cache with their block trailer.
// We need to make sure that's accounted for in choosing a
// fragmentation-friendly size.
const uint32_t kExtraPadding = kBlockTrailerSize;
const size_t kExtraPadding = kBlockTrailerSize;
size_t requested = rv + kExtraPadding;
// Allocate and get usable size
@ -241,9 +240,9 @@ class FastLocalBloomBitsBuilder : public XXH3pFilterBitsBuilder {
usable_len = size_t{0xffffffc0};
}
rv = (static_cast<uint32_t>(usable_len) & ~uint32_t{63}) +
rv = (usable_len & ~size_t{63}) +
/* metadata */ 5;
rv_fp_rate = EstimatedFpRate(num_entry, rv);
rv_fp_rate = EstimatedFpRate(num_entries, rv);
} else {
// Too small means bad malloc_usable_size
assert(usable == requested);
@ -428,8 +427,7 @@ class Standard128RibbonBitsBuilder : public XXH3pFilterBitsBuilder {
~Standard128RibbonBitsBuilder() override {}
virtual Slice Finish(std::unique_ptr<const char[]>* buf) override {
// More than 2^30 entries (~1 billion) not supported
if (hash_entries_.size() >= (size_t{1} << 30)) {
if (hash_entries_.size() > kMaxRibbonEntries) {
ROCKS_LOG_WARN(info_log_, "Too many keys for Ribbon filter: %llu",
static_cast<unsigned long long>(hash_entries_.size()));
SwapEntriesWith(&bloom_fallback_);
@ -508,17 +506,20 @@ class Standard128RibbonBitsBuilder : public XXH3pFilterBitsBuilder {
return rv;
}
uint32_t CalculateSpace(const int num_entries) override {
// NB: the BuiltinFilterBitsBuilder API presumes len fits in uint32_t.
size_t CalculateSpace(size_t num_entries) override {
if (num_entries > kMaxRibbonEntries) {
// More entries than supported by this Ribbon
return bloom_fallback_.CalculateSpace(num_entries);
}
uint32_t num_slots =
NumEntriesToNumSlots(static_cast<uint32_t>(num_entries));
uint32_t ribbon = static_cast<uint32_t>(
size_t ribbon =
SolnType::GetBytesForOneInFpRate(num_slots, desired_one_in_fp_rate_,
/*rounding*/ 0) +
/*metadata*/ 5);
/*metadata*/ 5;
// Consider possible Bloom fallback for small filters
if (num_slots < 1024) {
uint32_t bloom = bloom_fallback_.CalculateSpace(num_entries);
size_t bloom = bloom_fallback_.CalculateSpace(num_entries);
return std::min(bloom, ribbon);
} else {
return ribbon;
@ -527,6 +528,10 @@ class Standard128RibbonBitsBuilder : public XXH3pFilterBitsBuilder {
double EstimatedFpRate(size_t num_entries,
size_t len_with_metadata) override {
if (num_entries > kMaxRibbonEntries) {
// More entries than supported by this Ribbon
return bloom_fallback_.EstimatedFpRate(num_entries, len_with_metadata);
}
uint32_t num_slots =
NumEntriesToNumSlots(static_cast<uint32_t>(num_entries));
SolnType fake_soln(nullptr, len_with_metadata);
@ -544,6 +549,15 @@ class Standard128RibbonBitsBuilder : public XXH3pFilterBitsBuilder {
return SolnType::RoundUpNumSlots(num_slots1);
}
// Approximate num_entries to ensure number of bytes fits in 32 bits, which
// is not an inherent limitation but does ensure somewhat graceful Bloom
// fallback for crazy high number of entries, since the Bloom implementation
// does not support number of bytes bigger than fits in 32 bits. This is
// within an order of magnitude of implementation limit on num_slots
// fitting in 32 bits, and even closer for num_blocks fitting in 24 bits
// (for filter metadata).
static constexpr size_t kMaxRibbonEntries = 950000000; // ~ 1 billion
// A desired value for 1/fp_rate. For example, 100 -> 1% fp rate.
double desired_one_in_fp_rate_;
@ -610,12 +624,10 @@ class LegacyBloomBitsBuilder : public BuiltinFilterBitsBuilder {
Slice Finish(std::unique_ptr<const char[]>* buf) override;
int CalculateNumEntry(const uint32_t bytes) override;
uint32_t CalculateSpace(const int num_entry) override {
size_t CalculateSpace(size_t num_entries) override {
uint32_t dont_care1;
uint32_t dont_care2;
return CalculateSpace(num_entry, &dont_care1, &dont_care2);
return CalculateSpace(num_entries, &dont_care1, &dont_care2);
}
double EstimatedFpRate(size_t keys, size_t bytes) override {
@ -633,11 +645,11 @@ class LegacyBloomBitsBuilder : public BuiltinFilterBitsBuilder {
uint32_t GetTotalBitsForLocality(uint32_t total_bits);
// Reserve space for new filter
char* ReserveSpace(const int num_entry, uint32_t* total_bits,
char* ReserveSpace(size_t num_entries, uint32_t* total_bits,
uint32_t* num_lines);
// Implementation-specific variant of public CalculateSpace
uint32_t CalculateSpace(const int num_entry, uint32_t* total_bits,
uint32_t CalculateSpace(size_t num_entries, uint32_t* total_bits,
uint32_t* num_lines);
// Assuming single threaded access to this function.
@ -720,14 +732,18 @@ uint32_t LegacyBloomBitsBuilder::GetTotalBitsForLocality(uint32_t total_bits) {
return num_lines * (CACHE_LINE_SIZE * 8);
}
uint32_t LegacyBloomBitsBuilder::CalculateSpace(const int num_entry,
uint32_t LegacyBloomBitsBuilder::CalculateSpace(size_t num_entries,
uint32_t* total_bits,
uint32_t* num_lines) {
assert(bits_per_key_);
if (num_entry != 0) {
uint32_t total_bits_tmp = static_cast<uint32_t>(num_entry * bits_per_key_);
*total_bits = GetTotalBitsForLocality(total_bits_tmp);
if (num_entries != 0) {
size_t total_bits_tmp = num_entries * bits_per_key_;
// total bits, including temporary computations, cannot exceed 2^32
// for compatibility
total_bits_tmp = std::min(total_bits_tmp, size_t{0xffff0000});
*total_bits =
GetTotalBitsForLocality(static_cast<uint32_t>(total_bits_tmp));
*num_lines = *total_bits / (CACHE_LINE_SIZE * 8);
assert(*total_bits > 0 && *total_bits % 8 == 0);
} else {
@ -742,30 +758,15 @@ uint32_t LegacyBloomBitsBuilder::CalculateSpace(const int num_entry,
return sz;
}
char* LegacyBloomBitsBuilder::ReserveSpace(const int num_entry,
char* LegacyBloomBitsBuilder::ReserveSpace(size_t num_entries,
uint32_t* total_bits,
uint32_t* num_lines) {
uint32_t sz = CalculateSpace(num_entry, total_bits, num_lines);
uint32_t sz = CalculateSpace(num_entries, total_bits, num_lines);
char* data = new char[sz];
memset(data, 0, sz);
return data;
}
int LegacyBloomBitsBuilder::CalculateNumEntry(const uint32_t bytes) {
assert(bits_per_key_);
assert(bytes > 0);
int high = static_cast<int>(bytes * 8 / bits_per_key_ + 1);
int low = 1;
int n = high;
for (; n >= low; n--) {
if (CalculateSpace(n) <= bytes) {
break;
}
}
assert(n < high); // High should be an overestimation
return n;
}
inline void LegacyBloomBitsBuilder::AddHash(uint32_t h, char* data,
uint32_t num_lines,
uint32_t total_bits) {

@ -25,18 +25,18 @@ class Slice;
class BuiltinFilterBitsBuilder : public FilterBitsBuilder {
public:
// Calculate number of bytes needed for a new filter, including
// metadata. Passing the result to CalculateNumEntry should
// metadata. Passing the result to ApproximateNumEntries should
// return >= the num_entry passed in.
virtual uint32_t CalculateSpace(const int num_entry) = 0;
virtual size_t CalculateSpace(size_t num_entries) = 0;
// A somewhat expensive but workable default implementation
// using binary search on CalculateSpace
int CalculateNumEntry(const uint32_t bytes) override;
size_t ApproximateNumEntries(size_t bytes) override;
// Returns an estimate of the FP rate of the returned filter if
// `keys` keys are added and the filter returned by Finish is `bytes`
// bytes.
virtual double EstimatedFpRate(size_t keys, size_t bytes) = 0;
// `num_entries` keys are added and the filter returned by Finish
// is `bytes` bytes.
virtual double EstimatedFpRate(size_t num_entries, size_t bytes) = 0;
};
// RocksDB built-in filter policy for Bloom or Bloom-like filters.

@ -224,8 +224,8 @@ class CountUniqueFilterBitsBuilderWrapper : public FilterBitsBuilder {
return rv;
}
int CalculateNumEntry(const uint32_t bytes) override {
return b_->CalculateNumEntry(bytes);
size_t ApproximateNumEntries(size_t bytes) override {
return b_->ApproximateNumEntries(bytes);
}
size_t CountUnique() { return uniq_.size(); }

@ -34,15 +34,16 @@ PartitionedFilterBlockBuilder::PartitionedFilterBlockBuilder(
use_value_delta_encoding),
p_index_builder_(p_index_builder),
keys_added_to_partition_(0) {
keys_per_partition_ =
filter_bits_builder_->CalculateNumEntry(partition_size);
keys_per_partition_ = static_cast<uint32_t>(
filter_bits_builder_->ApproximateNumEntries(partition_size));
if (keys_per_partition_ < 1) {
// partition_size (minus buffer, ~10%) might be smaller than minimum
// filter size, sometimes based on cache line size. Try to find that
// minimum size without CalculateSpace (not necessarily available).
uint32_t larger = std::max(partition_size + 4, uint32_t{16});
for (;;) {
keys_per_partition_ = filter_bits_builder_->CalculateNumEntry(larger);
keys_per_partition_ = static_cast<uint32_t>(
filter_bits_builder_->ApproximateNumEntries(larger));
if (keys_per_partition_ >= 1) {
break;
}

@ -422,13 +422,24 @@ TEST_P(FullBloomTest, FilterSize) {
EXPECT_EQ((bpk.second + 500) / 1000, bfp->GetWholeBitsPerKey());
auto bits_builder = GetBuiltinFilterBitsBuilder();
for (int n = 1; n < 100; n++) {
auto space = bits_builder->CalculateSpace(n);
auto n2 = bits_builder->CalculateNumEntry(space);
size_t n = 1;
size_t space = 0;
for (; n < 100; n++) {
// Ensure consistency between CalculateSpace and ApproximateNumEntries
space = bits_builder->CalculateSpace(n);
size_t n2 = bits_builder->ApproximateNumEntries(space);
EXPECT_GE(n2, n);
auto space2 = bits_builder->CalculateSpace(n2);
size_t space2 = bits_builder->CalculateSpace(n2);
EXPECT_EQ(space, space2);
}
// Until size_t overflow
for (; n < (n + n / 3); n += n / 3) {
// Ensure space computation is not overflowing; capped is OK
size_t space2 = bits_builder->CalculateSpace(n);
EXPECT_GE(space2, space);
space = space2;
}
}
// Check that the compiler hasn't optimized our computation into nothing
EXPECT_TRUE(some_computed_less_than_denoted);

@ -562,15 +562,25 @@ double FilterBench::RandomQueryTest(uint32_t inside_threshold, bool dry_run,
// 100% of queries to 1 filter
num_primary_filters = 1;
} else if (mode == kFiftyOneFilter) {
if (num_infos < 50) {
return 0.0; // skip
}
// 50% of queries
primary_filter_threshold /= 2;
// to 1% of filters
num_primary_filters = (num_primary_filters + 99) / 100;
} else if (mode == kEightyTwentyFilter) {
if (num_infos < 5) {
return 0.0; // skip
}
// 80% of queries
primary_filter_threshold = primary_filter_threshold / 5 * 4;
// to 20% of filters
num_primary_filters = (num_primary_filters + 4) / 5;
} else if (mode == kRandomFilter) {
if (num_infos == 1) {
return 0.0; // skip
}
}
uint32_t batch_size = 1;
std::unique_ptr<Slice[]> batch_slices;

Loading…
Cancel
Save