rocksdb

fork of https://github.com/oxigraph/rocksdb and https://github.com/facebook/rocksdb for nextgraph and oxigraph

History

Peter Dillinger 8aa99fc71e Warn on excessive keys for legacy Bloom filter with 32-bit hash (#6317 ) Summary: With many millions of keys, the old Bloom filter implementation for the block-based table (format_version <= 4) would have excessive FP rate due to the limitations of feeding the Bloom filter with a 32-bit hash. This change computes an estimated inflated FP rate due to this effect and warns in the log whenever an SST filter is constructed (almost certainly a "full" not "partitioned" filter) that exceeds 1.5x FP rate due to this effect. The detailed condition is only checked if 3 million keys or more have been added to a filter, as this should be a lower bound for common bits/key settings (< 20). Recommended remedies include smaller SST file size, using format_version >= 5 (for new Bloom filter), or using partitioned filters. This does not change behavior other than generating warnings for some constructed filters using the old implementation. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6317 Test Plan: Example with warning, 15M keys @ 15 bits / key: (working_mem_size_mb is just to stop after building one filter if it's large) $ ./filter_bench -quick -impl=0 -working_mem_size_mb=1 -bits_per_key=15 -average_keys_per_filter=15000000 2>&1 \| grep 'FP rate' [WARN] [/block_based/filter_policy.cc:292] Using legacy SST/BBT Bloom filter with excessive key count (15.0M @ 15bpk), causing estimated 1.8x higher filter FP rate. Consider using new Bloom with format_version>=5, smaller SST file size, or partitioned filters. Predicted FP rate %: 0.766702 Average FP rate %: 0.66846 Example without warning (150K keys): $ ./filter_bench -quick -impl=0 -working_mem_size_mb=1 -bits_per_key=15 -average_keys_per_filter=150000 2>&1 \| grep 'FP rate' Predicted FP rate %: 0.422857 Average FP rate %: 0.379301 $ With more samples at 15 bits/key: 150K keys -> no warning; actual: 0.379% FP rate (baseline) 1M keys -> no warning; actual: 0.396% FP rate, 1.045x 9M keys -> no warning; actual: 0.563% FP rate, 1.485x 10M keys -> warning (1.5x); actual: 0.564% FP rate, 1.488x 15M keys -> warning (1.8x); actual: 0.668% FP rate, 1.76x 25M keys -> warning (2.4x); actual: 0.880% FP rate, 2.32x At 10 bits/key: 150K keys -> no warning; actual: 1.17% FP rate (baseline) 1M keys -> no warning; actual: 1.16% FP rate 10M keys -> no warning; actual: 1.32% FP rate, 1.13x 25M keys -> no warning; actual: 1.63% FP rate, 1.39x 35M keys -> warning (1.6x); actual: 1.81% FP rate, 1.55x At 5 bits/key: 150K keys -> no warning; actual: 9.32% FP rate (baseline) 25M keys -> no warning; actual: 9.62% FP rate, 1.03x 200M keys -> no warning; actual: 12.2% FP rate, 1.31x 250M keys -> warning (1.5x); actual: 12.8% FP rate, 1.37x 300M keys -> warning (1.6x); actual: 13.4% FP rate, 1.43x The reason for the modest inaccuracy at low bits/key is that the assumption of independence between a collision between 32-hash values feeding the filter and an FP in the filter is not quite true for implementations using "simple" logic to compute indices from the stock hash result. There's math on this in my dissertation, but I don't think it's worth the effort just for these extreme cases (> 100 million keys and low-ish bits/key). Differential Revision: D19471715 Pulled By: pdillinger fbshipit-source-id: f80c96893a09bf1152630ff0b964e5cdd7e35c68		6 years ago
..
aligned_buffer.h	Document AlignedBuffer (#5345 )	6 years ago
autovector.h	Fix the constness issues around autovector::iterator_impl's dereference operators (#6057 )	6 years ago
autovector_test.cc	Move some memory related files from util/ to memory/ (#5382 )	6 years ago
bloom_impl.h	Warn on excessive keys for legacy Bloom filter with 32-bit hash (#6317 )	6 years ago
bloom_test.cc	Expose and elaborate FilterBuildingContext (#6088 )	6 years ago
build_version.cc.in	Add copyright headers per FB open-source checkup tool. (#5199 )	7 years ago
build_version.h	Change RocksDB License	8 years ago
cast_util.h	Add a missing "once" in .h	8 years ago
channel.h	Fix build breakage from lock_guard error (#6161 )	6 years ago
coding.cc	Enable MSVC W4 with a few exceptions. Fix warnings and bugs	8 years ago
coding.h	Avoid user key copying for Get/Put/Write with user-timestamp (#5502 )	6 years ago
coding_test.cc	Move test related files under util/ to test_util/ (#5377 )	6 years ago
compaction_job_stats_impl.cc	Refresh snapshot list during long compactions (2nd attempt) (#5278 )	7 years ago
comparator.cc	Add support for timestamp in Get/Put (#5079 )	6 years ago
compression.h	crash_test to cover bottommost compression and some other changes (#6215 )	6 years ago
compression_context_cache.cc	run make format for PR 3838 (#3954 )	7 years ago
compression_context_cache.h	run make format for PR 3838 (#3954 )	7 years ago
concurrent_task_limiter_impl.cc	Compaction limiter miscs (#4795 )	7 years ago
concurrent_task_limiter_impl.h	Apply formatter on recent 45 commits. (#5827 )	6 years ago
core_local.h	Change RocksDB License	8 years ago
crc32c.cc	Cleanup the Arm64 CRC32 unused warning (#5565 )	6 years ago
crc32c.h	Updated CRC32 Power Optimization Changes	8 years ago
crc32c_arm64.cc	Apply formatter to recent 200+ commits. (#5830 )	6 years ago
crc32c_arm64.h	Apply formatter to recent 200+ commits. (#5830 )	6 years ago
crc32c_ppc.c	C file should not include <cinttypes>, it is a C++ header. (#5499 )	6 years ago
crc32c_ppc.h	Remove PATENTS text from a few straggler files (#5326 )	6 years ago
crc32c_ppc_asm.S	Remove PATENTS text from a few straggler files (#5326 )	6 years ago
crc32c_ppc_constants.h	Remove PATENTS text from a few straggler files (#5326 )	6 years ago
crc32c_test.cc	Move test related files under util/ to test_util/ (#5377 )	6 years ago
duplicate_detector.h	simplify include directive involving inttypes (#5402 )	6 years ago
dynamic_bloom.cc	Apply formatter to recent 200+ commits. (#5830 )	6 years ago
dynamic_bloom.h	MultiGet batching in memtable (#5818 )	6 years ago
dynamic_bloom_test.cc	Apply formatter to recent 200+ commits. (#5830 )	6 years ago
file_reader_writer_test.cc	Introduce a new storage specific Env API (#5761 )	6 years ago
filelock_test.cc	Move some memory related files from util/ to memory/ (#5382 )	6 years ago
filter_bench.cc	Warn on excessive keys for legacy Bloom filter with 32-bit hash (#6317 )	6 years ago
gflags_compat.h	filter_bench - a prelim tool for SST filter benchmarking (#5825 )	6 years ago
hash.cc	Add new persistent 64-bit hash (#5984 )	6 years ago
hash.h	Add new persistent 64-bit hash (#5984 )	6 years ago
hash_map.h	Change RocksDB License	8 years ago
hash_test.cc	Add new persistent 64-bit hash (#5984 )	6 years ago
heap.h	Add compaction logic to RangeDelAggregatorV2 (#4758 )	7 years ago
heap_test.cc	fix gflags namespace	8 years ago
kv_map.h	Consolidate hash function used for non-persistent data in a new function (#5155 )	7 years ago
log_write_bench.cc	Divide file_reader_writer.h and .cc (#5803 )	6 years ago
murmurhash.cc	Add GCC 8 to Travis (#3433 )	7 years ago
murmurhash.h	Change RocksDB License	8 years ago
mutexlock.h	Apply formatter on recent 45 commits. (#5827 )	6 years ago
ppc-opcode.h	Remove PATENTS text from a few straggler files (#5326 )	6 years ago
random.cc	Change RocksDB License	8 years ago
random.h	Add useful idioms to Random API (OneInOpt, PercentTrue) (#6154 )	6 years ago
random_test.cc	Add useful idioms to Random API (OneInOpt, PercentTrue) (#6154 )	6 years ago
rate_limiter.cc	Move some memory related files from util/ to memory/ (#5382 )	6 years ago
rate_limiter.h	rate limit auto-tuning	8 years ago
rate_limiter_test.cc	Apply formatter to recent 200+ commits. (#5830 )	6 years ago
repeatable_thread.h	Move test related files under util/ to test_util/ (#5377 )	6 years ago
repeatable_thread_test.cc	Move some memory related files from util/ to memory/ (#5382 )	6 years ago
set_comparator.h	WritePrepared Txn: Move DuplicateDetector to util	8 years ago
slice.cc	Apply modernize-use-override (2nd iteration)	7 years ago
slice_transform_test.cc	Move test related files under util/ to test_util/ (#5377 )	6 years ago
status.cc	Work around weird unused errors with Mingw (#6075 )	6 years ago
stderr_logger.h	Change RocksDB License	8 years ago
stop_watch.h	Make statistics's stats_level change thread-safe (#5030 )	7 years ago
string_util.cc	Refactor trimming logic for immutable memtables (#5022 )	6 years ago
string_util.h	Refactor trimming logic for immutable memtables (#5022 )	6 years ago
thread_list_test.cc	Move test related files under util/ to test_util/ (#5377 )	6 years ago
thread_local.cc	Enable building of ARM32 (#4349 )	7 years ago
thread_local.h	Provide a way to override windows memory allocator with jemalloc for ZSTD	7 years ago
thread_local_test.cc	Fix thread_local_test failure caused by recent io_uring change (#6136 )	6 years ago
thread_operation.h	Add inline comments to flush job (#4464 )	7 years ago
threadpool_imp.cc	Apply formatter to recent 200+ commits. (#5830 )	6 years ago
threadpool_imp.h	Support lowering CPU priority of background threads	8 years ago
timer_queue.h	Move test related files under util/ to test_util/ (#5377 )	6 years ago
timer_queue_test.cc	Change RocksDB License	8 years ago
user_comparator_wrapper.h	Fix perf_context.user_key_comparison_count for range scan (#5098 )	7 years ago
util.h	Add GCC 8 to Travis (#3433 )	7 years ago
vector_iterator.h	Make clang-analyzer happy (#5821 )	6 years ago
xxh3p.h	Add new persistent 64-bit hash (#5984 )	6 years ago
xxhash.cc	Add new persistent 64-bit hash (#5984 )	6 years ago
xxhash.h	Misc hashing updates / upgrades (#5909 )	6 years ago