rocksdb

Commit Graph

Author	SHA1	Message	Date
Cheng Chang	91b7553293	Enable IO Uring in MultiGet in direct IO mode (#6815 ) Summary: Currently, in direct IO mode, `MultiGet` retrieves the data blocks one by one instead of in parallel, see `BlockBasedTable::RetrieveMultipleBlocks`. Since direct IO is supported in `RandomAccessFileReader::MultiRead` in https://github.com/facebook/rocksdb/pull/6446, this PR applies `MultiRead` to `MultiGet` so that the data blocks can be retrieved in parallel. Also, in direct IO mode and when data blocks are compressed and need to uncompressed, this PR only allocates one continuous aligned buffer to hold the data blocks, and then directly uncompress the blocks to insert into block cache, there is no longer intermediate copies to scratch buffers. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6815 Test Plan: 1. added a new unit test `BlockBasedTableReaderTest::MultiGet`. 2. existing unit tests and stress tests contain tests against `MultiGet` in direct IO mode. Reviewed By: anand1976 Differential Revision: D21426347 Pulled By: cheng-chang fbshipit-source-id: b8446ae0e74152444ef9111e97f8e402ac31b24f	5 years ago
sdong	4a4b8a1344	sst_dump to reduce number of file reads (#6836 ) Summary: sst_dump can issue many file reads from the file system. This doesn't work well with file systems without a OS cache, especially remote file systems. In order to mitigate this problem, several improvements are done: 1. --readahead_size is added, so that users can specify readahead size when scanning the data. 2. Force a 512KB tail readahead, which prevents three I/Os for footer, meta index and property blocks and hopefully index and filter blocks too. 3. Consoldiate SSTDump's I/Os before opening the file for read. Use the same file prefetch buffer. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6836 Test Plan: Add a test that covers this new feature. Reviewed By: pdillinger Differential Revision: D21516607 fbshipit-source-id: 3ae43526286f67b2f4a5bdedfbc92719d579b87e	5 years ago
Ziyue Yang	c384c08a4f	Add tests for compression failure in BlockBasedTableBuilder (#6709 ) Summary: Currently there is no check for whether BlockBasedTableBuilder will expose compression error status if compression fails during the table building. This commit adds fake faulting compressors and a unit test to test such cases. This check finds 5 bugs, and this commit also fixes them: 1. Not handling compression failure well in BlockBasedTableBuilder::BGWorkWriteRawBlock. 2. verify_compression failing in BlockBasedTableBuilder when used with ZSTD. 3. Wrongly passing the same reference of block contents to BlockBasedTableBuilder::CompressAndVerifyBlock in parallel compression. 4. Wrongly setting block_rep->first_key_in_next_block to nullptr in BlockBasedTableBuilder::EnterUnbuffered when there are still incoming data blocks. 5. Not maintaining variables for compression ratio estimation and first_block in BlockBasedTableBuilder::EnterUnbuffered. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6709 Reviewed By: ajkr Differential Revision: D21236254 fbshipit-source-id: 101f6e62b2bac2b7be72be198adf93cd32a1ff46	5 years ago
Peter Dillinger	b27a1448b6	Fix false NotFound from batched MultiGet with kHashSearch (#6821 ) Summary: The error is assigning KeyContext::s to NotFound status in a table reader for a "not found in this table" case, which skips searching in later tables, like only a delete should. (The hash search index iterator is the only one that can return status NotFound even if Valid() == false.) This was detected by intermittent failure in MultiThreadedDBTest.MultiThreaded/5, a kHashSearch configuration. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6821 Test Plan: modified existing unit test to reproduce problem Reviewed By: anand1976 Differential Revision: D21450469 Pulled By: pdillinger fbshipit-source-id: 7478003684d637dbd491cdac81468041a791be2c	5 years ago
mrambacher	394f2bbd13	Add OptionTypeInfo::Enum and related methods (#6423 ) Summary: Add methods and constructors for handling enums to the OptionTypeInfo. This change allows enums to be converted/compared without adding a special "type" to the OptionType. This change addresses a couple of issues: - It allows new enumerated types to be added to the options without editing the OptionType base class (and related methods) - It standardizes the procedure for adding enumerated types to the options, reducing potential mistakes - It moves the enum maps to the location where they are used, allowing them to be static file members rather than global values - It reduces the number of types and cases that need to be handled in the various OptionType methods Pull Request resolved: https://github.com/facebook/rocksdb/pull/6423 Reviewed By: siying Differential Revision: D21408713 Pulled By: zhichao-cao fbshipit-source-id: fc492af285d011822578b95d186a0fce25d35626	5 years ago
sdong	079e50d2ba	Disallow BlockBasedTableBuilder to set status from non-OK (#6776 ) Summary: There is no systematic mechanism to prevent BlockBasedTableBuilder's status to be set from non-OK to OK. Adding a mechanism to force this will help us prevent failures in the future. The solution is to only make it possible to set the status code if the status code to set is not OK. Since the status code passed to CompressAndVerifyBlock() is changed, a mini refactoring is done too so that the output arguments are changed from reference to pointers, based on Google C++ Style. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6776 Test Plan: Run all existing test. Reviewed By: pdillinger Differential Revision: D21314382 fbshipit-source-id: 27000c10f1e4c121661e026548d6882066409375	5 years ago
anand76	ab13d43e1d	Pass a timeout to FileSystem for random reads (#6751 ) Summary: Calculate ```IOOptions::timeout``` using ```ReadOptions::deadline``` and pass it to ```FileSystem::Read/FileSystem::MultiRead```. This allows us to impose a tighter bound on the time taken by Get/MultiGet on FileSystem/Envs that support IO timeouts. Even on those that don't support, check in ```RandomAccessFileReader::Read``` and ```MultiRead``` and return ```Status::TimedOut()``` if the deadline is exceeded. For now, TableReader creation, which might do file opens and reads, are not covered. It will be implemented in another PR. Tests: Update existing unit tests to verify the correct timeout value is being passed Pull Request resolved: https://github.com/facebook/rocksdb/pull/6751 Reviewed By: riversand963 Differential Revision: D21285631 Pulled By: anand1976 fbshipit-source-id: d89af843e5a91ece866e87aa29438b52a65a8567	5 years ago
mrambacher	618bf638aa	Add Functions to OptionTypeInfo (#6422 ) Summary: Added functions for parsing, serializing, and comparing elements to OptionTypeInfo. These functions allow all of the special cases that could not be handled directly in the map of OptionTypeInfo to be moved into the map. Using these functions, every type can be handled via the map rather than special cased. By adding these functions, the code for handling options can become more standardized (fewer special cases) and (eventually) handled completely by common classes. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6422 Test Plan: pass make check Reviewed By: siying Differential Revision: D21269005 Pulled By: zhichao-cao fbshipit-source-id: 9ba71c721a38ebf9ee88259d60bd81b3282b9077	5 years ago
Peter Dillinger	bae6f58696	Basic MultiGet support for partitioned filters (#6757 ) Summary: In MultiGet, access each applicable filter partition only once per batch, rather than for each applicable key. Also, * Fix Bloom stats for MultiGet * Fix/refactor MultiGetContext::Range::KeysLeft, including * Add efficient BitsSetToOne implementation * Assert that MultiGetContext::Range does not go beyond shift range Performance test: Generate db: $ ./db_bench --benchmarks=fillrandom --num=15000000 --cache_index_and_filter_blocks -bloom_bits=10 -partition_index_and_filters=true ... Before (middle performing run of three; note some missing Bloom stats): $ ./db_bench --use-existing-db --benchmarks=multireadrandom --num=15000000 --cache_index_and_filter_blocks --bloom_bits=10 --threads=16 --cache_size=20000000 -partition_index_and_filters -batch_size=32 -multiread_batched -statistics --duration=20 2>&1 \| egrep 'micros/op\|block.cache.filter.hit\|bloom.filter.(full\|use)\|number.multiget' multireadrandom : 26.403 micros/op 597517 ops/sec; (548427 of 671968 found) rocksdb.block.cache.filter.hit COUNT : 83443275 rocksdb.bloom.filter.useful COUNT : 0 rocksdb.bloom.filter.full.positive COUNT : 0 rocksdb.bloom.filter.full.true.positive COUNT : 7931450 rocksdb.number.multiget.get COUNT : 385984 rocksdb.number.multiget.keys.read COUNT : 12351488 rocksdb.number.multiget.bytes.read COUNT : 793145000 rocksdb.number.multiget.keys.found COUNT : 7931450 After (middle performing run of three): $ ./db_bench_new --use-existing-db --benchmarks=multireadrandom --num=15000000 --cache_index_and_filter_blocks --bloom_bits=10 --threads=16 --cache_size=20000000 -partition_index_and_filters -batch_size=32 -multiread_batched -statistics --duration=20 2>&1 \| egrep 'micros/op\|block.cache.filter.hit\|bloom.filter.(full\|use)\|number.multiget' multireadrandom : 21.024 micros/op 752963 ops/sec; (705188 of 863968 found) rocksdb.block.cache.filter.hit COUNT : 49856682 rocksdb.bloom.filter.useful COUNT : 45684579 rocksdb.bloom.filter.full.positive COUNT : 10395458 rocksdb.bloom.filter.full.true.positive COUNT : 9908456 rocksdb.number.multiget.get COUNT : 481984 rocksdb.number.multiget.keys.read COUNT : 15423488 rocksdb.number.multiget.bytes.read COUNT : 990845600 rocksdb.number.multiget.keys.found COUNT : 9908456 So that's about 25% higher throughput even for random keys Pull Request resolved: https://github.com/facebook/rocksdb/pull/6757 Test Plan: unit test included Reviewed By: anand1976 Differential Revision: D21243256 Pulled By: pdillinger fbshipit-source-id: 5644a1468d9e8c8575be02f4e04bc5d62dbbb57f	5 years ago
Peter Dillinger	249eff0f30	Stats for redundant insertions into block cache (#6681 ) Summary: Since read threads do not coordinate on loading data into block cache, two threads between Lookup and Insert can end up loading and inserting the same data. This is particularly concerning with cache_index_and_filter_blocks since those are hot and more likely to be race targets if ejected from (or not pre-populated in) the cache. Particularly with moves toward disaggregated / network storage, the cost of redundant retrieval might be high, and we should at least have some hard statistics from which we can estimate impact. Example with full filter thrashing "cliff": $ ./db_bench --benchmarks=fillrandom --num=15000000 --cache_index_and_filter_blocks -bloom_bits=10 ... $ ./db_bench --db=/tmp/rocksdbtest-172704/dbbench --use_existing_db --benchmarks=readrandom,stats --num=200000 --cache_index_and_filter_blocks --cache_size=$((130 * 1024 * 1024)) --bloom_bits=10 --threads=16 -statistics 2>&1 \| egrep '^rocksdb.block.cache.(.add\|.redundant)' \| grep -v compress \| sort rocksdb.block.cache.add COUNT : 14181 rocksdb.block.cache.add.failures COUNT : 0 rocksdb.block.cache.add.redundant COUNT : 476 rocksdb.block.cache.data.add COUNT : 12749 rocksdb.block.cache.data.add.redundant COUNT : 18 rocksdb.block.cache.filter.add COUNT : 1003 rocksdb.block.cache.filter.add.redundant COUNT : 217 rocksdb.block.cache.index.add COUNT : 429 rocksdb.block.cache.index.add.redundant COUNT : 241 $ ./db_bench --db=/tmp/rocksdbtest-172704/dbbench --use_existing_db --benchmarks=readrandom,stats --num=200000 --cache_index_and_filter_blocks --cache_size=$((120 * 1024 * 1024)) --bloom_bits=10 --threads=16 -statistics 2>&1 \| egrep '^rocksdb.block.cache.(.add\|.redundant)' \| grep -v compress \| sort rocksdb.block.cache.add COUNT : 1182223 rocksdb.block.cache.add.failures COUNT : 0 rocksdb.block.cache.add.redundant COUNT : 302728 rocksdb.block.cache.data.add COUNT : 31425 rocksdb.block.cache.data.add.redundant COUNT : 12 rocksdb.block.cache.filter.add COUNT : 795455 rocksdb.block.cache.filter.add.redundant COUNT : 130238 rocksdb.block.cache.index.add COUNT : 355343 rocksdb.block.cache.index.add.redundant COUNT : 172478 Pull Request resolved: https://github.com/facebook/rocksdb/pull/6681 Test Plan: Some manual testing (above) and unit test covering key metrics is included Reviewed By: ltamasi Differential Revision: D21134113 Pulled By: pdillinger fbshipit-source-id: c11497b5f00f4ffdfe919823904e52d0a1a91d87	5 years ago
anand76	9e7b7e2c08	Silence false alarms in db_stress fault injection (#6741 ) Summary: False alarms are caused by codepaths that intentionally swallow IO errors. Tests: make crash_test Pull Request resolved: https://github.com/facebook/rocksdb/pull/6741 Reviewed By: ltamasi Differential Revision: D21181138 Pulled By: anand1976 fbshipit-source-id: 5ccfbc68eb192033488de6269e59c00f2c65ce00	5 years ago
Ibrahim Jarif	ae77880223	Fix some typos in code comments (#6733 ) Summary: This PR fixes some typos in code comments. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6733 Reviewed By: siying Differential Revision: D21209037 Pulled By: zhichao-cao fbshipit-source-id: d9274611fab1f5e992998c8c4117b8078c4cbc69	5 years ago
mrambacher	4cbc19d2a1	Add a ConfigOptions for use in comparing objects and converting to/from strings (#6389 ) Summary: The methods in convenience.h are used to compare/convert objects to/from strings. There is a mishmash of parameters in use here with more needed in the future. This PR replaces those parameters with a single structure. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6389 Reviewed By: siying Differential Revision: D21163707 Pulled By: zhichao-cao fbshipit-source-id: f807b4cc7e2b0af3871536b69546b2604dfa81bd	5 years ago
Andrew Kryczka	f9155a3404	Prevent uninitialized load in `IndexBlockIter` (#6736 ) Summary: When index block is empty or an error happens while reading it, `Invalidate()` is called rather than `Initialize()`. So `Seek()` must not refer to member variables that are only initialized in `Initialize()` until it is sure `Initialize()` has been called. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6736 Reviewed By: siying Differential Revision: D21139641 Pulled By: ajkr fbshipit-source-id: 71c58cc1adbd795dc3729dd5023bf7df1515ff32	5 years ago
Peter Dillinger	31da5e34c1	C++20 compatibility (#6697 ) Summary: Based on https://github.com/facebook/rocksdb/issues/6648 (CLA Signed), but heavily modified / extended: * Implicit capture of this via [=] deprecated in C++20, and [=,this] not standard before C++20 -> now using explicit capture lists * Implicit copy operator deprecated in gcc 9 -> add explicit '= default' definition * std::random_shuffle deprecated in C++17 and removed in C++20 -> migrated to a replacement in RocksDB random.h API * Add the ability to build with different std version though -DCMAKE_CXX_STANDARD=11/14/17/20 on the cmake command line * Minimal rebuild flag of MSVC is deprecated and is forbidden with /std:c++latest (C++20) * Added MSVC 2019 C++11 & MSVC 2019 C++20 in AppVeyor * Added GCC 9 C++11 & GCC9 C++20 in Travis Pull Request resolved: https://github.com/facebook/rocksdb/pull/6697 Test Plan: make check and CI Reviewed By: cheng-chang Differential Revision: D21020318 Pulled By: pdillinger fbshipit-source-id: 12311be5dbd8675a0e2c817f7ec50fa11c18ab91	5 years ago
Mike Kolupaev	e45673dece	Properly report IO errors when IndexType::kBinarySearchWithFirstKey is used (#6621 ) Summary: Context: Index type `kBinarySearchWithFirstKey` added the ability for sst file iterator to sometimes report a key from index without reading the corresponding data block. This is useful when sst blocks are cut at some meaningful boundaries (e.g. one block per key prefix), and many seeks land between blocks (e.g. for each prefix, the ranges of keys in different sst files are nearly disjoint, so a typical seek needs to read a data block from only one file even if all files have the prefix). But this added a new error condition, which rocksdb code was really not equipped to deal with: `InternalIterator::value()` may fail with an IO error or Status::Incomplete, but it's just a method returning a Slice, with no way to report error instead. Before this PR, this type of error wasn't handled at all (an empty slice was returned), and kBinarySearchWithFirstKey implementation was considered a prototype. Now that we (LogDevice) have experimented with kBinarySearchWithFirstKey for a while and confirmed that it's really useful, this PR is adding the missing error handling. It's a pretty inconvenient situation implementation-wise. The error needs to be reported from InternalIterator when trying to access value. But there are ~700 call sites of `InternalIterator::value()`, most of which either can't hit the error condition (because the iterator is reading from memtable or from index or something) or wouldn't benefit from the deferred loading of the value (e.g. compaction iterator that reads all values anyway). Adding error handling to all these call sites would needlessly bloat the code. So instead I made the deferred value loading optional: only the call sites that may use deferred loading have to call the new method `PrepareValue()` before calling `value()`. The feature is enabled with a new bool argument `allow_unprepared_value` to a bunch of methods that create iterators (it wouldn't make sense to put it in ReadOptions because it's completely internal to iterators, with virtually no user-visible effect). Lmk if you have better ideas. Note that the deferred value loading only happens for internal iterators. The user-visible iterator (DBIter) always prepares the value before returning from Seek/Next/etc. We could go further and add an API to defer that value loading too, but that's most likely not useful for LogDevice, so it doesn't seem worth the complexity for now. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6621 Test Plan: make -j5 check . Will also deploy to some logdevice test clusters and look at stats. Reviewed By: siying Differential Revision: D20786930 Pulled By: al13n321 fbshipit-source-id: 6da77d918bad3780522e918f17f4d5513d3e99ee	5 years ago
Ziyue Yang	41563b61db	Fix data racing of BlockBasedTableBuilder::ParallelCompressionRep::first_block (#6640 ) Summary: BlockBasedTableBuilder::ParallelCompressionRep::first_block can be read in Flush() and written in BGWorkWriteRawBlock() concurrently. This commit fixes the issue by reading first_block out before pushing the block to compression and write. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6640 Test Plan: Run all tests concurrently with TSAN. Reviewed By: cheng-chang Differential Revision: D20851370 fbshipit-source-id: 6f039222e8319d31e15f1b45e05c106527253f72	5 years ago
Andrew Kryczka	9eca6d651d	fix comparison count for format_version=3 indexes (#6650 ) Summary: In index blocks since `format_version=3`, user keys are written rather than internal keys. When reading such blocks, the comparator is obtained via `InternalKeyComparator::user_comparator()`. That function must not return an unwrapped result as the wrapper class provides accounting logic to populate `PerfContext::user_key_comparison_count`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6650 Test Plan: ran db_bench and verified `PerfContext::user_key_comparison_count` became larger. Reviewed By: cheng-chang Differential Revision: D20866325 Pulled By: ajkr fbshipit-source-id: ad755d46bda31157dacc5b66e532279f19ad538c	5 years ago
anand76	79c838eb0f	Fix a few bugs in db_stress fault injection (#6693 ) Summary: Fix the following issues - 1. Output parsing error in db_crashtest.py 2. Memory leak on exit 3. False alarm on filter block read error Pull Request resolved: https://github.com/facebook/rocksdb/pull/6693 Test Plan: asan_crash Reviewed By: cheng-chang Differential Revision: D20990399 Pulled By: anand1976 fbshipit-source-id: 178ee0dd7c69a4bc5db698379db0dedb29281699	5 years ago
anand76	5c19a441c4	Fault injection in db_stress (#6538 ) Summary: This PR implements a fault injection mechanism for injecting errors in reads in db_stress. The FaultInjectionTestFS is used for this purpose. A thread local structure is used to track the errors, so that each db_stress thread can independently enable/disable error injection and verify observed errors against expected errors. This is initially enabled only for Get and MultiGet, but can be extended to iterator as well once its proven stable. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6538 Test Plan: crash_test make check Reviewed By: riversand963 Differential Revision: D20714347 Pulled By: anand1976 fbshipit-source-id: d7598321d4a2d72bda0ced57411a337a91d87dc7	5 years ago
anand76	d600e5b0eb	Fix a Centos build failure reported in #6651 (#6656 ) Summary: Fixes issue https://github.com/facebook/rocksdb/issues/6651 Tests: make check Pull Request resolved: https://github.com/facebook/rocksdb/pull/6656 Reviewed By: cheng-chang Differential Revision: D20879084 Pulled By: anand1976 fbshipit-source-id: c2cc508ca2716fcf80dcf9d2ba31c32d211f941e	5 years ago
Yi Wu	eb287c72d7	Fix wrong key being read on ingested file with global seqno and delta encoding (#6669 ) Summary: On reading an ingested SST file, `DataBlockIter` will replace seqno encoded in a key with global seqno. However, if the original seqno was part of the prefix used for the next key, the global seqno is by mistake used as part of the prefix to construct the next key, causing wrong result being returned. Although at this point it is only software error while data in the file is not corrupted, the issue can further cause compaction output out of order and corrupted result when the ingested SST participated in compaction. Fixing the issue by save the actual seqno and restore it before the key being used as prefix to construct next key. The unit test is by Little-Wallace from https://github.com/facebook/rocksdb/issues/6666. Fixing https://github.com/facebook/rocksdb/issues/6666. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6669 Test Plan: New unit test Signed-off-by: Yi Wu <yiwu@pingcap.com> Reviewed By: cheng-chang Differential Revision: D20931808 Pulled By: ajkr fbshipit-source-id: f01959c35d6a493954dca981663766c7a5a9e8ab	5 years ago
sdong	00f8016b36	Fix clang anaylze warning caused by #6262 (#6641 ) Summary: https://github.com/facebook/rocksdb/pull/6262 causes CLANG analyze to complain. Add assertion to suppress the warning. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6641 Test Plan: Run "clang analyze" and make sure it passes. Reviewed By: anand1976 Differential Revision: D20841722 fbshipit-source-id: 5fa6e0c5cfe7a822214c9b898a408df59d4fd2cd	5 years ago
mrambacher	259b6ec8da	Move the OptionTypeMap code closer to home (#6198 ) Summary: This is a predecessor to the Configurable PR. This change moves the OptionTypeInfo maps closer to where they will be used. When the Configurable changes are adopted, these values will become static and not associated with the OptionsHelper. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6198 Reviewed By: siying Differential Revision: D20778108 Pulled By: zhichao-cao fbshipit-source-id: a9f85fc73bc53503656e1958ecc1e764052fd1aa	5 years ago
sdong	d0f3894cf1	In block based table builder, make variables for estimating file size atomic (#6636 ) Summary: With https://github.com/facebook/rocksdb/issues/6262, TSAN complains about data race of some variables. Those variables are used to estimate file size and are accessed in writer and background threads. Since file size estimation doesn't have to be 100% accurate, we make some variables atomic and use relaxed memory order. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6636 Test Plan: Run all tests with TSAN. Reviewed By: anand1976 Differential Revision: D20820635 fbshipit-source-id: 1ea45ff38be15e33674ffe06b7d42fc9fe161ea5	5 years ago
Ziyue Yang	8088482dd6	Fix a division by zero after #6262 (#6633 ) Summary: With https://github.com/facebook/rocksdb/issues/6262, UBSAN fails with "division by zero": [ RUN ] Timestamp/DBBasicTestWithTimestampCompressionSettings.PutAndGetWithCompaction/3 internal_repo_rocksdb/repo/table/block_based/block_based_table_builder.cc:1066:39: runtime error: division by zero #0 0x7ffb3117b071 in rocksdb::BlockBasedTableBuilder::WriteRawBlock(rocksdb::Slice const&, rocksdb::CompressionType, rocksdb::BlockHandle, bool) internal_repo_rocksdb/repo/table/block_based/block_based_table_builder.cc:1066 https://github.com/facebook/rocksdb/issues/1 0x7ffb311775e1 in rocksdb::BlockBasedTableBuilder::WriteBlock(rocksdb::Slice const&, rocksdb::BlockHandle, bool) internal_repo_rocksdb/repo/table/block_based/block_based_table_builder.cc:848 https://github.com/facebook/rocksdb/issues/2 0x7ffb311771a2 in rocksdb::BlockBasedTableBuilder::WriteBlock(rocksdb::BlockBuilder, rocksdb::BlockHandle, bool) internal_repo_rocksdb/repo/table/block_based/block_based_table_builder.cc:832 This is caused by not returning immediately after CompressAndVerifyBlock call in WriteBlock when rep_->status == kBuffered. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6633 Test Plan: Run all existing test. Reviewed By: anand1976 Differential Revision: D20808366 fbshipit-source-id: 09f24b7c0fbaf4c7a8fc48cac61fa6fcb9b85811	5 years ago
Ziyue Yang	03a781a90c	Add pipelined & parallel compression optimization (#6262 ) Summary: This PR adds support for pipelined & parallel compression optimization for `BlockBasedTableBuilder`. This optimization makes block building, block compression and block appending a pipeline, and uses multiple threads to accelerate block compression. Users can set `CompressionOptions::parallel_threads` greater than 1 to enable compression parallelism. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6262 Reviewed By: ajkr Differential Revision: D20651306 fbshipit-source-id: 62125590a9c15b6d9071def9dc72589c1696a4cb	5 years ago
Levi Tamasi	e6f86cfb36	Revert the recent cache deleter change (#6620 ) Summary: Revert "Use function objects as deleters in the block cache (https://github.com/facebook/rocksdb/issues/6545)" This reverts commit `6301dbe7a7`. Revert "Call out the cache deleter related interface change in HISTORY.md (https://github.com/facebook/rocksdb/issues/6606)" This reverts commit `3a35542f86`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6620 Test Plan: `make check` Reviewed By: zhichao-cao Differential Revision: D20773311 Pulled By: ltamasi fbshipit-source-id: 7637a761f718f323ef0e7da959462e8fb06e7a2b	5 years ago
Zhichao Cao	e8d332d97e	Use FileChecksumGenFactory for SST file checksum (#6600 ) Summary: In the current implementation, sst file checksum is calculated by a shared checksum function object, which may make some checksum function hard to be applied here such as SHA1. In this implementation, each sst file will have its own checksum generator obejct, created by FileChecksumGenFactory. User needs to implement its own FilechecksumGenerator and Factory to plugin the in checksum calculation method. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6600 Test Plan: tested with make asan_check Reviewed By: riversand963 Differential Revision: D20717670 Pulled By: zhichao-cao fbshipit-source-id: 2a74c1c280ac11a07a1980185b43b671acaa71c6	5 years ago
Zhichao Cao	4246888101	Pass IOStatus to write path and set retryable IO Error as hard error in BG jobs (#6487 ) Summary: In the current code base, we use Status to get and store the returned status from the call. Specifically, for IO related functions, the current Status cannot reflect the IO Error details such as error scope, error retryable attribute, and others. With the implementation of https://github.com/facebook/rocksdb/issues/5761, we have the new Wrapper for IO, which returns IOStatus instead of Status. However, the IOStatus is purged at the lower level of write path and transferred to Status. The first job of this PR is to pass the IOStatus to the write path (flush, WAL write, and Compaction). The second job is to identify the Retryable IO Error as HardError, and set the bg_error_ as HardError. In this case, the DB Instance becomes read only. User is informed of the Status and need to take actions to deal with it (e.g., call db->Resume()). Pull Request resolved: https://github.com/facebook/rocksdb/pull/6487 Test Plan: Added the testing case to error_handler_fs_test. Pass make asan_check Reviewed By: anand1976 Differential Revision: D20685017 Pulled By: zhichao-cao fbshipit-source-id: ff85f042896243abcd6ef37877834e26f36b6eb0	5 years ago
Levi Tamasi	6301dbe7a7	Use function objects as deleters in the block cache (#6545 ) Summary: As the first step of reintroducing eviction statistics for the block cache, the patch switches from using simple function pointers as deleters to function objects implementing an interface. This will enable using deleters that have state, like a smart pointer to the statistics object that is to be updated when an entry is removed from the cache. For now, the patch adds a deleter template class `SimpleDeleter`, which simply casts the `value` pointer to its original type and calls `delete` or `delete[]` on it as appropriate. Note: to prevent object lifecycle issues, deleters must outlive the cache entries referring to them; `SimpleDeleter` ensures this by using the ("leaky") Meyers singleton pattern. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6545 Test Plan: `make asan_check` Reviewed By: siying Differential Revision: D20475823 Pulled By: ltamasi fbshipit-source-id: fe354c33dd96d9bafc094605462352305449a22a	5 years ago
Mike Kolupaev	963af52f15	Fix iterator reading filter block despite read_tier == kBlockCacheTier (#6562 ) Summary: We're seeing iterators with `ReadOptions::read_tier == kBlockCacheTier` sometimes doing file reads. Stack trace: ``` rocksdb::RandomAccessFileReader::Read(unsigned long, unsigned long, rocksdb::Slice, char, bool) const rocksdb::BlockFetcher::ReadBlockContents() rocksdb::Status rocksdb::BlockBasedTable::MaybeReadBlockAndLoadToCache<rocksdb::ParsedFullFilterBlock>(rocksdb::FilePrefetchBuffer, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::UncompressionDict const&, rocksdb::CachableEntry<rocksdb::ParsedFullFilterBlock>, rocksdb::BlockType, rocksdb::GetContext, rocksdb::BlockCacheLookupContext, rocksdb::BlockContents) const rocksdb::Status rocksdb::BlockBasedTable::RetrieveBlock<rocksdb::ParsedFullFilterBlock>(rocksdb::FilePrefetchBuffer, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::UncompressionDict const&, rocksdb::CachableEntry<rocksdb::ParsedFullFilterBlock>, rocksdb::BlockType, rocksdb::GetContext, rocksdb::BlockCacheLookupContext, bool, bool) const rocksdb::FilterBlockReaderCommon<rocksdb::ParsedFullFilterBlock>::ReadFilterBlock(rocksdb::BlockBasedTable const, rocksdb::FilePrefetchBuffer, rocksdb::ReadOptions const&, bool, rocksdb::GetContext, rocksdb::BlockCacheLookupContext, rocksdb::CachableEntry<rocksdb::ParsedFullFilterBlock>) rocksdb::FilterBlockReaderCommon<rocksdb::ParsedFullFilterBlock>::GetOrReadFilterBlock(bool, rocksdb::GetContext, rocksdb::BlockCacheLookupContext, rocksdb::CachableEntry<rocksdb::ParsedFullFilterBlock>) const rocksdb::FullFilterBlockReader::MayMatch(rocksdb::Slice const&, bool, rocksdb::GetContext, rocksdb::BlockCacheLookupContext) const rocksdb::FullFilterBlockReader::RangeMayExist(rocksdb::Slice const, rocksdb::Slice const&, rocksdb::SliceTransform const, rocksdb::Comparator const, rocksdb::Slice const, bool, bool, rocksdb::BlockCacheLookupContext) rocksdb::BlockBasedTable::PrefixMayMatch(rocksdb::Slice const&, rocksdb::ReadOptions const&, rocksdb::SliceTransform const, bool, rocksdb::BlockCacheLookupContext) const rocksdb::BlockBasedTableIterator<rocksdb::DataBlockIter, rocksdb::Slice>::SeekImpl(rocksdb::Slice const) rocksdb::ForwardIterator::SeekInternal(rocksdb::Slice const&, bool) rocksdb::DBIter::Seek(rocksdb::Slice const&) ``` `BlockBasedTableIterator::CheckPrefixMayMatch` was missing a check for `kBlockCacheTier`. This PR adds it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6562 Test Plan: deployed it to a logdevice test cluster and looked at logdevice's IO tracing. Reviewed By: siying Differential Revision: D20529368 Pulled By: al13n321 fbshipit-source-id: 65bf33964b1951464415c900336635fb20919611	5 years ago
Cheng Chang	4fc216649d	Support direct IO in RandomAccessFileReader::MultiRead (#6446 ) Summary: By supporting direct IO in RandomAccessFileReader::MultiRead, the benefits of parallel IO (IO uring) and direct IO can be combined. In direct IO mode, read requests are aligned and merged together before being issued to RandomAccessFile::MultiRead, so blocks in the original requests might share the same underlying buffer, the shared buffers are returned in `aligned_bufs`, which is a new parameter of the `MultiRead` API. For example, suppose alignment requirement for direct IO is 4KB, one request is (offset: 1KB, len: 1KB), another request is (offset: 3KB, len: 1KB), then since they all belong to page (offset: 0, len: 4KB), `MultiRead` only reads the page with direct IO into a buffer on heap, and returns 2 Slices referencing regions in that same buffer. See `random_access_file_reader_test.cc` for more examples. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6446 Test Plan: Added a new test `random_access_file_reader_test.cc`. Reviewed By: anand1976 Differential Revision: D20097518 Pulled By: cheng-chang fbshipit-source-id: ca48a8faf9c3af146465c102ef6b266a363e78d1	5 years ago
sdong	712bc4b6a2	Fix regression bug in partitioned index reseek caused by #6531 (#6551 ) Summary: https://github.com/facebook/rocksdb/pull/6531 removed some code in partitioned index seek logic. By mistake the logic of storing previous index offset is removed, while the logic of using it is preserved, so that the code might use wrong value to determine reseeking condition. This will trigger a bug, if following a Seek() not going to the last block, SeekToLast() is called, and then Seek() is called which should position the cursor to the block before SeekToLast(). Pull Request resolved: https://github.com/facebook/rocksdb/pull/6551 Test Plan: Add a unit test that reproduces the bug. In the same unit test, also some reseek cases are covered to avoid regression. Reviewed By: pdillinger Differential Revision: D20493990 fbshipit-source-id: 3919aa4861c0481ec96844e053048da1a934b91d	5 years ago
Yanqin Jin	098dce2d1a	Fix compiler warning treated as error (#6547 ) Summary: Define a private member variable only in debug mode. Without fix, build will fail ``` In file included from table/block_based/partitioned_index_iterator.cc:9: ./table/block_based/partitioned_index_iterator.h:125:32: error: private field 'icomp_' is not used [-Werror,-Wunused-private-field] const InternalKeyComparator& icomp_; ``` Test plan (dev server) 1. make check 2. Make sure fixed in Travis Pull Request resolved: https://github.com/facebook/rocksdb/pull/6547 Reviewed By: siying Differential Revision: D20480027 Pulled By: pdillinger fbshipit-source-id: 288bc94280e240c3136335b6c73eb1ccb0db459d	5 years ago
sdong	d66908091d	De-template block based table iterator (#6531 ) Summary: Right now block based table iterator is used as both of iterating data for block based table, and for the index iterator for partitioend index. This was initially convenient for introducing a new iterator and block type for new index format, while reducing code change. However, these two usage doesn't go with each other very well. For example, Prev() is never called for partitioned index iterator, and some other complexity is maintained in block based iterators, which is not needed for index iterator but maintainers will always need to reason about it. Furthermore, the template usage is not following Google C++ Style which we are following, and makes a large chunk of code tangled together. This commit separate the two iterators. Right now, here is what it is done: 1. Copy the block based iterator code into partitioned index iterator, and de-template them. 2. Remove some code not needed for partitioned index. The upper bound check and tricks are removed. We never tested performance for those tricks when partitioned index is enabled in the first place. It's unlikelyl to generate performance regression, as creating new partitioned index block is much rarer than data blocks. 3. Separate out the prefetch logic to a helper class and both classes call them. This commit will enable future follow-ups. One direction is that we might separate index iterator interface for data blocks and index blocks, as they are quite different. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6531 Test Plan: build using make and cmake. And build release Differential Revision: D20473108 fbshipit-source-id: e48011783b339a4257c204cc07507b171b834b0f	5 years ago
sdong	674cf41732	Divide block_based_table_reader.cc (#6527 ) Summary: block_based_table_reader.cc is a giant file, which makes it hard for users to navigate the code. Divide the files to multiple files. Some class templates cannot be moved to .cc file. They are moved to .h files. It is still better than including them all in block_based_table_reader.cc. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6527 Test Plan: "make all check" and "make release". Also build using cmake. Differential Revision: D20428455 fbshipit-source-id: ca713c698469f07f35bc0c271358c0874ed4eb28	5 years ago
Yanqin Jin	d93812c9ae	Iterator with timestamp (#6255 ) Summary: Preliminary support for iterator with user timestamp. Current implementation does not consider merge operator and reverse iterator. Auto compaction is also disabled in unit tests. Create an iterator with timestamp. ``` ... read_opts.timestamp = &ts; auto* iter = db->NewIterator(read_opts); // target is key without timestamp. for (iter->Seek(target); iter->Valid(); iter->Next()) {} for (iter->SeekToFirst(); iter->Valid(); iter->Next()) {} delete iter; read_opts.timestamp = &ts1; // lower_bound and upper_bound are without timestamp. read_opts.iterate_lower_bound = &lower_bound; read_opts.iterate_upper_bound = &upper_bound; auto* iter1 = db->NewIterator(read_opts); // Do Seek or SeekToFirst() delete iter1; ``` Test plan (dev server) ``` $make check ``` Simple benchmarking (dev server) 1. The overhead introduced by this PR even when timestamp is disabled. key size: 16 bytes value size: 100 bytes Entries: 1000000 Data reside in main memory, and try to stress iterator. Repeated three times on master and this PR. - Seek without next ``` ./db_bench -db=/dev/shm/rocksdbtest-1000 -benchmarks=fillseq,seekrandom -enable_pipelined_write=false -disable_wal=true -format_version=3 ``` master: 159047.0 ops/sec this PR: 158922.3 ops/sec (2% drop in throughput) - Seek and next 10 times ``` ./db_bench -db=/dev/shm/rocksdbtest-1000 -benchmarks=fillseq,seekrandom -enable_pipelined_write=false -disable_wal=true -format_version=3 -seek_nexts=10 ``` master: 109539.3 ops/sec this PR: 107519.7 ops/sec (2% drop in throughput) Pull Request resolved: https://github.com/facebook/rocksdb/pull/6255 Differential Revision: D19438227 Pulled By: riversand963 fbshipit-source-id: b66b4979486f8474619f4aa6bdd88598870b0746	5 years ago
Michael R. Crusoe	051696bf98	fix some spelling typos (#6464 ) Summary: Found from Debian's "Lintian" program Pull Request resolved: https://github.com/facebook/rocksdb/pull/6464 Differential Revision: D20162862 Pulled By: zhichao-cao fbshipit-source-id: 06941ee2437b038b2b8045becbe9d2c6fbff3e12	5 years ago
Andrew Kryczka	69679e7375	Fix range deletion tombstone ingestion with global seqno (#6429 ) Summary: Original author: jeffrey-xiao If we are writing a global seqno for an ingested file, the range tombstone metablock gets accessed and put into the cache during ingestion preparation. At the time, the global seqno of the ingested file has not yet been determined, so the cached block will not have a global seqno. When the file is ingested and we read its range tombstone metablock, it will be returned from the cache with no global seqno. In that case, we use the actual seqnos stored in the range tombstones, which are all zero, so the tombstones cover nothing. This commit removes global_seqno_ variable from Block. When iterating over a block, the global seqno for the block is determined by the iterator instead of storing this mutable attribute in Block. Additionally, this commit adds a regression test to check that keys are deleted when ingesting a file with a global seqno and range deletion tombstones. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6429 Differential Revision: D19961563 Pulled By: ajkr fbshipit-source-id: 5cf777397fa3e452401f0bf0364b0750492487b7	5 years ago
Yanqin Jin	890d87fadc	Some minor fix-ups (#6440 ) Summary: Cleanup some code without any real change in functionality. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6440 Differential Revision: D20015891 Pulled By: riversand963 fbshipit-source-id: 33e18754b0f002006a6d4805e9aaf84c0c8ad25a	5 years ago
sdong	fdf882ded2	Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433 ) Summary: When dynamically linking two binaries together, different builds of RocksDB from two sources might cause errors. To provide a tool for user to solve the problem, the RocksDB namespace is changed to a flag which can be overridden in build time. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6433 Test Plan: Build release, all and jtest. Try to build with ROCKSDB_NAMESPACE with another flag. Differential Revision: D19977691 fbshipit-source-id: aa7f2d0972e1c31d75339ac48478f34f6cfcfb3e	5 years ago
anand76	3e49249d30	Ensure all MultiGet IO errors are propagated to user (#6403 ) Summary: Unrevert the previous fix to propagate error status, and an additional fix to not treat a memtable lookup MergeInProgress status as an error. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6403 Test Plan: Unit tests Tried running stress tests but couldn't repro the stress failure Differential Revision: D19846721 Pulled By: anand1976 fbshipit-source-id: 7db10cccbdc863d9b559497f0a46b608d2488ca4	5 years ago
anand76	35ed530d2c	Revert "Check KeyContext status in MultiGet (#6387 )" (#6401 ) Summary: This reverts commit `d70011bccc`. The commit is causing some stress test failure due to unexpected Status::MergeInProgress() return for some keys. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6401 Differential Revision: D19826623 Pulled By: anand1976 fbshipit-source-id: edd634cede9cb7bdd2cb8f46e662ea709b16d2f1	5 years ago
Zhichao Cao	4369f2c7bb	Checksum for each SST file and stores in MANIFEST (#6216 ) Summary: In the current code base, RocksDB generate the checksum for each block and verify the checksum at usage. Current PR enable SST file checksum. After a SST file is generated by Flush or Compaction, RocksDB generate the SST file checksum and store the checksum value and checksum method name in the vs_info and MANIFEST as part for the FileMetadata. Added the enable_sst_file_checksum to Options to enable or disable file checksum. Added sst_file_checksum to Options such that user can plugin their own SST file checksum calculate method via overriding the SstFileChecksum class. The checksum information inlcuding uint32_t checksum value and a checksum name (string). A new tool is added to LDB such that user can dump out a list of file checksum information from MANIFEST. If user enables the file checksum but does not provide the sst_file_checksum instance, RocksDB will use the default crc32checksum implemented in table/sst_file_checksum_crc32c.h Pull Request resolved: https://github.com/facebook/rocksdb/pull/6216 Test Plan: Added the testing case in table_test and ldb_cmd_test to verify checksum is correct in different level. Pass make asan_check. Differential Revision: D19171461 Pulled By: zhichao-cao fbshipit-source-id: b2e53479eefc5bb0437189eaa1941670e5ba8b87	5 years ago
anand76	d70011bccc	Check KeyContext status in MultiGet (#6387 ) Summary: Currently, any IO errors and checksum mismatches while reading data blocks, are being ignored by the batched MultiGet. Its only looking at the GetContext state. Fix that. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6387 Test Plan: Add unit tests Differential Revision: D19799819 Pulled By: anand1976 fbshipit-source-id: 46133dccbb04e64067b9fe6cda73e282203db969	5 years ago
sdong	8f2bee6747	Add ReadOptions.auto_prefix_mode (#6314 ) Summary: Add a new option ReadOptions.auto_prefix_mode. When set to true, iterator should return the same result as total order seek, but may choose to do prefix seek internally, based on iterator upper bounds. Also fix two previous bugs when handling prefix extrator changes: (1) reverse iterator should not rely on upper bound to determine prefix. Fix it with skipping prefix check. (2) block-based filter is not handled properly. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6314 Test Plan: (1) add a unit test; (2) add the check to stress test and run see whether it can pass at least one run. Differential Revision: D19458717 fbshipit-source-id: 51c1bcc5cdd826c2469af201979a39600e779bce	5 years ago
sdong	f10f135938	Fix regression bug of hash index with iterator total order seek (#6328 ) Summary: https://github.com/facebook/rocksdb/pull/6028 introduces a bug for hash index in SST files. If a table reader is created when total order seek is used, prefix_extractor might be passed into table reader as null. While later when prefix seek is used, the same table reader used, hash index is checked but prefix extractor is null and the program would crash. Fix the issue by fixing http://github.com/facebook/rocksdb/pull/6028 in the way that prefix_extractor is preserved but ReadOptions.total_order_seek is checked Also, a null pointer check is added so that a bug like this won't cause segfault in the future. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6328 Test Plan: Add a unit test that would fail without the fix. Stress test that reproduces the crash would pass. Differential Revision: D19586751 fbshipit-source-id: 8de77690167ddf5a77a01e167cf89430b1bfba42	5 years ago
Peter Dillinger	986df37135	Clean up PartitionedFilterBlockBuilder (#6299 ) Summary: Remove the redundant PartitionedFilterBlockBuilder::num_added_ and ::NumAdded since the parent class, FullFilterBlockBuilder, already provides them. Also rename filters_in_partition_ and filters_per_partition_ to keys_added_to_partition_ and keys_per_partition_ to improve readability. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6299 Test Plan: make check Differential Revision: D19413278 Pulled By: pdillinger fbshipit-source-id: 04926ee7874477d659cb2b6ae03f2d995fb747e5	5 years ago
Peter Dillinger	8aa99fc71e	Warn on excessive keys for legacy Bloom filter with 32-bit hash (#6317 ) Summary: With many millions of keys, the old Bloom filter implementation for the block-based table (format_version <= 4) would have excessive FP rate due to the limitations of feeding the Bloom filter with a 32-bit hash. This change computes an estimated inflated FP rate due to this effect and warns in the log whenever an SST filter is constructed (almost certainly a "full" not "partitioned" filter) that exceeds 1.5x FP rate due to this effect. The detailed condition is only checked if 3 million keys or more have been added to a filter, as this should be a lower bound for common bits/key settings (< 20). Recommended remedies include smaller SST file size, using format_version >= 5 (for new Bloom filter), or using partitioned filters. This does not change behavior other than generating warnings for some constructed filters using the old implementation. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6317 Test Plan: Example with warning, 15M keys @ 15 bits / key: (working_mem_size_mb is just to stop after building one filter if it's large) $ ./filter_bench -quick -impl=0 -working_mem_size_mb=1 -bits_per_key=15 -average_keys_per_filter=15000000 2>&1 \| grep 'FP rate' [WARN] [/block_based/filter_policy.cc:292] Using legacy SST/BBT Bloom filter with excessive key count (15.0M @ 15bpk), causing estimated 1.8x higher filter FP rate. Consider using new Bloom with format_version>=5, smaller SST file size, or partitioned filters. Predicted FP rate %: 0.766702 Average FP rate %: 0.66846 Example without warning (150K keys): $ ./filter_bench -quick -impl=0 -working_mem_size_mb=1 -bits_per_key=15 -average_keys_per_filter=150000 2>&1 \| grep 'FP rate' Predicted FP rate %: 0.422857 Average FP rate %: 0.379301 $ With more samples at 15 bits/key: 150K keys -> no warning; actual: 0.379% FP rate (baseline) 1M keys -> no warning; actual: 0.396% FP rate, 1.045x 9M keys -> no warning; actual: 0.563% FP rate, 1.485x 10M keys -> warning (1.5x); actual: 0.564% FP rate, 1.488x 15M keys -> warning (1.8x); actual: 0.668% FP rate, 1.76x 25M keys -> warning (2.4x); actual: 0.880% FP rate, 2.32x At 10 bits/key: 150K keys -> no warning; actual: 1.17% FP rate (baseline) 1M keys -> no warning; actual: 1.16% FP rate 10M keys -> no warning; actual: 1.32% FP rate, 1.13x 25M keys -> no warning; actual: 1.63% FP rate, 1.39x 35M keys -> warning (1.6x); actual: 1.81% FP rate, 1.55x At 5 bits/key: 150K keys -> no warning; actual: 9.32% FP rate (baseline) 25M keys -> no warning; actual: 9.62% FP rate, 1.03x 200M keys -> no warning; actual: 12.2% FP rate, 1.31x 250M keys -> warning (1.5x); actual: 12.8% FP rate, 1.37x 300M keys -> warning (1.6x); actual: 13.4% FP rate, 1.43x The reason for the modest inaccuracy at low bits/key is that the assumption of independence between a collision between 32-hash values feeding the filter and an FP in the filter is not quite true for implementations using "simple" logic to compute indices from the stock hash result. There's math on this in my dissertation, but I don't think it's worth the effort just for these extreme cases (> 100 million keys and low-ish bits/key). Differential Revision: D19471715 Pulled By: pdillinger fbshipit-source-id: f80c96893a09bf1152630ff0b964e5cdd7e35c68	5 years ago

1 2 3 4

184 Commits (1bcef3d83c668afa690dd40c89bb1e955be0b1da)