Summary:
RocksDB does auto-readahead for iterators on noticing more than two sequential reads for a table file if user doesn't provide readahead_size. The readahead starts at 8KB and doubles on every additional read up to max_auto_readahead_size. However at each level, if iterator moves over next file, readahead_size starts again from 8KB.
This PR introduces a new ReadOption "adaptive_readahead" which when set true will maintain readahead_size at each level. So when iterator moves from one file to another, new file's readahead_size will continue from previous file's readahead_size instead of scratch. However if reads are not sequential it will fall back to 8KB (default) with no prefetching for that block.
1. If block is found in cache but it was eligible for prefetch (block wasn't in Rocksdb's prefetch buffer), readahead_size will decrease by 8KB.
2. It maintains readahead_size for L1 - Ln levels.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9056
Test Plan:
Added new unit tests
Ran db_bench for "readseq, seekrandom, seekrandomwhilewriting, readrandom" with --adaptive_readahead=true and there was no regression if new feature is enabled.
Reviewed By: anand1976
Differential Revision: D31773640
Pulled By: akankshamahajan15
fbshipit-source-id: 7332d16258b846ae5cea773009195a5af58f8f98
Summary:
https://github.com/facebook/rocksdb/pull/6531 removed some code in partitioned index seek logic. By mistake the logic of storing previous index offset is removed, while the logic of using it is preserved, so that the code might use wrong value to determine reseeking condition.
This will trigger a bug, if following a Seek() not going to the last block, SeekToLast() is called, and then Seek() is called which should position the cursor to the block before SeekToLast().
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6551
Test Plan: Add a unit test that reproduces the bug. In the same unit test, also some reseek cases are covered to avoid regression.
Reviewed By: pdillinger
Differential Revision: D20493990
fbshipit-source-id: 3919aa4861c0481ec96844e053048da1a934b91d
Summary:
Right now block based table iterator is used as both of iterating data for block based table, and for the index iterator for partitioend index. This was initially convenient for introducing a new iterator and block type for new index format, while reducing code change. However, these two usage doesn't go with each other very well. For example, Prev() is never called for partitioned index iterator, and some other complexity is maintained in block based iterators, which is not needed for index iterator but maintainers will always need to reason about it. Furthermore, the template usage is not following Google C++ Style which we are following, and makes a large chunk of code tangled together. This commit separate the two iterators. Right now, here is what it is done:
1. Copy the block based iterator code into partitioned index iterator, and de-template them.
2. Remove some code not needed for partitioned index. The upper bound check and tricks are removed. We never tested performance for those tricks when partitioned index is enabled in the first place. It's unlikelyl to generate performance regression, as creating new partitioned index block is much rarer than data blocks.
3. Separate out the prefetch logic to a helper class and both classes call them.
This commit will enable future follow-ups. One direction is that we might separate index iterator interface for data blocks and index blocks, as they are quite different.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6531
Test Plan: build using make and cmake. And build release
Differential Revision: D20473108
fbshipit-source-id: e48011783b339a4257c204cc07507b171b834b0f