rocksdb

Commit Graph

Author	SHA1	Message	Date
Andrew Kryczka	b02d0c238d	Init compression dict handle before reading meta-blocks (#5267 ) Summary: At least one of the meta-block loading functions (`ReadRangeDelBlock`) uses the same block reading function (`NewDataBlockIterator`) as data block reads, which means it uses the dictionary handle. However, the dictionary handle was uninitialized while reading meta-blocks, causing readers to receive an error. This situation was only noticed when `cache_index_and_filter_blocks=true`. This PR initializes the handle to null while reading meta-blocks to prevent the error. It also adds support to `db_stress` / `db_crashtest.py` for `cache_index_and_filter_blocks`. Fixes #5263. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5267 Differential Revision: D15149264 Pulled By: maysamyabandeh fbshipit-source-id: 991d38a306c62db5976778bfb050fa3cd4a0671b	7 years ago
Sagar Vemuri	3548e4220d	Improve explicit user readahead performance (#5246 ) Summary: Improve the iterators performance when the user explicitly sets the readahead size via `ReadOptions.readahead_size`. 1. Stop creating new table readers when the user explicitly sets readahead size. 2. Make use of an internal buffer based on `FilePrefetchBuffer` instead of using `ReadaheadRandomAccessFileReader`, to handle the user readahead requests (for both buffered and direct io cases). 3. Add `readahead_size` to db_bench. Benchmarks: https://gist.github.com/sagar0/53693edc320a18abeaeca94ca32f5737 For 1 MB readahead, Buffered IO performance improves by 28% and Direct IO performance improves by 50%. For 512KB readahead, Buffered IO performance improves by 30% and Direct IO performance improves by 67%. Test Plan: Updated `DBIteratorTest.ReadAhead` test to make sure that: - no new table readers are created for iterators on setting ReadOptions.readahead_size - At least "readahead" number of bytes are actually getting read on each iterator read. TODO later: - Use similar logic for compactions as well. - This ties in nicely with #4052 and paves the way for removing ReadaheadRandomAcessFile later. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5246 Differential Revision: D15107946 Pulled By: sagar0 fbshipit-source-id: 2c1149729ca7d779e4e8b7710ba6f4e8cbfd3bea	7 years ago
Maysam Yabandeh	506e8448be	Refresh snapshot list during long compactions (#5099 ) Summary: Part of compaction cpu goes to processing snapshot list, the larger the list the bigger the overhead. Although the lifetime of most of the snapshots is much shorter than the lifetime of compactions, the compaction conservatively operates on the list of snapshots that it initially obtained. This patch allows the snapshot list to be updated via a callback if the compaction is taking long. This should let the compaction to continue more efficiently with much smaller snapshot list. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5099 Differential Revision: D15086710 Pulled By: maysamyabandeh fbshipit-source-id: 7649f56c3b6b2fb334962048150142a3bf9c1a12	7 years ago
Yuchi Chen	78a6e07c83	Fix compilation errors for 32bits/LITE/ios build. (#5220 ) Summary: When I build RocksDB for 32bits/LITE/iOS environment, some errors like the following. ` table/block_based_table_reader.cc:971:44: error: implicit conversion loses integer precision: 'uint64_t' (aka 'unsigned long long') to 'size_t' (aka 'unsigned long') [-Werror,-Wshorten-64-to-32] size_t block_size = props_block_handle.size(); ~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~^~~~~~ ./util/file_reader_writer.h:177:8: error: private field 'env_' is not used [-Werror,-Wunused-private-field] Env* env_; ^ ` Pull Request resolved: https://github.com/facebook/rocksdb/pull/5220 Differential Revision: D15023481 Pulled By: siying fbshipit-source-id: 1b5d121d3016f2b0a8a9a2cc1bd638479357f9f7	7 years ago
Mike Kolupaev	df38c1ce66	Add BlockBasedTableOptions::index_shortening (#5174 ) Summary: Introduce BlockBasedTableOptions::index_shortening to give users control on which key shortening techniques to be used in building index blocks. Before this patch, both separators and successor keys where shortened in indexes. With this patch, the default is set to kShortenSeparators to only shorten the separators. Since each index block has many separators and only one successor (last key), the change should not have negative impact on index block size. However it should prevent many unnecessary block loads where due to approximation introduced by shorted successor, seek would land us to the previous block and then fix it by moving to the next one. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5174 Differential Revision: D14884185 Pulled By: al13n321 fbshipit-source-id: 1b08bc8c03edcf09b6b8c16e9a7eea08ad4dd534	7 years ago
anand76	5265c5709e	Remove a couple of non-public includes from public header file (#5219 ) Summary: Cleanup a couple of stray includes left by #5011. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5219 Differential Revision: D15007244 Pulled By: anand1976 fbshipit-source-id: 15ca1d4f977b5b60e99df3bfb8fc3db217d19bdd	7 years ago
Sagar Vemuri	efa948741c	Use creation_time or mtime when file_creation_time=0 (#5184 ) Summary: We found an issue in Periodic Compactions (introduced in #5166) where files were not being picked up for compactions as all the SST files created with older versions of RocksDB have `file_creation_time` as 0. (Note that `file_creation_time` is a new table property introduced in #5166). To address this, Periodic compactions now fall back to looking at the `creation_time` table property or the file's modification time (as given by the Env) when `file_creation_time` table property is found to be 0. Here how the file's modification time (and, in turn, the file age) is computed now: 1. Use `file_creation_time` table property if it is > 0. 1. If not, then use `creation_time` table property if it is > 0. 1. If not, then use file's mtime stat metadata given by the underlying Env. Don't consider the file at all for compaction if the modification time cannot be correctly determined based on the above conditions. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5184 Differential Revision: D14907795 Pulled By: sagar0 fbshipit-source-id: 4bb2f3631f9a3e04470c674a1d13544584e1e56c	7 years ago
Siying Dong	992dfc7811	Introduce InternalIteratorBase::NextAndGetResult() (#5197 ) Summary: In long scans, virtual function calls of Next(), Valid(), key() and value() are not trivial. By introducing NextAndGetResult(), Some of the Next(), Valid() and key() calls are consolidated into one virtual function call to reduce CPU. Also did some inline tricks and add some "final" randomly in some functions. Even without the "final" annotation, most Next() calls are inlined with -O3, but sometimes with a final it is inlined by O2 too. It doesn't hurt to add those final annotations. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5197 Differential Revision: D14945977 Pulled By: siying fbshipit-source-id: 7003969f9a5f1d5717f0bda503b91d19ba75ed88	7 years ago
Fosco Marotto	6c2bf9e916	Add copyright headers per FB open-source checkup tool. (#5199 ) Summary: internal task: T35568575 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5199 Differential Revision: D14962794 Pulled By: gfosco fbshipit-source-id: 93838ede6d0235eaecff90d200faed9a8515bbbe	7 years ago
Yanqin Jin	d9280ff2d2	Add back NewEmptyIterator (#5203 ) Summary: #4905 removed the implementation of `NewEmptyIterator` but kept its declaration in the public header. This breaks some systems that depend on RocksDB if the systems use `NewEmptyIterator`. Therefore, add it back to fix. cc maysamyabandeh please remind me if I miss anything here. Thanks Pull Request resolved: https://github.com/facebook/rocksdb/pull/5203 Differential Revision: D14968382 Pulled By: riversand963 fbshipit-source-id: 5fb86e99c8cfaf9f7a9473cdb1355d7558ff6e01	7 years ago
yiwu-arbug	f1239d5f10	Avoid per-key upper bound check in BlockBasedTableIterator (#5142 ) Summary: This is second attempt for #5101. Original commit message: `BlockBasedTableIterator` avoid reading next block on `Next()` if it detects the iterator will be out of bound, by checking against index key. The optimization was added in #2239, and by the time it only check the bound per block. It seems later change make it a per-key check, which introduce unnecessary key comparisons. This patch come with two fixes: Fix 1: To optimize checking for bounds, we need comparing the bounds with index key as well. However BlockBasedTableIterator doesn't know whether its index iterator is internally using user keys or internal keys. The patch fixes that by extending InternalIterator with a user_key() function that is overridden by In IndexBlockIter. Fix 2: In #5101 we return `IsOutOfBound()=true` when block index key is out of bound. But the index key can be larger than smallest key of the next file on the level. That file can be within upper bound and should not be filtered out. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5142 Differential Revision: D14907113 Pulled By: siying fbshipit-source-id: ac95775c5b4e7b700f76ab43e39f45402c98fbfb	7 years ago
Yanqin Jin	3189398c00	Fix bugs detected by clang analyzer (#5185 ) Summary: as titled. False positive included, fixed anyway to make the check pass. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5185 Differential Revision: D14909384 Pulled By: riversand963 fbshipit-source-id: dc5177e72b1929ccfd6175a60e2cd7bdb9bd80f3	7 years ago
anand76	fefd4b98c5	Introduce a new MultiGet batching implementation (#5011 ) Summary: This PR introduces a new MultiGet() API, with the underlying implementation grouping keys based on SST file and batching lookups in a file. The reason for the new API is twofold - the definition allows callers to allocate storage for status and values on stack instead of std::vector, as well as return values as PinnableSlices in order to avoid copying, and it keeps the original MultiGet() implementation intact while we experiment with batching. Batching is useful when there is some spatial locality to the keys being queries, as well as larger batch sizes. The main benefits are due to - 1. Fewer function calls, especially to BlockBasedTableReader::MultiGet() and FullFilterBlockReader::KeysMayMatch() 2. Bloom filter cachelines can be prefetched, hiding the cache miss latency The next step is to optimize the binary searches in the level_storage_info, index blocks and data blocks, since we could reduce the number of key comparisons if the keys are relatively close to each other. The batching optimizations also need to be extended to other formats, such as PlainTable and filter formats. This also needs to be added to db_stress. Benchmark results from db_bench for various batch size/locality of reference combinations are given below. Locality was simulated by offsetting the keys in a batch by a stride length. Each SST file is about 8.6MB uncompressed and key/value size is 16/100 uncompressed. To focus on the cpu benefit of batching, the runs were single threaded and bound to the same cpu to eliminate interference from other system events. The results show a 10-25% improvement in micros/op from smaller to larger batch sizes (4 - 32). Batch Sizes 1 \| 2 \| 4 \| 8 \| 16 \| 32 Random pattern (Stride length 0) 4.158 \| 4.109 \| 4.026 \| 4.05 \| 4.1 \| 4.074 - Get 4.438 \| 4.302 \| 4.165 \| 4.122 \| 4.096 \| 4.075 - MultiGet (no batching) 4.461 \| 4.256 \| 4.277 \| 4.11 \| 4.182 \| 4.14 - MultiGet (w/ batching) Good locality (Stride length 16) 4.048 \| 3.659 \| 3.248 \| 2.99 \| 2.84 \| 2.753 4.429 \| 3.728 \| 3.406 \| 3.053 \| 2.911 \| 2.781 4.452 \| 3.45 \| 2.833 \| 2.451 \| 2.233 \| 2.135 Good locality (Stride length 256) 4.066 \| 3.786 \| 3.581 \| 3.447 \| 3.415 \| 3.232 4.406 \| 4.005 \| 3.644 \| 3.49 \| 3.381 \| 3.268 4.393 \| 3.649 \| 3.186 \| 2.882 \| 2.676 \| 2.62 Medium locality (Stride length 4096) 4.012 \| 3.922 \| 3.768 \| 3.61 \| 3.582 \| 3.555 4.364 \| 4.057 \| 3.791 \| 3.65 \| 3.57 \| 3.465 4.479 \| 3.758 \| 3.316 \| 3.077 \| 2.959 \| 2.891 dbbench command used (on a DB with 4 levels, 12 million keys)- TEST_TMPDIR=/dev/shm numactl -C 10 ./db_bench.tmp -use_existing_db=true -benchmarks="readseq,multireadrandom" -write_buffer_size=4194304 -target_file_size_base=4194304 -max_bytes_for_level_base=16777216 -num=12000000 -reads=12000000 -duration=90 -threads=1 -compression_type=none -cache_size=4194304000 -batch_size=32 -disable_auto_compactions=true -bloom_bits=10 -cache_index_and_filter_blocks=true -pin_l0_filter_and_index_blocks_in_cache=true -multiread_batched=true -multiread_stride=4 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5011 Differential Revision: D14348703 Pulled By: anand1976 fbshipit-source-id: 774406dab3776d979c809522a67bedac6c17f84b	7 years ago
Sagar Vemuri	d3d20dcdca	Periodic Compactions (#5166 ) Summary: Introducing Periodic Compactions. This feature allows all the files in a CF to be periodically compacted. It could help in catching any corruptions that could creep into the DB proactively as every file is constantly getting re-compacted. And also, of course, it helps to cleanup data older than certain threshold. - Introduced a new option `periodic_compaction_time` to control how long a file can live without being compacted in a CF. - This works across all levels. - The files are put in the same level after going through the compaction. (Related files in the same level are picked up as `ExpandInputstoCleanCut` is used). - Compaction filters, if any, are invoked as usual. - A new table property, `file_creation_time`, is introduced to implement this feature. This property is set to the time at which the SST file was created (and that time is given by the underlying Env/OS). This feature can be enabled on its own, or in conjunction with `ttl`. It is possible to set a different time threshold for the bottom level when used in conjunction with ttl. Since `ttl` works only on 0 to last but one levels, you could set `ttl` to, say, 1 day, and `periodic_compaction_time` to, say, 7 days. Since `ttl < periodic_compaction_time` all files in last but one levels keep getting picked up based on ttl, and almost never based on periodic_compaction_time. The files in the bottom level get picked up for compaction based on `periodic_compaction_time`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5166 Differential Revision: D14884441 Pulled By: sagar0 fbshipit-source-id: 408426cbacb409c06386a98632dcf90bfa1bda47	7 years ago
Levi Tamasi	59ef2ba559	Evict the uncompression dictionary from the block cache upon table close (#5150 ) Summary: The uncompression dictionary object has a Statistics pointer that might dangle if the database closed. This patch evicts the dictionary from the block cache when a table is closed, similarly to how index and filter readers are handled. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5150 Differential Revision: D14782422 Pulled By: ltamasi fbshipit-source-id: 0cec9336c742c479aa92206e04521767f1aa9622	7 years ago
Adam Simpkins	c06c4c01c5	Fix many bugs in log statement arguments (#5089 ) Summary: Annotate all of the logging functions to inform the compiler that these use printf-style formatting arguments. This allows the compiler to emit warnings if the format arguments are incorrect. This also fixes many problems reported now that format string checking is enabled. Many of these are simply mix-ups in the argument type (e.g, int vs uint64_t), but in several cases the wrong number of arguments were being passed in which can cause the code to crash. The primary motivation for this was to fix the log message in `DBImpl::SwitchMemtable()` which caused a segfault due to an extra %s format parameter with no argument supplied. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5089 Differential Revision: D14574795 Pulled By: simpkins fbshipit-source-id: 0921b03f0743652bf4ae21e414ff54b3bb65422a	7 years ago
Zhongyi Xie	26015f3b48	add compression options to table properties (#5081 ) Summary: Since we are planning to use dictionary compression and to use different compression level, it is quite useful to add compression options to TableProperties. For example, in MyRocks, if the feature is available, we can query from information_schema.rocksdb_sst_props to see if all sst files are converted to ZSTD dictionary compressions. Resolves https://github.com/facebook/rocksdb/issues/4992 With this PR, user can query table properties through `GetPropertiesOfAllTables` API and get compression options as std::string: `window_bits=-14; level=32767; strategy=0; max_dict_bytes=0; zstd_max_train_bytes=0; enabled=0;` or table_properties->ToString() will also contain it `# data blocks=1; # entries=13; # deletions=0; # merge operands=0; # range deletions=0; raw key size=143; raw average key size=11.000000; raw value size=39; raw average value size=3.000000; data block size=120; index block size (user-key? 0, delta-value? 0)=27; filter block size=0; (estimated) table size=147; filter policy name=N/A; prefix extractor name=nullptr; column family ID=0; column family name=default; comparator name=leveldb.BytewiseComparator; merge operator name=nullptr; property collectors names=[]; SST file compression algo=Snappy; SST file compression options=window_bits=-14; level=32767; strategy=0; max_dict_bytes=0; zstd_max_train_bytes=0; enabled=0; ; creation time=1552946632; time stamp of earliest key=1552946632;` Pull Request resolved: https://github.com/facebook/rocksdb/pull/5081 Differential Revision: D14716692 Pulled By: miasantreble fbshipit-source-id: 7d2f2cf84e052bff876e71b4212cfdebf5be32dd	7 years ago
Siying Dong	ebcc8ae1d3	Revert "Avoid per-key upper bound check in BlockBasedTableIterator (#5101 )" (#5132 ) Summary: This reverts commit `f29dc1b906`. In BlockBasedTableIterator, index_iter_->key() is sometimes a user key, so it is wrong to call ExtractUserKey() against it. This is a bug introduced by #5101. Temporarily revert the diff to keep the branch clean. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5132 Differential Revision: D14718584 Pulled By: siying fbshipit-source-id: 0ac55dc9b5dbc18c7809092146bdf7eb9364b9ad	7 years ago
Remington Brasga	127a850beb	Fix arena allocation size in NewEmptyInternalIterator (#4905 ) Summary: NewEmptyInternalIterator with arena mistakenly used EmptyIterator to allocate the size from area but then initialized it to a totally different object: EmptyInternalIterator. The patch fixes that. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4905 Differential Revision: D14689840 Pulled By: maysamyabandeh fbshipit-source-id: af64fd8ee93d5a4ad54691c792e5ecc5efabc887	7 years ago
Yi Wu	f29dc1b906	Avoid per-key upper bound check in BlockBasedTableIterator (#5101 ) Summary: `BlockBasedTableIterator` avoid reading next block on `Next()` if it detects the iterator will be out of bound, by checking against index key. The optimization was added in #2239, and by the time it only check the bound per block. It seems later change make it a per-key check, which introduce unnecessary key comparisons. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5101 Differential Revision: D14678707 Pulled By: siying fbshipit-source-id: 2372446116753c7892ea4cec7b4b49ef87ba463e	7 years ago
Siying Dong	89ab1381f8	Apply automatic formatting to some files (#5114 ) Summary: Following files were run through automatic formatter: db/db_impl.cc db/db_impl.h db/db_impl_compaction_flush.cc db/db_impl_debug.cc db/db_impl_files.cc db/db_impl_readonly.h db/db_impl_write.cc db/dbformat.cc db/dbformat.h table/block.cc table/block.h table/block_based_filter_block.cc table/block_based_filter_block.h table/block_based_filter_block_test.cc table/block_based_table_builder.cc table/block_based_table_reader.cc table/block_based_table_reader.h table/block_builder.cc table/block_builder.h table/block_fetcher.cc table/block_prefix_index.cc table/block_prefix_index.h table/block_test.cc table/format.cc table/format.h I could easily run all the files, but I don't want people to feel that I'm doing it for lines of code changes :) Pull Request resolved: https://github.com/facebook/rocksdb/pull/5114 Differential Revision: D14633040 Pulled By: siying fbshipit-source-id: 3f346cb53bf21e8c10704400da548dfce1e89a52	7 years ago
Yi Wu	d69241586e	Fix perf_context.user_key_comparison_count for range scan (#5098 ) Summary: Currently `perf_context.user_key_comparison_count` is bump only in `InternalKeyComparator`. For places user comparator is used directly the counter is not bump. Fixing the majority of it. Index iterator and filter code also use user comparator directly and don't bump the counter. It is not fixed in this patch. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5098 Differential Revision: D14603753 Pulled By: siying fbshipit-source-id: 1cd41035644ca9e49b97a51030a5d1e15f5f3cae	7 years ago
Siying Dong	2b4d5ceb47	Remove some "using std::..." from header files. (#5113 ) Summary: The code convention we are following, Google C++ Style, discourage alias in header files, especially public headers: https://google.github.io/styleguide/cppguide.html#Aliases Remove some of them. Might removed some from .cc files as well to be consistent. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5113 Differential Revision: D14633030 Pulled By: siying fbshipit-source-id: b990edc919d5de60295992284f980195e501d424	7 years ago
Yi Wu	75133b1b6b	Fix SstFileReader not able to open ingested file (#5097 ) Summary: Since `SstFileReader` don't know largest seqno of a file, it will fail this check when it open a file with global seqno: `ca89ac2ba9/table/block_based_table_reader.cc (L730)` Changes: * Pass largest_seqno=kMaxSequenceNumber from `SstFileReader` and allow it to bypass the above check. * `BlockBasedTable::VerifyChecksum` also double check if checksum will match when excluding global seqno (this is to make the new test in sst_table_reader_test pass). Pull Request resolved: https://github.com/facebook/rocksdb/pull/5097 Differential Revision: D14607434 Pulled By: riversand963 fbshipit-source-id: 9008599227c5fccbf9b73fee46b3bf4a1523f023	7 years ago
Yi Wu	7ca9eb7542	Fix BlockBasedTableIterator construction missing index_key_is_full parameter Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5104 Differential Revision: D14619000 Pulled By: maysamyabandeh fbshipit-source-id: c2895794a3f31b826c149dcb698c1952dacc2332	7 years ago
Shobhit Dayal	b45b1cde3e	Feature for sampling and reporting compressibility (#4842 ) Summary: This is a feature to sample data-block compressibility and and report them as stats. 1 in N (tunable) blocks is sampled for compressibility using two algorithms: 1. lz4 or snappy for fast compression 2. zstd or zlib for slow but higher compression. The stats are reported to the caller as raw-bytes and compressed-bytes. The block continues to be compressed for storage using the specified CompressionType. The db_bench_tool how has a command line option for specifying the sampling rate. It's default value is 0 (no sampling). To test the overhead for a certain value, users can compare the performance of db_bench_tool, varying the sampling rate. It is unlikely to have a noticeable impact for high values like 20. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4842 Differential Revision: D13629011 Pulled By: shobhitdayal fbshipit-source-id: 14ca668bcab6499b2a1734edf848eb62a4f4fafa	7 years ago
Yi Wu	62eb2c23aa	Print data block index options to info log (#5039 ) Summary: Print data block index type related options to info log Pull Request resolved: https://github.com/facebook/rocksdb/pull/5039 Differential Revision: D14387718 Pulled By: miasantreble fbshipit-source-id: 9df8f82eea83a8344c7d12a712486f656691bc4a	7 years ago
Siying Dong	0920bf4e68	Revert "Remove PlainTable's feature store_index_in_file (#4914 )" (#5034 ) Summary: This reverts commit `ee1818081f`. We are not ready to deprecate this feature. revert it for now. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5034 Differential Revision: D14287246 Pulled By: siying fbshipit-source-id: e4beafdeaee1c94364fdaa6ba198218d158339f7	7 years ago
Siying Dong	aef763b6d6	Make statistics's stats_level change thread-safe (#5030 ) Summary: Right now, users can change statistics.stats_level while DB is running, but TSAN may report data race. We make stats_level_ to be atomic, and access them using accessors. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5030 Differential Revision: D14267519 Pulled By: siying fbshipit-source-id: 37d7ebeff7a43a406230143422a16af899163f73	7 years ago
Siying Dong	5e298f865b	Add two more StatsLevel (#5027 ) Summary: Statistics cost too much CPU for some use cases. Add two stats levels so that people can choose to skip two types of expensive stats, timers and histograms. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5027 Differential Revision: D14252765 Pulled By: siying fbshipit-source-id: 75ecec9eaa44c06118229df4f80c366115346592	7 years ago
Zhongyi Xie	ed995c6a69	add whole key bloom filter support in memtables (#4985 ) Summary: MyRocks calls `GetForUpdate` on `INSERT`, for unique key check, and in almost all cases GetForUpdate returns empty result. For such cases, whole key bloom filter is helpful. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4985 Differential Revision: D14118257 Pulled By: miasantreble fbshipit-source-id: d35cb7109c62fd5ad541a26968e3a3e16d3e85ea	7 years ago
Michael Liu	ca89ac2ba9	Apply modernize-use-override (2nd iteration) Summary: Use C++11’s override and remove virtual where applicable. Change are automatically generated. Reviewed By: Orvid Differential Revision: D14090024 fbshipit-source-id: 1e9432e87d2657e1ff0028e15370a85d1739ba2a	7 years ago
Andrew Kryczka	c8c8104d7e	Dictionary compression for files written by SstFileWriter (#4978 ) Summary: If `CompressionOptions::max_dict_bytes` and/or `CompressionOptions::zstd_max_train_bytes` are set, `SstFileWriter` will now generate files respecting those options. I refactored the logic a bit for deciding when to use dictionary compression. Previously we plumbed `is_bottommost_level` down to the table builder and used that. However it was kind of confusing in `SstFileWriter`'s context since we don't know what level the file will be ingested to. Instead, now the higher-level callers (e.g., flush, compaction, file writer) are responsible for building the right `CompressionOptions` to give the table builder. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4978 Differential Revision: D14060763 Pulled By: ajkr fbshipit-source-id: dc802c327896df2b319dc162d6acc82b9cdb452a	7 years ago
Andrew Kryczka	62f70f6d14	Reduce scope of compression dictionary to single SST (#4952 ) Summary: Our previous approach was to train one compression dictionary per compaction, using the first output SST to train a dictionary, and then applying it on subsequent SSTs in the same compaction. While this was great for minimizing CPU/memory/I/O overhead, it did not achieve good compression ratios in practice. In our most promising potential use case, moderate reductions in a dictionary's scope make a major difference on compression ratio. So, this PR changes compression dictionary to be scoped per-SST. It accepts the tradeoff during table building to use more memory and CPU. Important changes include: - The `BlockBasedTableBuilder` has a new state when dictionary compression is in-use: `kBuffered`. In that state it accumulates uncompressed data in-memory whenever `Add` is called. - After accumulating target file size bytes or calling `BlockBasedTableBuilder::Finish`, a `BlockBasedTableBuilder` moves to the `kUnbuffered` state. The transition (`EnterUnbuffered()`) involves sampling the buffered data, training a dictionary, and compressing/writing out all buffered data. In the `kUnbuffered` state, a `BlockBasedTableBuilder` behaves the same as before -- blocks are compressed/written out as soon as they fill up. - Samples are now whole uncompressed data blocks, except the final sample may be a partial data block so we don't breach the user's configured `max_dict_bytes` or `zstd_max_train_bytes`. The dictionary trainer is supposed to work better when we pass it real units of compression. Previously we were passing 64-byte KV samples which was not realistic. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4952 Differential Revision: D13967980 Pulled By: ajkr fbshipit-source-id: 82bea6f7537e1529c7a1a4cdee84585f5949300f	7 years ago
Peter (Stig) Edwards	79496d71ed	Increment NUMBER_BLOCK_NOT_COMPRESSED when !GoodCompressionRatio (#4929 ) Summary: See https://github.com/facebook/rocksdb/issues/4884 Pull Request resolved: https://github.com/facebook/rocksdb/pull/4929 Differential Revision: D14028333 Pulled By: sagar0 fbshipit-source-id: eed12bceae85385a34aaa6dd303bf0f53c4c7b06	7 years ago
Yanqin Jin	2d049ab7e8	Checksum properties block for block-based table (#4956 ) Summary: Always enable properties block checksum verification for block-based table. For external SST file ingested with 'write_global_seqno==true', we use 'DecodeEntrySlow' to parse its blocks' contents so that the process will not die upon failing the assertion possibly caused by corruption. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4956 Differential Revision: D14012741 Pulled By: riversand963 fbshipit-source-id: 8b766e6f54b36f8f9e074c0e19e0926ec3cce186	7 years ago
Siying Dong	cf3a671733	Remove cuckoo hash memtable (#4953 ) Summary: Cuckoo Hash is less useful than we initially expected. Remove it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4953 Differential Revision: D13979264 Pulled By: siying fbshipit-source-id: 2a60afdaa989f045357398b43a1cc5d46f4492ed	7 years ago
Alexander Zinoviev	32a6dd9a41	Add a new CPU time counter to compaction report (#4889 ) Summary: Measure CPU time consumed for a compaction and report it in the stats report Enable NowCPUNanos() to work for MacOS Pull Request resolved: https://github.com/facebook/rocksdb/pull/4889 Differential Revision: D13701276 Pulled By: zinoale fbshipit-source-id: 5024e5bbccd4dd10fd90d947870237f436445055	7 years ago
Yanqin Jin	158da7a6ee	Verify checksum before ingestion (#4916 ) Summary: before file ingestion (in preparation phase), verify the checksums of the blocks of the external SST file, including properties block with global seqno. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4916 Differential Revision: D13863501 Pulled By: riversand963 fbshipit-source-id: dc54697f970e3807832e2460f7228fcc7efe81ee	7 years ago
Siying Dong	ee1818081f	Remove PlainTable's feature store_index_in_file (#4914 ) Summary: Store_index_in_file is a less useful feature. To simplify the code to maintain, we are dropping the feature. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4914 Differential Revision: D13791883 Pulled By: siying fbshipit-source-id: d187c5d662584866103e4b77d09dfb925509ae2e	7 years ago
Siying Dong	f184bee77b	PlainTable should avoid copying Get() results from immortal source. (#4924 ) Summary: https://github.com/facebook/rocksdb/pull/4053 avoids memcopy for Get() results if files are immortable (read-only DB, max_open_files=-1) and the file is ammaped. The same optimization is being applied to PlainTable here. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4924 Differential Revision: D13827749 Pulled By: siying fbshipit-source-id: 1f2cbfc530b40ce08ccd53f95f6e78de4d1c2f96	7 years ago
Siying Dong	fc53839bfa	Disallow customized hash function in DynamicBloom (#4915 ) Summary: I didn't find where customized hash function is used in DynamicBloom. This can only reduce performance. Remove it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4915 Differential Revision: D13794452 Pulled By: siying fbshipit-source-id: e38669b11e01444d2d782da11c7decabbd851819	7 years ago
Andrew Kryczka	8ec3e72551	Cache dictionary used for decompressing data blocks (#4881 ) Summary: - If block cache disabled or not used for meta-blocks, `BlockBasedTableReader::Rep::uncompression_dict` owns the `UncompressionDict`. It is preloaded during `PrefetchIndexAndFilterBlocks`. - If block cache is enabled and used for meta-blocks, block cache owns the `UncompressionDict`, which holds dictionary and digested dictionary when needed. It is never prefetched though there is a TODO for this in the code. The cache key is simply the compression dictionary block handle. - New stats for compression dictionary accesses in block cache: "BLOCK_CACHE_COMPRESSION_DICT_*" and "compression_dict_block_read_count" Pull Request resolved: https://github.com/facebook/rocksdb/pull/4881 Differential Revision: D13663801 Pulled By: ajkr fbshipit-source-id: bdcc54044e180855cdcc57639b493b0e016c9a3f	7 years ago
Andrew Kryczka	27054d837b	Call NewDataBlockIterator with correct arguments in DB::Get (#4913 ) Summary: The pointer `get_context` was passed as the value for the boolean argument `index_key_is_full`. Luckily the pointer was always non-null so evaluated to true which is the correct value for the boolean argument. But we were missing out on batch updates to stats since we were not passing anything for the `GetContext*` argument and it defaults to `nullptr`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4913 Differential Revision: D13791449 Pulled By: ajkr fbshipit-source-id: dbe40bf406c64d34cb5298604145d18b9e0ca9be	7 years ago
Andrew Kryczka	01013ae766	Digest ZSTD compression dictionary once when writing SST file (#4849 ) Summary: This is essentially a re-submission of #4251 with a few improvements: - Split `CompressionDict` into two separate classes: `CompressionDict` and `UncompressionDict` - Eliminated `Init` functions. Instead do all initialization work in constructors. - Added test case for parallel DB open, which is the scenario where #4251 failed under TSAN. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4849 Differential Revision: D13606039 Pulled By: ajkr fbshipit-source-id: 08c236059798c710db9cbf545fce0f371232d447	7 years ago
Andrew Kryczka	ace543a815	fix accounting for range tombstones in TableProperties (#4841 ) Summary: - To be consistent with the accounting of other optypes in `TableProperties`, we should count range tombstones in `TableProperties::num_entries` and `TableProperties::num_deletions`. - Updated assertions in stress test's `OnTableFileCreated` handler to accept files with range tombstones only. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4841 Differential Revision: D13568424 Pulled By: ajkr fbshipit-source-id: 0139d7806494eda20ece67ec460d2458dbbf6026	7 years ago
Alexander Zinoviev	80bf8975fd	Add a new per level counter for block cache hit (#4796 ) Summary: Add a new per level counter for block cache hits, increase it by one on every successful attempt to get an entry from cache. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4796 Differential Revision: D13513688 Pulled By: zinoale fbshipit-source-id: 104df038f1232e3356e162eb2d8ca138e34a8281	7 years ago
Andrew Kryczka	e0be1bc4f1	fix DeleteRange memory leak for mmap and block cache (#4810 ) Summary: Previously we were cleaning up range tombstone meta-block by calling `ReleaseCachedEntry`, which wouldn't work if `value != nullptr && cache_handle == nullptr`. This happened at least in the case with mmap reads and block cache both enabled. I noticed `NewDataBlockIterator` intends to handle all these cases, so migrated to that instead of `NewUnfragmentedRangeTombstoneIterator`. Also changed the table-opening logic to fail on `ReadRangeDelBlock` failure, since that can cause data corruption. Added a test case to verify this behavior. Note the test case does not fail on `TryReopen` because failure to preload table handlers is not considered critical. However, it does fail on any read involving that file since it cannot return correct data. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4810 Differential Revision: D13534296 Pulled By: ajkr fbshipit-source-id: 55dde1111717cea6ec4bf38418daab81ccef3599	7 years ago
DorianZheng	2670fe8c73	Get `CompactionJobInfo` from CompactFiles Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4716 Differential Revision: D13207677 Pulled By: ajkr fbshipit-source-id: d0ccf5a66df6cbb07288b0c5ebad81fd9df3926b	7 years ago
Abhishek Madan	cad248f5c6	Prepare FragmentedRangeTombstoneIterator for use in compaction (#4740 ) Summary: To support the flush/compaction use cases of RangeDelAggregator in v2, FragmentedRangeTombstoneIterator now supports dropping tombstones that cannot be read in the compaction output file. Furthermore, FragmentedRangeTombstoneIterator supports the "snapshot striping" use case by allowing an iterator to be split by a list of snapshots. RangeDelAggregatorV2 will use these changes in a follow-up change. In the process of making these changes, other miscellaneous cleanups were also done in these files. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4740 Differential Revision: D13287382 Pulled By: abhimadan fbshipit-source-id: f5aeb03e1b3058049b80c02a558ee48f723fa48c	7 years ago

... 11 12 13 14 15 ...

1472 Commits (baf37a0e818dc334a0ed94f3d315155e2c138c93)