rocksdb

Commit Graph

Author	SHA1	Message	Date
Levi Tamasi	253ae017fa	Update version on main to 7.4 and add 7.3 to the format compatibility checks (#10038 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10038 Reviewed By: riversand963 Differential Revision: D36604533 Pulled By: ltamasi fbshipit-source-id: 54ccd0a4b32a320b5640a658ea6846ee897065d1	3 years ago
Changyu Bi	cc23b46da1	Support using ZDICT_finalizeDictionary to generate zstd dictionary (#9857 ) Summary: An untrained dictionary is currently simply the concatenation of several samples. The ZSTD API, ZDICT_finalizeDictionary(), can improve such a dictionary's effectiveness at low cost. This PR changes how dictionary is created by calling the ZSTD ZDICT_finalizeDictionary() API instead of creating raw content dictionary (when max_dict_buffer_bytes > 0), and pass in all buffered uncompressed data blocks as samples. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9857 Test Plan: #### db_bench test for cpu/memory of compression+decompression and space saving on synthetic data: Set up: change the parameter [here](`fb9a167a55/tools/db_bench_tool.cc (L1766)`) to 16384 to make synthetic data more compressible. ``` # linked local ZSTD with version 1.5.2 # DEBUG_LEVEL=0 ROCKSDB_NO_FBCODE=1 ROCKSDB_DISABLE_ZSTD=1 EXTRA_CXXFLAGS="-DZSTD_STATIC_LINKING_ONLY -DZSTD -I/data/users/changyubi/install/include/" EXTRA_LDFLAGS="-L/data/users/changyubi/install/lib/ -l:libzstd.a" make -j32 db_bench dict_bytes=16384 train_bytes=1048576 echo "========== No Dictionary ==========" TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=filluniquerandom,compact -num=10000000 -compression_type=zstd -compression_max_dict_bytes=0 -block_size=4096 -max_background_jobs=24 -memtablerep=vector -allow_concurrent_memtable_write=false -disable_wal=true -max_write_buffer_number=8 >/dev/null 2>&1 TEST_TMPDIR=/dev/shm /usr/bin/time ./db_bench -use_existing_db=true -benchmarks=compact -compression_type=zstd -compression_max_dict_bytes=0 -block_size=4096 2>&1 \| grep elapsed du -hc /dev/shm/dbbench/sst \| grep total echo "========== Raw Content Dictionary ==========" TEST_TMPDIR=/dev/shm ./db_bench_main -benchmarks=filluniquerandom,compact -num=10000000 -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -block_size=4096 -max_background_jobs=24 -memtablerep=vector -allow_concurrent_memtable_write=false -disable_wal=true -max_write_buffer_number=8 >/dev/null 2>&1 TEST_TMPDIR=/dev/shm /usr/bin/time ./db_bench_main -use_existing_db=true -benchmarks=compact -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -block_size=4096 2>&1 \| grep elapsed du -hc /dev/shm/dbbench/sst \| grep total echo "========== FinalizeDictionary ==========" TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=filluniquerandom,compact -num=10000000 -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -compression_zstd_max_train_bytes=$train_bytes -compression_use_zstd_dict_trainer=false -block_size=4096 -max_background_jobs=24 -memtablerep=vector -allow_concurrent_memtable_write=false -disable_wal=true -max_write_buffer_number=8 >/dev/null 2>&1 TEST_TMPDIR=/dev/shm /usr/bin/time ./db_bench -use_existing_db=true -benchmarks=compact -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -compression_zstd_max_train_bytes=$train_bytes -compression_use_zstd_dict_trainer=false -block_size=4096 2>&1 \| grep elapsed du -hc /dev/shm/dbbench/sst \| grep total echo "========== TrainDictionary ==========" TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=filluniquerandom,compact -num=10000000 -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -compression_zstd_max_train_bytes=$train_bytes -block_size=4096 -max_background_jobs=24 -memtablerep=vector -allow_concurrent_memtable_write=false -disable_wal=true -max_write_buffer_number=8 >/dev/null 2>&1 TEST_TMPDIR=/dev/shm /usr/bin/time ./db_bench -use_existing_db=true -benchmarks=compact -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -compression_zstd_max_train_bytes=$train_bytes -block_size=4096 2>&1 \| grep elapsed du -hc /dev/shm/dbbench/sst \| grep total # Result: TrainDictionary is much better on space saving, but FinalizeDictionary seems to use less memory. # before compression data size: 1.2GB dict_bytes=16384 max_dict_buffer_bytes = 1048576 space cpu/memory No Dictionary 468M 14.93user 1.00system 0:15.92elapsed 100%CPU (0avgtext+0avgdata 23904maxresident)k Raw Dictionary 251M 15.81user 0.80system 0:16.56elapsed 100%CPU (0avgtext+0avgdata 156808maxresident)k FinalizeDictionary 236M 11.93user 0.64system 0:12.56elapsed 100%CPU (0avgtext+0avgdata 89548maxresident)k TrainDictionary 84M 7.29user 0.45system 0:07.75elapsed 100%CPU (0avgtext+0avgdata 97288maxresident)k ``` #### Benchmark on 10 sample SST files for spacing saving and CPU time on compression: FinalizeDictionary is comparable to TrainDictionary in terms of space saving, and takes less time in compression. ``` dict_bytes=16384 train_bytes=1048576 for sst_file in `ls ../temp/myrock-sst/` do echo "******** $sst_file ********" echo "========== No Dictionary ==========" ./sst_dump --file="../temp/myrock-sst/$sst_file" --command=recompress --compression_level_from=6 --compression_level_to=6 --compression_types=kZSTD echo "========== Raw Content Dictionary ==========" ./sst_dump --file="../temp/myrock-sst/$sst_file" --command=recompress --compression_level_from=6 --compression_level_to=6 --compression_types=kZSTD --compression_max_dict_bytes=$dict_bytes echo "========== FinalizeDictionary ==========" ./sst_dump --file="../temp/myrock-sst/$sst_file" --command=recompress --compression_level_from=6 --compression_level_to=6 --compression_types=kZSTD --compression_max_dict_bytes=$dict_bytes --compression_zstd_max_train_bytes=$train_bytes --compression_use_zstd_finalize_dict echo "========== TrainDictionary ==========" ./sst_dump --file="../temp/myrock-sst/$sst_file" --command=recompress --compression_level_from=6 --compression_level_to=6 --compression_types=kZSTD --compression_max_dict_bytes=$dict_bytes --compression_zstd_max_train_bytes=$train_bytes done 010240.sst (Size/Time) 011029.sst 013184.sst 021552.sst 185054.sst 185137.sst 191666.sst 7560381.sst 7604174.sst 7635312.sst No Dictionary 28165569 / 2614419 32899411 / 2976832 32977848 / 3055542 31966329 / 2004590 33614351 / 1755877 33429029 / 1717042 33611933 / 1776936 33634045 / 2771417 33789721 / 2205414 33592194 / 388254 Raw Content Dictionary 28019950 / 2697961 33748665 / 3572422 33896373 / 3534701 26418431 / 2259658 28560825 / 1839168 28455030 / 1846039 28494319 / 1861349 32391599 / 3095649 33772142 / 2407843 33592230 / 474523 FinalizeDictionary 27896012 / 2650029 33763886 / 3719427 33904283 / 3552793 26008225 / 2198033 28111872 / 1869530 28014374 / 1789771 28047706 / 1848300 32296254 / 3204027 33698698 / 2381468 33592344 / 517433 TrainDictionary 28046089 / 2740037 33706480 / 3679019 33885741 / 3629351 25087123 / 2204558 27194353 / 1970207 27234229 / 1896811 27166710 / 1903119 32011041 / 3322315 32730692 / 2406146 33608631 / 570593 ``` #### Decompression/Read test: With FinalizeDictionary/TrainDictionary, some data structure used for decompression are in stored in dictionary, so they are expected to be faster in terms of decompression/reads. ``` dict_bytes=16384 train_bytes=1048576 echo "No Dictionary" TEST_TMPDIR=/dev/shm/ ./db_bench -benchmarks=filluniquerandom,compact -compression_type=zstd -compression_max_dict_bytes=0 > /dev/null 2>&1 TEST_TMPDIR=/dev/shm/ ./db_bench -use_existing_db=true -benchmarks=readrandom -cache_size=0 -compression_type=zstd -compression_max_dict_bytes=0 2>&1 \| grep MB/s echo "Raw Dictionary" TEST_TMPDIR=/dev/shm/ ./db_bench -benchmarks=filluniquerandom,compact -compression_type=zstd -compression_max_dict_bytes=$dict_bytes > /dev/null 2>&1 TEST_TMPDIR=/dev/shm/ ./db_bench -use_existing_db=true -benchmarks=readrandom -cache_size=0 -compression_type=zstd -compression_max_dict_bytes=$dict_bytes 2>&1 \| grep MB/s echo "FinalizeDict" TEST_TMPDIR=/dev/shm/ ./db_bench -benchmarks=filluniquerandom,compact -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -compression_zstd_max_train_bytes=$train_bytes -compression_use_zstd_dict_trainer=false > /dev/null 2>&1 TEST_TMPDIR=/dev/shm/ ./db_bench -use_existing_db=true -benchmarks=readrandom -cache_size=0 -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -compression_zstd_max_train_bytes=$train_bytes -compression_use_zstd_dict_trainer=false 2>&1 \| grep MB/s echo "Train Dictionary" TEST_TMPDIR=/dev/shm/ ./db_bench -benchmarks=filluniquerandom,compact -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -compression_zstd_max_train_bytes=$train_bytes > /dev/null 2>&1 TEST_TMPDIR=/dev/shm/ ./db_bench -use_existing_db=true -benchmarks=readrandom -cache_size=0 -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -compression_zstd_max_train_bytes=$train_bytes 2>&1 \| grep MB/s No Dictionary readrandom : 12.183 micros/op 82082 ops/sec 12.183 seconds 1000000 operations; 9.1 MB/s (1000000 of 1000000 found) Raw Dictionary readrandom : 12.314 micros/op 81205 ops/sec 12.314 seconds 1000000 operations; 9.0 MB/s (1000000 of 1000000 found) FinalizeDict readrandom : 9.787 micros/op 102180 ops/sec 9.787 seconds 1000000 operations; 11.3 MB/s (1000000 of 1000000 found) Train Dictionary readrandom : 9.698 micros/op 103108 ops/sec 9.699 seconds 1000000 operations; 11.4 MB/s (1000000 of 1000000 found) ``` Reviewed By: ajkr Differential Revision: D35720026 Pulled By: cbi42 fbshipit-source-id: 24d230fdff0fd28a1bb650658798f00dfcfb2a1f	3 years ago
Peter Dillinger	280b9f371a	Fix auto_prefix_mode performance with partitioned filters (#10012 ) Summary: Essentially refactored the RangeMayExist implementation in FullFilterBlockReader to FilterBlockReaderCommon so that it applies to partitioned filters as well. (The function is not called for the block-based filter case.) RangeMayExist is essentially a series of checks around a possible PrefixMayExist, and I'm confident those checks should be the same for partitioned as for full filters. (I think it's likely that bugs remain in those checks, but this change is overall a simplifying one.) Added auto_prefix_mode support to db_bench Other small fixes as well Fixes https://github.com/facebook/rocksdb/issues/10003 Pull Request resolved: https://github.com/facebook/rocksdb/pull/10012 Test Plan: Expanded unit test that uses statistics to check for filter optimization, fails without the production code changes here Performance: populate two DBs with ``` TEST_TMPDIR=/dev/shm/rocksdb_nonpartitioned ./db_bench -benchmarks=fillrandom -num=10000000 -disable_wal=1 -write_buffer_size=30000000 -bloom_bits=16 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=0 -prefix_size=8 TEST_TMPDIR=/dev/shm/rocksdb_partitioned ./db_bench -benchmarks=fillrandom -num=10000000 -disable_wal=1 -write_buffer_size=30000000 -bloom_bits=16 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=0 -prefix_size=8 -partition_index_and_filters ``` Observe no measurable change in non-partitioned performance ``` TEST_TMPDIR=/dev/shm/rocksdb_nonpartitioned ./db_bench -benchmarks=seekrandom[-X1000] -num=10000000 -readonly -bloom_bits=16 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=0 -prefix_size=8 -auto_prefix_mode -cache_index_and_filter_blocks=1 -cache_size=1000000000 -duration 20 ``` Before: seekrandom [AVG 15 runs] : 11798 (± 331) ops/sec After: seekrandom [AVG 15 runs] : 11724 (± 315) ops/sec Observe big improvement with partitioned (also supported by bloom use statistics) ``` TEST_TMPDIR=/dev/shm/rocksdb_partitioned ./db_bench -benchmarks=seekrandom[-X1000] -num=10000000 -readonly -bloom_bits=16 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=0 -prefix_size=8 -partition_index_and_filters -auto_prefix_mode -cache_index_and_filter_blocks=1 -cache_size=1000000000 -duration 20 ``` Before: seekrandom [AVG 12 runs] : 2942 (± 57) ops/sec After: seekrandom [AVG 12 runs] : 7489 (± 184) ops/sec Reviewed By: siying Differential Revision: D36469796 Pulled By: pdillinger fbshipit-source-id: bcf1e2a68d347b32adb2b27384f945434e7a266d	3 years ago
Jay Zhuang	c6d326d3d7	Track SST unique id in MANIFEST and verify (#9990 ) Summary: Start tracking SST unique id in MANIFEST, which is used to verify with SST properties to make sure the SST file is not overwritten or misplaced. A DB option `try_verify_sst_unique_id` is introduced to enable/disable the verification, if enabled, it opens all SST files during DB-open to read the unique_id from table properties (default is false), so it's recommended to use it with `max_open_files = -1` to pre-open the files. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9990 Test Plan: unittests, format-compatible test, mini-crash Reviewed By: anand1976 Differential Revision: D36381863 Pulled By: jay-zhuang fbshipit-source-id: 89ea2eb6b35ed3e80ead9c724eb096083eaba63f	3 years ago
Hui Xiao	3573558ec5	Rewrite memory-charging feature's option API (#9926 ) Summary: Context: Previous PR https://github.com/facebook/rocksdb/pull/9748, https://github.com/facebook/rocksdb/pull/9073, https://github.com/facebook/rocksdb/pull/8428 added separate flag for each charged memory area. Such API design is not scalable as we charge more and more memory areas. Also, we foresee an opportunity to consolidate this feature with other cache usage related features such as `cache_index_and_filter_blocks` using `CacheEntryRole`. Therefore we decided to consolidate all these flags with `CacheUsageOptions cache_usage_options` and this PR serves as the first step by consolidating memory-charging related flags. Summary: - Replaced old API reference with new ones, including making `kCompressionDictionaryBuildingBuffer` opt-out and added a unit test for that - Added missing db bench/stress test for some memory charging features - Renamed related test suite to indicate they are under the same theme of memory charging - Refactored a commonly used mocked cache component in memory charging related tests to reduce code duplication - Replaced the phrases "memory tracking" / "cache reservation" (other than CacheReservationManager-related ones) with "memory charging" for standard description of this feature. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9926 Test Plan: - New unit test for opt-out `kCompressionDictionaryBuildingBuffer` `TEST_F(ChargeCompressionDictionaryBuildingBufferTest, Basic)` - New unit test for option validation/sanitization `TEST_F(CacheUsageOptionsOverridesTest, SanitizeAndValidateOptions)` - CI - db bench (in case querying new options introduces regression) +0.5% micros/op: `TEST_TMPDIR=/dev/shm/testdb ./db_bench -benchmarks=fillseq -db=$TEST_TMPDIR -charge_compression_dictionary_building_buffer=1(remove this for comparison) -compression_max_dict_bytes=10000 -disable_auto_compactions=1 -write_buffer_size=100000 -num=4000000 \| egrep 'fillseq'` #-run \| (pre-PR) avg micros/op \| std micros/op \| (post-PR) micros/op \| std micros/op \| change (%) -- \| -- \| -- \| -- \| -- \| -- 10 \| 3.9711 \| 0.264408 \| 3.9914 \| 0.254563 \| 0.5111933721 20 \| 3.83905 \| 0.0664488 \| 3.8251 \| 0.0695456 \| -0.3633711465 40 \| 3.86625 \| 0.136669 \| 3.8867 \| 0.143765 \| 0.5289363078 - db_stress: `python3 tools/db_crashtest.py blackbox -charge_compression_dictionary_building_buffer=1 -charge_filter_construction=1 -charge_table_reader=1 -cache_size=1` killed as normal Reviewed By: ajkr Differential Revision: D36054712 Pulled By: hx235 fbshipit-source-id: d406e90f5e0c5ea4dbcb585a484ad9302d4302af	3 years ago
Yanqin Jin	f6d9730ea1	Fix stress test with best-efforts-recovery (#9986 ) Summary: This PR - since we are testing with disable_wal = true and best_efforts_recovery, we should set column family count to 1, due to the requirement of `ExpectedState` tracking and replaying logic. - during backup and checkpoint restore, disable best-efforts-recovery. This does not matter now because db_crashtest.py always disables wal when testing best-efforts-recovery. In the future, if we enable wal, then not setting `restore_opitions.best_efforts_recovery` will cause backup db not to recover the WALs, and differ from db (that enables WAL). - during verification of backup and checkpoint restore, print the key where inconsistency exists between expected state and db. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9986 Test Plan: TEST_TMPDIR=/dev/shm/rocksdb make crash_test_with_best_efforts_recovery Reviewed By: siying Differential Revision: D36353105 Pulled By: riversand963 fbshipit-source-id: a484da161273e6216a1f7e245bac15a349693917	3 years ago
Andrew Kryczka	e943bbdd2f	Temporarily disable sync_fault_injection (#9979 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9979 Reviewed By: siying Differential Revision: D36301555 Pulled By: ajkr fbshipit-source-id: ed298d3484b6aad3ef19746e984bf4c52be33a9f	3 years ago
yaphet	26768edb65	Support single delete in ldb (#9469 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9469 Reviewed By: riversand963 Differential Revision: D33953484 fbshipit-source-id: f4e84a2d9865957d744c7e84ff02ffbb0a62b0a8	3 years ago
Peter Dillinger	c5c58708db	Fix format_compatible blowing away its TEST_TMPDIR (#9970 ) Summary: https://github.com/facebook/rocksdb/issues/9961 broke format_compatible check because of `make clean` referencing TEST_TMPDIR. The Makefile behavior seems reasonable to me, so here's a fix in check_format_compatible.sh Apparently I also included removing a redundant part of our CircleCI config. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9970 Test Plan: manual run: SHORT_TEST=1 ./tools/check_format_compatible.sh Reviewed By: riversand963 Differential Revision: D36258172 Pulled By: pdillinger fbshipit-source-id: d46507f04614e888b414ff23b88d040ae2b5c294	3 years ago
sdong	736a7b5433	Remove own ToString() (#9955 ) Summary: ToString() is created as some platform doesn't support std::to_string(). However, we've already used std::to_string() by mistake for 16 months (in db/db_info_dumper.cc). This commit just remove ToString(). Pull Request resolved: https://github.com/facebook/rocksdb/pull/9955 Test Plan: Watch CI tests Reviewed By: riversand963 Differential Revision: D36176799 fbshipit-source-id: bdb6dcd0e3a3ab96a1ac810f5d0188f684064471	3 years ago
Andrew Kryczka	62d84e2a2b	db_stress fault injection in release mode (#9957 ) Summary: Previously all fault injection was ignored in release mode. This PR adds it back except for read fault injection (`--read_fault_one_in > 0`) since its dependency (`IGNORE_STATUS_IF_ERROR`) is unavailable in release mode. Other notable changes include: - Moved `EnableWriteErrorInjection()` for `--write_fault_one_in > 0` so it's after `DB::Open()` without depending on `SyncPoint` - Made `--read_fault_one_in > 0` return an error in release mode - Updated `db_crashtest.py` to always set `--read_fault_one_in=0` in release mode Pull Request resolved: https://github.com/facebook/rocksdb/pull/9957 Test Plan: ``` $ DEBUG_LEVEL=0 make -j24 db_stress $ DEBUG_LEVEL=0 TEST_TMPDIR=/dev/shm python3 tools/db_crashtest.py blackbox ``` Reviewed By: anand1976 Differential Revision: D36193830 Pulled By: ajkr fbshipit-source-id: 0b97946b4e3f06e3e0f6e7833c2763da08ec5321	3 years ago
Andrew Kryczka	a62506aee2	Enable unsynced data loss in crash test (#9947 ) Summary: `db_stress` already tracks expected state history to verify prefix-recoverability when `sync_fault_injection` is enabled. This PR enables `sync_fault_injection` in `db_crashtest.py`. Previously enabling `sync_fault_injection` would cause whole unsynced files to be dropped. This PR adds a more interesting case of losing only the tail of unsynced data by implementing `TestFSWritableFile::RangeSync()` and enabling `{wal_,}bytes_per_sync`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9947 Test Plan: - regular blackbox, blackbox --simple - various commands to stress this new case, such as `TEST_TMPDIR=/dev/shm python3 tools/db_crashtest.py blackbox --max_key=100000 --write_buffer_size=2097152 --avoid_flush_during_recovery=1 --disable_wal=0 --interval=10 --db_write_buffer_size=0 --sync_fault_injection=1 --wal_compression=none --delpercent=0 --delrangepercent=0 --prefixpercent=0 --iterpercent=0 --writepercent=100 --readpercent=0 --wal_bytes_per_sync=131072 --duration=36000 --sync=0 --open_write_fault_one_in=16` Reviewed By: riversand963 Differential Revision: D36152775 Pulled By: ajkr fbshipit-source-id: 44b68a7fad0a4cf74af9fe1f39be01baab8141d8	3 years ago
sdong	49628c9a83	Use std::numeric_limits<> (#9954 ) Summary: Right now we still don't fully use std::numeric_limits but use a macro, mainly for supporting VS 2013. Right now we only support VS 2017 and up so it is not a problem. The code comment claims that MinGW still needs it. We don't have a CI running MinGW so it's hard to validate. since we now require C++17, it's hard to imagine MinGW would still build RocksDB but doesn't support std::numeric_limits<>. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9954 Test Plan: See CI Runs. Reviewed By: riversand963 Differential Revision: D36173954 fbshipit-source-id: a35a73af17cdcae20e258cdef57fcf29a50b49e0	3 years ago
Mark Callaghan	bf68d1c93d	Print elapsed time and number of operations completed (#9886 ) Summary: This is inspired by debugging a regression test that runs for ~0.05 seconds and the short running time makes it prone to variance. While db_bench ran for ~60 seconds, 59.95 seconds was spent opening 128 databases (and doing recovery). So it was harder to notice that the benchmark only ran for 0.05 seconds. Normally I add output to the end of the line to make life easier for existing tools that parse it but in this case the output near the end of the line has two optional parts and one of the optional parts adds an extra newline. This is for https://github.com/facebook/rocksdb/issues/9856 Pull Request resolved: https://github.com/facebook/rocksdb/pull/9886 Test Plan: ./db_bench --benchmarks=overwrite,readrandom --num=1000000 --threads=4 old output: DB path: [/tmp/rocksdbtest-2260/dbbench] overwrite : 14.108 micros/op 283338 ops/sec; 31.3 MB/s DB path: [/tmp/rocksdbtest-2260/dbbench] readrandom : 7.994 micros/op 496788 ops/sec; 55.0 MB/s (1000000 of 1000000 found) new output: DB path: [/tmp/rocksdbtest-2260/dbbench] overwrite : 14.117 micros/op 282862 ops/sec 14.141 seconds 4000000 operations; 31.3 MB/s DB path: [/tmp/rocksdbtest-2260/dbbench] readrandom : 8.649 micros/op 458475 ops/sec 8.725 seconds 4000000 operations; 49.8 MB/s (981548 of 1000000 found) Reviewed By: ajkr Differential Revision: D36102269 Pulled By: mdcallag fbshipit-source-id: 5cd8a9e11f5cbe2a46809571afd83335b6b0caa0	3 years ago
Jay Zhuang	270179bb12	Default `try_load_options` to true when DB is specified (#9937 ) Summary: If the DB path is specified, the user would expect ldb loads the options from the path, but it's not: ``` $ ldb list_live_files_metadata --db=`pwd` ``` Default `try_load_options` to true in that case. The user can still disable that by: ``` $ ldb list_live_files_metadata --db=`pwd` --try_load_options=false ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/9937 Test Plan: `ldb list_live_files_metadata --db=`pwd`` is able to work for a db generated with different options.num_levels. Reviewed By: ajkr Differential Revision: D36106708 Pulled By: jay-zhuang fbshipit-source-id: 2732fdc027a4d172436b2c9b6a9787b56b10c710	3 years ago
Mark Callaghan	b6ec3328af	Make --benchmarks=flush flush the default column family (#9887 ) Summary: db_bench --benchmarks=flush wasn't flushing the default column family. This is for https://github.com/facebook/rocksdb/issues/9880 Pull Request resolved: https://github.com/facebook/rocksdb/pull/9887 Test Plan: Confirm that flush works (.log is empty) when "flush" added to benchmark list Confirm that .log is not empty otherwise. Repeat for all combinations for: uses column families, uses multiple databases ./db_bench --benchmarks=overwrite --num=10000 ls -lrt /tmp/rocksdbtest-2260/dbbench/.log -rw-r--r-- 1 me users 1380286 Apr 21 10:47 /tmp/rocksdbtest-2260/dbbench/000004.log ./db_bench --benchmarks=overwrite,flush --num=10000 ls -lrt /tmp/rocksdbtest-2260/dbbench/.log -rw-r--r-- 1 me users 0 Apr 21 10:48 /tmp/rocksdbtest-2260/dbbench/000008.log ./db_bench --benchmarks=overwrite --num=10000 --num_column_families=4 ls -lrt /tmp/rocksdbtest-2260/dbbench/.log -rw-r--r-- 1 me users 1387823 Apr 21 10:49 /tmp/rocksdbtest-2260/dbbench/000004.log ./db_bench --benchmarks=overwrite,flush --num=10000 --num_column_families=4 ls -lrt /tmp/rocksdbtest-2260/dbbench/.log -rw-r--r-- 1 me users 0 Apr 21 10:51 /tmp/rocksdbtest-2260/dbbench/000014.log ./db_bench --benchmarks=overwrite --num=10000 --num_multi_db=2 ls -lrt /tmp/rocksdbtest-2260/dbbench/[01]/.log -rw-r--r-- 1 me users 1380838 Apr 21 10:55 /tmp/rocksdbtest-2260/dbbench/0/000004.log -rw-r--r-- 1 me users 1379734 Apr 21 10:55 /tmp/rocksdbtest-2260/dbbench/1/000004.log ./db_bench --benchmarks=overwrite,flush --num=10000 --num_multi_db=2 ls -lrt /tmp/rocksdbtest-2260/dbbench/[01]/.log -rw-r--r-- 1 me users 0 Apr 21 10:57 /tmp/rocksdbtest-2260/dbbench/0/000013.log -rw-r--r-- 1 me users 0 Apr 21 10:57 /tmp/rocksdbtest-2260/dbbench/1/000013.log ./db_bench --benchmarks=overwrite --num=10000 --num_column_families=4 --num_multi_db=2 ls -lrt /tmp/rocksdbtest-2260/dbbench/[01]/.log -rw-r--r-- 1 me users 1395108 Apr 21 10:52 /tmp/rocksdbtest-2260/dbbench/1/000004.log -rw-r--r-- 1 me users 1380411 Apr 21 10:52 /tmp/rocksdbtest-2260/dbbench/0/000004.log ./db_bench --benchmarks=overwrite,flush --num=10000 --num_column_families=4 --num_multi_db=2 ls -lrt /tmp/rocksdbtest-2260/dbbench/[01]/.log -rw-r--r-- 1 me users 0 Apr 21 10:54 /tmp/rocksdbtest-2260/dbbench/0/000022.log -rw-r--r-- 1 me users 0 Apr 21 10:54 /tmp/rocksdbtest-2260/dbbench/1/000022.log Reviewed By: ajkr Differential Revision: D36026777 Pulled By: mdcallag fbshipit-source-id: d42d3d7efceea7b9a25bbbc0f04461d2b7301122	3 years ago
Yanqin Jin	06394ff4e7	Fix a bug of CompactionIterator/CompactionFilter using `Delete` (#9929 ) Summary: When compaction filter determines that a key should be removed, it updates the internal key's type to `Delete`. If this internal key is preserved in current compaction but seen by a later compaction together with `SingleDelete`, it will cause compaction iterator to return Corruption. To fix the issue, compaction filter should return more information in addition to the intention of removing a key. Therefore, we add a new `kRemoveWithSingleDelete` to `CompactionFilter::Decision`. Seeing `kRemoveWithSingleDelete`, compaction iterator will update the op type of the internal key to `kTypeSingleDelete`. In addition, I updated db_stress_shared_state.[cc\|h] so that `no_overwrite_ids_` becomes `const`. It is easier to reason about thread-safety if accessed from multiple threads. This information is passed to `PrepareTxnDBOptions()` when calling from `Open()` so that we can set up the rollback deletion type callback for transactions. Finally, disable compaction filter for multiops_txn because the key removal logic of `DbStressCompactionFilter` does not quite work with `MultiOpsTxnsStressTest`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9929 Test Plan: make check make crash_test make crash_test_with_txn Reviewed By: anand1976 Differential Revision: D36069678 Pulled By: riversand963 fbshipit-source-id: cedd2f1ba958af59ad3916f1ba6f424307955f92	3 years ago
Anvesh Komuravelli	aafb377bb5	Update protection info on recovered logs data (#9875 ) Summary: Update protection info on recovered logs data Pull Request resolved: https://github.com/facebook/rocksdb/pull/9875 Test Plan: - Benchmark setup: `TEST_TMPDIR=/dev/shm/100MB_WAL_DB/ ./db_bench -benchmarks=fillrandom -write_buffer_size=1048576000` - Benchmark command: `TEST_TMPDIR=/dev/shm/100MB_WAL_DB/ /usr/bin/time ./db_bench -use_existing_db=true -benchmarks=overwrite -write_buffer_size=1048576000 -writes=1 -report_open_timing=true` - Results before this PR ``` OpenDb: 2350.14 milliseconds OpenDb: 2296.94 milliseconds OpenDb: 2184.29 milliseconds OpenDb: 2167.59 milliseconds OpenDb: 2231.24 milliseconds OpenDb: 2109.57 milliseconds OpenDb: 2197.71 milliseconds OpenDb: 2120.8 milliseconds OpenDb: 2148.12 milliseconds OpenDb: 2207.95 milliseconds ``` - Results after this PR ``` OpenDb: 2424.52 milliseconds OpenDb: 2359.84 milliseconds OpenDb: 2317.68 milliseconds OpenDb: 2339.4 milliseconds OpenDb: 2325.36 milliseconds OpenDb: 2321.06 milliseconds OpenDb: 2353.98 milliseconds OpenDb: 2344.64 milliseconds OpenDb: 2384.09 milliseconds OpenDb: 2428.58 milliseconds ``` Mean regressed 7.2% (2201.4 -> 2359.9) Reviewed By: ajkr Differential Revision: D36012787 Pulled By: akomurav fbshipit-source-id: d2aba09f29c6beb2fd0fe8e1e359be910b4ef02a	3 years ago
Yanqin Jin	94e245a14d	Improve stress test for MultiOpsTxnsStressTest (#9829 ) Summary: Adds more coverage to `MultiOpsTxnsStressTest` with a focus on write-prepared transactions. 1. Add a hack to manually evict commit cache entries. We currently cannot assign small values to `wp_commit_cache_bits` because it requires a prepared transaction to commit within a certain range of sequence numbers, otherwise it will throw. 2. Add coverage for commit-time-write-batch. If write policy is write-prepared, we need to set `use_only_the_last_commit_time_batch_for_recovery` to true. 3. After each flush/compaction, verify data consistency. This is possible since data size can be small: default numbers of primary/secondary keys are just 1000. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9829 Test Plan: ``` TEST_TMPDIR=/dev/shm/rocksdb_crashtest_blackbox/ make blackbox_crash_test_with_multiops_wp_txn ``` Reviewed By: pdillinger Differential Revision: D35806678 Pulled By: riversand963 fbshipit-source-id: d7fde7a29fda0fb481a61f553e0ca0c47da93616	3 years ago
Jaromir Vanek	fb9a167a55	Add 95% confidence intervals to db_bench output (#9882 ) Summary: Enhancing `db_bench` output with 95% statistical confidence intervals for better performance evaluation. The goal is to unambiguously separate random variance when running benchmark over multiple iterations. Output enhanced with confidence intervals exposed in brackets: ``` $ ./db_bench --benchmarks=fillseq[-X10] Running benchmark for 10 times fillseq : 4.961 micros/op 201578 ops/sec; 22.3 MB/s fillseq : 5.030 micros/op 198824 ops/sec; 22.0 MB/s fillseq [AVG 2 runs] : 200201 (± 2698) ops/sec; 22.1 (± 0.3) MB/sec fillseq : 4.963 micros/op 201471 ops/sec; 22.3 MB/s fillseq [AVG 3 runs] : 200624 (± 1765) ops/sec; 22.2 (± 0.2) MB/sec fillseq : 5.035 micros/op 198625 ops/sec; 22.0 MB/s fillseq [AVG 4 runs] : 200124 (± 1586) ops/sec; 22.1 (± 0.2) MB/sec fillseq : 4.979 micros/op 200861 ops/sec; 22.2 MB/s fillseq [AVG 5 runs] : 200272 (± 1262) ops/sec; 22.2 (± 0.1) MB/sec fillseq : 4.893 micros/op 204367 ops/sec; 22.6 MB/s fillseq [AVG 6 runs] : 200954 (± 1688) ops/sec; 22.2 (± 0.2) MB/sec fillseq : 4.914 micros/op 203502 ops/sec; 22.5 MB/s fillseq [AVG 7 runs] : 201318 (± 1595) ops/sec; 22.3 (± 0.2) MB/sec fillseq : 4.998 micros/op 200074 ops/sec; 22.1 MB/s fillseq [AVG 8 runs] : 201163 (± 1415) ops/sec; 22.3 (± 0.2) MB/sec fillseq : 4.946 micros/op 202188 ops/sec; 22.4 MB/s fillseq [AVG 9 runs] : 201277 (± 1267) ops/sec; 22.3 (± 0.1) MB/sec fillseq : 5.093 micros/op 196331 ops/sec; 21.7 MB/s fillseq [AVG 10 runs] : 200782 (± 1491) ops/sec; 22.2 (± 0.2) MB/sec fillseq [AVG 10 runs] : 200782 (± 1491) ops/sec; 22.2 (± 0.2) MB/sec fillseq [MEDIAN 10 runs] : 201166 ops/sec; 22.3 MB/s ``` For more explicit interval representation, use `--confidence_interval_only` flag: ``` $ ./db_bench --benchmarks=fillseq[-X10] --confidence_interval_only Running benchmark for 10 times fillseq : 4.935 micros/op 202648 ops/sec; 22.4 MB/s fillseq : 5.078 micros/op 196943 ops/sec; 21.8 MB/s fillseq [CI95 2 runs] : (194205, 205385) ops/sec; (21.5, 22.7) MB/sec fillseq : 5.159 micros/op 193816 ops/sec; 21.4 MB/s fillseq [CI95 3 runs] : (192735, 202869) ops/sec; (21.3, 22.4) MB/sec fillseq : 4.947 micros/op 202158 ops/sec; 22.4 MB/s fillseq [CI95 4 runs] : (194721, 203061) ops/sec; (21.5, 22.5) MB/sec fillseq : 4.908 micros/op 203756 ops/sec; 22.5 MB/s fillseq [CI95 5 runs] : (196113, 203615) ops/sec; (21.7, 22.5) MB/sec fillseq : 5.063 micros/op 197528 ops/sec; 21.9 MB/s fillseq [CI95 6 runs] : (196319, 202631) ops/sec; (21.7, 22.4) MB/sec fillseq : 5.214 micros/op 191799 ops/sec; 21.2 MB/s fillseq [CI95 7 runs] : (194953, 201803) ops/sec; (21.6, 22.3) MB/sec fillseq : 5.260 micros/op 190095 ops/sec; 21.0 MB/s fillseq [CI95 8 runs] : (193749, 200937) ops/sec; (21.4, 22.2) MB/sec fillseq : 5.076 micros/op 196992 ops/sec; 21.8 MB/s fillseq [CI95 9 runs] : (194134, 200474) ops/sec; (21.5, 22.2) MB/sec fillseq : 5.388 micros/op 185603 ops/sec; 20.5 MB/s fillseq [CI95 10 runs] : (192487, 199781) ops/sec; (21.3, 22.1) MB/sec fillseq [AVG 10 runs] : 196134 (± 3647) ops/sec; 21.7 (± 0.4) MB/sec fillseq [MEDIAN 10 runs] : 196968 ops/sec; 21.8 MB/sec ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/9882 Reviewed By: pdillinger Differential Revision: D35796148 Pulled By: vanekjar fbshipit-source-id: 8313712d16728ff982b8aff28195ee56622385b8	3 years ago
yuzhangyu	ac29645743	Add blob dump support to the dump_live_files command (#9896 ) Summary: This patch completes the second part of the task: "Add blob support to the dump and dump_live_files command" Pull Request resolved: https://github.com/facebook/rocksdb/pull/9896 Reviewed By: ltamasi Differential Revision: D35852667 Pulled By: jowlyzhang fbshipit-source-id: a006456c881f468a92da689e895134762e9574e1	4 years ago
yuzhangyu	fff28a7725	Add blob dump support to the dump command (#9881 ) Summary: This patch is the first part of adding blob dump support. It only adds blob dump support to the dump command. A follow up patch will add blob dump support to the dump_live_files command. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9881 Reviewed By: ltamasi Differential Revision: D35796731 Pulled By: jowlyzhang fbshipit-source-id: 2cc5973b222d505a331ac7b969edcf992b47c5ee	4 years ago
Jay Zhuang	2ea4205a69	Add 7.2 to compatible check (#9858 ) Summary: Add 7.2 to compatible check (should change it with version update). Pull Request resolved: https://github.com/facebook/rocksdb/pull/9858 Reviewed By: riversand963 Differential Revision: D35722897 Pulled By: jay-zhuang fbshipit-source-id: 08c782b9344599d7296543eb0c61afcd9a869a1a	4 years ago
yuzhangyu	9b5790f018	Add --decode_blob_index option to idump and dump commands (#9870 ) Summary: This patch completes the first part of the task: "Extend all three commands so they can decode and print blob references if a new option --decode_blob_index is specified" Pull Request resolved: https://github.com/facebook/rocksdb/pull/9870 Reviewed By: ltamasi Differential Revision: D35753932 Pulled By: jowlyzhang fbshipit-source-id: 9d2bbba0eef2ed86b982767eba9de1b4881f35c9	4 years ago
Andrew Kryczka	690f1edf37	Avoid overwriting OPTIONS file settings in db_bench (#9862 ) Summary: `InitializeOptionsGeneral()` was overwriting many options that were already configured by OPTIONS file, potentially with the flag default values. This PR changes that function to only overwrite options in limited scenarios, as described at the top of its definition. Block cache is still a violation. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9862 Test Plan: ran under various scenarios (multi-DB, single DB, OPTIONS file, flags) and verified options are set as expected Reviewed By: jay-zhuang Differential Revision: D35736960 Pulled By: ajkr fbshipit-source-id: 75b77740af37e6f5741618f8a8f5685df2417d03	4 years ago
Peter Dillinger	41237dd306	Add "no compression" job to CircleCI (#9850 ) Summary: Since they operate at distinct abstraction layers, I thought it was prudent to combine with EncryptedEnv CI test for each PR, for efficiency in testing. Also added supported compressions to sst_dump --help output so that CI job can verify no compiled-in compression support. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9850 Test Plan: CI, some manual stuff Reviewed By: riversand963 Differential Revision: D35682346 Pulled By: pdillinger fbshipit-source-id: be9879c1533fed304ee32c89fd9ba4b07c2b90cc	4 years ago
yuzhangyu	082eb04200	Add option --decode_blob_index to dump_live_files command (#9842 ) Summary: This change only add decode blob index support to dump_live_files command, which is part of a task to add blob support to a few commands. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9842 Reviewed By: ltamasi Differential Revision: D35650167 Pulled By: jowlyzhang fbshipit-source-id: a78151b98bc38ac6f52c6e01ca6927a3429ddd14	4 years ago
gitbw95	f241d082b6	Prevent double caching in the compressed secondary cache (#9747 ) Summary: ### Summary: When both LRU Cache and CompressedSecondaryCache are configured together, there possibly are some data blocks double cached. Changes include: 1. Update IS_PROMOTED to IS_IN_SECONDARY_CACHE to prevent confusions. 2. This PR updates SecondaryCacheResultHandle and use IsErasedFromSecondaryCache to determine whether the handle is erased in the secondary cache. Then, the caller can determine whether to SetIsInSecondaryCache(). 3. Rename LRUSecondaryCache to CompressedSecondaryCache. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9747 Test Plan: Test Scripts: 1. Populate a DB. The on disk footprint is 482 MB. The data is set to be 50% compressible, so the total decompressed size is expected to be 964 MB. ./db_bench --benchmarks=fillrandom --num=10000000 -db=/db_bench_1 2. overwrite it to a stable state: ./db_bench --benchmarks=overwrite,stats --num=10000000 -use_existing_db -duration=10 --benchmark_write_rate_limit=2000000 -db=/db_bench_1 4. Run read tests with diffeernt cache setting: T1: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=520000000 --statistics -db=/db_bench_1 T2: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=320000000 -compressed_secondary_cache_size=400000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1 T3: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=520000000 -compressed_secondary_cache_size=400000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1 T4: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=20000000 -compressed_secondary_cache_size=500000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1 Before this PR \| Cache Size \| Compressed Secondary Cache Size \| Cache Hit Rate \| \|------------\|-------------------------------------\|----------------\| \|520 MB \| 0 MB \| 85.5% \| \|320 MB \| 400 MB \| 96.2% \| \|520 MB \| 400 MB \| 98.3% \| \|20 MB \| 500 MB \| 98.8% \| Before this PR \| Cache Size \| Compressed Secondary Cache Size \| Cache Hit Rate \| \|------------\|-------------------------------------\|----------------\| \|520 MB \| 0 MB \| 85.5% \| \|320 MB \| 400 MB \| 99.9% \| \|520 MB \| 400 MB \| 99.9% \| \|20 MB \| 500 MB \| 99.2% \| Reviewed By: anand1976 Differential Revision: D35117499 Pulled By: gitbw95 fbshipit-source-id: ea2657749fc13efebe91a8a1b56bc61d6a224a12	4 years ago
Duncan Bellamy	25e31d1a94	tools/db_bench_tool.cc use uint64_t instead of size_t (#9800 ) Summary: to fix compilation for 32bit Pull Request resolved: https://github.com/facebook/rocksdb/pull/9800 Reviewed By: riversand963 Differential Revision: D35404447 fbshipit-source-id: 6a1185bb38f3a718357aa120e3b26a1ea77f023d	4 years ago
anand76	c3d7e16252	Add WAL compression to stress tests (#9811 ) Summary: Add the WAL compression feature to the stress test. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9811 Reviewed By: riversand963 Differential Revision: D35414316 Pulled By: anand1976 fbshipit-source-id: 0c17b1ec55679a52f088ad368798b57139bd921a	4 years ago
Hui Xiao	49623f9c8e	Account memory of big memory users in BlockBasedTable in global memory limit (#9748 ) Summary: Context: Through heap profiling, we discovered that `BlockBasedTableReader` objects can accumulate and lead to high memory usage (e.g, `max_open_file = -1`). These memories are currently not saved, not tracked, not constrained and not cache evict-able. As a first step to improve this, similar to https://github.com/facebook/rocksdb/pull/8428, this PR is to track an estimate of `BlockBasedTableReader` object's memory in block cache and fail future creation if the memory usage exceeds the available space of cache at the time of creation. Summary: - Approximate big memory users (`BlockBasedTable::Rep` and `TableProperties` )' memory usage in addition to the existing estimated ones (filter block/index block/un-compression dictionary) - Charge all of these memory usages to block cache on `BlockBasedTable::Open()` and release them on `~BlockBasedTable()` as there is no memory usage fluctuation of concern in between - Refactor on CacheReservationManager (and its call-sites) to add concurrent support for BlockBasedTable used in this PR. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9748 Test Plan: - New unit tests - db bench: `OpenDb` : -0.52% in ms - Setup `./db_bench -benchmarks=fillseq -db=/dev/shm/testdb -disable_auto_compactions=1 -write_buffer_size=1048576` - Repeated run with pre-change w/o feature and post-change with feature, benchmark `OpenDb`: `./db_bench -benchmarks=readrandom -use_existing_db=1 -db=/dev/shm/testdb -reserve_table_reader_memory=true (remove this when running w/o feature) -file_opening_threads=3 -open_files=-1 -report_open_timing=true\| egrep 'OpenDb:'` #-run \| (feature-off) avg milliseconds \| std milliseconds \| (feature-on) avg milliseconds \| std milliseconds \| change (%) -- \| -- \| -- \| -- \| -- \| -- 10 \| 11.4018 \| 5.95173 \| 9.47788 \| 1.57538 \| -16.87382694 20 \| 9.23746 \| 0.841053 \| 9.32377 \| 1.14074 \| 0.9343477536 40 \| 9.0876 \| 0.671129 \| 9.35053 \| 1.11713 \| 2.893283155 80 \| 9.72514 \| 2.28459 \| 9.52013 \| 1.0894 \| -2.108041632 160 \| 9.74677 \| 0.991234 \| 9.84743 \| 1.73396 \| 1.032752389 320 \| 10.7297 \| 5.11555 \| 10.547 \| 1.97692 \| -1.70275031 640 \| 11.7092 \| 2.36565 \| 11.7869 \| 2.69377 \| 0.6635807741 - db bench on write with cost to cache in WriteBufferManager (just in case this PR's CRM refactoring accidentally slows down anything in WBM) : `fillseq` : +0.54% in micros/op `./db_bench -benchmarks=fillseq -db=/dev/shm/testdb -disable_auto_compactions=1 -cost_write_buffer_to_cache=true -write_buffer_size=10000000000 \| egrep 'fillseq'` #-run \| (pre-PR) avg micros/op \| std micros/op \| (post-PR) avg micros/op \| std micros/op \| change (%) -- \| -- \| -- \| -- \| -- \| -- 10 \| 6.15 \| 0.260187 \| 6.289 \| 0.371192 \| 2.260162602 20 \| 7.28025 \| 0.465402 \| 7.37255 \| 0.451256 \| 1.267813605 40 \| 7.06312 \| 0.490654 \| 7.13803 \| 0.478676 \| 1.060579461 80 \| 7.14035 \| 0.972831 \| 7.14196 \| 0.92971 \| 0.02254791432 - filter bench: `bloom filter`: -0.78% in ms/key - ` ./filter_bench -impl=2 -quick -reserve_table_builder_memory=true \| grep 'Build avg'` #-run \| (pre-PR) avg ns/key \| std ns/key \| (post-PR) ns/key \| std ns/key \| change (%) -- \| -- \| -- \| -- \| -- \| -- 10 \| 26.4369 \| 0.442182 \| 26.3273 \| 0.422919 \| -0.4145720565 20 \| 26.4451 \| 0.592787 \| 26.1419 \| 0.62451 \| -1.1465262 - Crash test `python3 tools/db_crashtest.py blackbox --reserve_table_reader_memory=1 --cache_size=1` killed as normal Reviewed By: ajkr Differential Revision: D35136549 Pulled By: hx235 fbshipit-source-id: 146978858d0f900f43f4eb09bfd3e83195e3be28	4 years ago
Peter Dillinger	6534c6dea4	Fix remaining uses of "backupable" (#9792 ) Summary: Various renaming and fixes to get rid of remaining uses of "backupable" which is terminology leftover from the original, flawed design of BackupableDB. Now any DB can be backed up, using BackupEngine. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9792 Test Plan: CI Reviewed By: ajkr Differential Revision: D35334386 Pulled By: pdillinger fbshipit-source-id: 2108a42b4575c8cccdfd791c549aae93ec2f3329	4 years ago
Chen Lixiang	cd59b139fc	Fix some typos in comments and HISTORY.md (#9798 ) Summary: compation --> compaction Pull Request resolved: https://github.com/facebook/rocksdb/pull/9798 Reviewed By: ajkr Differential Revision: D35341611 Pulled By: jay-zhuang fbshipit-source-id: 5ea07527c311de75cade219456b6ee52b23020f6	4 years ago
Bo Wang	bcabee737f	Improve comments for some files (#9793 ) Summary: Update the comments, e.g. fixing typo, formatting, etc. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9793 Reviewed By: jay-zhuang Differential Revision: D35323989 Pulled By: gitbw95 fbshipit-source-id: 4a72fc02b67abaae8be0d1439b68f9967a68052d	4 years ago
Andrew Kryczka	bfea9e7c02	Add benchmark for GetMergeOperands() (#9785 ) Summary: There's an existing benchmark, "getmergeoperands", but it is unconventional in that it has multiple phases and hardcoded setup parameters. This PR adds a different one, "readrandomoperands", that follows the pattern of other benchmarks of having a single phase and taking its configuration from existing flags. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9785 Test Plan: ``` $ ./db_bench -benchmarks=mergerandom -merge_operator=StringAppendOperator -write_buffer_size=1048576 -max_bytes_for_level_base=4194304 -target_file_size_base=1048576 -compression_type=none -disable_auto_compactions=true $ ./db_bench -use_existing_db=true -benchmarks=readrandomoperands -merge_operator=StringAppendOperator -disable_auto_compactions=true -duration=10 ... readrandomoperands : 542.082 micros/op 1844 ops/sec; 0.2 MB/s (11980 of 18999 found) ``` Reviewed By: jay-zhuang Differential Revision: D35290412 Pulled By: ajkr fbshipit-source-id: fb367ca614b128cef844a75f0e5d9dd7c3328d85	4 years ago
Akanksha Mahajan	fd66005628	Add 'adaptive_readahead' and 'async_io' options to db_stress (#9750 ) Summary: Same as title Pull Request resolved: https://github.com/facebook/rocksdb/pull/9750 Test Plan: export CRASH_TEST_EXT_ARGS=" --async_io=1 --adaptive_readahead=1; make -j crash_test Reviewed By: jay-zhuang Differential Revision: D35114326 Pulled By: akankshamahajan15 fbshipit-source-id: 8b05c95be09f7aff6cb9eb757aa20a6520349d45	4 years ago
Hui Xiao	60106b91ac	Add 7.0.fb/7.1.fb to check_format_compatible.sh (#9772 ) Summary: As titled Pull Request resolved: https://github.com/facebook/rocksdb/pull/9772 Test Plan: `./tools/check_format_compatible.sh 7.1.fb` (and manually removed 2.7.fb due to pre-existing assertion failure) passed compatibility test Reviewed By: ajkr Differential Revision: D35233659 Pulled By: hx235 fbshipit-source-id: 6b93263a5724d752347e04f1396628804c24a880	4 years ago
Mark Callaghan	37de4e1d08	Correctly set ThreadState::tid (#9757 ) Summary: Fixes a bug introduced by me in https://github.com/facebook/rocksdb/pull/9733 That PR added a counter so that the per-thread seeds in ThreadState would be unique even when --benchmarks had more than one test. But it incorrectly used this counter as the value for ThreadState::tid as well. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9757 Test Plan: Confirm that unexpectedly good QPS results on the regression tests return to normal with this fix. I have confirmed that the QPS increase starts with the PR 9733 diff. Reviewed By: jay-zhuang Differential Revision: D35149303 Pulled By: mdcallag fbshipit-source-id: dee5cc36b7faaba6c3be6d6a253d3c2eaad72864	4 years ago
Mark Callaghan	1a130fa3c1	db_bench should use a good seed when --seed is not set or set to 0 (#9740 ) Summary: This is for https://github.com/facebook/rocksdb/issues/9737 I have wasted more than a few hours running db_bench benchmarks where --seed was not set and getting better than expected results because cache hit rates are great because multiple invocations of db_bench used the same value for --seed or did not set it, and then all used 0. The result is that all see the same sequence of keys. Others have done the same. The problem is worse in that it is easy to miss and the result is a benchmark with results that are misleading. A good way to avoid this is to set it to the equivalent of gettimeofday() when either --seed is not set or it is set to 0 (the default). With this change the actual seed is printed when it was 0 at process start: Set seed to 1647992570365606 because --seed was 0 Pull Request resolved: https://github.com/facebook/rocksdb/pull/9740 Test Plan: Perf results: ./db_bench --benchmarks=fillseq,readrandom --num=1000000 --reads=4000000 readrandom : 6.469 micros/op 154583 ops/sec; 17.1 MB/s (4000000 of 4000000 found) ./db_bench --benchmarks=fillseq,readrandom --num=1000000 --reads=4000000 --seed=0 readrandom : 6.565 micros/op 152321 ops/sec; 16.9 MB/s (4000000 of 4000000 found) ./db_bench --benchmarks=fillseq,readrandom --num=1000000 --reads=4000000 --seed=1 readrandom : 6.461 micros/op 154777 ops/sec; 17.1 MB/s (4000000 of 4000000 found) ./db_bench --benchmarks=fillseq,readrandom --num=1000000 --reads=4000000 --seed=2 readrandom : 6.525 micros/op 153244 ops/sec; 17.0 MB/s (4000000 of 4000000 found) Reviewed By: jay-zhuang Differential Revision: D35145361 Pulled By: mdcallag fbshipit-source-id: 2b35b153ccec46b27d7c9405997523555fc51267	4 years ago
Mark Callaghan	409635cb2a	Add --slow_usecs option to determine when long op message is printed (#9732 ) Summary: This adds the --slow_usecs option with a default value of 1M. Operations that take this much time have a message printed when --histogram=1, --stats_interval=0 and --stats_interval_seconds=0. The current code hardwired this to 20,000 usecs and for some stress tests that reduced throughput by 20% or more. This is for https://github.com/facebook/rocksdb/issues/9620 Pull Request resolved: https://github.com/facebook/rocksdb/pull/9732 Test Plan: ./db_bench --benchmarks=fillrandom,readrandom --compression_type=lz4 --slow_usecs=100 --histogram=1 ./db_bench --benchmarks=fillrandom,readrandom --compression_type=lz4 --slow_usecs=100000 --histogram=1 Reviewed By: jay-zhuang Differential Revision: D35121522 Pulled By: mdcallag fbshipit-source-id: daf27f937efd748980545d6395db332712fc078b	4 years ago
Mark Callaghan	f219e3d5d8	db_bench should fail on bad values for --compaction_fadvice and --value_size_distribution_type (#9741 ) Summary: db_bench quietly parses and ignores bad values for --compaction_fadvice and --value_size_distribution_type I prefer that it fail for them as it does for bad option values in most other cases. Otherwise a benchmark result will be provided for the wrong configuration and the result will be misleading. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9741 Test Plan: These now fail: ./db_bench --compaction_fadvice=noney Unknown compaction fadvice:noney ./db_bench --value_size_distribution_type=norma Cannot parse distribution type 'norma' While correct values continue to work: ./db_bench --value_size_distribution_type=normal Initializing RocksDB Options from the specified file Initializing RocksDB Options from command-line flags ./db_bench --compaction_fadvice=none Initializing RocksDB Options from the specified file Initializing RocksDB Options from command-line flags Reviewed By: siying Differential Revision: D35115973 Pulled By: mdcallag fbshipit-source-id: c2b10de5c2d1ea7c7539e676f5bd556351f5d370	4 years ago
Mark Callaghan	d583d23d86	Avoid seed reuse when --benchmarks has more than one test (#9733 ) Summary: When --benchmarks has more than one test then the threads in one benchmark will use the same set of seeds as the threads in the previous benchmark. This diff fixe that. This fixes https://github.com/facebook/rocksdb/issues/9632 Pull Request resolved: https://github.com/facebook/rocksdb/pull/9733 Test Plan: For this command line the block cache is 8GB, so it caches at most 1024 8KB blocks. Note that without this diff the second run of readrandom has a much better response time because seed reuse means the second run reads the same 1000 blocks as the first run and they are cached at that point. But with this diff that does not happen. ./db_bench --benchmarks=fillseq,flush,compact0,waitforcompaction,levelstats,readrandom,readrandom --compression_type=zlib --num=10000000 --reads=1000 --block_size=8192 ... ``` Level Files Size(MB) -------------------- 0 0 0 1 11 238 2 9 253 3 0 0 4 0 0 5 0 0 6 0 0 ``` --- perf results without this diff DB path: [/tmp/rocksdbtest-2260/dbbench] readrandom : 46.212 micros/op 21618 ops/sec; 2.4 MB/s (1000 of 1000 found) DB path: [/tmp/rocksdbtest-2260/dbbench] readrandom : 21.963 micros/op 45450 ops/sec; 5.0 MB/s (1000 of 1000 found) --- perf results with this diff DB path: [/tmp/rocksdbtest-2260/dbbench] readrandom : 47.213 micros/op 21126 ops/sec; 2.3 MB/s (1000 of 1000 found) DB path: [/tmp/rocksdbtest-2260/dbbench] readrandom : 42.880 micros/op 23299 ops/sec; 2.6 MB/s (1000 of 1000 found) Reviewed By: jay-zhuang Differential Revision: D35089763 Pulled By: mdcallag fbshipit-source-id: 1b50143a07afe876b8c8e5fa50dd94a8ce57fc6b	4 years ago
Yanqin Jin	c18c4a081c	Add new determinators for multiops transactions stress test (#9708 ) Summary: Add determinators for multiops transactions stress test with write-committed and write-prepared policies. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9708 Test Plan: Internal CI Reviewed By: jay-zhuang Differential Revision: D34967263 Pulled By: riversand963 fbshipit-source-id: 170a0842d56dccb6ed6bc0c5adfd33849acd6b31	4 years ago
Mark Callaghan	6904fd0c86	db_bench should fail when an option uses an invalid compression type (#9729 ) Summary: This changes db_bench to fail at startup for invalid compression types. It had been changing them to Snappy. For other invalid options it fails at startup. This is for https://github.com/facebook/rocksdb/issues/9621 Pull Request resolved: https://github.com/facebook/rocksdb/pull/9729 Test Plan: This continues to work: ./db_bench --benchmarks=fillrandom --compression_type=lz4 This now fails rather than changing the compression type to Snappy ./db_bench --benchmarks=fillrandom --compression_type=lz44 Cannot parse compression type 'lz44' Reviewed By: jay-zhuang Differential Revision: D35081323 Pulled By: mdcallag fbshipit-source-id: 9b38c835abddce11aa7feb235df63f53cf829981	4 years ago
Mark Callaghan	d71e5a5beb	Add number of running flushes & compactions to --stats_per_interval output (#9726 ) Summary: This is for https://github.com/facebook/rocksdb/issues/9709 and add two lines to the end of DB Stats for num-running-compactions and num-running-flushes. For example ... DB Stats Uptime(secs): 6.0 total, 1.0 interval Cumulative writes: 915K writes, 915K keys, 915K commit groups, 1.0 writes per commit group, ingest: 0.11 GB, 18.95 MB/s Cumulative WAL: 915K writes, 0 syncs, 915000.00 writes per sync, written: 0.11 GB, 18.95 MB/s Cumulative stall: 00:00:0.000 H:M:S, 0.0 percent Interval writes: 133K writes, 133K keys, 133K commit groups, 1.0 writes per commit group, ingest: 16.62 MB, 16.53 MB/s Interval WAL: 133K writes, 0 syncs, 133000.00 writes per sync, written: 0.02 GB, 16.53 MB/s Interval stall: 00:00:0.000 H:M:S, 0.0 percent num-running-compactions: 0 num-running-flushes: 0 Pull Request resolved: https://github.com/facebook/rocksdb/pull/9726 Reviewed By: jay-zhuang Differential Revision: D35066759 Pulled By: mdcallag fbshipit-source-id: c161fadd3c15c5aa715a820dab6bfedb46dc099b	4 years ago
Akanksha Mahajan	f07eec1bf8	Add async_io read option in db_bench (#9735 ) Summary: Add async_io Read option in db_bench Pull Request resolved: https://github.com/facebook/rocksdb/pull/9735 Test Plan: ./db_bench -use_existing_db=true -db=/tmp/prefix_scan_prefetch_main -benchmarks="seekrandom" -key_size=32 -value_size=512 -num=5000000 -use_direct_reads=true -seek_nexts=327680 -duration=120 -ops_between_duration_checks=1 -async_io=1 Reviewed By: riversand963 Differential Revision: D35058482 Pulled By: akankshamahajan15 fbshipit-source-id: 1522b638c79f6d85bb7408c67f6ab76dbabeeee7	4 years ago
Mark Callaghan	63a284a6ad	For db_bench --benchmarks=fillseq with --num_multi_db load databases … (#9713 ) Summary: …in order This fixes https://github.com/facebook/rocksdb/issues/9650 For db_bench --benchmarks=fillseq --num_multi_db=X it loads databases in sequence rather than randomly choosing a database per Put. The benefits are: 1) avoids long delays between flushing memtables 2) avoids flushing memtables for all of them at the same point in time 3) puts same number of keys per database so that query tests will find keys as expected Pull Request resolved: https://github.com/facebook/rocksdb/pull/9713 Test Plan: Using db_bench.1 without the change and db_bench.2 with the change: for i in 1 2; do rm -rf /data/m/rx/* ; time ./db_bench.$i --db=/data/m/rx --benchmarks=fillseq --num_multi_db=4 --num=10000000; du -hs /data/m/rx ; done --- without the change fillseq : 3.188 micros/op 313682 ops/sec; 34.7 MB/s real 2m7.787s user 1m52.776s sys 0m46.549s 2.7G /data/m/rx --- with the change fillseq : 3.149 micros/op 317563 ops/sec; 35.1 MB/s real 2m6.196s user 1m51.482s sys 0m46.003s 2.7G /data/m/rx Also, temporarily added a printf to confirm that the code switches to the next database at the right time ZZ switch to db 1 at 10000000 ZZ switch to db 2 at 20000000 ZZ switch to db 3 at 30000000 for i in 1 2; do rm -rf /data/m/rx/* ; time ./db_bench.$i --db=/data/m/rx --benchmarks=fillseq,readrandom --num_multi_db=4 --num=100000; du -hs /data/m/rx ; done --- without the change, smaller database, note that not all keys are found by readrandom because databases have < and > --num keys fillseq : 3.176 micros/op 314805 ops/sec; 34.8 MB/s readrandom : 1.913 micros/op 522616 ops/sec; 57.7 MB/s (99873 of 100000 found) --- with the change, smaller database, note that all keys are found by readrandom fillseq : 3.110 micros/op 321566 ops/sec; 35.6 MB/s readrandom : 1.714 micros/op 583257 ops/sec; 64.5 MB/s (100000 of 100000 found) Reviewed By: jay-zhuang Differential Revision: D35030168 Pulled By: mdcallag fbshipit-source-id: 2a18c4ec571d954cf5a57b00a11802a3608823ee	4 years ago
Mark Callaghan	1ca1562e35	Make mixgraph easier to use (#9711 ) Summary: Changes: * improves monitoring by displaying average size of a Put value and average scan length * forces the minimum value size to be 10. Before this it was 0 if you didn't set the distribution parameters. * uses reasonable defaults for the distribution parameters that determine value size and scan length * includes seeks in "reads ... found" message, before this they were missing This is for https://github.com/facebook/rocksdb/issues/9672 Pull Request resolved: https://github.com/facebook/rocksdb/pull/9711 Test Plan: Before this change: ./db_bench --benchmarks=fillseq,mixgraph --mix_get_ratio=50 --mix_put_ratio=25 --mix_seek_ratio=25 --num=100000 --value_k=0.2615 --value_sigma=25.45 --iter_k=2.517 --iter_sigma=14.236 fillseq : 4.289 micros/op 233138 ops/sec; 25.8 MB/s mixgraph : 18.461 micros/op 54166 ops/sec; 755.0 MB/s ( Gets:50164 Puts:24919 Seek:24917 of 50164 in 75081 found) After this change: ./db_bench --benchmarks=fillseq,mixgraph --mix_get_ratio=50 --mix_put_ratio=25 --mix_seek_ratio=25 --num=100000 --value_k=0.2615 --value_sigma=25.45 --iter_k=2.517 --iter_sigma=14.236 fillseq : 3.974 micros/op 251553 ops/sec; 27.8 MB/s mixgraph : 16.722 micros/op 59795 ops/sec; 833.5 MB/s ( Gets:50164 Puts:24919 Seek:24917, reads 75081 in 75081 found, avg size: 36.0 value, 504.9 scan) Reviewed By: jay-zhuang Differential Revision: D35030190 Pulled By: mdcallag fbshipit-source-id: d8f555f28d869f752ddb674a524108884511b151	4 years ago
Peter Dillinger	a8a422e962	Add manifest fix-up utility for file temperatures (#9683 ) Summary: The goal of this change is to allow changes to the "current" (in FileSystem) file temperatures to feed back into DB metadata, so that they can inform decisions and stats reporting. In part because of modular code factoring, it doesn't seem easy to do this automagically, where opening an SST file and observing current Temperature different from expected would trigger a change in metadata and DB manifest write (essentially giving the deep read path access to the write path). It is also difficult to do this while the DB is open because of the limitations of LogAndApply. This change allows updating file temperature metadata on a closed DB using an experimental utility function UpdateManifestForFilesState() or `ldb update_manifest --update_temperatures`. This should suffice for "migration" scenarios where outside tooling has placed or re-arranged DB files into a (different) tiered configuration without going through RocksDB itself (currently, only compaction can change temperature metadata). Some details: * Refactored and added unit test for `ldb unsafe_remove_sst_file` because of shared functionality * Pulled in autovector.h changes from https://github.com/facebook/rocksdb/issues/9546 to fix SuperVersionContext move constructor (related to an older draft of this change) Possible follow-up work: * Support updating manifest with file checksums, such as when a new checksum function is used and want existing DB metadata updated for it. * It's possible that for some repair scenarios, lighter weight than full repair, we might want to support UpdateManifestForFilesState() to modify critical file details like size or checksum using same algorithm. But let's make sure these are differentiated from modifying file details in ways that don't suspect corruption (or require extreme trust). Pull Request resolved: https://github.com/facebook/rocksdb/pull/9683 Test Plan: unit tests added Reviewed By: jay-zhuang Differential Revision: D34798828 Pulled By: pdillinger fbshipit-source-id: cfd83e8fb10761d8c9e7f9c020d68c9106a95554	4 years ago
Peter Dillinger	cff0d1e8e6	New backup meta schema, with file temperatures (#9660 ) Summary: The primary goal of this change is to add support for backing up and restoring (applying on restore) file temperature metadata, without committing to either the DB manifest or the FS reported "current" temperatures being exclusive "source of truth". To achieve this goal, we need to add temperature information to backup metadata, which requires updated backup meta schema. Fortunately I prepared for this in https://github.com/facebook/rocksdb/issues/8069, which began forward compatibility in version 6.19.0 for this kind of schema update. (Previously, backup meta schema was not extensible! Making this schema update public will allow some other "nice to have" features like taking backups with hard links, and avoiding crc32c checksum computation when another checksum is already available.) While schema version 2 is newly public, the default schema version is still 1. Until we change the default, users will need to set to 2 to enable features like temperature data backup+restore. New metadata like temperature information will be ignored with a warning in versions before this change and since 6.19.0. The metadata is considered ignorable because a functioning DB can be restored without it. Some detail: * Some renaming because "future schema" is now just public schema 2. * Initialize some atomics in TestFs (linter reported) * Add temperature hint support to SstFileDumper (used by BackupEngine) Pull Request resolved: https://github.com/facebook/rocksdb/pull/9660 Test Plan: related unit test majorly updated for the new functionality, including some shared testing support for tracking temperatures in a FS. Some other tests and testing hooks into production code also updated for making the backup meta schema change public. Reviewed By: ajkr Differential Revision: D34686968 Pulled By: pdillinger fbshipit-source-id: 3ac1fa3e67ee97ca8a5103d79cc87d872c1d862a	4 years ago

1 2 3 4 5 ...

1401 Commits (5a5f21c48969ea248cef5f7f35cbba0ba6c84253)