rocksdb

Commit Graph

Author	SHA1	Message	Date
Andrew Kryczka	78ee8564ad	Integrity protection for live updates to WriteBatch (#7748 ) Summary: This PR adds the foundation classes for key-value integrity protection and the first use case: protecting live updates from the source buffers added to `WriteBatch` through the destination buffer in `MemTable`. The width of the protection info is not yet configurable -- only eight bytes per key is supported. This PR allows users to enable protection by constructing `WriteBatch` with `protection_bytes_per_key == 8`. It does not yet expose a way for users to get integrity protection via other write APIs (e.g., `Put()`, `Merge()`, `Delete()`, etc.). The foundation classes (`ProtectionInfo.`) embed the coverage info in their type, and provide `Protect.()` and `Strip.()` functions to navigate between types with different coverage. For making bytes per key configurable (for powers of two up to eight) in the future, these classes are templated on the unsigned integer type used to store the protection info. That integer contains the XOR'd result of hashes with independent seeds for all covered fields. For integer fields, the hash is computed on the raw unadjusted bytes, so the result is endian-dependent. The most significant bytes are truncated when the hash value (8 bytes) is wider than the protection integer. When `WriteBatch` is constructed with `protection_bytes_per_key == 8`, we hold a `ProtectionInfoKVOTC` (i.e., one that covers key, value, optype aka `ValueType`, timestamp, and CF ID) for each entry added to the batch. The protection info is generated from the original buffers passed by the user, as well as the original metadata generated internally. When writing to memtable, each entry is transformed to a `ProtectionInfoKVOTS` (i.e., dropping coverage of CF ID and adding coverage of sequence number), since at that point we know the sequence number, and have already selected a memtable corresponding to a particular CF. This protection info is verified once the entry is encoded in the `MemTable` buffer. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7748 Test Plan: - an integration test to verify a wide variety of single-byte changes to the encoded `MemTable` buffer are caught - add to stress/crash test to verify it works in variety of configs/operations without intentional corruption - [deferred] unit tests for `ProtectionInfo.` classes for edge cases like KV swap, `SliceParts` and `Slice` APIs are interchangeable, etc. Reviewed By: pdillinger Differential Revision: D25754492 Pulled By: ajkr fbshipit-source-id: e481bac6c03c2ab268be41359730f1ceb9964866	4 years ago
Zhichao Cao	04b3524ad0	Inject the random write error to stress test (#7653 ) Summary: Inject the random write error to stress test, it requires set reopen=0 and disable_wal=true. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7653 Test Plan: pass db_stress and python3 db_crashtest.py blackbox Reviewed By: ajkr Differential Revision: D25354132 Pulled By: zhichao-cao fbshipit-source-id: 44721104eecb416e27f65f854912c40e301dd669	4 years ago
Peter Dillinger	60af964372	Experimental (production candidate) SST schema for Ribbon filter (#7658 ) Summary: Added experimental public API for Ribbon filter: NewExperimentalRibbonFilterPolicy(). This experimental API will take a "Bloom equivalent" bits per key, and configure the Ribbon filter for the same FP rate as Bloom would have but ~30% space savings. (Note: optimize_filters_for_memory is not yet implemented for Ribbon filter. That can be added with no effect on schema.) Internally, the Ribbon filter is configured using a "one_in_fp_rate" value, which is 1 over desired FP rate. For example, use 100 for 1% FP rate. I'm expecting this will be used in the future for configuring Bloom-like filters, as I expect people to more commonly hold constant the filter accuracy and change the space vs. time trade-off, rather than hold constant the space (per key) and change the accuracy vs. time trade-off, though we might make that available. ### Benchmarking ``` $ ./filter_bench -impl=2 -quick -m_keys_total_max=200 -average_keys_per_filter=100000 -net_includes_hashing Building... Build avg ns/key: 34.1341 Number of filters: 1993 Total size (MB): 238.488 Reported total allocated memory (MB): 262.875 Reported internal fragmentation: 10.2255% Bits/key stored: 10.0029 ---------------------------- Mixed inside/outside queries... Single filter net ns/op: 18.7508 Random filter net ns/op: 258.246 Average FP rate %: 0.968672 ---------------------------- Done. (For more info, run with -legend or -help.) $ ./filter_bench -impl=3 -quick -m_keys_total_max=200 -average_keys_per_filter=100000 -net_includes_hashing Building... Build avg ns/key: 130.851 Number of filters: 1993 Total size (MB): 168.166 Reported total allocated memory (MB): 183.211 Reported internal fragmentation: 8.94626% Bits/key stored: 7.05341 ---------------------------- Mixed inside/outside queries... Single filter net ns/op: 58.4523 Random filter net ns/op: 363.717 Average FP rate %: 0.952978 ---------------------------- Done. (For more info, run with -legend or -help.) ``` 168.166 / 238.488 = 0.705 -> 29.5% space reduction 130.851 / 34.1341 = 3.83x construction time for this Ribbon filter vs. lastest Bloom filter (could make that as little as about 2.5x for less space reduction) ### Working around a hashing "flaw" bloom_test discovered a flaw in the simple hashing applied in StandardHasher when num_starts == 1 (num_slots == 128), showing an excessively high FP rate. The problem is that when many entries, on the order of number of hash bits or kCoeffBits, are associated with the same start location, the correlation between the CoeffRow and ResultRow (for efficiency) can lead to a solution that is "universal," or nearly so, for entries mapping to that start location. (Normally, variance in start location breaks the effective association between CoeffRow and ResultRow; the same value for CoeffRow is effectively different if start locations are different.) Without kUseSmash and with num_starts > 1 (thus num_starts ~= num_slots), this flaw should be completely irrelevant. Even with 10M slots, the chances of a single slot having just 16 (or more) entries map to it--not enough to cause an FP problem, which would be local to that slot if it happened--is 1 in millions. This spreadsheet formula shows that: =1/(10000000(1 - POISSON(15, 1, TRUE))) As kUseSmash==false (the setting for Standard128RibbonBitsBuilder) is intended for CPU efficiency of filters with many more entries/slots than kCoeffBits, a very reasonable work-around is to disallow num_starts==1 when !kUseSmash, by making the minimum non-zero number of slots 2kCoeffBits. This is the work-around I've applied. This also means that the new Ribbon filter schema (Standard128RibbonBitsBuilder) is not space-efficient for less than a few hundred entries. Because of this, I have made it fall back on constructing a Bloom filter, under existing schema, when that is more space efficient for small filters. (We can change this in the future if we want.) TODO: better unit tests for this case in ribbon_test, and probably update StandardHasher for kUseSmash case so that it can scale nicely to small filters. ### Other related changes * Add Ribbon filter to stress/crash test * Add Ribbon filter to filter_bench as -impl=3 * Add option string support, as in "filter_policy=experimental_ribbon:5.678;" where 5.678 is the Bloom equivalent bits per key. * Rename internal mode BloomFilterPolicy::kAuto to kAutoBloom * Add a general BuiltinFilterBitsBuilder::CalculateNumEntry based on binary searching CalculateSpace (inefficient), so that subclasses (especially experimental ones) don't have to provide an efficient implementation inverting CalculateSpace. * Minor refactor FastLocalBloomBitsBuilder for new base class XXH3pFilterBitsBuilder shared with new Standard128RibbonBitsBuilder, which allows the latter to fall back on Bloom construction in some extreme cases. * Mostly updated bloom_test for Ribbon filter, though a test like FullBloomTest::Schema is a next TODO to ensure schema stability (in case this becomes production-ready schema as it is). * Add some APIs to ribbon_impl.h for configuring Ribbon filters. Although these are reasonably covered by bloom_test, TODO more unit tests in ribbon_test * Added a "tool" FindOccupancyForSuccessRate to ribbon_test to get data for constructing the linear approximations in GetNumSlotsFor95PctSuccess. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7658 Test Plan: Some unit tests updated but other testing is left TODO. This is considered experimental but laying down schema compatibility as early as possible in case it proves production-quality. Also tested in stress/crash test. Reviewed By: jay-zhuang Differential Revision: D24899349 Pulled By: pdillinger fbshipit-source-id: 9715f3e6371c959d923aea8077c9423c7a9f82b8	4 years ago
Andrew Kryczka	75d3b6fdf0	Redesign block cache pinning API (#7520 ) Summary: The old flag-based APIs (`BlockBasedTableOptions::pin_l0_filter_and_index_blocks_in_cache` and `BlockBasedTableOptions::pin_top_level_index_and_filter`) were insufficient for our needs. For example, it was impossible to pin only unpartitioned meta-blocks, which could prevent block cache contention when turning on dictionary compression or during a migration to partitioned indexes/filters. It was also impossible to pin all meta-blocks in memory while having predictable memory usage via block cache. If we had continued adding flags to address these scenarios, they would have had significant overlap causing confusion. Instead, this PR deprecates the flags and starts a new API with non-overlapping options. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7520 Test Plan: - new unit test - added new options to stress/crash test and ran for a while: `$ python tools/db_crashtest.py blackbox --simple --max_key=1000000 -write_buffer_size=1048576 -target_file_size_base=1048576 -max_bytes_for_level_base=4194304 --interval=10 -value_size_mult=33 -column_families=1 -reopen=0` Reviewed By: pdillinger Differential Revision: D24200034 Pulled By: ajkr fbshipit-source-id: 3fa7cfc71e7960f7a867511dd6ae5834dd73b13e	4 years ago
sdong	aedcaaef99	Stress test to support paranoid_file_checks (#7473 ) Summary: It's important to make sure no false positive is reported when options.paranoid_file_checks is used. Add it to stress test and a place holder in crash test. It is disabled in crash test as there appears to be a bug causing false positive. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7473 Test Plan: Run crash test Reviewed By: ajkr Differential Revision: D24026939 fbshipit-source-id: 89102acb45cf041776775ce44a4eef4b0f3a380c	4 years ago
Peter Dillinger	06ad5dd293	Add file checksum to stress/crash test (#7343 ) Summary: This change has the crash test randomly select from a few file checksum implementations, or nullptr, for DB file_checksum_gen_factory. For compatibility across runs on same DB, each non-null factory can understand all the other functions, but the default changes. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7343 Test Plan: 'make blackbox_crash_test' for a while, including with some debug output to ensure code is being exercised. Reviewed By: zhichao-cao Differential Revision: D23494580 Pulled By: pdillinger fbshipit-source-id: 73bbc7ca32c1adaf619134c0c830f12894880b8a	4 years ago
Peter Dillinger	499c9448d0	Fix, enable, and enhance backup/restore in db_stress (#7348 ) Summary: Although added to db_stress, testing of backup/restore was never integrated into the crash test, originally concerned about performance. I've enabled it now and to address the peformance concern, testing backup/restore is always skipped once the db exceeds a certain size threshold, default 100MB. This should provide sufficient opportunity for testing BackupEngine without bogging down everything else with heavier and heavier operations. Also fixed backup/restore in db_stress by making sure PurgeOldBackups can remove manifest files, which are normally kept around for db_stress. Added more coverage of backup options, and up to three backups being saved in one backup directory (in some cases). Pull Request resolved: https://github.com/facebook/rocksdb/pull/7348 Test Plan: ran 'make blackbox_crash_test' for a while, with heightened probabilitly of taking backups (1/10k). Also confirmed with some debug output that the code is being covered, TestBackupRestore only takes a few seconds to complete when triggered, and even at 1/10k and ~50MB database, there's <,~ 1 thread testing backups at any time. Reviewed By: ajkr Differential Revision: D23510835 Pulled By: pdillinger fbshipit-source-id: b6b8735591808141f81f10773ac31634cf03b6c0	4 years ago
Hans Holmberg	2a0d3c7054	Add a file system parameter: --fs_uri to db_stress and db_bench (#6878 ) Summary: This pull request adds the parameter --fs_uri to db_bench and db_stress, creating a composite env combining the default env with a specified registered rocksdb file system. This makes it easier to develop and test new RocksDB FileSystems. The pull request also registers the posix file system for testing purposes. Examples: ``` $./db_bench --fs_uri=posix:// --benchmarks=fillseq $./db_stress --fs_uri=zenfs://nullb1 ``` zenfs is a RocksDB FileSystem I'm developing to add support for zoned block devices, and in that case the zoned block device is specified in the uri (a zoned null block device in the above example). Pull Request resolved: https://github.com/facebook/rocksdb/pull/6878 Reviewed By: siying Differential Revision: D23023063 Pulled By: ajkr fbshipit-source-id: 8b3fe7193ce45e683043b021779b7a4d547af247	4 years ago
Andrew Kryczka	7eebe6d38a	Mark files for compaction in stress/crash tests (#7231 ) Summary: The mechanism to mark files for compaction is most commonly used in delete-triggered compaction. This PR adds an option to exercise the marking mechanism on random files created by db_stress. This PR also enables that option in db_crashtest.py on its db_stress runs at random. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7231 Test Plan: - ran some minified crash tests; verified they succeed and we see `"compaction_reason": "FilesMarkedForCompaction"` regularly in the logs. ``` $ TEST_TMPDIR=/dev/shm python tools/db_crashtest.py blackbox --duration=600 --interval=30 --max_key=10000000 --write_buffer_size=1048576 --target_file_size_base=1048576 --max_bytes_for_level_base=4194304 --value_size_mult=33 $ TEST_TMPDIR=/dev/shm python tools/db_crashtest.py whitebox --duration=600 --interval=30 --max_key=1000000 --write_buffer_size=1048576 --target_file_size_base=1048576 --max_bytes_for_level_base=4194304 --value_size_mult=33 --random_kill_odd=8887 ``` Reviewed By: anand1976 Differential Revision: D23025156 Pulled By: ajkr fbshipit-source-id: a404c467ebc12afa94dae35956ea9b372f592a96	4 years ago
Jay Zhuang	fc4d5f5065	Add stress test for GetProperty (#7111 ) Summary: Add stress test coverage for `DB::GetProperty()`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7111 Test Plan: ``` ./db_stress -get_property_one_in=1 make crash_test ``` Reviewed By: ajkr Differential Revision: D22487906 Pulled By: jay-zhuang fbshipit-source-id: c118d95cc9b4e2fa669a06e6aa531541fa885dc5	5 years ago
Peter Dillinger	5b2bbacb6f	Minimize memory internal fragmentation for Bloom filters (#6427 ) Summary: New experimental option BBTO::optimize_filters_for_memory builds filters that maximize their use of "usable size" from malloc_usable_size, which is also used to compute block cache charges. Rather than always "rounding up," we track state in the BloomFilterPolicy object to mix essentially "rounding down" and "rounding up" so that the average FP rate of all generated filters is the same as without the option. (YMMV as heavily accessed filters might be unluckily lower accuracy.) Thus, the option near-minimizes what the block cache considers as "memory used" for a given target Bloom filter false positive rate and Bloom filter implementation. There are no forward or backward compatibility issues with this change, though it only works on the format_version=5 Bloom filter. With Jemalloc, we see about 10% reduction in memory footprint (and block cache charge) for Bloom filters, but 1-2% increase in storage footprint, due to encoding efficiency losses (FP rate is non-linear with bits/key). Why not weighted random round up/down rather than state tracking? By only requiring malloc_usable_size, we don't actually know what the next larger and next smaller usable sizes for the allocator are. We pick a requested size, accept and use whatever usable size it has, and use the difference to inform our next choice. This allows us to narrow in on the right balance without tracking/predicting usable sizes. Why not weight history of generated filter false positive rates by number of keys? This could lead to excess skew in small filters after generating a large filter. Results from filter_bench with jemalloc (irrelevant details omitted): (normal keys/filter, but high variance) $ ./filter_bench -quick -impl=2 -average_keys_per_filter=30000 -vary_key_count_ratio=0.9 Build avg ns/key: 29.6278 Number of filters: 5516 Total size (MB): 200.046 Reported total allocated memory (MB): 220.597 Reported internal fragmentation: 10.2732% Bits/key stored: 10.0097 Average FP rate %: 0.965228 $ ./filter_bench -quick -impl=2 -average_keys_per_filter=30000 -vary_key_count_ratio=0.9 -optimize_filters_for_memory Build avg ns/key: 30.5104 Number of filters: 5464 Total size (MB): 200.015 Reported total allocated memory (MB): 200.322 Reported internal fragmentation: 0.153709% Bits/key stored: 10.1011 Average FP rate %: 0.966313 (very few keys / filter, optimization not as effective due to ~59 byte internal fragmentation in blocked Bloom filter representation) $ ./filter_bench -quick -impl=2 -average_keys_per_filter=1000 -vary_key_count_ratio=0.9 Build avg ns/key: 29.5649 Number of filters: 162950 Total size (MB): 200.001 Reported total allocated memory (MB): 224.624 Reported internal fragmentation: 12.3117% Bits/key stored: 10.2951 Average FP rate %: 0.821534 $ ./filter_bench -quick -impl=2 -average_keys_per_filter=1000 -vary_key_count_ratio=0.9 -optimize_filters_for_memory Build avg ns/key: 31.8057 Number of filters: 159849 Total size (MB): 200 Reported total allocated memory (MB): 208.846 Reported internal fragmentation: 4.42297% Bits/key stored: 10.4948 Average FP rate %: 0.811006 (high keys/filter) $ ./filter_bench -quick -impl=2 -average_keys_per_filter=1000000 -vary_key_count_ratio=0.9 Build avg ns/key: 29.7017 Number of filters: 164 Total size (MB): 200.352 Reported total allocated memory (MB): 221.5 Reported internal fragmentation: 10.5552% Bits/key stored: 10.0003 Average FP rate %: 0.969358 $ ./filter_bench -quick -impl=2 -average_keys_per_filter=1000000 -vary_key_count_ratio=0.9 -optimize_filters_for_memory Build avg ns/key: 30.7131 Number of filters: 160 Total size (MB): 200.928 Reported total allocated memory (MB): 200.938 Reported internal fragmentation: 0.00448054% Bits/key stored: 10.1852 Average FP rate %: 0.963387 And from db_bench (block cache) with jemalloc: $ ./db_bench -db=/dev/shm/dbbench.no_optimize -benchmarks=fillrandom -format_version=5 -value_size=90 -bloom_bits=10 -num=2000000 -threads=8 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=false $ ./db_bench -db=/dev/shm/dbbench -benchmarks=fillrandom -format_version=5 -value_size=90 -bloom_bits=10 -num=2000000 -threads=8 -optimize_filters_for_memory -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=false $ (for FILE in /dev/shm/dbbench.no_optimize/.sst; do ./sst_dump --file=$FILE --show_properties \| grep 'filter block' ; done) \| awk '{ t += $4; } END { print t; }' 17063835 $ (for FILE in /dev/shm/dbbench/.sst; do ./sst_dump --file=$FILE --show_properties \| grep 'filter block' ; done) \| awk '{ t += $4; } END { print t; }' 17430747 $ #^ 2.1% additional filter storage $ ./db_bench -db=/dev/shm/dbbench.no_optimize -use_existing_db -benchmarks=readrandom,stats -statistics -bloom_bits=10 -num=2000000 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=false -duration=10 -cache_index_and_filter_blocks -cache_size=1000000000 rocksdb.block.cache.index.add COUNT : 33 rocksdb.block.cache.index.bytes.insert COUNT : 8440400 rocksdb.block.cache.filter.add COUNT : 33 rocksdb.block.cache.filter.bytes.insert COUNT : 21087528 rocksdb.bloom.filter.useful COUNT : 4963889 rocksdb.bloom.filter.full.positive COUNT : 1214081 rocksdb.bloom.filter.full.true.positive COUNT : 1161999 $ #^ 1.04 % observed FP rate $ ./db_bench -db=/dev/shm/dbbench -use_existing_db -benchmarks=readrandom,stats -statistics -bloom_bits=10 -num=2000000 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=false -optimize_filters_for_memory -duration=10 -cache_index_and_filter_blocks -cache_size=1000000000 rocksdb.block.cache.index.add COUNT : 33 rocksdb.block.cache.index.bytes.insert COUNT : 8448592 rocksdb.block.cache.filter.add COUNT : 33 rocksdb.block.cache.filter.bytes.insert COUNT : 18220328 rocksdb.bloom.filter.useful COUNT : 5360933 rocksdb.bloom.filter.full.positive COUNT : 1321315 rocksdb.bloom.filter.full.true.positive COUNT : 1262999 $ #^ 1.08 % observed FP rate, 13.6% less memory usage for filters (Due to specific key density, this example tends to generate filters that are "worse than average" for internal fragmentation. "Better than average" cases can show little or no improvement.) Pull Request resolved: https://github.com/facebook/rocksdb/pull/6427 Test Plan: unit test added, 'make check' with gcc, clang and valgrind Reviewed By: siying Differential Revision: D22124374 Pulled By: pdillinger fbshipit-source-id: f3e3aa152f9043ddf4fae25799e76341d0d8714e	5 years ago
Peter Dillinger	88b4210701	Remove racially charged terms "whitelist" and "blacklist" (#7008 ) Summary: We don't need them. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7008 Test Plan: "make check" and ensure "make crash_test" starts Reviewed By: ajkr Differential Revision: D22143838 Pulled By: pdillinger fbshipit-source-id: 72c8e16603abc59f4954e304466bc4dc1f58f94e	5 years ago
Andrew Kryczka	775dc623ad	add `CompactionFilter` to stress/crash tests (#6988 ) Summary: Added a `CompactionFilter` that is aware of the stress test's expected state. It only drops key versions that are already covered according to the expected state. It is incompatible with snapshots (same as all `CompactionFilter`s), so disables all snapshot-related features when used in the crash test. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6988 Test Plan: running a minified blackbox crash test ``` $ TEST_TMPDIR=/dev/shm python tools/db_crashtest.py blackbox --max_key=1000000 -write_buffer_size=1048576 -max_bytes_for_level_base=4194304 -target_file_size_base=1048576 -value_size_mult=33 --interval=10 --duration=3600 ``` Reviewed By: anand1976 Differential Revision: D22072888 Pulled By: ajkr fbshipit-source-id: 727b9d7a90d5eab18be0ec6cd5a810712ac13320	5 years ago
Yanqin Jin	15d9f28da5	Add stress test for best-efforts recovery (#6819 ) Summary: Add crash test for the case of best-efforts recovery. After a certain amount of time, we kill the db_stress process, randomly delete some certain table files and restart db_stress. Given the randomness of file deletion, it is difficult to verify against a reference for data correctness. Therefore, we just check that the db can restart successfully. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6819 Test Plan: ``` ./db_stress -best_efforts_recovery=true -disable_wal=1 -reopen=0 ./db_stress -best_efforts_recovery=true -disable_wal=0 -skip_verifydb=1 -verify_db_one_in=0 -continuous_verification_interval=0 make crash_test_with_best_efforts_recovery ``` Reviewed By: anand1976 Differential Revision: D21436753 Pulled By: riversand963 fbshipit-source-id: 0b3605c922a16c37ed17d5ab6682ca4240e47926	5 years ago
Andrew Kryczka	1f20df2f38	cover single level universal in crash test (#6818 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6818 Test Plan: fast whitebox test and verify there are some single-level universal and some multi-level universal runs. ``` $ python ./tools/db_crashtest.py whitebox --simple -max_key=1000000 -value_size_mult=33 -write_buffer_size=524288 -target_file_size_base=524288 -max_bytes_for_level_base=2097152 --duration=120 --interval=10 --ops_per_thread=1000 --random_kill_odd=887 ``` Reviewed By: riversand963 Differential Revision: D21432138 Pulled By: ajkr fbshipit-source-id: 2fc5ba9f3dfa49bb11e81da7dd00a17b476e64d7	5 years ago
Ziyue Yang	e619a20e93	Add an option for parallel compression in for db_stress (#6722 ) Summary: This commit adds an `compression_parallel_threads` option in db_stress. It also fixes the naming of parallel compression option in db_bench to keep it aligned with others. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6722 Reviewed By: pdillinger Differential Revision: D21091385 fbshipit-source-id: c9ba8c4e5cc327ff9e6094a6dc6a15fcff70f100	5 years ago
Cheng Chang	0a77617820	Disable O_DIRECT in stress test when db directory does not support direct IO (#6727 ) Summary: In crash test, the db directory might be set to /dev/shm or /tmp, in certain environments such as internal testing infrastructure, neither of these directories support direct IO, so direct IO is never enabled in crash test. This PR sets up SyncPoints in direct IO related code paths to disable O_DIRECT flag in calls to `open`, so the direct IO code paths will be executed, all direct IO related assertions will be checked, but no real direct IO request will be issued to the file system. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6727 Test Plan: export CRASH_TEST_EXT_ARGS="--use_direct_reads=1 --mmap_read=0" make -j24 crash_test Reviewed By: zhichao-cao Differential Revision: D21139250 Pulled By: cheng-chang fbshipit-source-id: db9adfe78d91aa4759835b1af91c5db7b27b62ee	5 years ago
sdong	73523baeb1	crash_test to cover options.avoid_flush_during_recovery (#6712 ) Summary: Options.avoid_flush_during_recovery is uncovered in crash_test. Add the coverage with a chance of 1/8, as it is a less frequently used options. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6712 Test Plan: Run crash_test and see the option can be used or not used by chance. Reviewed By: ltamasi Differential Revision: D21056566 fbshipit-source-id: c3b1521517cfc204786e6ef8c6acd7fffda64793	5 years ago
Yueh-Hsuan Chiang	5801af4646	Add env_fault_injection argument to db_stress (#6687 ) Summary: Add env_fault_injection argument to db_stress. When enabled, FaultInjectionTestEnv will be used instead. Currently this option does not support running with other env setting. This will allow us to later manually produce error when running db_crashtest. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6687 Test Plan: make db_stress -j32 ./db_stress --env_fault_injection ./db_stress --env_fault_injection --hdfs // expect error message Reviewed By: ajkr Differential Revision: D21014683 Pulled By: yhchiang fbshipit-source-id: 0724aeac37efd57adb72a37defe6dbd3bfa8106a	5 years ago
anand76	5c19a441c4	Fault injection in db_stress (#6538 ) Summary: This PR implements a fault injection mechanism for injecting errors in reads in db_stress. The FaultInjectionTestFS is used for this purpose. A thread local structure is used to track the errors, so that each db_stress thread can independently enable/disable error injection and verify observed errors against expected errors. This is initially enabled only for Get and MultiGet, but can be extended to iterator as well once its proven stable. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6538 Test Plan: crash_test make check Reviewed By: riversand963 Differential Revision: D20714347 Pulled By: anand1976 fbshipit-source-id: d7598321d4a2d72bda0ced57411a337a91d87dc7	5 years ago
Levi Tamasi	217ce20021	Remove GetSortedWalFiles/GetCurrentWalFile from the crash test (#6491 ) Summary: Currently, `db_stress` tests a randomly picked one of `GetLiveFiles`, `GetSortedWalFiles`, and `GetCurrentWalFile` with a 1/N chance when the command line parameter `get_live_files_and_wal_files_one_in` is specified. The problem is that `GetSortedWalFiles` and `GetCurrentWalFile` are unreliable in the sense that they can return errors if another thread removes a WAL file while they are executing (which is a perfectly plausible and legitimate scenario). The patch splits this command line parameter into three (one for each API), and changes the crash test script so that only `GetLiveFiles` is tested during our continuous crash test runs. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6491 Test Plan: ``` make check python tools/db_crashtest.py whitebox ``` Reviewed By: siying Differential Revision: D20312200 Pulled By: ltamasi fbshipit-source-id: e7c3481eddfe3bd3d5349476e34abc9eee5b7dc8	5 years ago
Andrew Kryczka	f52db84650	support SstFileManager in db_stress (#6454 ) Summary: Add some flags for configuring an SstFileManager. An SstFileManager is only created when one or more of these flags are set. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6454 Test Plan: - ran it a while: ``` $ python ./tools/db_crashtest.py blackbox --simple -max_key=100000 -write_buffer_size=131072 -target_file_size_base=131072 -max_bytes_for_level_base=524288 -value_size_mult=33 --interval=10 -max_background_compactions=4 -max_background_flushes=2 -sst_file_manager_bytes_per_sec=1048576 ``` - verified with strace the SstFileManager is behaving as configured: ``` $ strace -fp `pidof db_stress` -e ftruncate,unlink ... [pid 3074805] ftruncate(9</tmp/rocksdb_crashtest_blackbox6OJywh/000070.sst.trash>, 67423) = 0 [pid 3074805] ftruncate(9</tmp/rocksdb_crashtest_blackbox6OJywh/000070.sst.trash>, 51039) = 0 [pid 3074805] ftruncate(9</tmp/rocksdb_crashtest_blackbox6OJywh/000070.sst.trash>, 34655) = 0 [pid 3074805] ftruncate(9</tmp/rocksdb_crashtest_blackbox6OJywh/000070.sst.trash>, 18271) = 0 [pid 3074805] ftruncate(9</tmp/rocksdb_crashtest_blackbox6OJywh/000070.sst.trash>, 1887) = 0 [pid 3074805] unlink("/tmp/rocksdb_crashtest_blackbox6OJywh/000070.sst.trash") = 0 ... ``` Differential Revision: D20103315 Pulled By: ajkr fbshipit-source-id: b3e1092747157459d244b047947a979b85c98f48	5 years ago
sdong	fdf882ded2	Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433 ) Summary: When dynamically linking two binaries together, different builds of RocksDB from two sources might cause errors. To provide a tool for user to solve the problem, the RocksDB namespace is changed to a flag which can be overridden in build time. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6433 Test Plan: Build release, all and jtest. Try to build with ROCKSDB_NAMESPACE with another flag. Differential Revision: D19977691 fbshipit-source-id: aa7f2d0972e1c31d75339ac48478f34f6cfcfb3e	5 years ago
anand76	687119aeaf	Variable key length in db_stress (#6273 ) Summary: Undo https://github.com/facebook/rocksdb/issues/6243 and fix the crash test failures. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6273 Test Plan: Run make ubsan_crash_test Differential Revision: D19331472 Pulled By: anand1976 fbshipit-source-id: 30aa4a36c1b0f77a97159d82bbfd1cd767878e28	5 years ago
Peter Dillinger	37fd2b9694	Revert "Generate variable length keys in db_stress (#6165 )" and follow-ups (#6243 ) Summary: This commit is suspected in some crash test failures such as Verification failed for column family 0 key 78438077: Value not found: NotFound: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6243 Test Plan: 'make check' and start 'make crash_test' Differential Revision: D19220495 Pulled By: pdillinger fbshipit-source-id: 6c4709cee80ab4344e06ce360f51e947d79fb3fa	5 years ago
sdong	79cc8dc29b	db_stress: cover approximate size (#6213 ) Summary: db_stress to execute DB::GetApproximateSizes() with randomized keys and options. Return value is not validated but error will be reported. Two ways to generate the range keys: (1) two random keys; (2) a small range. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6213 Test Plan: (1) run "make crash_test" for a while; (2) hack the code to ingest some errors to see it is reported. Differential Revision: D19204665 fbshipit-source-id: 652db36f13bcb5a3bd8fe4a10c0aa22a77a0bce2	5 years ago
anand76	3160edfdc7	Generate variable length keys in db_stress (#6165 ) Summary: Currently, db_stress generates fixed length keys of 8 bytes. This patch adds the ability to generate variable length keys. Most of the db_stress code continues to work with a numeric key randomly generated, and the numeric key also acts as an index into the values_ array. The numeric key is mapped to a variable length string key in a deterministic way. Furthermore, the ordering is preserved. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6165 Test Plan: run make crash_test Differential Revision: D19204646 Pulled By: anand1976 fbshipit-source-id: d2d46a96615b4832a8be2a981f5913905f0e1ca7	5 years ago
sdong	338c149b92	crash_test to cover bottommost compression and some other changes (#6215 ) Summary: Several improvements to crash_test/stress_test: (1) Stress_test to support an parameter of bottommost compression (2) Rename those FLAGS_* variables that are not gflags to avoid confusion (3) Crash_test to randomly generate compression type for bottommost compression with half the chance. (4) Stress_test to sanitize unsupported compression type to snappy, so that crash_test to cover all possible compression types and people don't need to worry about they don't support all comrpession types in their environment. (5) In crash_test, when generating db_stress command, sort arguments in alphabeta order, so that it is easier to find value for a specific argument. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6215 Test Plan: Run "make crash_test" for a while and see the botommost option shown in LOG files. Differential Revision: D19171255 fbshipit-source-id: d7001e246c4ff9ee5760776eea0be97738650735	5 years ago
Zhichao Cao	f89dea4fec	db_stress: Added the verification for GetLiveFiles, GetSortedWalFiles and GetCurrentWalFile (#6224 ) Summary: Add the verification in operateDB to verify GetLiveFiles, GetSortedWalFiles and GetCurrentWalFile. The test will be called every 1 out of N, N is decided by get_live_files_and_wal_files_one_i, whose default is 1000000. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6224 Test Plan: pass db_stress default run. Differential Revision: D19183358 Pulled By: zhichao-cao fbshipit-source-id: 20073cf72ede77a3e0d3cf5f28304f1f605d2b1a	5 years ago
Levi Tamasi	786c3d45ed	Support BlobDB in db_stress (#6230 ) Summary: The patch adds support for BlobDB to `db_stress`. Note that BlobDB currently does not support (amongst other features) Column Families or the `SingleDelete` API, so for now, those should be disabled on the command line when running `db_stress` in BlobDB mode (using `-column_families=1` and `-nooverwritepercent=0`, respectively). Also, some BlobDB features that do not go well with the verification logic in `db_stress` like TTL and FIFO eviction are not supported currently. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6230 Test Plan: ``` ./db_stress -max_key=100000 -use_blob_db -column_families=1 -nooverwritepercent=0 -reopen=1 -blob_db_file_size=1000000 -target_file_size_base=1000000 -blob_db_enable_gc -blob_db_gc_cutoff=0.1 -blob_db_min_blob_size=10 -blob_db_bytes_per_sync=16384 ``` Differential Revision: D19191476 Pulled By: ltamasi fbshipit-source-id: 35840452af8c5e6095249c7fd9a53a119a0985fc	5 years ago
Yanqin Jin	670a916d01	Add more verification to db_stress (#6173 ) Summary: Currently, db_stress performs verification by calling `VerifyDb()` at the end of test and optionally before tests start. In case of corruption or incorrect result, it will be too late. This PR adds more verification in two ways. 1. For cf consistency test, each test thread takes a snapshot and verifies every N ops. N is configurable via `-verify_db_one_in`. This option is not supported in other stress tests. 2. For cf consistency test, we use another background thread in which a secondary instance periodically tails the primary (interval is configurable). We verify the secondary. Once an error is detected, we terminate the test and report. This does not affect other stress tests. Test plan (devserver) ``` $./db_stress -test_cf_consistency -verify_db_one_in=0 -ops_per_thread=100000 -continuous_verification_interval=100 $./db_stress -test_cf_consistency -verify_db_one_in=1000 -ops_per_thread=10000 -continuous_verification_interval=0 $make crash_test ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/6173 Differential Revision: D19047367 Pulled By: riversand963 fbshipit-source-id: aeed584ad71f9310c111445f34975e5ab47a0615	5 years ago
anand76	2afea29762	Add VerifyChecksum() to db_stress (#6203 ) Summary: Add an option to db_stress, verify_checksum_one_in, to call DB::VerifyChecksum() once every N ops. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6203 Differential Revision: D19145753 Pulled By: anand1976 fbshipit-source-id: d09edf21f309ad53aa40dd25b7a563d50665fd8b	5 years ago
sdong	bcc372c0c3	Add some new options to crash_test (#6176 ) Summary: Several options are trivially added to crash test and random values are picked. Made simple test run non-dynamic level and normal test run dynamic level. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6176 Test Plan: Run crash_test and watch the printing Differential Revision: D19053955 fbshipit-source-id: 958cb43c968541ebd87ed4d91e778bd1d40e7502	5 years ago
Zhichao Cao	fbda25f57a	db_stress: generate the key based on Zipfian distribution (hot key) (#6163 ) Summary: In the current db_stress, all the keys are generated randomly and follows the uniform distribution. In order to test some corner cases that some key are always updated or read, we need to generate the key based on other distributions. In this PR, the key is generated based on Zipfian distribution and the skewness can be controlled by setting hot_key_alpha (0.8 to 1.5 is suggested). The larger hot_key_alpha is, the more skewed will be. Not that, usually, if hot_key_alpha is larger than 2, there might be only 1 or 2 keys that are generated. If hot_key_alpha is 0, it generate the key follows uniform distribution (random key) Testing plan: pass the db_stress and printed the keys to make sure it follows the distribution. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6163 Differential Revision: D18978480 Pulled By: zhichao-cao fbshipit-source-id: e123b4865477f7478e83fb581f9576bada334680	5 years ago
Maysam Yabandeh	4b97812da8	Add long-running snapshots to stress tests (#6171 ) Summary: Current implementation holds on to 10% of snapshots for 10x longer, and 1% of snapshots 100x longer. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6171 Test Plan: ``` make -j32 crash_test Differential Revision: D19038399 Pulled By: maysamyabandeh fbshipit-source-id: 75da2dbb5c47a0b3f37d299b8719e392b73b42c0	5 years ago
Maysam Yabandeh	fec7302a9d	Enable unordered_write in stress tests (#6164 ) Summary: With WritePrepared transactions configured with two_write_queues, unordered_write will offer the same guarantees as vanilla rocksdb and thus can be enabled in stress tests. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6164 Test Plan: ``` make -j32 crash_test_with_txn Differential Revision: D18991899 Pulled By: maysamyabandeh fbshipit-source-id: eece5e96b4169b67d7931e5c0afca88540a113e1	5 years ago
Maysam Yabandeh	8613ee2e94	Enable all txn write policies in crash test (#6158 ) Summary: Currently the default txn write policy in crash tests is WRITE_PREPARED. The patch randomly picks the write policy at the start of the crash test. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6158 Test Plan: ``` make -j32 crash_test_with_txn ``` Differential Revision: D18946307 Pulled By: maysamyabandeh fbshipit-source-id: f77d7a94f99a08791ef9626da153d284bf521950	5 years ago
Yanqin Jin	383f5071f0	Add SyncWAL to db_stress (#6149 ) Summary: Add SyncWAL to db_stress. Specify with `-sync_wal_one_in=N` so that it will be called once every N operations on average. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6149 Test Plan: ``` $make db_stress $./db_stress -sync_wal_one_in=100 -ops_per_thread=100000 ``` Differential Revision: D18922529 Pulled By: riversand963 fbshipit-source-id: 4c0b8cb8fa21852722cffd957deddf688f12ea56	5 years ago
Peter Dillinger	a653857178	Add PauseBackgroundWork() to db_stress (#6148 ) Summary: Worker thread will occasionally call PauseBackgroundWork(), briefly sleep (to avoid stalling itself) and then call ContinueBackgroundWork(). Pull Request resolved: https://github.com/facebook/rocksdb/pull/6148 Test Plan: some running of 'make blackbox_crash_test' with temporary printf output to confirm code occasionally reached. Differential Revision: D18913886 Pulled By: pdillinger fbshipit-source-id: ae9356a803390929f3165dfb6a00194692ba92be	5 years ago
Peter Dillinger	6380df5e10	Vary bloom_bits in db_crashtest (#6103 ) Summary: Especially with non-integral bits/key now supported, db_crashtest should vary the bloom_bits configuration. The probabilities look like this: 1/2 chance of a uniform int from 0 to 19. This includes overall 1/40 chance of 0 which disables the bloom filter. 1/2 chance of a float from a lognormal distribution with a median of 10. This always produces positive values but with a decent chance of < 1 (overall ~1/40) or > 100 (overall ~1/40), the enforced/coerced implementation limits. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6103 Test Plan: start 'make blackbox_crash_test' several times and look at configuration output Differential Revision: D18734877 Pulled By: pdillinger fbshipit-source-id: 4a38cb057d3b3fc1327f93199f65b9a9ffbd7316	5 years ago
sdong	7d79b32618	Break db_stress_tool.cc to a list of source files (#6134 ) Summary: db_stress_tool.cc now is a giant file. In order to main it easier to improve and maintain, break it down to multiple source files. Most classes are turned into their own files. Separate .h and .cc files are created for gflag definiations. Another .h and .cc files are created for some common functions. Some test execution logic that is only loosely related to class StressTest is moved to db_stress_driver.h and db_stress_driver.cc. All the files are located under db_stress_tool/. The directory name is created as such because if we end it with either stress or test, .gitignore will ignore any file under it and makes it prone to issues in developements. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6134 Test Plan: Build under GCC7 with and without LITE on using GNU Make. Build with GCC 4.8. Build with cmake with -DWITH_TOOL=1 Differential Revision: D18876064 fbshipit-source-id: b25d0a7451840f31ac0f5ebb0068785f783fdf7d	5 years ago

1 2

91 Commits (25cc564ff7c396e4a9728a6dfa5fd053a5181545)