rocksdb

Commit Graph

Author	SHA1	Message	Date
Andrew Kryczka	0f42e50fec	Fix `GetLiveFiles()` returning OPTIONS-000000 (#8268 ) Summary: See release note in HISTORY.md. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8268 Test Plan: unit test repro Reviewed By: siying Differential Revision: D28227901 Pulled By: ajkr fbshipit-source-id: faf61d13b9e43a761e3d5dcf8203923126b51339	4 years ago
Peter Dillinger	3b981eaa1d	Fix use-after-free threading bug in ClockCache (#8261 ) Summary: In testing for https://github.com/facebook/rocksdb/issues/8225 I found cache_bench would crash with -use_clock_cache, as well as db_bench -use_clock_cache, but not single-threaded. Smaller cache size hits failure much faster. ASAN reported the failuer as calling malloc_usable_size on the `key` pointer of a ClockCache handle after it was reportedly freed. On detailed inspection I found this bad sequence of operations for a cache entry: state=InCache=1,refs=1 [thread 1] Start ClockCacheShard::Unref (from Release, no mutex) [thread 1] Decrement ref count state=InCache=1,refs=0 [thread 1] Suspend before CalcTotalCharge (no mutex) [thread 2] Start UnsetInCache (from Insert, mutex held) [thread 2] clear InCache bit state=InCache=0,refs=0 [thread 2] Calls RecycleHandle (based on pre-updated state) [thread 2] Returns to Insert which calls Cleanup which deletes `key` [thread 1] Resume ClockCacheShard::Unref [thread 1] Read `key` in CalcTotalCharge To fix this, I've added a field to the handle to store the metadata charge so that we can efficiently remember everything we need from the handle in Unref. We must not read from the handle again if we decrement the count to zero with InCache=1, which means we don't own the entry and someone else could eject/overwrite it immediately. Note before this change, on amd64 sizeof(Handle) == 56 even though there are only 48 bytes of data. Grouping together the uint32_t fields would cut it down to 48, but I've added another uint32_t, which takes it back up to 56. Not a big deal. Also fixed DisownData to cooperate with ASAN as in LRUCache. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8261 Test Plan: Manual + adding use_clock_cache to db_crashtest.py Base performance ./cache_bench -use_clock_cache Complete in 17.060 s; QPS = 2458513 New performance ./cache_bench -use_clock_cache Complete in 17.052 s; QPS = 2459695 Any difference is easily buried in small noise. Crash test shows still more bug(s) in ClockCache, so I'm expecting to disable ClockCache from production code in a follow-up PR (if we can't find and fix the bug(s)) Reviewed By: mrambacher Differential Revision: D28207358 Pulled By: pdillinger fbshipit-source-id: aa7a9322afc6f18f30e462c75dbbe4a1206eb294	4 years ago
sdong	cde69a7cfd	db_stress to add --open_metadata_write_fault_one_in (#8235 ) Summary: DB Stress to add --open_metadata_write_fault_one_in which would randomly fail in some file metadata modification operations during DB Open, including file creation, close, renaming and directory sync. Some operations can fail before and after the operations take place. If DB open fails, db_stress would retry without the failure ingestion, and DB is expected to open successfully. This option is enabled in crash test in half of the time. Some follow up changes would allow write failures in open time, and ingesting those failures in non-DB open cases. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8235 Test Plan: Run stress tests for a while and see failures got triggered. This can reproduce the bug fixed by https://github.com/facebook/rocksdb/pull/8192 and a similar one that fails when fsyncing parent directory. Reviewed By: anand1976 Differential Revision: D28010944 fbshipit-source-id: 36a96da4dc3633e5f7680cef3ea0a900fcdb5558	4 years ago
Peter Dillinger	95f6add746	Revert Ribbon starting level support from #8198 (#8212 ) Summary: This partially reverts commit `10196d7edc`. The problem with this change is because of important filter use cases: FIFO compaction and SST writer. FIFO "compaction" always uses level 0 so would only use Ribbon filters if specifically including level 0 for the Ribbon filter policy. SST writer sets level_at_creation=-1 to indicate unknown level, and this would be treated the same as level 0 unless fixed. We are keeping the part about committing to permanent schema, which is only changes to API comments and HISTORY.md. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8212 Test Plan: CI Reviewed By: jay-zhuang Differential Revision: D27896468 Pulled By: pdillinger fbshipit-source-id: 50a775f7cba5d64fb729d9b982e355864020596e	4 years ago
Peter Dillinger	10196d7edc	Ribbon long-term support, starting level support (#8198 ) Summary: Since the Ribbon filter schema seems good (compatible back to 6.15.0), this change commits to long term support of the SST schema, even though we expect the API for enabling Ribbon to change (still called NewExperimentalRibbonFilterPolicy). This also adds support for "hybrid" configuration in which some levels use Bloom (higher levels, lower numbered) for speed and the rest use Ribbon (lower levels, higher numbered) for memory space efficiency. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8198 Test Plan: unit test added, crash test support Reviewed By: jay-zhuang Differential Revision: D27831232 Pulled By: pdillinger fbshipit-source-id: 90e528677689474d293ed6710b42ba89fbd5b5ab	4 years ago
Akanksha Mahajan	0be89e87fd	Enable backup/restore for Integrated BlobDB in stress and crash tests (#8165 ) Summary: Enable backup/restore functionality with Integrated BlobDB in db_stress and crash test. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8165 Test Plan: Ran python3 -u tools/db_crashtest.py --simple whitebox along with : 1. decreased "backup_in_one" value for backups to be more frequent and 2. manually changed code for "enable_blob_file" to be always true and apply blobdb params 100% for testing purpose. Reviewed By: ltamasi Differential Revision: D27636025 Pulled By: akankshamahajan15 fbshipit-source-id: 0d0e0d1479ced163f992872dc998e79c581bfc99	4 years ago
Yanqin Jin	09528f9fa1	Fix a bug for SeekForPrev with partitioned filter and prefix (#8137 ) Summary: According to https://github.com/facebook/rocksdb/issues/5907, each filter partition "should include the bloom of the prefix of the last key in the previous partition" so that SeekForPrev() in prefix mode can return correct result. The prefix of the last key in the previous partition does not necessarily have the same prefix as the first key in the current partition. Regardless of the first key in current partition, the prefix of the last key in the previous partition should be added. The existing code, however, does not follow this. Furthermore, there is another issue: when finishing current filter partition, `FullFilterBlockBuilder::AddPrefix()` is called for the first key in next filter partition, which effectively overwrites `last_prefix_str_` prematurely. Consequently, when the filter block builder proceeds to the next partition, `last_prefix_str_` will be the prefix of its first key, leaving no way of adding the bloom of the prefix of the last key of the previous partition. Prefix extractor is FixedLength.2. ``` [ filter part 1 ] [ filter part 2 ] abc d ``` When SeekForPrev("abcd"), checking the filter partition will land on filter part 2 because "abcd" > "abc" but smaller than "d". If the filter in filter part 2 happens to return false for the test for "ab", then SeekForPrev("abcd") will build incorrect iterator tree in non-total-order mode. Also fix a unit test which starts to fail following this PR. `InDomain` should not fail due to assertion error when checking on an arbitrary key. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8137 Test Plan: ``` make check ``` Without this fix, the following command will fail pretty soon. ``` ./db_stress --acquire_snapshot_one_in=10000 --avoid_flush_during_recovery=0 \ --avoid_unnecessary_blocking_io=0 --backup_max_size=104857600 --backup_one_in=0 \ --batch_protection_bytes_per_key=0 --block_size=16384 --bloom_bits=17 \ --bottommost_compression_type=disable --cache_index_and_filter_blocks=1 --cache_size=1048576 \ --checkpoint_one_in=0 --checksum_type=kxxHash64 --clear_column_family_one_in=0 \ --compact_files_one_in=1000000 --compact_range_one_in=1000000 --compaction_ttl=0 \ --compression_max_dict_buffer_bytes=0 --compression_max_dict_bytes=0 \ --compression_parallel_threads=1 --compression_type=zstd --compression_zstd_max_train_bytes=0 \ --continuous_verification_interval=0 --db=/dev/shm/rocksdb/rocksdb_crashtest_whitebox \ --db_write_buffer_size=8388608 --delpercent=5 --delrangepercent=0 --destroy_db_initially=0 --enable_blob_files=0 \ --enable_compaction_filter=0 --enable_pipelined_write=1 --file_checksum_impl=big --flush_one_in=1000000 \ --format_version=5 --get_current_wal_file_one_in=0 --get_live_files_one_in=1000000 --get_property_one_in=1000000 \ --get_sorted_wal_files_one_in=0 --index_block_restart_interval=4 --index_type=2 --ingest_external_file_one_in=0 \ --iterpercent=10 --key_len_percent_dist=1,30,69 --level_compaction_dynamic_level_bytes=True \ --log2_keys_per_lock=10 --long_running_snapshots=1 --mark_for_compaction_one_file_in=0 \ --max_background_compactions=20 --max_bytes_for_level_base=10485760 --max_key=100000000 --max_key_len=3 \ --max_manifest_file_size=1073741824 --max_write_batch_group_size_bytes=16777216 --max_write_buffer_number=3 \ --max_write_buffer_size_to_maintain=8388608 --memtablerep=skip_list --mmap_read=1 --mock_direct_io=False \ --nooverwritepercent=0 --open_files=500000 --ops_per_thread=20000000 --optimize_filters_for_memory=0 --paranoid_file_checks=1 --partition_filters=1 --partition_pinning=0 --pause_background_one_in=1000000 \ --periodic_compaction_seconds=0 --prefixpercent=5 --progress_reports=0 --read_fault_one_in=0 --read_only=0 \ --readpercent=45 --recycle_log_file_num=0 --reopen=20 --secondary_catch_up_one_in=0 \ --snapshot_hold_ops=100000 --sst_file_manager_bytes_per_sec=104857600 \ --sst_file_manager_bytes_per_truncate=0 --subcompactions=2 --sync=0 --sync_fault_injection=False \ --target_file_size_base=2097152 --target_file_size_multiplier=2 --test_batches_snapshots=0 --test_cf_consistency=0 \ --top_level_index_pinning=0 --unpartitioned_pinning=1 --use_blob_db=0 --use_block_based_filter=0 \ --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --use_full_merge_v1=0 --use_merge=0 \ --use_multiget=0 --use_ribbon_filter=0 --use_txn=0 --user_timestamp_size=8 --verify_checksum=1 \ --verify_checksum_one_in=1000000 --verify_db_one_in=100000 --write_buffer_size=4194304 \ --write_dbid_to_manifest=1 --writepercent=35 ``` Reviewed By: pdillinger Differential Revision: D27553054 Pulled By: riversand963 fbshipit-source-id: 60e391e4a2d8d98a9a3172ec5d6176b90ec3de98	4 years ago
Yanqin Jin	ae7a795686	Disable partitioned filters in ts stress test (#8127 ) Summary: Currently, partitioned filter does not support user-defined timestamp. Disable it for now in ts stress test so that the contrun jobs can proceed. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8127 Test Plan: make crash_test_with_ts Reviewed By: ajkr Differential Revision: D27388488 Pulled By: riversand963 fbshipit-source-id: 5ccff18121cb537bd82f2ac072cd25efb625c666	4 years ago
Yanqin Jin	e1aa8c160f	Fix an error while running db_crashtest for non-user-ts tests (#8091 ) Summary: Fix the following error while running `make crash_test` ``` Traceback (most recent call last): File "tools/db_crashtest.py", line 705, in <module> main() File "tools/db_crashtest.py", line 696, in main blackbox_crash_main(args, unknown_args) File "tools/db_crashtest.py", line 479, in blackbox_crash_main + list({'db': dbname}.items())), unknown_args) File "tools/db_crashtest.py", line 414, in gen_cmd finalzied_params = finalize_and_sanitize(params) File "tools/db_crashtest.py", line 331, in finalize_and_sanitize dest_params.get("user_timestamp_size") > 0): TypeError: '>' not supported between instances of 'NoneType' and 'int' ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/8091 Test Plan: make crash_test Reviewed By: ltamasi Differential Revision: D27268276 Pulled By: riversand963 fbshipit-source-id: ed2873b9587ecc51e24abc35ef2bd3d91fb1ed1b	4 years ago
Yanqin Jin	08144bc2f5	Add user-defined timestamps to db_stress (#8061 ) Summary: Add some basic test for user-defined timestamp to db_stress. Currently, read with timestamp always tries to read using the current timestamp. Due to the per-key timestamp-sequence ordering constraint, we only add timestamp- related tests to the `NonBatchedOpsStressTest` since this test serializes accesses to the same key and uses a file to cross-check data correctness. The timestamp feature is not supported in a number of components, e.g. Merge, SingleDelete, DeleteRange, CompactionFilter, Readonly instance, secondary instance, SST file ingestion, transaction, etc. Therefore, db_stress should exit if user enables both timestamp and these features at the same time. The (currently) incompatible features can be found in `CheckAndSetOptionsForUserTimestamp`. This PR also fixes a bug triggered when timestamp is enabled together with `index_type=kBinarySearchWithFirstKey`. This bug fix will also be in another separate PR with more unit tests coverage. Fixing it here because I do not want to exclude the index type from crash test. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8061 Test Plan: make crash_test_with_ts Reviewed By: jay-zhuang Differential Revision: D27056282 Pulled By: riversand963 fbshipit-source-id: c3e00ad1023fdb9ebbdf9601ec18270c5e2925a9	4 years ago
Levi Tamasi	0d800dadea	Adjust the set of potential min_blob_size values in stress/crash tests (#8085 ) Summary: Since our stress/crash tests by default generate values of size 8, 16, or 24, it does not make much sense to set `min_blob_size` to 256. The patch updates the set of potential `min_blob_size` values in the crash test script and in `db_stress` where it might be set dynamically using `SetOptions`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8085 Test Plan: Ran `make check` and tried the crash test script. Reviewed By: riversand963 Differential Revision: D27238620 Pulled By: ltamasi fbshipit-source-id: 4a96f9944b1ed9220d3045c5ab0b34c49009aeee	4 years ago
Yanqin Jin	1f11d07f24	Enable compact filter for blob in dbstress and dbbench (#8011 ) Summary: As title. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8011 Test Plan: ``` ./db_bench -enable_blob_files=1 -use_keep_filter=1 -disable_auto_compactions=1 /db_stress -enable_blob_files=1 -enable_compaction_filter=1 -acquire_snapshot_one_in=0 -compact_range_one_in=0 -iterpercent=0 -test_batches_snapshots=0 -readpercent=10 -prefixpercent=20 -writepercent=55 -delpercent=15 -continuous_verification_interval=0 ``` Reviewed By: ltamasi Differential Revision: D26736061 Pulled By: riversand963 fbshipit-source-id: 1c7834903c28431ce23324c4f259ed71255614e2	4 years ago
Andrew Kryczka	d904233d2f	Limit buffering for collecting samples for compression dictionary (#7970 ) Summary: For dictionary compression, we need to collect some representative samples of the data to be compressed, which we use to either generate or train (when `CompressionOptions::zstd_max_train_bytes > 0`) a dictionary. Previously, the strategy was to buffer all the data blocks during flush, and up to the target file size during compaction. That strategy allowed us to randomly pick samples from as wide a range as possible that'd be guaranteed to land in a single output file. However, some users try to make huge files in memory-constrained environments, where this strategy can cause OOM. This PR introduces an option, `CompressionOptions::max_dict_buffer_bytes`, that limits how much data blocks are buffered before we switch to unbuffered mode (which means creating the per-SST dictionary, writing out the buffered data, and compressing/writing new blocks as soon as they are built). It is not strict as we currently buffer more than just data blocks -- also keys are buffered. But it does make a step towards giving users predictable memory usage. Related changes include: - Changed sampling for dictionary compression to select unique data blocks when there is limited availability of data blocks - Made use of `BlockBuilder::SwapAndReset()` to save an allocation+memcpy when buffering data blocks for building a dictionary - Changed `ParseBoolean()` to accept an input containing characters after the boolean. This is necessary since, with this PR, a value for `CompressionOptions::enabled` is no longer necessarily the final component in the `CompressionOptions` string. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7970 Test Plan: - updated `CompressionOptions` unit tests to verify limit is respected (to the extent expected in the current implementation) in various scenarios of flush/compaction to bottommost/non-bottommost level - looked at jemalloc heap profiles right before and after switching to unbuffered mode during flush/compaction. Verified memory usage in buffering is proportional to the limit set. Reviewed By: pdillinger Differential Revision: D26467994 Pulled By: ajkr fbshipit-source-id: 3da4ef9fba59974e4ef40e40c01611002c861465	4 years ago
Levi Tamasi	dab4fe5bcd	Add checkpoint support to BlobDB (#7959 ) Summary: The patch adds checkpoint support to BlobDB. Blob files are hard linked or copied, depending on whether the checkpoint directory is on the same filesystem or not, similarly to table files. TODO: Add support for blob files to `ExportColumnFamily` and to the checksum verification logic used by backup/restore. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7959 Test Plan: Ran `make check` and the crash test for a while. Reviewed By: riversand963 Differential Revision: D26434768 Pulled By: ltamasi fbshipit-source-id: 994be55a8dc08133028250760fca440d2c7c4dc5	4 years ago
Levi Tamasi	0288bdbc53	Add the integrated BlobDB to the stress/crash tests (#7900 ) Summary: The patch adds support for the options related to the new BlobDB implementation to `db_stress`, including support for dynamically adjusting them using `SetOptions` when `set_options_one_in` and a new flag `allow_setting_blob_options_dynamically` are specified. (The latter is used to prevent the options from being enabled when incompatible features are in use.) The patch also updates the `db_stress` help messages of the existing stacked BlobDB related options to clarify that they pertain to the old implementation. In addition, it adds the new BlobDB to the crash test script. In order to prevent a combinatorial explosion of jobs and still perform whitebox/blackbox testing (including under ASAN/TSAN/UBSAN), and to also test BlobDB in conjunction with atomic flush and transactions, the script sets the BlobDB options in 10% of normal/`cf_consistency`/`txn` crash test runs. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7900 Test Plan: Ran `make check` and `db_stress`/`db_crashtest.py` with various options. Reviewed By: jay-zhuang Differential Revision: D26094913 Pulled By: ltamasi fbshipit-source-id: c2ef3391a05e43a9687f24e297df05f4a5584814	4 years ago
Andrew Kryczka	78ee8564ad	Integrity protection for live updates to WriteBatch (#7748 ) Summary: This PR adds the foundation classes for key-value integrity protection and the first use case: protecting live updates from the source buffers added to `WriteBatch` through the destination buffer in `MemTable`. The width of the protection info is not yet configurable -- only eight bytes per key is supported. This PR allows users to enable protection by constructing `WriteBatch` with `protection_bytes_per_key == 8`. It does not yet expose a way for users to get integrity protection via other write APIs (e.g., `Put()`, `Merge()`, `Delete()`, etc.). The foundation classes (`ProtectionInfo.`) embed the coverage info in their type, and provide `Protect.()` and `Strip.()` functions to navigate between types with different coverage. For making bytes per key configurable (for powers of two up to eight) in the future, these classes are templated on the unsigned integer type used to store the protection info. That integer contains the XOR'd result of hashes with independent seeds for all covered fields. For integer fields, the hash is computed on the raw unadjusted bytes, so the result is endian-dependent. The most significant bytes are truncated when the hash value (8 bytes) is wider than the protection integer. When `WriteBatch` is constructed with `protection_bytes_per_key == 8`, we hold a `ProtectionInfoKVOTC` (i.e., one that covers key, value, optype aka `ValueType`, timestamp, and CF ID) for each entry added to the batch. The protection info is generated from the original buffers passed by the user, as well as the original metadata generated internally. When writing to memtable, each entry is transformed to a `ProtectionInfoKVOTS` (i.e., dropping coverage of CF ID and adding coverage of sequence number), since at that point we know the sequence number, and have already selected a memtable corresponding to a particular CF. This protection info is verified once the entry is encoded in the `MemTable` buffer. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7748 Test Plan: - an integration test to verify a wide variety of single-byte changes to the encoded `MemTable` buffer are caught - add to stress/crash test to verify it works in variety of configs/operations without intentional corruption - [deferred] unit tests for `ProtectionInfo.` classes for edge cases like KV swap, `SliceParts` and `Slice` APIs are interchangeable, etc. Reviewed By: pdillinger Differential Revision: D25754492 Pulled By: ajkr fbshipit-source-id: e481bac6c03c2ab268be41359730f1ceb9964866	4 years ago
Zhichao Cao	04b3524ad0	Inject the random write error to stress test (#7653 ) Summary: Inject the random write error to stress test, it requires set reopen=0 and disable_wal=true. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7653 Test Plan: pass db_stress and python3 db_crashtest.py blackbox Reviewed By: ajkr Differential Revision: D25354132 Pulled By: zhichao-cao fbshipit-source-id: 44721104eecb416e27f65f854912c40e301dd669	4 years ago
Peter Dillinger	60af964372	Experimental (production candidate) SST schema for Ribbon filter (#7658 ) Summary: Added experimental public API for Ribbon filter: NewExperimentalRibbonFilterPolicy(). This experimental API will take a "Bloom equivalent" bits per key, and configure the Ribbon filter for the same FP rate as Bloom would have but ~30% space savings. (Note: optimize_filters_for_memory is not yet implemented for Ribbon filter. That can be added with no effect on schema.) Internally, the Ribbon filter is configured using a "one_in_fp_rate" value, which is 1 over desired FP rate. For example, use 100 for 1% FP rate. I'm expecting this will be used in the future for configuring Bloom-like filters, as I expect people to more commonly hold constant the filter accuracy and change the space vs. time trade-off, rather than hold constant the space (per key) and change the accuracy vs. time trade-off, though we might make that available. ### Benchmarking ``` $ ./filter_bench -impl=2 -quick -m_keys_total_max=200 -average_keys_per_filter=100000 -net_includes_hashing Building... Build avg ns/key: 34.1341 Number of filters: 1993 Total size (MB): 238.488 Reported total allocated memory (MB): 262.875 Reported internal fragmentation: 10.2255% Bits/key stored: 10.0029 ---------------------------- Mixed inside/outside queries... Single filter net ns/op: 18.7508 Random filter net ns/op: 258.246 Average FP rate %: 0.968672 ---------------------------- Done. (For more info, run with -legend or -help.) $ ./filter_bench -impl=3 -quick -m_keys_total_max=200 -average_keys_per_filter=100000 -net_includes_hashing Building... Build avg ns/key: 130.851 Number of filters: 1993 Total size (MB): 168.166 Reported total allocated memory (MB): 183.211 Reported internal fragmentation: 8.94626% Bits/key stored: 7.05341 ---------------------------- Mixed inside/outside queries... Single filter net ns/op: 58.4523 Random filter net ns/op: 363.717 Average FP rate %: 0.952978 ---------------------------- Done. (For more info, run with -legend or -help.) ``` 168.166 / 238.488 = 0.705 -> 29.5% space reduction 130.851 / 34.1341 = 3.83x construction time for this Ribbon filter vs. lastest Bloom filter (could make that as little as about 2.5x for less space reduction) ### Working around a hashing "flaw" bloom_test discovered a flaw in the simple hashing applied in StandardHasher when num_starts == 1 (num_slots == 128), showing an excessively high FP rate. The problem is that when many entries, on the order of number of hash bits or kCoeffBits, are associated with the same start location, the correlation between the CoeffRow and ResultRow (for efficiency) can lead to a solution that is "universal," or nearly so, for entries mapping to that start location. (Normally, variance in start location breaks the effective association between CoeffRow and ResultRow; the same value for CoeffRow is effectively different if start locations are different.) Without kUseSmash and with num_starts > 1 (thus num_starts ~= num_slots), this flaw should be completely irrelevant. Even with 10M slots, the chances of a single slot having just 16 (or more) entries map to it--not enough to cause an FP problem, which would be local to that slot if it happened--is 1 in millions. This spreadsheet formula shows that: =1/(10000000(1 - POISSON(15, 1, TRUE))) As kUseSmash==false (the setting for Standard128RibbonBitsBuilder) is intended for CPU efficiency of filters with many more entries/slots than kCoeffBits, a very reasonable work-around is to disallow num_starts==1 when !kUseSmash, by making the minimum non-zero number of slots 2kCoeffBits. This is the work-around I've applied. This also means that the new Ribbon filter schema (Standard128RibbonBitsBuilder) is not space-efficient for less than a few hundred entries. Because of this, I have made it fall back on constructing a Bloom filter, under existing schema, when that is more space efficient for small filters. (We can change this in the future if we want.) TODO: better unit tests for this case in ribbon_test, and probably update StandardHasher for kUseSmash case so that it can scale nicely to small filters. ### Other related changes * Add Ribbon filter to stress/crash test * Add Ribbon filter to filter_bench as -impl=3 * Add option string support, as in "filter_policy=experimental_ribbon:5.678;" where 5.678 is the Bloom equivalent bits per key. * Rename internal mode BloomFilterPolicy::kAuto to kAutoBloom * Add a general BuiltinFilterBitsBuilder::CalculateNumEntry based on binary searching CalculateSpace (inefficient), so that subclasses (especially experimental ones) don't have to provide an efficient implementation inverting CalculateSpace. * Minor refactor FastLocalBloomBitsBuilder for new base class XXH3pFilterBitsBuilder shared with new Standard128RibbonBitsBuilder, which allows the latter to fall back on Bloom construction in some extreme cases. * Mostly updated bloom_test for Ribbon filter, though a test like FullBloomTest::Schema is a next TODO to ensure schema stability (in case this becomes production-ready schema as it is). * Add some APIs to ribbon_impl.h for configuring Ribbon filters. Although these are reasonably covered by bloom_test, TODO more unit tests in ribbon_test * Added a "tool" FindOccupancyForSuccessRate to ribbon_test to get data for constructing the linear approximations in GetNumSlotsFor95PctSuccess. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7658 Test Plan: Some unit tests updated but other testing is left TODO. This is considered experimental but laying down schema compatibility as early as possible in case it proves production-quality. Also tested in stress/crash test. Reviewed By: jay-zhuang Differential Revision: D24899349 Pulled By: pdillinger fbshipit-source-id: 9715f3e6371c959d923aea8077c9423c7a9f82b8	4 years ago
Akanksha Mahajan	202605143b	Fix crash test to run in DEBUG_LEVEL=0 mode in tmpfs (#7643 ) Summary: crash tests donot run in DEBUG_MODE=0 on tmpfs when use_direct_reads/use_direct_io_for_flush_and_compaction is set randomly because direct I/O is not supported on tmpfs and tests exit. Fix: Sanitize direct I/O read options in DEBUG_LEVEL=0 so that crash tests can run in tmpfs. When mmap_reads is set, direct I/O reads options are unset so we can sanitize direct I/O reads options in case of tmpfs as well. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7643 Test Plan: 1. export DEBUG_LEVEL=0; export TEST_TMPDIR="/dev/shm"; export CRASH_TEST_EXT_ARGS="--use_direct_reads=1 --mmap_read=0"; make crash_test -j64 2. In DEBUG_LEVEL=1 mode: make crash_test -j64 Reviewed By: jay-zhuang Differential Revision: D24766550 Pulled By: akankshamahajan15 fbshipit-source-id: 021720b2343c12c72004f84b26147625d3991d9e	4 years ago
Akanksha Mahajan	06a92fcf5c	Add "max_write_buffer_size_to_maintain" to crash test (#7634 ) Summary: Add "max_write_buffer_size_to_maintain" to crash test Pull Request resolved: https://github.com/facebook/rocksdb/pull/7634 Test Plan: make crash_test -j64 Reviewed By: zhichao-cao Differential Revision: D24710401 Pulled By: akankshamahajan15 fbshipit-source-id: 89e0412aaa56b2ef5a75603971b82f4b0b494ab7	4 years ago
Jay Zhuang	9e03e4dd52	db_crashtest preserves expected values file for failed crash tests (#7534 ) Summary: If crash test fails, don't delete the `expected_values_file` for later debug. More details: https://github.com/facebook/rocksdb/issues/7530 Pull Request resolved: https://github.com/facebook/rocksdb/pull/7534 Test Plan: local host Reviewed By: ajkr Differential Revision: D24239655 Pulled By: jay-zhuang fbshipit-source-id: 3566f91a30aae1e27d2f51d910cddd08edb7d4cf	4 years ago
Andrew Kryczka	75d3b6fdf0	Redesign block cache pinning API (#7520 ) Summary: The old flag-based APIs (`BlockBasedTableOptions::pin_l0_filter_and_index_blocks_in_cache` and `BlockBasedTableOptions::pin_top_level_index_and_filter`) were insufficient for our needs. For example, it was impossible to pin only unpartitioned meta-blocks, which could prevent block cache contention when turning on dictionary compression or during a migration to partitioned indexes/filters. It was also impossible to pin all meta-blocks in memory while having predictable memory usage via block cache. If we had continued adding flags to address these scenarios, they would have had significant overlap causing confusion. Instead, this PR deprecates the flags and starts a new API with non-overlapping options. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7520 Test Plan: - new unit test - added new options to stress/crash test and ran for a while: `$ python tools/db_crashtest.py blackbox --simple --max_key=1000000 -write_buffer_size=1048576 -target_file_size_base=1048576 -max_bytes_for_level_base=4194304 --interval=10 -value_size_mult=33 -column_families=1 -reopen=0` Reviewed By: pdillinger Differential Revision: D24200034 Pulled By: ajkr fbshipit-source-id: 3fa7cfc71e7960f7a867511dd6ae5834dd73b13e	4 years ago
sdong	82d42606ad	Enable paranoid_file_checks in crash test (#7489 ) Summary: Cover paranoid_file_checks in crash test. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7489 Test Plan: Run crash tests for hours and didn't see any failure. Reviewed By: ajkr Differential Revision: D24063868 fbshipit-source-id: 7b48b110e66ce78ae5d0c99a9f32af86edd34c1e	4 years ago
sdong	aedcaaef99	Stress test to support paranoid_file_checks (#7473 ) Summary: It's important to make sure no false positive is reported when options.paranoid_file_checks is used. Add it to stress test and a place holder in crash test. It is disabled in crash test as there appears to be a bug causing false positive. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7473 Test Plan: Run crash test Reviewed By: ajkr Differential Revision: D24026939 fbshipit-source-id: 89102acb45cf041776775ce44a4eef4b0f3a380c	4 years ago
Peter Dillinger	4e258d3e63	Fix backup/restore in stress/crash test (#7357 ) Summary: (1) Skip check on specific key if restoring an old backup (small minority of cases) because it can fail in those cases. (2) Remove an old assertion about number of column families and number of keys passed in, which is broken by atomic flush (cf_consistency) test. Like other code (for better or worse) assume a single key and iterate over column families. (3) Apply mock_direct_io to NewSequentialFile so that db_stress backup works on /dev/shm. Also add more context to output in case of backup/restore db_stress failure. Also a minor fix to BackupEngine to report first failure status in creating new backup, and drop another clue about the potential source of a "Backup failed" status. Reverts "Disable backup/restore stress test (https://github.com/facebook/rocksdb/issues/7350)" Pull Request resolved: https://github.com/facebook/rocksdb/pull/7357 Test Plan: Using backup_one_in=10000, "USE_CLANG=1 make crash_test_with_atomic_flush" for 30+ minutes "USE_CLANG=1 make blackbox_crash_test" for 30+ minutes And with use_direct_reads with TEST_TMPDIR=/dev/shm/rocksdb Reviewed By: riversand963 Differential Revision: D23567244 Pulled By: pdillinger fbshipit-source-id: e77171c2e8394d173917e36898c02dead1c40b77	4 years ago
Jay Zhuang	ef32f11004	Disable backup/restore stress test (#7350 ) Summary: Seems it's causing some tests failures. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7350 Reviewed By: riversand963 Differential Revision: D23544109 Pulled By: jay-zhuang fbshipit-source-id: 798a0ca374a20b6c2d0f29582729ff101c6a2e99	4 years ago
Peter Dillinger	06ad5dd293	Add file checksum to stress/crash test (#7343 ) Summary: This change has the crash test randomly select from a few file checksum implementations, or nullptr, for DB file_checksum_gen_factory. For compatibility across runs on same DB, each non-null factory can understand all the other functions, but the default changes. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7343 Test Plan: 'make blackbox_crash_test' for a while, including with some debug output to ensure code is being exercised. Reviewed By: zhichao-cao Differential Revision: D23494580 Pulled By: pdillinger fbshipit-source-id: 73bbc7ca32c1adaf619134c0c830f12894880b8a	4 years ago
Peter Dillinger	499c9448d0	Fix, enable, and enhance backup/restore in db_stress (#7348 ) Summary: Although added to db_stress, testing of backup/restore was never integrated into the crash test, originally concerned about performance. I've enabled it now and to address the peformance concern, testing backup/restore is always skipped once the db exceeds a certain size threshold, default 100MB. This should provide sufficient opportunity for testing BackupEngine without bogging down everything else with heavier and heavier operations. Also fixed backup/restore in db_stress by making sure PurgeOldBackups can remove manifest files, which are normally kept around for db_stress. Added more coverage of backup options, and up to three backups being saved in one backup directory (in some cases). Pull Request resolved: https://github.com/facebook/rocksdb/pull/7348 Test Plan: ran 'make blackbox_crash_test' for a while, with heightened probabilitly of taking backups (1/10k). Also confirmed with some debug output that the code is being covered, TestBackupRestore only takes a few seconds to complete when triggered, and even at 1/10k and ~50MB database, there's <,~ 1 thread testing backups at any time. Reviewed By: ajkr Differential Revision: D23510835 Pulled By: pdillinger fbshipit-source-id: b6b8735591808141f81f10773ac31634cf03b6c0	4 years ago
Andrew Kryczka	7eebe6d38a	Mark files for compaction in stress/crash tests (#7231 ) Summary: The mechanism to mark files for compaction is most commonly used in delete-triggered compaction. This PR adds an option to exercise the marking mechanism on random files created by db_stress. This PR also enables that option in db_crashtest.py on its db_stress runs at random. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7231 Test Plan: - ran some minified crash tests; verified they succeed and we see `"compaction_reason": "FilesMarkedForCompaction"` regularly in the logs. ``` $ TEST_TMPDIR=/dev/shm python tools/db_crashtest.py blackbox --duration=600 --interval=30 --max_key=10000000 --write_buffer_size=1048576 --target_file_size_base=1048576 --max_bytes_for_level_base=4194304 --value_size_mult=33 $ TEST_TMPDIR=/dev/shm python tools/db_crashtest.py whitebox --duration=600 --interval=30 --max_key=1000000 --write_buffer_size=1048576 --target_file_size_base=1048576 --max_bytes_for_level_base=4194304 --value_size_mult=33 --random_kill_odd=8887 ``` Reviewed By: anand1976 Differential Revision: D23025156 Pulled By: ajkr fbshipit-source-id: a404c467ebc12afa94dae35956ea9b372f592a96	4 years ago
Jay Zhuang	fc4d5f5065	Add stress test for GetProperty (#7111 ) Summary: Add stress test coverage for `DB::GetProperty()`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7111 Test Plan: ``` ./db_stress -get_property_one_in=1 make crash_test ``` Reviewed By: ajkr Differential Revision: D22487906 Pulled By: jay-zhuang fbshipit-source-id: c118d95cc9b4e2fa669a06e6aa531541fa885dc5	4 years ago
Andrew Kryczka	8efb5cfb6f	add SstFileManager to crash test (#6993 ) Summary: SstFileManager is already supported in the stress test as of https://github.com/facebook/rocksdb/issues/6454. This PR enables the SstFileManager in some of the crash test runs. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6993 Reviewed By: riversand963 Differential Revision: D22084406 Pulled By: ajkr fbshipit-source-id: 78b8642682e7570ff6ec3a1c3ccd9940f4362289	4 years ago
Peter Dillinger	5b2bbacb6f	Minimize memory internal fragmentation for Bloom filters (#6427 ) Summary: New experimental option BBTO::optimize_filters_for_memory builds filters that maximize their use of "usable size" from malloc_usable_size, which is also used to compute block cache charges. Rather than always "rounding up," we track state in the BloomFilterPolicy object to mix essentially "rounding down" and "rounding up" so that the average FP rate of all generated filters is the same as without the option. (YMMV as heavily accessed filters might be unluckily lower accuracy.) Thus, the option near-minimizes what the block cache considers as "memory used" for a given target Bloom filter false positive rate and Bloom filter implementation. There are no forward or backward compatibility issues with this change, though it only works on the format_version=5 Bloom filter. With Jemalloc, we see about 10% reduction in memory footprint (and block cache charge) for Bloom filters, but 1-2% increase in storage footprint, due to encoding efficiency losses (FP rate is non-linear with bits/key). Why not weighted random round up/down rather than state tracking? By only requiring malloc_usable_size, we don't actually know what the next larger and next smaller usable sizes for the allocator are. We pick a requested size, accept and use whatever usable size it has, and use the difference to inform our next choice. This allows us to narrow in on the right balance without tracking/predicting usable sizes. Why not weight history of generated filter false positive rates by number of keys? This could lead to excess skew in small filters after generating a large filter. Results from filter_bench with jemalloc (irrelevant details omitted): (normal keys/filter, but high variance) $ ./filter_bench -quick -impl=2 -average_keys_per_filter=30000 -vary_key_count_ratio=0.9 Build avg ns/key: 29.6278 Number of filters: 5516 Total size (MB): 200.046 Reported total allocated memory (MB): 220.597 Reported internal fragmentation: 10.2732% Bits/key stored: 10.0097 Average FP rate %: 0.965228 $ ./filter_bench -quick -impl=2 -average_keys_per_filter=30000 -vary_key_count_ratio=0.9 -optimize_filters_for_memory Build avg ns/key: 30.5104 Number of filters: 5464 Total size (MB): 200.015 Reported total allocated memory (MB): 200.322 Reported internal fragmentation: 0.153709% Bits/key stored: 10.1011 Average FP rate %: 0.966313 (very few keys / filter, optimization not as effective due to ~59 byte internal fragmentation in blocked Bloom filter representation) $ ./filter_bench -quick -impl=2 -average_keys_per_filter=1000 -vary_key_count_ratio=0.9 Build avg ns/key: 29.5649 Number of filters: 162950 Total size (MB): 200.001 Reported total allocated memory (MB): 224.624 Reported internal fragmentation: 12.3117% Bits/key stored: 10.2951 Average FP rate %: 0.821534 $ ./filter_bench -quick -impl=2 -average_keys_per_filter=1000 -vary_key_count_ratio=0.9 -optimize_filters_for_memory Build avg ns/key: 31.8057 Number of filters: 159849 Total size (MB): 200 Reported total allocated memory (MB): 208.846 Reported internal fragmentation: 4.42297% Bits/key stored: 10.4948 Average FP rate %: 0.811006 (high keys/filter) $ ./filter_bench -quick -impl=2 -average_keys_per_filter=1000000 -vary_key_count_ratio=0.9 Build avg ns/key: 29.7017 Number of filters: 164 Total size (MB): 200.352 Reported total allocated memory (MB): 221.5 Reported internal fragmentation: 10.5552% Bits/key stored: 10.0003 Average FP rate %: 0.969358 $ ./filter_bench -quick -impl=2 -average_keys_per_filter=1000000 -vary_key_count_ratio=0.9 -optimize_filters_for_memory Build avg ns/key: 30.7131 Number of filters: 160 Total size (MB): 200.928 Reported total allocated memory (MB): 200.938 Reported internal fragmentation: 0.00448054% Bits/key stored: 10.1852 Average FP rate %: 0.963387 And from db_bench (block cache) with jemalloc: $ ./db_bench -db=/dev/shm/dbbench.no_optimize -benchmarks=fillrandom -format_version=5 -value_size=90 -bloom_bits=10 -num=2000000 -threads=8 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=false $ ./db_bench -db=/dev/shm/dbbench -benchmarks=fillrandom -format_version=5 -value_size=90 -bloom_bits=10 -num=2000000 -threads=8 -optimize_filters_for_memory -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=false $ (for FILE in /dev/shm/dbbench.no_optimize/.sst; do ./sst_dump --file=$FILE --show_properties \| grep 'filter block' ; done) \| awk '{ t += $4; } END { print t; }' 17063835 $ (for FILE in /dev/shm/dbbench/.sst; do ./sst_dump --file=$FILE --show_properties \| grep 'filter block' ; done) \| awk '{ t += $4; } END { print t; }' 17430747 $ #^ 2.1% additional filter storage $ ./db_bench -db=/dev/shm/dbbench.no_optimize -use_existing_db -benchmarks=readrandom,stats -statistics -bloom_bits=10 -num=2000000 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=false -duration=10 -cache_index_and_filter_blocks -cache_size=1000000000 rocksdb.block.cache.index.add COUNT : 33 rocksdb.block.cache.index.bytes.insert COUNT : 8440400 rocksdb.block.cache.filter.add COUNT : 33 rocksdb.block.cache.filter.bytes.insert COUNT : 21087528 rocksdb.bloom.filter.useful COUNT : 4963889 rocksdb.bloom.filter.full.positive COUNT : 1214081 rocksdb.bloom.filter.full.true.positive COUNT : 1161999 $ #^ 1.04 % observed FP rate $ ./db_bench -db=/dev/shm/dbbench -use_existing_db -benchmarks=readrandom,stats -statistics -bloom_bits=10 -num=2000000 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=false -optimize_filters_for_memory -duration=10 -cache_index_and_filter_blocks -cache_size=1000000000 rocksdb.block.cache.index.add COUNT : 33 rocksdb.block.cache.index.bytes.insert COUNT : 8448592 rocksdb.block.cache.filter.add COUNT : 33 rocksdb.block.cache.filter.bytes.insert COUNT : 18220328 rocksdb.bloom.filter.useful COUNT : 5360933 rocksdb.bloom.filter.full.positive COUNT : 1321315 rocksdb.bloom.filter.full.true.positive COUNT : 1262999 $ #^ 1.08 % observed FP rate, 13.6% less memory usage for filters (Due to specific key density, this example tends to generate filters that are "worse than average" for internal fragmentation. "Better than average" cases can show little or no improvement.) Pull Request resolved: https://github.com/facebook/rocksdb/pull/6427 Test Plan: unit test added, 'make check' with gcc, clang and valgrind Reviewed By: siying Differential Revision: D22124374 Pulled By: pdillinger fbshipit-source-id: f3e3aa152f9043ddf4fae25799e76341d0d8714e	4 years ago
Andrew Kryczka	d76eed4839	minor fixes for stress/crash contruns (#7006 ) Summary: Avoid using `cf_consistency` together with `enable_compaction_filter` as the former heavily uses snapshots while the latter is incompatible with snapshots. Also fix a clang-analyze error for a write to a variable that is never read. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7006 Reviewed By: zhichao-cao Differential Revision: D22141679 Pulled By: ajkr fbshipit-source-id: 1840ae238168818a9ab5973f90fd78c067399447	4 years ago
Peter Dillinger	88b4210701	Remove racially charged terms "whitelist" and "blacklist" (#7008 ) Summary: We don't need them. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7008 Test Plan: "make check" and ensure "make crash_test" starts Reviewed By: ajkr Differential Revision: D22143838 Pulled By: pdillinger fbshipit-source-id: 72c8e16603abc59f4954e304466bc4dc1f58f94e	4 years ago
Andrew Kryczka	775dc623ad	add `CompactionFilter` to stress/crash tests (#6988 ) Summary: Added a `CompactionFilter` that is aware of the stress test's expected state. It only drops key versions that are already covered according to the expected state. It is incompatible with snapshots (same as all `CompactionFilter`s), so disables all snapshot-related features when used in the crash test. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6988 Test Plan: running a minified blackbox crash test ``` $ TEST_TMPDIR=/dev/shm python tools/db_crashtest.py blackbox --max_key=1000000 -write_buffer_size=1048576 -max_bytes_for_level_base=4194304 -target_file_size_base=1048576 -value_size_mult=33 --interval=10 --duration=3600 ``` Reviewed By: anand1976 Differential Revision: D22072888 Pulled By: ajkr fbshipit-source-id: 727b9d7a90d5eab18be0ec6cd5a810712ac13320	4 years ago
Yanqin Jin	15d9f28da5	Add stress test for best-efforts recovery (#6819 ) Summary: Add crash test for the case of best-efforts recovery. After a certain amount of time, we kill the db_stress process, randomly delete some certain table files and restart db_stress. Given the randomness of file deletion, it is difficult to verify against a reference for data correctness. Therefore, we just check that the db can restart successfully. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6819 Test Plan: ``` ./db_stress -best_efforts_recovery=true -disable_wal=1 -reopen=0 ./db_stress -best_efforts_recovery=true -disable_wal=0 -skip_verifydb=1 -verify_db_one_in=0 -continuous_verification_interval=0 make crash_test_with_best_efforts_recovery ``` Reviewed By: anand1976 Differential Revision: D21436753 Pulled By: riversand963 fbshipit-source-id: 0b3605c922a16c37ed17d5ab6682ca4240e47926	4 years ago
Peter Dillinger	0c56fc4d66	Allow missing "unversioned" python, as in CentOS 8 (#6883 ) Summary: RocksDB Makefile was assuming existence of 'python' command, which is not present in CentOS 8. We avoid using 'python' if 'python3' is available. Also added fancy logic to format-diff.sh to make clang-format-diff.py for Python2 work even with Python3 only (as some CentOS 8 FB machines come equipped) Also, now use just 'python3' for PYTHON if not found so that an informative "command not found" error will result rather than something weird. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6883 Test Plan: manually tried some variants, 'make check' on a fresh CentOS 8 machine without 'python' executable or Python2 but with clang-format-diff.py for Python2. Reviewed By: gg814 Differential Revision: D21767029 Pulled By: pdillinger fbshipit-source-id: 54761b376b140a3922407bdc462f3572f461d0e9	5 years ago
sdong	12825894a2	Disable "compression_parallel_threads" in crash test for now (#6816 ) Summary: "compressio_parallel_threads" caused several test failure tests. To keep crash test clean, disable it for now. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6816 Test Plan: "make crash_test" to make sure the python script doesn't break Reviewed By: zhichao-cao Differential Revision: D21462112 fbshipit-source-id: 9eecc764800da82cd19665dc8b167eacead3310b	5 years ago
Andrew Kryczka	1f20df2f38	cover single level universal in crash test (#6818 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6818 Test Plan: fast whitebox test and verify there are some single-level universal and some multi-level universal runs. ``` $ python ./tools/db_crashtest.py whitebox --simple -max_key=1000000 -value_size_mult=33 -write_buffer_size=524288 -target_file_size_base=524288 -max_bytes_for_level_base=2097152 --duration=120 --interval=10 --ops_per_thread=1000 --random_kill_odd=887 ``` Reviewed By: riversand963 Differential Revision: D21432138 Pulled By: ajkr fbshipit-source-id: 2fc5ba9f3dfa49bb11e81da7dd00a17b476e64d7	5 years ago
Ziyue Yang	e619a20e93	Add an option for parallel compression in for db_stress (#6722 ) Summary: This commit adds an `compression_parallel_threads` option in db_stress. It also fixes the naming of parallel compression option in db_bench to keep it aligned with others. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6722 Reviewed By: pdillinger Differential Revision: D21091385 fbshipit-source-id: c9ba8c4e5cc327ff9e6094a6dc6a15fcff70f100	5 years ago
Cheng Chang	0a77617820	Disable O_DIRECT in stress test when db directory does not support direct IO (#6727 ) Summary: In crash test, the db directory might be set to /dev/shm or /tmp, in certain environments such as internal testing infrastructure, neither of these directories support direct IO, so direct IO is never enabled in crash test. This PR sets up SyncPoints in direct IO related code paths to disable O_DIRECT flag in calls to `open`, so the direct IO code paths will be executed, all direct IO related assertions will be checked, but no real direct IO request will be issued to the file system. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6727 Test Plan: export CRASH_TEST_EXT_ARGS="--use_direct_reads=1 --mmap_read=0" make -j24 crash_test Reviewed By: zhichao-cao Differential Revision: D21139250 Pulled By: cheng-chang fbshipit-source-id: db9adfe78d91aa4759835b1af91c5db7b27b62ee	5 years ago
sdong	fe206f4f7c	crash_test to cover index_type kBinarySearchWithFirstKey (#6721 ) Summary: Recently index_type kBinarySearchWithFirstKey is improved so that the API guarantee is exactly the same as other types and it is ready for wide production. We should cover it in crash tst. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6721 Test Plan: Run crash_test Reviewed By: anand1976 Differential Revision: D21099781 fbshipit-source-id: fda91eba831d9eacbb140c703e9768bb1701f935	5 years ago
sdong	63d82e57b9	crash_test to cover small max_open_files (#6719 ) Summary: RocksDB behavior is different while max_open_files is small or large. Add the coverage to small max_open_files. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6719 Test Plan: Run crash_test Reviewed By: pdillinger Differential Revision: D21081021 fbshipit-source-id: e3e211761a9bd25d93d19a61c1f7b62d48cf5e3c	5 years ago
sdong	73523baeb1	crash_test to cover options.avoid_flush_during_recovery (#6712 ) Summary: Options.avoid_flush_during_recovery is uncovered in crash_test. Add the coverage with a chance of 1/8, as it is a less frequently used options. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6712 Test Plan: Run crash_test and see the option can be used or not used by chance. Reviewed By: ltamasi Differential Revision: D21056566 fbshipit-source-id: c3b1521517cfc204786e6ef8c6acd7fffda64793	5 years ago
Yueh-Hsuan Chiang	5801af4646	Add env_fault_injection argument to db_stress (#6687 ) Summary: Add env_fault_injection argument to db_stress. When enabled, FaultInjectionTestEnv will be used instead. Currently this option does not support running with other env setting. This will allow us to later manually produce error when running db_crashtest. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6687 Test Plan: make db_stress -j32 ./db_stress --env_fault_injection ./db_stress --env_fault_injection --hdfs // expect error message Reviewed By: ajkr Differential Revision: D21014683 Pulled By: yhchiang fbshipit-source-id: 0724aeac37efd57adb72a37defe6dbd3bfa8106a	5 years ago
anand76	610a09ccff	Remove a printf from db_stress that's not useful info (#6705 ) Summary: This was causing db_crashtest.py to wrongly assume an error by parsing the output. Hopefully this will stabilize the crash tests. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6705 Test Plan: make blackbox_crash_test Reviewed By: ltamasi Differential Revision: D21043335 Pulled By: anand1976 fbshipit-source-id: 5cddd112b124d4e2ebd11724a17d4ef0f50c1cf8	5 years ago
anand76	79c838eb0f	Fix a few bugs in db_stress fault injection (#6693 ) Summary: Fix the following issues - 1. Output parsing error in db_crashtest.py 2. Memory leak on exit 3. False alarm on filter block read error Pull Request resolved: https://github.com/facebook/rocksdb/pull/6693 Test Plan: asan_crash Reviewed By: cheng-chang Differential Revision: D20990399 Pulled By: anand1976 fbshipit-source-id: 178ee0dd7c69a4bc5db698379db0dedb29281699	5 years ago
anand76	5c19a441c4	Fault injection in db_stress (#6538 ) Summary: This PR implements a fault injection mechanism for injecting errors in reads in db_stress. The FaultInjectionTestFS is used for this purpose. A thread local structure is used to track the errors, so that each db_stress thread can independently enable/disable error injection and verify observed errors against expected errors. This is initially enabled only for Get and MultiGet, but can be extended to iterator as well once its proven stable. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6538 Test Plan: crash_test make check Reviewed By: riversand963 Differential Revision: D20714347 Pulled By: anand1976 fbshipit-source-id: d7598321d4a2d72bda0ced57411a337a91d87dc7	5 years ago
Yanqin Jin	ccf7676455	Update a few scripts to be python3 compatible (#6525 ) Summary: There are a few scripts with python3 compatibility issues that were not detected by automated tool before. Update them now. Test Plan (devserver): python2 tools/ldb_test.py python3 tools/ldb_test.py python2 tools/write_stress_runner.py --runtime_sec=30 python3 tools/write_stress_runner.py --runtime_sec=30 python2 tools/db_crashtest.py --simple --interval=2 --duration=10 blackbox python3 tools/db_crashtest.py --simple --interval=2 --duration=10 blackbox python2 tools/db_crashtest.py --simple --duration=10 --random_kill_odd=1000 --ops_per_thread=1000 whitebox python3 tools/db_crashtest.py --simple --duration=10 --random_kill_odd=1000 --ops_per_thread=1000 whitebox Pull Request resolved: https://github.com/facebook/rocksdb/pull/6525 Reviewed By: cheng-chang Differential Revision: D20627820 Pulled By: riversand963 fbshipit-source-id: 4b25a7bd4d001c7f868be8b640ef876523be6ca3	5 years ago
Levi Tamasi	217ce20021	Remove GetSortedWalFiles/GetCurrentWalFile from the crash test (#6491 ) Summary: Currently, `db_stress` tests a randomly picked one of `GetLiveFiles`, `GetSortedWalFiles`, and `GetCurrentWalFile` with a 1/N chance when the command line parameter `get_live_files_and_wal_files_one_in` is specified. The problem is that `GetSortedWalFiles` and `GetCurrentWalFile` are unreliable in the sense that they can return errors if another thread removes a WAL file while they are executing (which is a perfectly plausible and legitimate scenario). The patch splits this command line parameter into three (one for each API), and changes the crash test script so that only `GetLiveFiles` is tested during our continuous crash test runs. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6491 Test Plan: ``` make check python tools/db_crashtest.py whitebox ``` Reviewed By: siying Differential Revision: D20312200 Pulled By: ltamasi fbshipit-source-id: e7c3481eddfe3bd3d5349476e34abc9eee5b7dc8	5 years ago

1 2 3 4 5 ...

273 Commits (f6a0065d541a37fdab2870d171a411acc5efe013)