rocksdb

Commit Graph

Author	SHA1	Message	Date
Andrew Kryczka	76de3c85cc	reduce memory usage in CircleCI mini crashtest (#10639 ) Summary: Example flake where CircleCI reports memory at 99% and process gets killed with signal 9 (likely OOM): https://app.circleci.com/pipelines/github/facebook/rocksdb/18085/workflows/bdadbfe6-c40f-4ccb-a5db-fc8c4036f20a/jobs/475628 The previous settings of max_key=25000000, column_families=10, and log2_keys_per_lock=2 resulted in 3GB memory usage just for SharedState. The locks alone consume at least (25000000 keys per CF) * (10 CFs) / (2^2 keys per lock) * (40 bytes per lock) = 2.3GB. This PR reduces it 10x by reducing max_key by that factor. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10639 Reviewed By: cbi42 Differential Revision: D39263804 Pulled By: ajkr fbshipit-source-id: 9b5565bbafcb21a2f5b487c8364808dea2f0bc0c	3 years ago
Andrew Kryczka	36dec11bc6	Disable RateLimiterTest.Rate with valgrind (#10637 ) Summary: Example valgrind flake: https://app.circleci.com/pipelines/github/facebook/rocksdb/18073/workflows/3794e569-45cb-4621-a2b4-df1dcdf5cb19/jobs/475569 ``` util/rate_limiter_test.cc:358 Expected equality of these values: samples_at_minimum Which is: 9 samples Which is: 10 ``` Some other runs of `RateLimiterTest.Rate` already skip this check due to its reliance on a minimum execution speed. We know valgrind slows execution a lot so can disable the check in that case. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10637 Reviewed By: cbi42 Differential Revision: D39251350 Pulled By: ajkr fbshipit-source-id: 41ae1ea4cd91992ea57df902f9f7fd6d182a5932	3 years ago
Andrew Kryczka	fe5fbe32cb	Deflake DBBlockCacheTest1.WarmCacheWithBlocksDuringFlush (#10635 ) Summary: Previously, automatic compaction could be triggered prior to the test invoking CompactRange(). It could lead to the following flaky failure: ``` /root/project/db/db_block_cache_test.cc:753: Failure Expected equality of these values: 1 + kNumBlocks Which is: 11 options.statistics->getTickerCount(BLOCK_CACHE_INDEX_ADD) Which is: 10 ``` A sequence leading to this failure was: * Automatic compaction * files [1] [2] trivially moved * files [3] [4] [5] [6] trivially moved * CompactRange() * files [7] [8] [9] trivially moved * file [10] trivially moved In such a case, the index/filter block adds that the test expected did not happen since there were no new files. This PR just tweaks settings to ensure the `CompactRange()` produces one new file. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10635 Reviewed By: cbi42 Differential Revision: D39250869 Pulled By: ajkr fbshipit-source-id: a3c94c49069e28c49c40b4b80dae0059739d19fd	3 years ago
Changyu Bi	30bc495c03	Skip swaths of range tombstone covered keys in merging iterator (2022 edition) (#10449 ) Summary: Delete range logic is moved from `DBIter` to `MergingIterator`, and `MergingIterator` will seek to the end of a range deletion if possible instead of scanning through each key and check with `RangeDelAggregator`. With the invariant that a key in level L (consider memtable as the first level, each immutable and L0 as a separate level) has a larger sequence number than all keys in any level >L, a range tombstone `[start, end)` from level L covers all keys in its range in any level >L. This property motivates optimizations in iterator: - in `Seek(target)`, if level L has a range tombstone `[start, end)` that covers `target.UserKey`, then for all levels > L, we can do Seek() on `end` instead of `target` to skip some range tombstone covered keys. - in `Next()/Prev()`, if the current key is covered by a range tombstone `[start, end)` from level L, we can do `Seek` to `end` for all levels > L. This PR implements the above optimizations in `MergingIterator`. As all range tombstone covered keys are now skipped in `MergingIterator`, the range tombstone logic is removed from `DBIter`. The idea in this PR is similar to https://github.com/facebook/rocksdb/issues/7317, but this PR leaves `InternalIterator` interface mostly unchanged. Credit: the cascading seek optimization and the sentinel key (discussed below) are inspired by [Pebble](https://github.com/cockroachdb/pebble/blob/master/merging_iter.go) and suggested by ajkr in https://github.com/facebook/rocksdb/issues/7317. The two optimizations are mostly implemented in `SeekImpl()/SeekForPrevImpl()` and `IsNextDeleted()/IsPrevDeleted()` in `merging_iterator.cc`. See comments for each method for more detail. One notable change is that the minHeap/maxHeap used by `MergingIterator` now contains range tombstone end keys besides point key iterators. This helps to reduce the number of key comparisons. For example, for a range tombstone `[start, end)`, a `start` and an `end` `HeapItem` are inserted into the heap. When a `HeapItem` for range tombstone start key is popped from the minHeap, we know this range tombstone becomes "active" in the sense that, before the range tombstone's end key is popped from the minHeap, all the keys popped from this heap is covered by the range tombstone's internal key range `[start, end)`. Another major change, delete range sentinel key, is made to `LevelIterator`. Before this PR, when all point keys in an SST file are iterated through in `MergingIterator`, a level iterator would advance to the next SST file in its level. In the case when an SST file has a range tombstone that covers keys beyond the SST file's last point key, advancing to the next SST file would lose this range tombstone. Consequently, `MergingIterator` could return keys that should have been deleted by some range tombstone. We prevent this by pretending that file boundaries in each SST file are sentinel keys. A `LevelIterator` now only advance the file iterator once the sentinel key is processed. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10449 Test Plan: - Added many unit tests in db_range_del_test - Stress test: `./db_stress --readpercent=5 --prefixpercent=19 --writepercent=20 -delpercent=10 --iterpercent=44 --delrangepercent=2` - Additional iterator stress test is added to verify against iterators against expected state: https://github.com/facebook/rocksdb/issues/10538. This is based on ajkr's previous attempt https://github.com/facebook/rocksdb/pull/5506#issuecomment-506021913. ``` python3 ./tools/db_crashtest.py blackbox --simple --write_buffer_size=524288 --target_file_size_base=524288 --max_bytes_for_level_base=2097152 --compression_type=none --max_background_compactions=8 --value_size_mult=33 --max_key=5000000 --interval=10 --duration=7200 --delrangepercent=3 --delpercent=9 --iterpercent=25 --writepercent=60 --readpercent=3 --prefixpercent=0 --num_iterations=1000 --range_deletion_width=100 --verify_iterator_with_expected_state_one_in=1 ``` - Performance benchmark: I used a similar setup as in the blog [post](http://rocksdb.org/blog/2018/11/21/delete-range.html) that introduced DeleteRange, "a database with 5 million data keys, and 10000 range tombstones (ignoring those dropped during compaction) that were written in regular intervals after 4.5 million data keys were written". As expected, the performance with this PR depends on the range tombstone width. ``` # Setup: TEST_TMPDIR=/dev/shm ./db_bench_main --benchmarks=fillrandom --writes=4500000 --num=5000000 TEST_TMPDIR=/dev/shm ./db_bench_main --benchmarks=overwrite --writes=500000 --num=5000000 --use_existing_db=true --writes_per_range_tombstone=50 # Scan entire DB TEST_TMPDIR=/dev/shm ./db_bench_main --benchmarks=readseq[-X5] --use_existing_db=true --num=5000000 --disable_auto_compactions=true # Short range scan (10 Next()) TEST_TMPDIR=/dev/shm/width-100/ ./db_bench_main --benchmarks=seekrandom[-X5] --use_existing_db=true --num=500000 --reads=100000 --seek_nexts=10 --disable_auto_compactions=true # Long range scan(1000 Next()) TEST_TMPDIR=/dev/shm/width-100/ ./db_bench_main --benchmarks=seekrandom[-X5] --use_existing_db=true --num=500000 --reads=2500 --seek_nexts=1000 --disable_auto_compactions=true ``` Avg over of 10 runs (some slower tests had fews runs): For the first column (tombstone), 0 means no range tombstone, 100-10000 means width of the 10k range tombstones, and 1 means there is a single range tombstone in the entire DB (width is 1000). The 1 tombstone case is to test regression when there's very few range tombstones in the DB, as no range tombstone is likely to take a different code path than with range tombstones. - Scan entire DB \| tombstone width \| Pre-PR ops/sec \| Post-PR ops/sec \| ±% \| \| ------------- \| ------------- \| ------------- \| ------------- \| \| 0 range tombstone \|2525600 (± 43564) \|2486917 (± 33698) \|-1.53% \| \| 100 \|1853835 (± 24736) \|2073884 (± 32176) \|+11.87% \| \| 1000 \|422415 (± 7466) \|1115801 (± 22781) \|+164.15% \| \| 10000 \|22384 (± 227) \|227919 (± 6647) \|+918.22% \| \| 1 range tombstone \|2176540 (± 39050) \|2434954 (± 24563) \|+11.87% \| - Short range scan \| tombstone width \| Pre-PR ops/sec \| Post-PR ops/sec \| ±% \| \| ------------- \| ------------- \| ------------- \| ------------- \| \| 0 range tombstone \|35398 (± 533) \|35338 (± 569) \|-0.17% \| \| 100 \|28276 (± 664) \|31684 (± 331) \|+12.05% \| \| 1000 \|7637 (± 77) \|25422 (± 277) \|+232.88% \| \| 10000 \|1367 \|28667 \|+1997.07% \| \| 1 range tombstone \|32618 (± 581) \|32748 (± 506) \|+0.4% \| - Long range scan \| tombstone width \| Pre-PR ops/sec \| Post-PR ops/sec \| ±% \| \| ------------- \| ------------- \| ------------- \| ------------- \| \| 0 range tombstone \|2262 (± 33) \|2353 (± 20) \|+4.02% \| \| 100 \|1696 (± 26) \|1926 (± 18) \|+13.56% \| \| 1000 \|410 (± 6) \|1255 (± 29) \|+206.1% \| \| 10000 \|25 \|414 \|+1556.0% \| \| 1 range tombstone \|1957 (± 30) \|2185 (± 44) \|+11.65% \| - Microbench does not show significant regression: https://gist.github.com/cbi42/59f280f85a59b678e7e5d8561e693b61 Reviewed By: ajkr Differential Revision: D38450331 Pulled By: cbi42 fbshipit-source-id: b5ef12e8d8c289ed2e163ccdf277f5039b511fca	3 years ago
Peter Dillinger	3770d6b74b	Fix possible NaN StandardDeviation in Histogram (#10586 ) Summary: Appears possible after `5de98f2` introduced possible lost updates. Could be related to `2af132c` also. Simply ensure no sqrt of negative. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10586 Test Plan: test added Reviewed By: ajkr Differential Revision: D39068391 Pulled By: pdillinger fbshipit-source-id: 230b214a41e6c9ae91a1ef3e8b2a17b46bbb17c2	3 years ago
Peter Dillinger	9d5b3dabcf	Increase CircleCI no_output_timeout for macos-java builds (#10627 ) Summary: ... because we are frequently seeing the 10m "no output" timeouts on these Pull Request resolved: https://github.com/facebook/rocksdb/pull/10627 Test Plan: CI Reviewed By: hx235 Differential Revision: D39224922 Pulled By: pdillinger fbshipit-source-id: f54c7adb5de87b2f57ccbc7f4e6c541b9cd37e08	3 years ago
Levi Tamasi	b07217da04	Pin the newly cached blob after insert (#10625 ) Summary: With the current code, when a blob isn't found in the cache and gets read from the blob file and then inserted into the cache, the application gets passed the self-contained `PinnableSlice` resulting from the blob file read. The patch changes this so that the `PinnableSlice` pins the cache entry instead in this case. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10625 Test Plan: `make check` Reviewed By: pdillinger Differential Revision: D39220904 Pulled By: ltamasi fbshipit-source-id: cb9c62881e3523b1e9f614e00bf503bac2fe3b0a	3 years ago
Akanksha Mahajan	4cd16d65ae	Add new option num_file_reads_for_auto_readahead in BlockBasedTableOptions (#10556 ) Summary: RocksDB does auto-readahead for iterators on noticing more than two reads for a table file if user doesn't provide readahead_size and reads are sequential. A new option num_file_reads_for_auto_readahead is added which can be configured and indicates after how many sequential reads prefetching should be start. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10556 Test Plan: Existing and new unit test Reviewed By: anand1976 Differential Revision: D38947147 Pulled By: akankshamahajan15 fbshipit-source-id: c9eeab495f84a8df7f701c42f04894e46440ad97	3 years ago
anand76	5fbcc8c54d	Update MULTIGET_IO_BATCH_SIZE for non-async MultiGet (#10622 ) Summary: This stat was only getting updated in the async (coroutine) version of MultiGet. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10622 Reviewed By: akankshamahajan15 Differential Revision: D39188790 Pulled By: anand1976 fbshipit-source-id: 7e231507f65fc94c8a006c38f79dfba182a2c24a	3 years ago
Changyu Bi	3a75219e5d	Validate option `memtable_protection_bytes_per_key` (#10621 ) Summary: sanity check value for option `memtable_protection_bytes_per_key` in `ColumnFamilyData::ValidateOptions()`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10621 Test Plan: `make check`, added unit test in ColumnFamilyTest. Reviewed By: ajkr Differential Revision: D39180133 Pulled By: cbi42 fbshipit-source-id: 009e0da3ccb332d1c9e14d20193304610bd4eb8a	3 years ago
Andrew Kryczka	ccf822492f	Reenable sync_fault_injection in crash test (#10172 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10172 Reviewed By: riversand963 Differential Revision: D37164671 Pulled By: ajkr fbshipit-source-id: 40eb919b8dc261d502510e878ee8ac7874ab35d0	3 years ago
Hui Xiao	e7525a1fff	Disable use_txn=1 with sync_fault_injection=1 in db_crashtest.py (#10605 ) Summary: Context/Summary: `ExpectedState` is not aware of transaction-related concept so `use_txn=1 ` is not compatible with `sync_fault_injection=1`. Therefore this PR disabled this combination until we expand our correctness testing to transaction related features. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10605 Test Plan: - Run the following commands to verify `--use_txn` is correctly sanitized - `python3 ./tools/db_crashtest.py blackbox --use_txn=1 --sync_fault_injection=1 ` - `python3 ./tools/db_crashtest.py blackbox --use_txn=0 --sync_fault_injection=1 ` Reviewed By: ajkr Differential Revision: D39121287 Pulled By: hx235 fbshipit-source-id: 7d5d6dd32479ea1c07df4f38322650f3a60def9c	3 years ago
sdong	9509003503	Option migration tool to break down files for FIFO compaction (#10600 ) Summary: Right now, when the option migration tool migrates to FIFO compaction, it compacts all the data into one single SST file and move to L0. Although it creates a valid LSM-tree for FIFO, for any data to be deleted for FIFO, the giant file will be deleted, which might make the DB almost empty. There is not good solution for it, because usually we don't have enough information to reconstruct the FIFO LSM-tree. This change changes to a solution that compromises the FIFO condition. We hope the solution is more useable. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10600 Test Plan: Add unit tests for that. Reviewed By: jay-zhuang Differential Revision: D39106424 fbshipit-source-id: bdfd852c3b343373765b8d9716fefc08fd27145c	3 years ago
Levi Tamasi	228f2c5bf5	Adjust the blob cache printout in db_bench/db_stress (#10614 ) Summary: Currently, `db_bench` and `db_stress` print the blob cache options even if a shared block/blob cache is configured, i.e. when they are not actually in effect. The patch changes this so they are only printed when a separate blob cache is used. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10614 Test Plan: Tested manually using `db_bench` and `db_stress`. Reviewed By: akankshamahajan15 Differential Revision: D39144603 Pulled By: ltamasi fbshipit-source-id: f714304c5d46186f8514746c27ee6f52aa3e4af8	3 years ago
Levi Tamasi	01e88dfeb4	Support using cache warming with the secondary blob cache (#10603 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10603 Test Plan: `make check` Reviewed By: riversand963 Differential Revision: D39117952 Pulled By: ltamasi fbshipit-source-id: 5e956fa2fc18974876a5c87686acb50718e0edb7	3 years ago
Hui Xiao	8a85946f58	Add missing mutex when reading from shared variable bg_bottom_compaction_scheduled_, bg_compaction_scheduled_ (#10610 ) Summary: Context/Summary: According to https://github.com/facebook/rocksdb/blob/7.6.fb/db/compaction/compaction_job.h#L328-L332, any reading in the form of `bg_compaction_scheduled_` , `bg_bottom_compaction_scheduled_` should be protected by mutex, which isn't the case for some assert statement. This leads to a data race that can be repro-ed by the following command (command coming soon) ``` db=/dev/shm/rocksdb_crashtest_blackbox exp=/dev/shm/rocksdb_crashtest_expected rm -rf $db $exp mkdir -p $exp ./db_stress --clear_column_family_one_in=0 --column_families=1 --db=$db --delpercent=10 --delrangepercent=0 --destroy_db_initially=1 --expected_values_dir=$exp --iterpercent=0 --key_len_percent_dist=1,30,69 --max_key=1000000 --max_key_len=3 --prefixpercent=0 --readpercent=0 --reopen=0 --ops_per_thread=100000000 --value_size_mult=32 --writepercent=90 --compaction_pri=4 --use_txn=1 --level_compaction_dynamic_level_bytes=True --compaction_ttl=0 --compact_files_one_in=1000000 --compact_range_one_in=1000000 --value_size_mult=32 --verify_db_one_in=1000 --write_buffer_size=65536 --mark_for_compaction_one_file_in=10 --max_background_compactions=20 --max_key=25000000 --max_key_len=3 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=2097152 --target_file_size_base=2097152 --target_file_size_multiplier=2 ``` ``` WARNING: ThreadSanitizer: data race (pid=73424) Read of size 4 at 0x7b8c0000151c by thread T13: #0 ReleaseSubcompactionResources internal_repo_rocksdb/repo/db/compaction/compaction_job.cc:390 (db_stress+0x630aa3) https://github.com/facebook/rocksdb/issues/1 rocksdb::CompactionJob::Run() internal_repo_rocksdb/repo/db/compaction/compaction_job.cc:741 (db_stress+0x630aa3) https://github.com/facebook/rocksdb/issues/2 rocksdb::DBImpl::BackgroundCompaction(bool, rocksdb::JobContext, rocksdb::LogBuffer, rocksdb::DBImpl::PrepickedCompaction, rocksdb::Env::Priority) internal_repo_rocksdb/repo/db/db_impl/db_impl_compaction_flush.cc:3436 (db_stress+0x60b2cc) https://github.com/facebook/rocksdb/issues/3 rocksdb::DBImpl::BackgroundCallCompaction(rocksdb::DBImpl::PrepickedCompaction, rocksdb::Env::Priority) internal_repo_rocksdb/repo/db/db_impl/db_impl_compaction_flush.cc:2950 (db_stress+0x606d79) https://github.com/facebook/rocksdb/issues/4 rocksdb::DBImpl::BGWorkCompaction(void) internal_repo_rocksdb/repo/db/db_impl/db_impl_compaction_flush.cc:2693 (db_stress+0x60356a) Previous write of size 4 at 0x7b8c0000151c by thread T12 (mutexes: write M438955329917552448): #0 rocksdb::DBImpl::BackgroundCallCompaction(rocksdb::DBImpl::PrepickedCompaction, rocksdb::Env::Priority) internal_repo_rocksdb/repo/db/db_impl/db_impl_compaction_flush.cc:3018 (db_stress+0x6072a1) https://github.com/facebook/rocksdb/issues/1 rocksdb::DBImpl::BGWorkCompaction(void) internal_repo_rocksdb/repo/db/db_impl/db_impl_compaction_flush.cc:2693 (db_stress+0x60356a) Location is heap block of size 6720 at 0x7b8c00000000 allocated by main thread: #0 operator new(unsigned long, std::align_val_t) <null> (db_stress+0xbab5bb) https://github.com/facebook/rocksdb/issues/1 rocksdb::DBImpl::Open(rocksdb::DBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle, std::allocator<rocksdb::ColumnFamilyHandle> >, rocksdb::DB, bool, bool) internal_repo_rocksdb/repo/db/db_impl/db_impl_open.cc:1811 (db_stress+0x69769a) https://github.com/facebook/rocksdb/issues/2 rocksdb::TransactionDB::Open(rocksdb::DBOptions const&, rocksdb::TransactionDBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle, std::allocator<rocksdb::ColumnFamilyHandle> >, rocksdb::TransactionDB*) internal_repo_rocksdb/repo/utilities/transactions/pessimistic_transaction_db.cc:258 (db_stress+0x8ae1f4) https://github.com/facebook/rocksdb/issues/3 rocksdb::StressTest::Open(rocksdb::SharedState) internal_repo_rocksdb/repo/db_stress_tool/db_stress_test_base.cc:2611 (db_stress+0x32b927) https://github.com/facebook/rocksdb/issues/4 rocksdb::StressTest::InitDb(rocksdb::SharedState*) internal_repo_rocksdb/repo/db_stress_tool/db_stress_test_base.cc:290 (db_stress+0x34712c) ``` This PR added all the missing mutex that should've been in place Pull Request resolved: https://github.com/facebook/rocksdb/pull/10610 Test Plan: - Past repro command - Existing CI Reviewed By: riversand963 Differential Revision: D39143016 Pulled By: hx235 fbshipit-source-id: 51dd4db55ad306f3dbda5d0dd54d6f2513cf70f2	3 years ago
gitbw95	6cd8133035	Fix an import issue in fbcode. (#10604 ) Summary: This should fix an import issue detected in meta internal tests. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10604 Test Plan: Unit Tests. Reviewed By: hx235 Differential Revision: D39120414 Pulled By: gitbw95 fbshipit-source-id: dbd016d7f47b9f54aab5ea61e8d3cd79734f46af	3 years ago
Yanqin Jin	7c0838e65e	Use std::make_unique when possible (#10578 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10578 Test Plan: make check Reviewed By: ajkr Differential Revision: D39064748 Pulled By: riversand963 fbshipit-source-id: c7c135b7b713608edb14614846050ece6d4cc59d	3 years ago
Hui Xiao	e484b81eee	Sync dir containing CURRENT after RenameFile on CURRENT as much as possible (#10573 ) Summary: Context: Below crash test revealed a bug that directory containing CURRENT file (short for `dir_contains_current_file` below) was not always get synced after a new CURRENT is created and being called with `RenameFile` as part of the creation. This bug exposes a risk that such un-synced directory containing the updated CURRENT can’t survive a host crash (e.g, power loss) hence get corrupted. This then will be followed by a recovery from a corrupted CURRENT that we don't want. The root-cause is that a nullptr `FSDirectory* dir_contains_current_file` sometimes gets passed-down to `SetCurrentFile()` hence in those case `dir_contains_current_file->FSDirectory::FsyncWithDirOptions()` will be skipped (which otherwise will internally call`Env/FS::SyncDic()` ) ``` ./db_stress --acquire_snapshot_one_in=10000 --adaptive_readahead=1 --allow_data_in_errors=True --avoid_unnecessary_blocking_io=0 --backup_max_size=104857600 --backup_one_in=100000 --batch_protection_bytes_per_key=8 --block_size=16384 --bloom_bits=134.8015470676662 --bottommost_compression_type=disable --cache_size=8388608 --checkpoint_one_in=1000000 --checksum_type=kCRC32c --clear_column_family_one_in=0 --compact_files_one_in=1000000 --compact_range_one_in=1000000 --compaction_pri=2 --compaction_ttl=100 --compression_max_dict_buffer_bytes=511 --compression_max_dict_bytes=16384 --compression_type=zstd --compression_use_zstd_dict_trainer=1 --compression_zstd_max_train_bytes=65536 --continuous_verification_interval=0 --data_block_index_type=0 --db=$db --db_write_buffer_size=1048576 --delpercent=5 --delrangepercent=0 --destroy_db_initially=0 --disable_wal=0 --enable_compaction_filter=0 --enable_pipelined_write=1 --expected_values_dir=$exp --fail_if_options_file_error=1 --file_checksum_impl=none --flush_one_in=1000000 --get_current_wal_file_one_in=0 --get_live_files_one_in=1000000 --get_property_one_in=1000000 --get_sorted_wal_files_one_in=0 --index_block_restart_interval=4 --ingest_external_file_one_in=0 --iterpercent=10 --key_len_percent_dist=1,30,69 --level_compaction_dynamic_level_bytes=True --mark_for_compaction_one_file_in=10 --max_background_compactions=20 --max_bytes_for_level_base=10485760 --max_key=10000 --max_key_len=3 --max_manifest_file_size=16384 --max_write_batch_group_size_bytes=64 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=0 --memtable_prefix_bloom_size_ratio=0.001 --memtable_protection_bytes_per_key=1 --memtable_whole_key_filtering=1 --mmap_read=1 --nooverwritepercent=1 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=0 --open_write_fault_one_in=0 --ops_per_thread=100000000 --optimize_filters_for_memory=1 --paranoid_file_checks=1 --partition_pinning=2 --pause_background_one_in=1000000 --periodic_compaction_seconds=0 --prefix_size=5 --prefixpercent=5 --prepopulate_block_cache=1 --progress_reports=0 --read_fault_one_in=1000 --readpercent=45 --recycle_log_file_num=0 --reopen=0 --ribbon_starting_level=999 --secondary_cache_fault_one_in=32 --secondary_cache_uri=compressed_secondary_cache://capacity=8388608 --set_options_one_in=10000 --snapshot_hold_ops=100000 --sst_file_manager_bytes_per_sec=0 --sst_file_manager_bytes_per_truncate=0 --subcompactions=3 --sync_fault_injection=1 --target_file_size_base=2097 --target_file_size_multiplier=2 --test_batches_snapshots=1 --top_level_index_pinning=1 --use_full_merge_v1=1 --use_merge=1 --value_size_mult=32 --verify_checksum=1 --verify_checksum_one_in=1000000 --verify_db_one_in=100000 --verify_sst_unique_id_in_manifest=1 --wal_bytes_per_sync=524288 --write_buffer_size=4194 --writepercent=35 ``` ``` stderr: WARNING: prefix_size is non-zero but memtablerep != prefix_hash db_stress: utilities/fault_injection_fs.cc:748: virtual rocksdb::IOStatus rocksdb::FaultInjectionTestFS::RenameFile(const std::string &, const std::string &, const rocksdb::IOOptions &, rocksdb::IODebugContext ): Assertion `tlist.find(tdn.second) == tlist.end()' failed.` ``` Summary:* The PR ensured the non-test path pass down a non-null dir containing CURRENT (which is by current RocksDB assumption just db_dir) by doing the following: - Renamed `directory_to_fsync` as `dir_contains_current_file` in `SetCurrentFile()` to tighten the association between this directory and CURRENT file - Changed `SetCurrentFile()` API to require `dir_contains_current_file` being passed-in, instead of making it by default nullptr. - Because `SetCurrentFile()`'s `dir_contains_current_file` is passed down from `VersionSet::LogAndApply()` then `VersionSet::ProcessManifestWrites()` (i.e, think about this as a chain of 3 functions related to MANIFEST update), these 2 functions also got refactored to require `dir_contains_current_file` - Updated the non-test-path callers of these 3 functions to obtain and pass in non-nullptr `dir_contains_current_file`, which by current assumption of RocksDB, is the `FSDirectory* db_dir`. - `db_impl` path will obtain `DBImpl::directories_.getDbDir()` while others with no access to such `directories_` are obtained on the fly by creating such object `FileSystem::NewDirectory(..)` and manage it by unique pointers to ensure short life time. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10573 Test Plan: - `make check` - Passed the repro db_stress command - For future improvement, since we currently don't assert dir containing CURRENT to be non-nullptr due to https://github.com/facebook/rocksdb/pull/10573#pullrequestreview-1087698899, there is still chances that future developers mistakenly pass down nullptr dir containing CURRENT thus resulting skipped sync dir and cause the bug again. Therefore a smarter test (e.g, such as quoted from ajkr "(make) unsynced data loss to be dropping files corresponding to unsynced directory entries") is still needed. Reviewed By: ajkr Differential Revision: D39005886 Pulled By: hx235 fbshipit-source-id: 336fb9090d0cfa6ca3dd580db86268007dde7f5a	3 years ago
Levi Tamasi	7818560194	Add a dedicated cache entry role for blobs (#10601 ) Summary: The patch adds a dedicated cache entry role for blob values and switches to a registered deleter so that blobs show up as a separate bucket (as opposed to "Misc") in the cache occupancy statistics, e.g. ``` Block cache entry stats(count,size,portion): DataBlock(133515,531.73 MB,13.6866%) BlobValue(1824855,3.10 GB,81.7071%) Misc(1,0.00 KB,0%) ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/10601 Test Plan: Ran `make check` and tested the cache occupancy statistics using `db_bench`. Reviewed By: riversand963 Differential Revision: D39107915 Pulled By: ltamasi fbshipit-source-id: 8446c3b190a41a144030df73f318eeda4398c125	3 years ago
anand76	72a3fb3424	Update statistics for async scan readaheads (#10585 ) Summary: Imported a fix to "rocksdb.prefetched.bytes.discarded" stat from https://github.com/facebook/rocksdb/issues/10561, and added a new stat "rocksdb.async.prefetch.abort.micros" to measure time spent waiting for async reads to abort. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10585 Reviewed By: akankshamahajan15 Differential Revision: D39067000 Pulled By: anand1976 fbshipit-source-id: d7cda71abb48017239bd5fd832345a16c7024faf	3 years ago
Yanqin Jin	3613d862ba	print value when verification fails (#10587 ) Summary: When verification fails for db_stress, print more information about value read from the db and expected state. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10587 Test Plan: make check ./db_stress Reviewed By: akankshamahajan15, hx235 Differential Revision: D39078511 Pulled By: riversand963 fbshipit-source-id: 77ac8ffae01fc3a9b58a02c2e7bbe141e1a18f0b	3 years ago
Peter Dillinger	c5afbbfe4b	Don't wait for indirect flush in read-only DB (#10569 ) Summary: Some APIs for getting live files, which are used by Checkpoint and BackupEngine, can optionally trigger and wait for a flush. These would deadlock when used on a read-only DB. Here we fix that by assuming the user wants the overall operation to succeed and is OK without flushing (because the DB is read-only). Follow-up work: the same or other issues can be hit by directly invoking some DB functions that are clearly not appropriate for read-only instance, but are not covered by overrides in DBImplReadOnly and CompactedDBImpl. These should be fixed to avoid similar problems on accidental misuse. (Long term, it would be nice to have a DBReadOnly class without those members, like BackupEngineReadOnly.) Pull Request resolved: https://github.com/facebook/rocksdb/pull/10569 Test Plan: tests updated to catch regression (hang before the fix) Reviewed By: riversand963 Differential Revision: D38995759 Pulled By: pdillinger fbshipit-source-id: f5f8bc7123e13cb45bd393dd974d7d6eda20bc68	3 years ago
Changyu Bi	5532b462c4	Verify Iterator/Get() against expected state in only `no_batched_ops_test` (#10590 ) Summary: https://github.com/facebook/rocksdb/issues/10538 added `TestIterateAgainstExpected()` in `no_batched_ops_test` to verify iterator correctness against the in memory expected state. It is not compatible when run after some other stress tests, e.g. `TestPut()` in `batched_op_stress`, that either do not set expected state when writing to DB or use keys that cannot be parsed by `GetIntVal()`. The assert [here](`d17be55aab/db_stress_tool/db_stress_common.h (L520)`) could fail. This PR fixed this issue by setting iterator upperbound to `max_key` when `destroy_db_initially=0` to avoid the key space that `batched_op_stress` touches. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10590 Test Plan: ``` # set up DB with batched_op_stress ./db_stress --test_batches_snapshots=1 --verify_iterator_with_expected_state_one_in=1 --max_key_len=3 --max_key=100000000 --skip_verifydb=1 --continuous_verification_interval=0 --writepercent=85 --delpercent=3 --delrangepercent=0 --iterpercent=10 --nooverwritepercent=1 --prefixpercent=0 --readpercent=2 --key_len_percent_dist=1,30,69 # Before this PR, the following test will fail the asserts with error msg like the following # Assertion failed: (size_key <= key_gen_ctx.weights.size() * sizeof(uint64_t)), function GetIntVal, file db_stress_common.h, line 524. ./db_stress --verify_iterator_with_expected_state_one_in=1 --max_key_len=3 --max_key=100000000 --skip_verifydb=1 --continuous_verification_interval=0 --writepercent=0 --delpercent=3 --delrangepercent=0 --iterpercent=95 --nooverwritepercent=1 --prefixpercent=0 --readpercent=2 --key_len_percent_dist=1,30,69 --destroy_db_initially=0 ``` Reviewed By: ajkr Differential Revision: D39085243 Pulled By: cbi42 fbshipit-source-id: a7dfee2320c330773b623b442d730fd014ec7056	3 years ago
Levi Tamasi	64e74723f7	Use the default metadata charge policy when creating an LRU cache via the Java API (#10577 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10577 Reviewed By: akankshamahajan15 Differential Revision: D39035884 Pulled By: ltamasi fbshipit-source-id: 48f116f8ca172b7eb5eb3651f39ddb891a7ffade	3 years ago
Andrew Hutchings	ce529a4ce1	Fix FreeBSD building (#10575 ) Summary: FreeBSD doesn't have `JEMALLOC_USABLE_SIZE_CONST` so we need to define it. This fixes MariaDB MDEV-20248. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10575 Reviewed By: jay-zhuang Differential Revision: D39057665 Pulled By: ajkr fbshipit-source-id: 3874779d12a1dd5036324947f6372e6ad57a7b08	3 years ago
zhangenming	d17be55aab	Make header more natural. (#10580 ) Summary: Fixed #10381 for blog's navigation bar UI. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10580 Reviewed By: hx235 Differential Revision: D39079045 Pulled By: cbi42 fbshipit-source-id: 922cf2624f201c0af42815b23d97361fc0151d93	3 years ago
Levi Tamasi	23376aa576	Improve the accounting of memory used by cached blobs (#10583 ) Summary: The patch improves the bookkeeping around the memory usage of cached blobs in two ways: 1) it uses `malloc_usable_size`, which accounts for allocator bin sizes etc., and 2) it also considers the memory usage of the `BlobContents` object in addition to the blob itself. Note: some unit tests had been relying on the cache charge being equal to the size of the cached blob; these were updated. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10583 Test Plan: `make check` Reviewed By: riversand963 Differential Revision: D39060680 Pulled By: ltamasi fbshipit-source-id: 3583adce2b4ce6e84861f3fadccbfd2e5a3cc482	3 years ago
bilyz	7670fdd690	fix trace_analyzer_tool args column position (#10576 ) Summary: The column meaning explanation is not correct according to the parsed human-readable trace file. Following are the results data from parsed trace human-readable file format. The key is in the first column. ``` 0x00000005 6 1 0 1661317998095439 0x00000007 0 1 0 1661317998095479 0x00000008 6 1 0 1661317998095493 0x0000000300000001 1 1 6 1661317998101508 0x0000000300000000 1 1 6 1661317998101508 0x0000000300000001 0 1 0 1661317998106486 0x0000000300000000 0 1 0 1661317998106498 0x0000000A 6 1 0 1661317998106515 0x00000007 0 1 0 1661317998111887 0x00000001 6 1 0 1661317998111923 ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/10576 Reviewed By: ajkr Differential Revision: D39039110 Pulled By: jay-zhuang fbshipit-source-id: eade6394c7870005b717846af09a848be6f677ce	3 years ago
Jay Zhuang	d9e71fb2c5	Fix periodic_task unable to re-register the same task type (#10379 ) Summary: Timer has a limitation that it cannot re-register a task with the same name, because the cancel only mark the task as invalid and wait for the Timer thread to clean it up later, before the task is cleaned up, the same task name cannot be added. Which makes the task option update likely to fail, which basically cancel and re-register the same task name. Change the periodic task name to a random unique id and store it in periodic_task_scheduler. Also refactor the `periodic_work` to `periodic_task` to make each job function as a `task`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10379 Test Plan: unittests Reviewed By: ajkr Differential Revision: D38000615 Pulled By: jay-zhuang fbshipit-source-id: e4135f9422e3b53aaec8eda54f4e18ce633a279e	3 years ago
Levi Tamasi	3f57d84af4	Introduce a dedicated class to represent blob values (#10571 ) Summary: The patch introduces a new class called `BlobContents`, which represents a single uncompressed blob value. We currently use `std::string` for this purpose; `BlobContents` is somewhat smaller but the primary reason for a dedicated class is that it enables certain improvements and optimizations like eliding a copy when inserting a blob into the cache, using custom allocators, or more control over and better accounting of the memory usage of cached blobs (see https://github.com/facebook/rocksdb/issues/10484). (We plan to implement these in subsequent PRs.) Pull Request resolved: https://github.com/facebook/rocksdb/pull/10571 Test Plan: `make check` Reviewed By: riversand963 Differential Revision: D39000965 Pulled By: ltamasi fbshipit-source-id: f296eddf9dec4fc3e11cad525b462bdf63c78f96	3 years ago
Brendan MacDonell	418b36a9bc	Support CompactionPri::kRoundRobin in RocksJava (#10572 ) Summary: Pretty trivial — this PR just adds the new compaction priority to the Java API. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10572 Reviewed By: hx235 Differential Revision: D39006523 Pulled By: ajkr fbshipit-source-id: ea8d665817e7b05826c397afa41c3abcda81484e	3 years ago
Brendan MacDonell	9f290a5d15	Update the javadoc for setforceConsistencyChecks (#10574 ) Summary: As of v6.14 (released in 2020), force_consistency_checks is enabled by default. However, the Java documentation does not seem to have been updated to reflect the change at the time. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10574 Reviewed By: hx235 Differential Revision: D39006566 Pulled By: ajkr fbshipit-source-id: c7b029484d62deaa1f260ec55084049fe39eb84a	3 years ago
Andrew Kryczka	7ad4b38617	Ensure writes to WAL tail during `FlushWAL(true /* sync /)` will be synced (#10560 ) Summary: WAL append and switch can both happen between `FlushWAL(true / sync */)`'s sync operations and its call to `MarkLogsSynced()`. We permit this since locks need to be released for the sync operations. Such an appended/switched WAL is both inactive and incompletely synced at the time `MarkLogsSynced()` processes it. Prior to this PR, `MarkLogsSynced()` assumed all inactive WALs were fully synced and removed them from consideration for future syncs. That was wrong in the scenario described above and led to the latest append(s) never being synced. This PR changes `MarkLogsSynced()` to only remove inactive WALs from consideration for which all flushed data has been synced. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10560 Test Plan: repro unit test for the scenario described above. Without this PR, it fails on "key2" not found Reviewed By: riversand963 Differential Revision: D38957391 Pulled By: ajkr fbshipit-source-id: da77175eba97ff251a4219b227b3bb2d4843ed26	3 years ago
Alan Paxton	7fbee01f0c	CI benchmarks refine configuration (#10514 ) Summary: CI benchmarks refine configuration Run only “essential” benchmarks, but for longer Fix (reduce) the NUM_KEYS to ensure cached behaviour Reduce level size to try to ensure more levels Refine test durations again, more time per test, but fewer tests. In CI benchmark mode, the only read test is readrandom. There are still 3 mostly-read tests. Goal is to squeeze complete run a little bit inside 1 hour so it doesn’t clash with the next run (cron scheduled for main branch), but it gets to run as long as possible, so that results are as credible as possible. Reduce thread count to physical capacity, in an attempt to reduce throughput variance for write heavy tests. See Mark Callaghan’s comments in related documentation.. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10514 Reviewed By: ajkr Differential Revision: D38952469 Pulled By: jay-zhuang fbshipit-source-id: 72fa6bba897cc47066ced65facd1fd36e28f30a8	3 years ago
Andrew Kryczka	d95e376368	Disable db_stress features incompatible with unsynced data dropping when sync_fault_injection=1 (#10559 ) Summary: The features that cannot work with disable_wal=1 due to unsynced data dropping (ingest_external_file_one_in and enable_compaction_filter) similarly cannot work with sync_fault_injection=1. This PR prevents those features from being used together with sync_fault_injection=1. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10559 Reviewed By: hx235 Differential Revision: D38953019 Pulled By: ajkr fbshipit-source-id: 7e2c7644ec84d7323f632cf976bcee00502d0ed7	3 years ago
Changyu Bi	d140fbfd7d	Add Iterator test against expected state to stress test (#10538 ) Summary: As mentioned in https://github.com/facebook/rocksdb/pull/5506#issuecomment-506021913, `db_stress` does not have much verification for iterator correctness. It has a `TestIterate()` function, but that is mainly for comparing results between two iterators, one with `total_order_seek` and the other optionally sets auto_prefix, upper/lower bounds. Commit 49a0581ad2462e31aa3f768afa769e0d33390f33 added a new `TestIterateAgainstExpected()` function that compares iterator against expected state. It locks a range of keys, creates an iterator, does a random sequence of `Next/Prev` and compares against expected state. This PR is based on that commit, the main changes include some logs (for easier debugging if a test fails), a forward and backward scan to cover the entire locked key range, and a flag for optionally turning on this version of Iterator testing. Added constraint that the checks against expected state in `TestIterateAgainstExpected()` and in `TestGet()` are only turned on when `--skip_verifydb` flag is not set. Remove the change log introduced in https://github.com/facebook/rocksdb/issues/10553. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10538 Test Plan: Run `db_stress` with `--verify_iterator_with_expected_state_one_in=1`, and a large `--iterpercent` and `--num_iterations`. Checked `op_logs` manually to ensure expected coverage. Tweaked part of the code in https://github.com/facebook/rocksdb/issues/10449 and stress test was able to catch it. - internally run various flavor of crash test Reviewed By: ajkr Differential Revision: D38847269 Pulled By: cbi42 fbshipit-source-id: 8b4402a9bba9f6cfa08051943cd672579d489599	3 years ago
muthukrishnan.s	79ed4be80f	Add get_name, get_id for column family handle in C API (#10499 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10499 Reviewed By: hx235 Differential Revision: D38523859 Pulled By: ajkr fbshipit-source-id: 268bba1fcce4a3e20c51e498a79d7b476f663aea	3 years ago
Levi Tamasi	78bbdef530	Fix a typo in BlobSecondaryCacheTest (#10566 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10566 Test Plan: `make check` Reviewed By: riversand963 Differential Revision: D38989926 Pulled By: ltamasi fbshipit-source-id: 6402635fe745e4e7eb3083ef9ad9f04c0177d762	3 years ago
sdong	4915f89513	WritableFileWriter to allow operation after failure when SyncWithoutFlush() is involved (#10555 ) Summary: https://github.com/facebook/rocksdb/pull/10489 adds an assertion in most functions in WritableFileWriter to check no previous error. However, it only works without calling SyncWithoutFlush(). The nature of SyncWithoutFlush() makes two concurrent call fails to check status code of each other and causing assertion failure. Fix the problem by skipping the check after SyncWithoutFlush() is called and not check status code in SyncWithoutFlush(). Since the original change was not officially released yet, the fix isn't added to HISTORY.md. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10555 Test Plan: Make sure existing tests still pass Reviewed By: anand1976 Differential Revision: D38946208 fbshipit-source-id: 63566732d3f25c8a8342840499cf7b7d745f27c2	3 years ago
Changyu Bi	198e5d8ee9	Update `TestGet()` to verify against expected state (#10553 ) Summary: updated `TestGet()` in `no_batched_op_stress` to check the result of `Get()` operations against expected state (`expected_state_manager_`). More specifically, if `Get()` finds a key, expected state should not have `DELETION_SENTINEL` for the same key, and if `Get()` returns NotFound for a key, expected state should not have the key. One intention for this change it to verify correctness of code path change regarding range tombstones. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10553 Test Plan: run db_stress with nonzero readpercent: `./db_stress_branch --readpercent=57 --prefixpercent=4 --writepercent=25 -delpercent=5 --iterpercent=5 --delrangepercent=4`. When I initially used wrong column family in `thread->shared->Get`, the test reported inconsistencies. Reviewed By: ajkr Differential Revision: D38927007 Pulled By: cbi42 fbshipit-source-id: f9f61b312ad0b4c21a799329609ba8526169b048	3 years ago
Mohamed Issa	cbe2c6d2d2	Remove unnecessary append to PLUGINS variable in top-level CMakeLists.txt (#10494 ) Summary: The PLUGINS variable already contains a semicolon separated list of plugins to compile, so there is no need to append the space separated list in ROCKSDB_PLUGINS passed in as compile argument. Removing this unnecessary append now allows CMake based compiles for two or more plugins at a time. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10494 Reviewed By: hx235 Differential Revision: D38482094 Pulled By: ajkr fbshipit-source-id: 61565f7cae2717e70a92132c972b25692ce6f0e8	3 years ago
muthukrishnan.s	616f3bd02e	Add grocksdb in Go language bindings (#10498 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10498 Reviewed By: hx235 Differential Revision: D38523574 Pulled By: ajkr fbshipit-source-id: 4df46fe3bfe49335a278594dfe6fd887879e71ec	3 years ago
lhsoft	38bf569ee7	Fix build error with NIOSTATS_CONTEXT (#10506 ) Summary: Fix https://github.com/facebook/rocksdb/issues/10475 Pull Request resolved: https://github.com/facebook/rocksdb/pull/10506 Reviewed By: hx235 Differential Revision: D38549337 Pulled By: ajkr fbshipit-source-id: fba864fba1b584c41419ca6015d5d62051539812	3 years ago
EdvardD	6e93d24935	Expose set_checksum function to C api (#10537 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10537 Reviewed By: hx235 Differential Revision: D38797662 Pulled By: ajkr fbshipit-source-id: a8db723c3eb9d5592cd78f8be7e442e4826686ad	3 years ago
Ryan Mack	06f73d2575	Fix autovector::emplace_back return type for C++17 (#10542 ) Summary: C++17 changes emplace_back API to return the new object. Needed to compile rocksdb on recent compilers. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10542 Reviewed By: hx235 Differential Revision: D38896019 Pulled By: ajkr fbshipit-source-id: cd7ddf34c0dcd449ecedc41e89a37b3a270a5603	3 years ago
Chen Lixiang	9593fd1c82	Fix wrong compression type and options in universal compaction picker (#10515 ) Summary: In UniversalCompactionBuilder::PickCompactionToReduceSortedRuns, we passed start_level to get compression type and options. I think that is wrong and we should use output_level instead. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10515 Reviewed By: hx235 Differential Revision: D38611335 Pulled By: ajkr fbshipit-source-id: bb860caed4b6c6bbde8f75fc50cf875a9f04723d	3 years ago
Peter Dillinger	db7606a41a	Fix "Behavior Changes" in 7.6 HISTORY.md (#10557 ) Summary: see diff Pull Request resolved: https://github.com/facebook/rocksdb/pull/10557 Test Plan: no functional change Reviewed By: gitbw95 Differential Revision: D38950531 Pulled By: pdillinger fbshipit-source-id: af72e80a31d7df38f6e633fa7115984c2274ed60	3 years ago
Changyu Bi	7b9e970042	Optionally issue `DeleteRange` in `whilewriting` benchmarks (#10552 ) Summary: Optionally issue DeleteRange in `whilewriting` benchmarks. This happens in `BGWriter` and uses similar logic as in `DoWrite` to issue DeleteRange operations. I added this when I was benchmarking https://github.com/facebook/rocksdb/issues/10547, but this should be an independent PR. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10552 Test Plan: ran some benchmarks with various delete range options, e.g. `./db_bench --benchmarks=readwhilewriting --writes_per_range_tombstone=100 --writes=200000 --reads=1000000 --disable_auto_compactions --max_num_range_tombstones=10000` Reviewed By: ajkr Differential Revision: D38927020 Pulled By: cbi42 fbshipit-source-id: 31ee20cb8127f7173f0816ea0cc2a204ec02aad6	3 years ago
Hui Xiao	b16655a547	Add missing synchronization in TestFSWritableFile (#10544 ) Summary: Context: ajkr's command revealed an existing TSAN data race between `TestFSWritableFile::Append` and `TestFSWritableFile::Sync` on `TestFSWritableFile::state_` ``` $ make clean && COMPILE_WITH_TSAN=1 make -j56 db_stress $ python3 tools/db_crashtest.py blackbox --simple --duration=3600 --interval=10 --sync_fault_injection=1 --disable_wal=0 --max_key=10000 --checkpoint_one_in=1000 ``` The race is due to concurrent access from [checkpoint's WAL sync](https://github.com/facebook/rocksdb/blob/7.4.fb/utilities/fault_injection_fs.cc#L324) and [db put's WAL write when ‘sync_fault_injection=1 ‘](https://github.com/facebook/rocksdb/blob/7.4.fb/utilities/fault_injection_fs.cc#L208) to the `state_` on the same WAL `TestFSWritableFile` under the missing synchronization. ``` WARNING: ThreadSanitizer: data race (pid=11275) Write of size 8 at 0x7b480003d850 by thread T23 (mutexes: write M69230): #0 rocksdb::TestFSWritableFile::Sync(rocksdb::IOOptions const&, rocksdb::IODebugContext) internal_repo_rocksdb/repo/utilities/fault_injection_fs.cc:297 (db_stress+0x716004) https://github.com/facebook/rocksdb/issues/1 rocksdb::(anonymous namespace)::CompositeWritableFileWrapper::Sync() internal_repo_rocksdb/repo/env/composite_env.cc:154 (db_stress+0x4dfa78) https://github.com/facebook/rocksdb/issues/2 rocksdb::(anonymous namespace)::LegacyWritableFileWrapper::Sync(rocksdb::IOOptions const&, rocksdb::IODebugContext) internal_repo_rocksdb/repo/env/env.cc:280 (db_stress+0x6dfd24) https://github.com/facebook/rocksdb/issues/3 rocksdb::WritableFileWriter::SyncInternal(bool) internal_repo_rocksdb/repo/file/writable_file_writer.cc:460 (db_stress+0xa1b98c) https://github.com/facebook/rocksdb/issues/4 rocksdb::WritableFileWriter::SyncWithoutFlush(bool) internal_repo_rocksdb/repo/file/writable_file_writer.cc:435 (db_stress+0xa1e441) https://github.com/facebook/rocksdb/issues/5 rocksdb::DBImpl::SyncWAL() internal_repo_rocksdb/repo/db/db_impl/db_impl.cc:1385 (db_stress+0x529458) https://github.com/facebook/rocksdb/issues/6 rocksdb::DBImpl::FlushWAL(bool) internal_repo_rocksdb/repo/db/db_impl/db_impl.cc:1339 (db_stress+0x54f82a) https://github.com/facebook/rocksdb/issues/7 rocksdb::DBImpl::GetLiveFilesStorageInfo(rocksdb::LiveFilesStorageInfoOptions const&, std::vector<rocksdb::LiveFileStorageInfo, std::allocator<rocksdb::LiveFileStorageInfo> >) internal_repo_rocksdb/repo/db/db_filesnapshot.cc:387 (db_stress+0x5c831d) https://github.com/facebook/rocksdb/issues/8 rocksdb::CheckpointImpl::CreateCustomCheckpoint(std::function<rocksdb::Status (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rocksdb::FileType)>, std::function<rocksdb::Status (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long, rocksdb::FileType, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rocksdb::Temperature)>, std::function<rocksdb::Status (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rocksdb::FileType)>, unsigned long, unsigned long, bool) internal_repo_rocksdb/repo/utilities/checkpoint/checkpoint_impl.cc:214 (db_stress+0x4c0343) https://github.com/facebook/rocksdb/issues/9 rocksdb::CheckpointImpl::CreateCheckpoint(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long, unsigned long) internal_repo_rocksdb/repo/utilities/checkpoint/checkpoint_impl.cc:123 (db_stress+0x4c237e) https://github.com/facebook/rocksdb/issues/10 rocksdb::StressTest::TestCheckpoint(rocksdb::ThreadState, std::vector<int, std::allocator<int> > const&, std::vector<long, std::allocator<long> > const&) internal_repo_rocksdb/repo/db_stress_tool/db_stress_test_base.cc:1699 (db_stress+0x328340) https://github.com/facebook/rocksdb/issues/11 rocksdb::StressTest::OperateDb(rocksdb::ThreadState) internal_repo_rocksdb/repo/db_stress_tool/db_stress_test_base.cc:825 (db_stress+0x33921f) https://github.com/facebook/rocksdb/issues/12 rocksdb::ThreadBody(void) internal_repo_rocksdb/repo/db_stress_tool/db_stress_driver.cc:33 (db_stress+0x354857) https://github.com/facebook/rocksdb/issues/13 rocksdb::(anonymous namespace)::StartThreadWrapper(void) internal_repo_rocksdb/repo/env/env_posix.cc:447 (db_stress+0x6eb2ad) Previous read of size 8 at 0x7b480003d850 by thread T64 (mutexes: write M980798978697532600, write M253744503184415024, write M1262): #0 memcpy <null> (db_stress+0xbc9696) https://github.com/facebook/rocksdb/issues/1 operator= internal_repo_rocksdb/repo/utilities/fault_injection_fs.h:35 (db_stress+0x70d5f1) https://github.com/facebook/rocksdb/issues/2 rocksdb::FaultInjectionTestFS::WritableFileAppended(rocksdb::FSFileState const&) internal_repo_rocksdb/repo/utilities/fault_injection_fs.cc:827 (db_stress+0x70d5f1) https://github.com/facebook/rocksdb/issues/3 rocksdb::TestFSWritableFile::Append(rocksdb::Slice const&, rocksdb::IOOptions const&, rocksdb::IODebugContext) internal_repo_rocksdb/repo/utilities/fault_injection_fs.cc:173 (db_stress+0x7143af) https://github.com/facebook/rocksdb/issues/4 rocksdb::(anonymous namespace)::CompositeWritableFileWrapper::Append(rocksdb::Slice const&) internal_repo_rocksdb/repo/env/composite_env.cc:115 (db_stress+0x4de3ab) https://github.com/facebook/rocksdb/issues/5 rocksdb::(anonymous namespace)::LegacyWritableFileWrapper::Append(rocksdb::Slice const&, rocksdb::IOOptions const&, rocksdb::IODebugContext) internal_repo_rocksdb/repo/env/env.cc:248 (db_stress+0x6df44b) https://github.com/facebook/rocksdb/issues/6 rocksdb::WritableFileWriter::WriteBuffered(char const, unsigned long, rocksdb::Env::IOPriority) internal_repo_rocksdb/repo/file/writable_file_writer.cc:551 (db_stress+0xa1a953) https://github.com/facebook/rocksdb/issues/7 rocksdb::WritableFileWriter::Flush(rocksdb::Env::IOPriority) internal_repo_rocksdb/repo/file/writable_file_writer.cc:327 (db_stress+0xa16ee8) https://github.com/facebook/rocksdb/issues/8 rocksdb::log::Writer::AddRecord(rocksdb::Slice const&, rocksdb::Env::IOPriority) internal_repo_rocksdb/repo/db/log_writer.cc:147 (db_stress+0x7f121f) https://github.com/facebook/rocksdb/issues/9 rocksdb::DBImpl::WriteToWAL(rocksdb::WriteBatch const&, rocksdb::log::Writer, unsigned long, unsigned long, rocksdb::Env::IOPriority, rocksdb::DBImpl::LogFileNumberSize&) internal_repo_rocksdb/repo/db/db_impl/db_impl_write.cc:1285 (db_stress+0x695042) https://github.com/facebook/rocksdb/issues/10 rocksdb::DBImpl::WriteToWAL(rocksdb::WriteThread::WriteGroup const&, rocksdb::log::Writer, unsigned long, bool, bool, unsigned long, rocksdb::DBImpl::LogFileNumberSize&) internal_repo_rocksdb/repo/db/db_impl/db_impl_write.cc:1328 (db_stress+0x6907e8) https://github.com/facebook/rocksdb/issues/11 rocksdb::DBImpl::PipelinedWriteImpl(rocksdb::WriteOptions const&, rocksdb::WriteBatch, rocksdb::WriteCallback, unsigned long, unsigned long, bool, unsigned long) internal_repo_rocksdb/repo/db/db_impl/db_impl_write.cc:731 (db_stress+0x68e8a7) https://github.com/facebook/rocksdb/issues/12 rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, rocksdb::WriteBatch, rocksdb::WriteCallback, unsigned long, unsigned long, bool, unsigned long, unsigned long, rocksdb::PreReleaseCallback, rocksdb::PostMemTableCallback) internal_repo_rocksdb/repo/db/db_impl/db_impl_write.cc:283 (db_stress+0x688370) https://github.com/facebook/rocksdb/issues/13 rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, rocksdb::WriteBatch) internal_repo_rocksdb/repo/db/db_impl/db_impl_write.cc:126 (db_stress+0x69a7b5) https://github.com/facebook/rocksdb/issues/14 rocksdb::DB::Put(rocksdb::WriteOptions const&, rocksdb::ColumnFamilyHandle, rocksdb::Slice const&, rocksdb::Slice const&, rocksdb::Slice const&) internal_repo_rocksdb/repo/db/db_impl/db_impl_write.cc:2247 (db_stress+0x698634) https://github.com/facebook/rocksdb/issues/15 rocksdb::DBImpl::Put(rocksdb::WriteOptions const&, rocksdb::ColumnFamilyHandle, rocksdb::Slice const&, rocksdb::Slice const&, rocksdb::Slice const&) internal_repo_rocksdb/repo/db/db_impl/db_impl_write.cc:37 (db_stress+0x699868) https://github.com/facebook/rocksdb/issues/16 rocksdb::NonBatchedOpsStressTest::TestPut(rocksdb::ThreadState, rocksdb::WriteOptions&, rocksdb::ReadOptions const&, std::vector<int, std::allocator<int> > const&, std::vector<long, std::allocator<long> > const&, char (&) [100], std::unique_ptr<rocksdb::MutexLock, std::default_delete<rocksdb::MutexLock> >&) internal_repo_rocksdb/repo/db_stress_tool/no_batched_ops_stress.cc:681 (db_stress+0x38d20c) https://github.com/facebook/rocksdb/issues/17 rocksdb::StressTest::OperateDb(rocksdb::ThreadState) internal_repo_rocksdb/repo/db_stress_tool/db_stress_test_base.cc:897 (db_stress+0x3399ec) https://github.com/facebook/rocksdb/issues/18 rocksdb::ThreadBody(void) internal_repo_rocksdb/repo/db_stress_tool/db_stress_driver.cc:33 (db_stress+0x354857) https://github.com/facebook/rocksdb/issues/19 rocksdb::(anonymous namespace)::StartThreadWrapper(void) internal_repo_rocksdb/repo/env/env_posix.cc:447 (db_stress+0x6eb2ad) Location is heap block of size 352 at 0x7b480003d800 allocated by thread T23: #0 operator new(unsigned long) <null> (db_stress+0xb685dc) https://github.com/facebook/rocksdb/issues/1 rocksdb::FaultInjectionTestFS::NewWritableFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rocksdb::FileOptions const&, std::unique_ptr<rocksdb::FSWritableFile, std::default_delete<rocksdb::FSWritableFile> >, rocksdb::IODebugContext) internal_repo_rocksdb/repo/utilities/fault_injection_fs.cc:506 (db_stress+0x711192) https://github.com/facebook/rocksdb/issues/2 rocksdb::CompositeEnv::NewWritableFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unique_ptr<rocksdb::WritableFile, std::default_delete<rocksdb::WritableFile> >, rocksdb::EnvOptions const&) internal_repo_rocksdb/repo/env/composite_env.cc:329 (db_stress+0x4d33fa) https://github.com/facebook/rocksdb/issues/3 rocksdb::EnvWrapper::NewWritableFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unique_ptr<rocksdb::WritableFile, std::default_delete<rocksdb::WritableFile> >, rocksdb::EnvOptions const&) internal_repo_rocksdb/repo/include/rocksdb/env.h:1425 (db_stress+0x300662) ... ``` Summary: - Added the missing lock in functions mentioned above along with three other functions with a similar need in TestFSWritableFile - Added clarification comment Pull Request resolved: https://github.com/facebook/rocksdb/pull/10544 Test Plan: - Past the above race condition repro Reviewed By: ajkr Differential Revision: D38886634 Pulled By: hx235 fbshipit-source-id: 0571bae9615f35b16fbd8168204607e306b1b486	3 years ago

1 2 3 4 5 ...

11468 Commits (80d010a5e78baa7254c2e985cce593519266008c) All Branches Search

11468 Commits (80d010a5e78baa7254c2e985cce593519266008c)

All Branches