Tag:
					 Branch:
					 Tree:
					75aca74017
				
			
			
		
		main
				
					oxigraph-8.1.1
				
					oxigraph-8.3.2
				
					oxigraph-main
				
			
			
				
			
		${ noResults }
		
	
		
			11605 Commits (75aca74017ca730391edce733381dac5498b3c53)
		
	
	
		
			
		
	
	| Author | SHA1 | Message | Date | 
|---|---|---|---|
|  Levi Tamasi | 228f2c5bf5 | Adjust the blob cache printout in db_bench/db_stress (#10614) Summary: Currently, `db_bench` and `db_stress` print the blob cache options even if a shared block/blob cache is configured, i.e. when they are not actually in effect. The patch changes this so they are only printed when a separate blob cache is used. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10614 Test Plan: Tested manually using `db_bench` and `db_stress`. Reviewed By: akankshamahajan15 Differential Revision: D39144603 Pulled By: ltamasi fbshipit-source-id: f714304c5d46186f8514746c27ee6f52aa3e4af8 | 3 years ago | 
|  Levi Tamasi | 01e88dfeb4 | Support using cache warming with the secondary blob cache (#10603) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10603 Test Plan: `make check` Reviewed By: riversand963 Differential Revision: D39117952 Pulled By: ltamasi fbshipit-source-id: 5e956fa2fc18974876a5c87686acb50718e0edb7 | 3 years ago | 
|  Hui Xiao | 8a85946f58 | Add missing mutex when reading from shared variable bg_bottom_compaction_scheduled_, bg_compaction_scheduled_ (#10610) Summary: **Context/Summary:** According to https://github.com/facebook/rocksdb/blob/7.6.fb/db/compaction/compaction_job.h#L328-L332, any reading in the form of `*bg_compaction_scheduled_` , `*bg_bottom_compaction_scheduled_` should be protected by mutex, which isn't the case for some assert statement. This leads to a data race that can be repro-ed by the following command (command coming soon) ``` db=/dev/shm/rocksdb_crashtest_blackbox exp=/dev/shm/rocksdb_crashtest_expected rm -rf $db $exp mkdir -p $exp ./db_stress --clear_column_family_one_in=0 --column_families=1 --db=$db --delpercent=10 --delrangepercent=0 --destroy_db_initially=1 --expected_values_dir=$exp --iterpercent=0 --key_len_percent_dist=1,30,69 --max_key=1000000 --max_key_len=3 --prefixpercent=0 --readpercent=0 --reopen=0 --ops_per_thread=100000000 --value_size_mult=32 --writepercent=90 --compaction_pri=4 --use_txn=1 --level_compaction_dynamic_level_bytes=True --compaction_ttl=0 --compact_files_one_in=1000000 --compact_range_one_in=1000000 --value_size_mult=32 --verify_db_one_in=1000 --write_buffer_size=65536 --mark_for_compaction_one_file_in=10 --max_background_compactions=20 --max_key=25000000 --max_key_len=3 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=2097152 --target_file_size_base=2097152 --target_file_size_multiplier=2 ``` ``` WARNING: ThreadSanitizer: data race (pid=73424) Read of size 4 at 0x7b8c0000151c by thread T13: #0 ReleaseSubcompactionResources internal_repo_rocksdb/repo/db/compaction/compaction_job.cc:390 (db_stress+0x630aa3) https://github.com/facebook/rocksdb/issues/1 rocksdb::CompactionJob::Run() internal_repo_rocksdb/repo/db/compaction/compaction_job.cc:741 (db_stress+0x630aa3) https://github.com/facebook/rocksdb/issues/2 rocksdb::DBImpl::BackgroundCompaction(bool*, rocksdb::JobContext*, rocksdb::LogBuffer*, rocksdb::DBImpl::PrepickedCompaction*, rocksdb::Env::Priority) internal_repo_rocksdb/repo/db/db_impl/db_impl_compaction_flush.cc:3436 (db_stress+0x60b2cc) https://github.com/facebook/rocksdb/issues/3 rocksdb::DBImpl::BackgroundCallCompaction(rocksdb::DBImpl::PrepickedCompaction*, rocksdb::Env::Priority) internal_repo_rocksdb/repo/db/db_impl/db_impl_compaction_flush.cc:2950 (db_stress+0x606d79) https://github.com/facebook/rocksdb/issues/4 rocksdb::DBImpl::BGWorkCompaction(void*) internal_repo_rocksdb/repo/db/db_impl/db_impl_compaction_flush.cc:2693 (db_stress+0x60356a) Previous write of size 4 at 0x7b8c0000151c by thread T12 (mutexes: write M438955329917552448): #0 rocksdb::DBImpl::BackgroundCallCompaction(rocksdb::DBImpl::PrepickedCompaction*, rocksdb::Env::Priority) internal_repo_rocksdb/repo/db/db_impl/db_impl_compaction_flush.cc:3018 (db_stress+0x6072a1) https://github.com/facebook/rocksdb/issues/1 rocksdb::DBImpl::BGWorkCompaction(void*) internal_repo_rocksdb/repo/db/db_impl/db_impl_compaction_flush.cc:2693 (db_stress+0x60356a) Location is heap block of size 6720 at 0x7b8c00000000 allocated by main thread: #0 operator new(unsigned long, std::align_val_t) <null> (db_stress+0xbab5bb) https://github.com/facebook/rocksdb/issues/1 rocksdb::DBImpl::Open(rocksdb::DBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**, bool, bool) internal_repo_rocksdb/repo/db/db_impl/db_impl_open.cc:1811 (db_stress+0x69769a) https://github.com/facebook/rocksdb/issues/2 rocksdb::TransactionDB::Open(rocksdb::DBOptions const&, rocksdb::TransactionDBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::TransactionDB**) internal_repo_rocksdb/repo/utilities/transactions/pessimistic_transaction_db.cc:258 (db_stress+0x8ae1f4) https://github.com/facebook/rocksdb/issues/3 rocksdb::StressTest::Open(rocksdb::SharedState*) internal_repo_rocksdb/repo/db_stress_tool/db_stress_test_base.cc:2611 (db_stress+0x32b927) https://github.com/facebook/rocksdb/issues/4 rocksdb::StressTest::InitDb(rocksdb::SharedState*) internal_repo_rocksdb/repo/db_stress_tool/db_stress_test_base.cc:290 (db_stress+0x34712c) ``` This PR added all the missing mutex that should've been in place Pull Request resolved: https://github.com/facebook/rocksdb/pull/10610 Test Plan: - Past repro command - Existing CI Reviewed By: riversand963 Differential Revision: D39143016 Pulled By: hx235 fbshipit-source-id: 51dd4db55ad306f3dbda5d0dd54d6f2513cf70f2 | 3 years ago | 
|  gitbw95 | 6cd8133035 | Fix an import issue in fbcode. (#10604) Summary: This should fix an import issue detected in meta internal tests. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10604 Test Plan: Unit Tests. Reviewed By: hx235 Differential Revision: D39120414 Pulled By: gitbw95 fbshipit-source-id: dbd016d7f47b9f54aab5ea61e8d3cd79734f46af | 3 years ago | 
|  Yanqin Jin | 7c0838e65e | Use std::make_unique when possible (#10578) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10578 Test Plan: make check Reviewed By: ajkr Differential Revision: D39064748 Pulled By: riversand963 fbshipit-source-id: c7c135b7b713608edb14614846050ece6d4cc59d | 3 years ago | 
|  Hui Xiao | e484b81eee | Sync dir containing CURRENT after RenameFile on CURRENT as much as possible (#10573) Summary:
**Context:**
Below crash test revealed a bug that directory containing CURRENT file (short for `dir_contains_current_file` below) was not always get synced after a new CURRENT is created and being called with `RenameFile` as part of the creation.
This bug exposes a risk that such un-synced directory containing the updated CURRENT can’t survive a host crash (e.g, power loss) hence get corrupted. This then will be followed by a recovery from a corrupted CURRENT that we don't want.
The root-cause is that a nullptr `FSDirectory* dir_contains_current_file` sometimes gets passed-down to `SetCurrentFile()` hence in those case `dir_contains_current_file->FSDirectory::FsyncWithDirOptions()` will be skipped  (which otherwise will internally call`Env/FS::SyncDic()` )
```
./db_stress --acquire_snapshot_one_in=10000 --adaptive_readahead=1 --allow_data_in_errors=True --avoid_unnecessary_blocking_io=0 --backup_max_size=104857600 --backup_one_in=100000 --batch_protection_bytes_per_key=8 --block_size=16384 --bloom_bits=134.8015470676662 --bottommost_compression_type=disable --cache_size=8388608 --checkpoint_one_in=1000000 --checksum_type=kCRC32c --clear_column_family_one_in=0 --compact_files_one_in=1000000 --compact_range_one_in=1000000 --compaction_pri=2 --compaction_ttl=100 --compression_max_dict_buffer_bytes=511 --compression_max_dict_bytes=16384 --compression_type=zstd --compression_use_zstd_dict_trainer=1 --compression_zstd_max_train_bytes=65536 --continuous_verification_interval=0 --data_block_index_type=0 --db=$db --db_write_buffer_size=1048576 --delpercent=5 --delrangepercent=0 --destroy_db_initially=0 --disable_wal=0 --enable_compaction_filter=0 --enable_pipelined_write=1 --expected_values_dir=$exp --fail_if_options_file_error=1 --file_checksum_impl=none --flush_one_in=1000000 --get_current_wal_file_one_in=0 --get_live_files_one_in=1000000 --get_property_one_in=1000000 --get_sorted_wal_files_one_in=0 --index_block_restart_interval=4 --ingest_external_file_one_in=0 --iterpercent=10 --key_len_percent_dist=1,30,69 --level_compaction_dynamic_level_bytes=True --mark_for_compaction_one_file_in=10 --max_background_compactions=20 --max_bytes_for_level_base=10485760 --max_key=10000 --max_key_len=3 --max_manifest_file_size=16384 --max_write_batch_group_size_bytes=64 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=0 --memtable_prefix_bloom_size_ratio=0.001 --memtable_protection_bytes_per_key=1 --memtable_whole_key_filtering=1 --mmap_read=1 --nooverwritepercent=1 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=0 --open_write_fault_one_in=0 --ops_per_thread=100000000 --optimize_filters_for_memory=1 --paranoid_file_checks=1 --partition_pinning=2 --pause_background_one_in=1000000 --periodic_compaction_seconds=0 --prefix_size=5 --prefixpercent=5 --prepopulate_block_cache=1 --progress_reports=0 --read_fault_one_in=1000 --readpercent=45 --recycle_log_file_num=0 --reopen=0 --ribbon_starting_level=999 --secondary_cache_fault_one_in=32 --secondary_cache_uri=compressed_secondary_cache://capacity=8388608 --set_options_one_in=10000 --snapshot_hold_ops=100000 --sst_file_manager_bytes_per_sec=0 --sst_file_manager_bytes_per_truncate=0 --subcompactions=3 --sync_fault_injection=1 --target_file_size_base=2097 --target_file_size_multiplier=2 --test_batches_snapshots=1 --top_level_index_pinning=1 --use_full_merge_v1=1 --use_merge=1 --value_size_mult=32 --verify_checksum=1 --verify_checksum_one_in=1000000 --verify_db_one_in=100000 --verify_sst_unique_id_in_manifest=1 --wal_bytes_per_sync=524288 --write_buffer_size=4194 --writepercent=35
```
```
stderr:
WARNING: prefix_size is non-zero but memtablerep != prefix_hash
db_stress: utilities/fault_injection_fs.cc:748: virtual rocksdb::IOStatus rocksdb::FaultInjectionTestFS::RenameFile(const std::string &, const std::string &, const rocksdb::IOOptions &, rocksdb::IODebugContext *): Assertion `tlist.find(tdn.second) == tlist.end()' failed.`
```
**Summary:**
The PR ensured the non-test path pass down a non-null dir containing CURRENT (which is by current RocksDB assumption just db_dir) by doing the following:
- Renamed `directory_to_fsync` as `dir_contains_current_file` in `SetCurrentFile()` to tighten the association between this directory and CURRENT file
- Changed `SetCurrentFile()` API to require `dir_contains_current_file` being passed-in, instead of making it by default nullptr.
    -  Because `SetCurrentFile()`'s `dir_contains_current_file` is passed down from `VersionSet::LogAndApply()` then `VersionSet::ProcessManifestWrites()` (i.e, think about this as a chain of 3 functions related to MANIFEST update), these 2 functions also got refactored to require `dir_contains_current_file`
- Updated the non-test-path callers of these 3 functions to obtain and pass in non-nullptr `dir_contains_current_file`, which by current assumption of RocksDB, is the `FSDirectory* db_dir`.
    - `db_impl` path will obtain `DBImpl::directories_.getDbDir()` while others with no access to such `directories_` are obtained on the fly by creating such object `FileSystem::NewDirectory(..)` and manage it by unique pointers to ensure short life time.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/10573
Test Plan:
- `make check`
- Passed the repro db_stress command
- For future improvement, since we currently don't assert dir containing CURRENT to be non-nullptr due to https://github.com/facebook/rocksdb/pull/10573#pullrequestreview-1087698899, there is still chances that future developers mistakenly pass down nullptr dir containing CURRENT thus resulting skipped sync dir and cause the bug again. Therefore a smarter test (e.g, such as quoted from ajkr  "(make) unsynced data loss to be dropping files corresponding to unsynced directory entries") is still needed.
Reviewed By: ajkr
Differential Revision: D39005886
Pulled By: hx235
fbshipit-source-id: 336fb9090d0cfa6ca3dd580db86268007dde7f5a | 3 years ago | 
|  Levi Tamasi | 7818560194 | Add a dedicated cache entry role for blobs (#10601) Summary: The patch adds a dedicated cache entry role for blob values and switches to a registered deleter so that blobs show up as a separate bucket (as opposed to "Misc") in the cache occupancy statistics, e.g. ``` Block cache entry stats(count,size,portion): DataBlock(133515,531.73 MB,13.6866%) BlobValue(1824855,3.10 GB,81.7071%) Misc(1,0.00 KB,0%) ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/10601 Test Plan: Ran `make check` and tested the cache occupancy statistics using `db_bench`. Reviewed By: riversand963 Differential Revision: D39107915 Pulled By: ltamasi fbshipit-source-id: 8446c3b190a41a144030df73f318eeda4398c125 | 3 years ago | 
|  anand76 | 72a3fb3424 | Update statistics for async scan readaheads (#10585) Summary: Imported a fix to "rocksdb.prefetched.bytes.discarded" stat from https://github.com/facebook/rocksdb/issues/10561, and added a new stat "rocksdb.async.prefetch.abort.micros" to measure time spent waiting for async reads to abort. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10585 Reviewed By: akankshamahajan15 Differential Revision: D39067000 Pulled By: anand1976 fbshipit-source-id: d7cda71abb48017239bd5fd832345a16c7024faf | 3 years ago | 
|  Yanqin Jin | 3613d862ba | print value when verification fails (#10587) Summary: When verification fails for db_stress, print more information about value read from the db and expected state. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10587 Test Plan: make check ./db_stress Reviewed By: akankshamahajan15, hx235 Differential Revision: D39078511 Pulled By: riversand963 fbshipit-source-id: 77ac8ffae01fc3a9b58a02c2e7bbe141e1a18f0b | 3 years ago | 
|  Peter Dillinger | c5afbbfe4b | Don't wait for indirect flush in read-only DB (#10569) Summary: Some APIs for getting live files, which are used by Checkpoint and BackupEngine, can optionally trigger and wait for a flush. These would deadlock when used on a read-only DB. Here we fix that by assuming the user wants the overall operation to succeed and is OK without flushing (because the DB is read-only). Follow-up work: the same or other issues can be hit by directly invoking some DB functions that are clearly not appropriate for read-only instance, but are not covered by overrides in DBImplReadOnly and CompactedDBImpl. These should be fixed to avoid similar problems on accidental misuse. (Long term, it would be nice to have a DBReadOnly class without those members, like BackupEngineReadOnly.) Pull Request resolved: https://github.com/facebook/rocksdb/pull/10569 Test Plan: tests updated to catch regression (hang before the fix) Reviewed By: riversand963 Differential Revision: D38995759 Pulled By: pdillinger fbshipit-source-id: f5f8bc7123e13cb45bd393dd974d7d6eda20bc68 | 3 years ago | 
|  Changyu Bi | 5532b462c4 | Verify Iterator/Get() against expected state in only `no_batched_ops_test` (#10590) Summary:
https://github.com/facebook/rocksdb/issues/10538 added `TestIterateAgainstExpected()` in `no_batched_ops_test` to verify iterator correctness against the in memory expected state. It is not compatible when run after some other stress tests, e.g. `TestPut()` in `batched_op_stress`, that either do not set expected state when writing to DB or use keys that cannot be parsed by `GetIntVal()`. The assert [here]( | 3 years ago | 
|  Levi Tamasi | 64e74723f7 | Use the default metadata charge policy when creating an LRU cache via the Java API (#10577) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10577 Reviewed By: akankshamahajan15 Differential Revision: D39035884 Pulled By: ltamasi fbshipit-source-id: 48f116f8ca172b7eb5eb3651f39ddb891a7ffade | 3 years ago | 
|  Andrew Hutchings | ce529a4ce1 | Fix FreeBSD building (#10575) Summary: FreeBSD doesn't have `JEMALLOC_USABLE_SIZE_CONST` so we need to define it. This fixes MariaDB MDEV-20248. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10575 Reviewed By: jay-zhuang Differential Revision: D39057665 Pulled By: ajkr fbshipit-source-id: 3874779d12a1dd5036324947f6372e6ad57a7b08 | 3 years ago | 
|  zhangenming | d17be55aab | Make header more natural. (#10580) Summary: Fixed #10381 for blog's navigation bar UI. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10580 Reviewed By: hx235 Differential Revision: D39079045 Pulled By: cbi42 fbshipit-source-id: 922cf2624f201c0af42815b23d97361fc0151d93 | 3 years ago | 
|  Levi Tamasi | 23376aa576 | Improve the accounting of memory used by cached blobs (#10583) Summary: The patch improves the bookkeeping around the memory usage of cached blobs in two ways: 1) it uses `malloc_usable_size`, which accounts for allocator bin sizes etc., and 2) it also considers the memory usage of the `BlobContents` object in addition to the blob itself. Note: some unit tests had been relying on the cache charge being equal to the size of the cached blob; these were updated. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10583 Test Plan: `make check` Reviewed By: riversand963 Differential Revision: D39060680 Pulled By: ltamasi fbshipit-source-id: 3583adce2b4ce6e84861f3fadccbfd2e5a3cc482 | 3 years ago | 
|  bilyz | 7670fdd690 | fix trace_analyzer_tool args column position (#10576) Summary: The column meaning explanation is not correct according to the parsed human-readable trace file. Following are the results data from parsed trace human-readable file format. The key is in the first column. ``` 0x00000005 6 1 0 1661317998095439 0x00000007 0 1 0 1661317998095479 0x00000008 6 1 0 1661317998095493 0x0000000300000001 1 1 6 1661317998101508 0x0000000300000000 1 1 6 1661317998101508 0x0000000300000001 0 1 0 1661317998106486 0x0000000300000000 0 1 0 1661317998106498 0x0000000A 6 1 0 1661317998106515 0x00000007 0 1 0 1661317998111887 0x00000001 6 1 0 1661317998111923 ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/10576 Reviewed By: ajkr Differential Revision: D39039110 Pulled By: jay-zhuang fbshipit-source-id: eade6394c7870005b717846af09a848be6f677ce | 3 years ago | 
|  Jay Zhuang | d9e71fb2c5 | Fix periodic_task unable to re-register the same task type (#10379) Summary: Timer has a limitation that it cannot re-register a task with the same name, because the cancel only mark the task as invalid and wait for the Timer thread to clean it up later, before the task is cleaned up, the same task name cannot be added. Which makes the task option update likely to fail, which basically cancel and re-register the same task name. Change the periodic task name to a random unique id and store it in periodic_task_scheduler. Also refactor the `periodic_work` to `periodic_task` to make each job function as a `task`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10379 Test Plan: unittests Reviewed By: ajkr Differential Revision: D38000615 Pulled By: jay-zhuang fbshipit-source-id: e4135f9422e3b53aaec8eda54f4e18ce633a279e | 3 years ago | 
|  Levi Tamasi | 3f57d84af4 | Introduce a dedicated class to represent blob values (#10571) Summary: The patch introduces a new class called `BlobContents`, which represents a single uncompressed blob value. We currently use `std::string` for this purpose; `BlobContents` is somewhat smaller but the primary reason for a dedicated class is that it enables certain improvements and optimizations like eliding a copy when inserting a blob into the cache, using custom allocators, or more control over and better accounting of the memory usage of cached blobs (see https://github.com/facebook/rocksdb/issues/10484). (We plan to implement these in subsequent PRs.) Pull Request resolved: https://github.com/facebook/rocksdb/pull/10571 Test Plan: `make check` Reviewed By: riversand963 Differential Revision: D39000965 Pulled By: ltamasi fbshipit-source-id: f296eddf9dec4fc3e11cad525b462bdf63c78f96 | 3 years ago | 
|  Brendan MacDonell | 418b36a9bc | Support CompactionPri::kRoundRobin in RocksJava (#10572) Summary: Pretty trivial — this PR just adds the new compaction priority to the Java API. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10572 Reviewed By: hx235 Differential Revision: D39006523 Pulled By: ajkr fbshipit-source-id: ea8d665817e7b05826c397afa41c3abcda81484e | 3 years ago | 
|  Brendan MacDonell | 9f290a5d15 | Update the javadoc for setforceConsistencyChecks (#10574) Summary: As of v6.14 (released in 2020), force_consistency_checks is enabled by default. However, the Java documentation does not seem to have been updated to reflect the change at the time. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10574 Reviewed By: hx235 Differential Revision: D39006566 Pulled By: ajkr fbshipit-source-id: c7b029484d62deaa1f260ec55084049fe39eb84a | 3 years ago | 
|  Andrew Kryczka | 7ad4b38617 | Ensure writes to WAL tail during `FlushWAL(true /* sync */)` will be synced (#10560) Summary: WAL append and switch can both happen between `FlushWAL(true /* sync */)`'s sync operations and its call to `MarkLogsSynced()`. We permit this since locks need to be released for the sync operations. Such an appended/switched WAL is both inactive and incompletely synced at the time `MarkLogsSynced()` processes it. Prior to this PR, `MarkLogsSynced()` assumed all inactive WALs were fully synced and removed them from consideration for future syncs. That was wrong in the scenario described above and led to the latest append(s) never being synced. This PR changes `MarkLogsSynced()` to only remove inactive WALs from consideration for which all flushed data has been synced. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10560 Test Plan: repro unit test for the scenario described above. Without this PR, it fails on "key2" not found Reviewed By: riversand963 Differential Revision: D38957391 Pulled By: ajkr fbshipit-source-id: da77175eba97ff251a4219b227b3bb2d4843ed26 | 3 years ago | 
|  Alan Paxton | 7fbee01f0c | CI benchmarks refine configuration (#10514) Summary: CI benchmarks refine configuration Run only “essential” benchmarks, but for longer Fix (reduce) the NUM_KEYS to ensure cached behaviour Reduce level size to try to ensure more levels Refine test durations again, more time per test, but fewer tests. In CI benchmark mode, the only read test is readrandom. There are still 3 mostly-read tests. Goal is to squeeze complete run a little bit inside 1 hour so it doesn’t clash with the next run (cron scheduled for main branch), but it gets to run as long as possible, so that results are as credible as possible. Reduce thread count to physical capacity, in an attempt to reduce throughput variance for write heavy tests. See Mark Callaghan’s comments in related documentation.. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10514 Reviewed By: ajkr Differential Revision: D38952469 Pulled By: jay-zhuang fbshipit-source-id: 72fa6bba897cc47066ced65facd1fd36e28f30a8 | 3 years ago | 
|  Andrew Kryczka | d95e376368 | Disable db_stress features incompatible with unsynced data dropping when sync_fault_injection=1 (#10559) Summary: The features that cannot work with disable_wal=1 due to unsynced data dropping (ingest_external_file_one_in and enable_compaction_filter) similarly cannot work with sync_fault_injection=1. This PR prevents those features from being used together with sync_fault_injection=1. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10559 Reviewed By: hx235 Differential Revision: D38953019 Pulled By: ajkr fbshipit-source-id: 7e2c7644ec84d7323f632cf976bcee00502d0ed7 | 3 years ago | 
|  Changyu Bi | d140fbfd7d | Add Iterator test against expected state to stress test (#10538) Summary: As mentioned in https://github.com/facebook/rocksdb/pull/5506#issuecomment-506021913, `db_stress` does not have much verification for iterator correctness. It has a `TestIterate()` function, but that is mainly for comparing results between two iterators, one with `total_order_seek` and the other optionally sets auto_prefix, upper/lower bounds. Commit 49a0581ad2462e31aa3f768afa769e0d33390f33 added a new `TestIterateAgainstExpected()` function that compares iterator against expected state. It locks a range of keys, creates an iterator, does a random sequence of `Next/Prev` and compares against expected state. This PR is based on that commit, the main changes include some logs (for easier debugging if a test fails), a forward and backward scan to cover the entire locked key range, and a flag for optionally turning on this version of Iterator testing. Added constraint that the checks against expected state in `TestIterateAgainstExpected()` and in `TestGet()` are only turned on when `--skip_verifydb` flag is not set. Remove the change log introduced in https://github.com/facebook/rocksdb/issues/10553. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10538 Test Plan: Run `db_stress` with `--verify_iterator_with_expected_state_one_in=1`, and a large `--iterpercent` and `--num_iterations`. Checked `op_logs` manually to ensure expected coverage. Tweaked part of the code in https://github.com/facebook/rocksdb/issues/10449 and stress test was able to catch it. - internally run various flavor of crash test Reviewed By: ajkr Differential Revision: D38847269 Pulled By: cbi42 fbshipit-source-id: 8b4402a9bba9f6cfa08051943cd672579d489599 | 3 years ago | 
|  muthukrishnan.s | 79ed4be80f | Add get_name, get_id for column family handle in C API (#10499) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10499 Reviewed By: hx235 Differential Revision: D38523859 Pulled By: ajkr fbshipit-source-id: 268bba1fcce4a3e20c51e498a79d7b476f663aea | 3 years ago | 
|  Levi Tamasi | 78bbdef530 | Fix a typo in BlobSecondaryCacheTest (#10566) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10566 Test Plan: `make check` Reviewed By: riversand963 Differential Revision: D38989926 Pulled By: ltamasi fbshipit-source-id: 6402635fe745e4e7eb3083ef9ad9f04c0177d762 | 3 years ago | 
|  sdong | 4915f89513 | WritableFileWriter to allow operation after failure when SyncWithoutFlush() is involved (#10555) Summary: https://github.com/facebook/rocksdb/pull/10489 adds an assertion in most functions in WritableFileWriter to check no previous error. However, it only works without calling SyncWithoutFlush(). The nature of SyncWithoutFlush() makes two concurrent call fails to check status code of each other and causing assertion failure. Fix the problem by skipping the check after SyncWithoutFlush() is called and not check status code in SyncWithoutFlush(). Since the original change was not officially released yet, the fix isn't added to HISTORY.md. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10555 Test Plan: Make sure existing tests still pass Reviewed By: anand1976 Differential Revision: D38946208 fbshipit-source-id: 63566732d3f25c8a8342840499cf7b7d745f27c2 | 3 years ago | 
|  Changyu Bi | 198e5d8ee9 | Update `TestGet()` to verify against expected state (#10553) Summary: updated `TestGet()` in `no_batched_op_stress` to check the result of `Get()` operations against expected state (`expected_state_manager_`). More specifically, if `Get()` finds a key, expected state should not have `DELETION_SENTINEL` for the same key, and if `Get()` returns NotFound for a key, expected state should not have the key. One intention for this change it to verify correctness of code path change regarding range tombstones. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10553 Test Plan: run db_stress with nonzero readpercent: `./db_stress_branch --readpercent=57 --prefixpercent=4 --writepercent=25 -delpercent=5 --iterpercent=5 --delrangepercent=4`. When I initially used wrong column family in `thread->shared->Get`, the test reported inconsistencies. Reviewed By: ajkr Differential Revision: D38927007 Pulled By: cbi42 fbshipit-source-id: f9f61b312ad0b4c21a799329609ba8526169b048 | 3 years ago | 
|  Mohamed Issa | cbe2c6d2d2 | Remove unnecessary append to PLUGINS variable in top-level CMakeLists.txt (#10494) Summary: The PLUGINS variable already contains a semicolon separated list of plugins to compile, so there is no need to append the space separated list in ROCKSDB_PLUGINS passed in as compile argument. Removing this unnecessary append now allows CMake based compiles for two or more plugins at a time. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10494 Reviewed By: hx235 Differential Revision: D38482094 Pulled By: ajkr fbshipit-source-id: 61565f7cae2717e70a92132c972b25692ce6f0e8 | 3 years ago | 
|  muthukrishnan.s | 616f3bd02e | Add grocksdb in Go language bindings (#10498) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10498 Reviewed By: hx235 Differential Revision: D38523574 Pulled By: ajkr fbshipit-source-id: 4df46fe3bfe49335a278594dfe6fd887879e71ec | 3 years ago | 
|  lhsoft | 38bf569ee7 | Fix build error with NIOSTATS_CONTEXT (#10506) Summary: Fix https://github.com/facebook/rocksdb/issues/10475 Pull Request resolved: https://github.com/facebook/rocksdb/pull/10506 Reviewed By: hx235 Differential Revision: D38549337 Pulled By: ajkr fbshipit-source-id: fba864fba1b584c41419ca6015d5d62051539812 | 3 years ago | 
|  EdvardD | 6e93d24935 | Expose set_checksum function to C api (#10537) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10537 Reviewed By: hx235 Differential Revision: D38797662 Pulled By: ajkr fbshipit-source-id: a8db723c3eb9d5592cd78f8be7e442e4826686ad | 3 years ago | 
|  Ryan Mack | 06f73d2575 | Fix autovector::emplace_back return type for C++17 (#10542) Summary: C++17 changes emplace_back API to return the new object. Needed to compile rocksdb on recent compilers. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10542 Reviewed By: hx235 Differential Revision: D38896019 Pulled By: ajkr fbshipit-source-id: cd7ddf34c0dcd449ecedc41e89a37b3a270a5603 | 3 years ago | 
|  Chen Lixiang | 9593fd1c82 | Fix wrong compression type and options in universal compaction picker (#10515) Summary: In UniversalCompactionBuilder::PickCompactionToReduceSortedRuns, we passed start_level to get compression type and options. I think that is wrong and we should use output_level instead. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10515 Reviewed By: hx235 Differential Revision: D38611335 Pulled By: ajkr fbshipit-source-id: bb860caed4b6c6bbde8f75fc50cf875a9f04723d | 3 years ago | 
|  Peter Dillinger | db7606a41a | Fix "Behavior Changes" in 7.6 HISTORY.md (#10557) Summary: see diff Pull Request resolved: https://github.com/facebook/rocksdb/pull/10557 Test Plan: no functional change Reviewed By: gitbw95 Differential Revision: D38950531 Pulled By: pdillinger fbshipit-source-id: af72e80a31d7df38f6e633fa7115984c2274ed60 | 3 years ago | 
|  Changyu Bi | 7b9e970042 | Optionally issue `DeleteRange` in `*whilewriting` benchmarks (#10552) Summary: Optionally issue DeleteRange in `*whilewriting` benchmarks. This happens in `BGWriter` and uses similar logic as in `DoWrite` to issue DeleteRange operations. I added this when I was benchmarking https://github.com/facebook/rocksdb/issues/10547, but this should be an independent PR. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10552 Test Plan: ran some benchmarks with various delete range options, e.g. `./db_bench --benchmarks=readwhilewriting --writes_per_range_tombstone=100 --writes=200000 --reads=1000000 --disable_auto_compactions --max_num_range_tombstones=10000` Reviewed By: ajkr Differential Revision: D38927020 Pulled By: cbi42 fbshipit-source-id: 31ee20cb8127f7173f0816ea0cc2a204ec02aad6 | 3 years ago | 
|  Hui Xiao | b16655a547 | Add missing synchronization in TestFSWritableFile (#10544) Summary: **Context:** ajkr's command revealed an existing TSAN data race between `TestFSWritableFile::Append` and `TestFSWritableFile::Sync` on `TestFSWritableFile::state_` ``` $ make clean && COMPILE_WITH_TSAN=1 make -j56 db_stress $ python3 tools/db_crashtest.py blackbox --simple --duration=3600 --interval=10 --sync_fault_injection=1 --disable_wal=0 --max_key=10000 --checkpoint_one_in=1000 ``` The race is due to concurrent access from [checkpoint's WAL sync](https://github.com/facebook/rocksdb/blob/7.4.fb/utilities/fault_injection_fs.cc#L324) and [db put's WAL write when ‘sync_fault_injection=1 ‘](https://github.com/facebook/rocksdb/blob/7.4.fb/utilities/fault_injection_fs.cc#L208) to the `state_` on the same WAL `TestFSWritableFile` under the missing synchronization. ``` WARNING: ThreadSanitizer: data race (pid=11275) Write of size 8 at 0x7b480003d850 by thread T23 (mutexes: write M69230): #0 rocksdb::TestFSWritableFile::Sync(rocksdb::IOOptions const&, rocksdb::IODebugContext*) internal_repo_rocksdb/repo/utilities/fault_injection_fs.cc:297 (db_stress+0x716004) https://github.com/facebook/rocksdb/issues/1 rocksdb::(anonymous namespace)::CompositeWritableFileWrapper::Sync() internal_repo_rocksdb/repo/env/composite_env.cc:154 (db_stress+0x4dfa78) https://github.com/facebook/rocksdb/issues/2 rocksdb::(anonymous namespace)::LegacyWritableFileWrapper::Sync(rocksdb::IOOptions const&, rocksdb::IODebugContext*) internal_repo_rocksdb/repo/env/env.cc:280 (db_stress+0x6dfd24) https://github.com/facebook/rocksdb/issues/3 rocksdb::WritableFileWriter::SyncInternal(bool) internal_repo_rocksdb/repo/file/writable_file_writer.cc:460 (db_stress+0xa1b98c) https://github.com/facebook/rocksdb/issues/4 rocksdb::WritableFileWriter::SyncWithoutFlush(bool) internal_repo_rocksdb/repo/file/writable_file_writer.cc:435 (db_stress+0xa1e441) https://github.com/facebook/rocksdb/issues/5 rocksdb::DBImpl::SyncWAL() internal_repo_rocksdb/repo/db/db_impl/db_impl.cc:1385 (db_stress+0x529458) https://github.com/facebook/rocksdb/issues/6 rocksdb::DBImpl::FlushWAL(bool) internal_repo_rocksdb/repo/db/db_impl/db_impl.cc:1339 (db_stress+0x54f82a) https://github.com/facebook/rocksdb/issues/7 rocksdb::DBImpl::GetLiveFilesStorageInfo(rocksdb::LiveFilesStorageInfoOptions const&, std::vector<rocksdb::LiveFileStorageInfo, std::allocator<rocksdb::LiveFileStorageInfo> >*) internal_repo_rocksdb/repo/db/db_filesnapshot.cc:387 (db_stress+0x5c831d) https://github.com/facebook/rocksdb/issues/8 rocksdb::CheckpointImpl::CreateCustomCheckpoint(std::function<rocksdb::Status (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rocksdb::FileType)>, std::function<rocksdb::Status (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long, rocksdb::FileType, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rocksdb::Temperature)>, std::function<rocksdb::Status (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rocksdb::FileType)>, unsigned long*, unsigned long, bool) internal_repo_rocksdb/repo/utilities/checkpoint/checkpoint_impl.cc:214 (db_stress+0x4c0343) https://github.com/facebook/rocksdb/issues/9 rocksdb::CheckpointImpl::CreateCheckpoint(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long, unsigned long*) internal_repo_rocksdb/repo/utilities/checkpoint/checkpoint_impl.cc:123 (db_stress+0x4c237e) https://github.com/facebook/rocksdb/issues/10 rocksdb::StressTest::TestCheckpoint(rocksdb::ThreadState*, std::vector<int, std::allocator<int> > const&, std::vector<long, std::allocator<long> > const&) internal_repo_rocksdb/repo/db_stress_tool/db_stress_test_base.cc:1699 (db_stress+0x328340) https://github.com/facebook/rocksdb/issues/11 rocksdb::StressTest::OperateDb(rocksdb::ThreadState*) internal_repo_rocksdb/repo/db_stress_tool/db_stress_test_base.cc:825 (db_stress+0x33921f) https://github.com/facebook/rocksdb/issues/12 rocksdb::ThreadBody(void*) internal_repo_rocksdb/repo/db_stress_tool/db_stress_driver.cc:33 (db_stress+0x354857) https://github.com/facebook/rocksdb/issues/13 rocksdb::(anonymous namespace)::StartThreadWrapper(void*) internal_repo_rocksdb/repo/env/env_posix.cc:447 (db_stress+0x6eb2ad) Previous read of size 8 at 0x7b480003d850 by thread T64 (mutexes: write M980798978697532600, write M253744503184415024, write M1262): #0 memcpy <null> (db_stress+0xbc9696) https://github.com/facebook/rocksdb/issues/1 operator= internal_repo_rocksdb/repo/utilities/fault_injection_fs.h:35 (db_stress+0x70d5f1) https://github.com/facebook/rocksdb/issues/2 rocksdb::FaultInjectionTestFS::WritableFileAppended(rocksdb::FSFileState const&) internal_repo_rocksdb/repo/utilities/fault_injection_fs.cc:827 (db_stress+0x70d5f1) https://github.com/facebook/rocksdb/issues/3 rocksdb::TestFSWritableFile::Append(rocksdb::Slice const&, rocksdb::IOOptions const&, rocksdb::IODebugContext*) internal_repo_rocksdb/repo/utilities/fault_injection_fs.cc:173 (db_stress+0x7143af) https://github.com/facebook/rocksdb/issues/4 rocksdb::(anonymous namespace)::CompositeWritableFileWrapper::Append(rocksdb::Slice const&) internal_repo_rocksdb/repo/env/composite_env.cc:115 (db_stress+0x4de3ab) https://github.com/facebook/rocksdb/issues/5 rocksdb::(anonymous namespace)::LegacyWritableFileWrapper::Append(rocksdb::Slice const&, rocksdb::IOOptions const&, rocksdb::IODebugContext*) internal_repo_rocksdb/repo/env/env.cc:248 (db_stress+0x6df44b) https://github.com/facebook/rocksdb/issues/6 rocksdb::WritableFileWriter::WriteBuffered(char const*, unsigned long, rocksdb::Env::IOPriority) internal_repo_rocksdb/repo/file/writable_file_writer.cc:551 (db_stress+0xa1a953) https://github.com/facebook/rocksdb/issues/7 rocksdb::WritableFileWriter::Flush(rocksdb::Env::IOPriority) internal_repo_rocksdb/repo/file/writable_file_writer.cc:327 (db_stress+0xa16ee8) https://github.com/facebook/rocksdb/issues/8 rocksdb::log::Writer::AddRecord(rocksdb::Slice const&, rocksdb::Env::IOPriority) internal_repo_rocksdb/repo/db/log_writer.cc:147 (db_stress+0x7f121f) https://github.com/facebook/rocksdb/issues/9 rocksdb::DBImpl::WriteToWAL(rocksdb::WriteBatch const&, rocksdb::log::Writer*, unsigned long*, unsigned long*, rocksdb::Env::IOPriority, rocksdb::DBImpl::LogFileNumberSize&) internal_repo_rocksdb/repo/db/db_impl/db_impl_write.cc:1285 (db_stress+0x695042) https://github.com/facebook/rocksdb/issues/10 rocksdb::DBImpl::WriteToWAL(rocksdb::WriteThread::WriteGroup const&, rocksdb::log::Writer*, unsigned long*, bool, bool, unsigned long, rocksdb::DBImpl::LogFileNumberSize&) internal_repo_rocksdb/repo/db/db_impl/db_impl_write.cc:1328 (db_stress+0x6907e8) https://github.com/facebook/rocksdb/issues/11 rocksdb::DBImpl::PipelinedWriteImpl(rocksdb::WriteOptions const&, rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned long, bool, unsigned long*) internal_repo_rocksdb/repo/db/db_impl/db_impl_write.cc:731 (db_stress+0x68e8a7) https://github.com/facebook/rocksdb/issues/12 rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned long, bool, unsigned long*, unsigned long, rocksdb::PreReleaseCallback*, rocksdb::PostMemTableCallback*) internal_repo_rocksdb/repo/db/db_impl/db_impl_write.cc:283 (db_stress+0x688370) https://github.com/facebook/rocksdb/issues/13 rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, rocksdb::WriteBatch*) internal_repo_rocksdb/repo/db/db_impl/db_impl_write.cc:126 (db_stress+0x69a7b5) https://github.com/facebook/rocksdb/issues/14 rocksdb::DB::Put(rocksdb::WriteOptions const&, rocksdb::ColumnFamilyHandle*, rocksdb::Slice const&, rocksdb::Slice const&, rocksdb::Slice const&) internal_repo_rocksdb/repo/db/db_impl/db_impl_write.cc:2247 (db_stress+0x698634) https://github.com/facebook/rocksdb/issues/15 rocksdb::DBImpl::Put(rocksdb::WriteOptions const&, rocksdb::ColumnFamilyHandle*, rocksdb::Slice const&, rocksdb::Slice const&, rocksdb::Slice const&) internal_repo_rocksdb/repo/db/db_impl/db_impl_write.cc:37 (db_stress+0x699868) https://github.com/facebook/rocksdb/issues/16 rocksdb::NonBatchedOpsStressTest::TestPut(rocksdb::ThreadState*, rocksdb::WriteOptions&, rocksdb::ReadOptions const&, std::vector<int, std::allocator<int> > const&, std::vector<long, std::allocator<long> > const&, char (&) [100], std::unique_ptr<rocksdb::MutexLock, std::default_delete<rocksdb::MutexLock> >&) internal_repo_rocksdb/repo/db_stress_tool/no_batched_ops_stress.cc:681 (db_stress+0x38d20c) https://github.com/facebook/rocksdb/issues/17 rocksdb::StressTest::OperateDb(rocksdb::ThreadState*) internal_repo_rocksdb/repo/db_stress_tool/db_stress_test_base.cc:897 (db_stress+0x3399ec) https://github.com/facebook/rocksdb/issues/18 rocksdb::ThreadBody(void*) internal_repo_rocksdb/repo/db_stress_tool/db_stress_driver.cc:33 (db_stress+0x354857) https://github.com/facebook/rocksdb/issues/19 rocksdb::(anonymous namespace)::StartThreadWrapper(void*) internal_repo_rocksdb/repo/env/env_posix.cc:447 (db_stress+0x6eb2ad) Location is heap block of size 352 at 0x7b480003d800 allocated by thread T23: #0 operator new(unsigned long) <null> (db_stress+0xb685dc) https://github.com/facebook/rocksdb/issues/1 rocksdb::FaultInjectionTestFS::NewWritableFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rocksdb::FileOptions const&, std::unique_ptr<rocksdb::FSWritableFile, std::default_delete<rocksdb::FSWritableFile> >*, rocksdb::IODebugContext*) internal_repo_rocksdb/repo/utilities/fault_injection_fs.cc:506 (db_stress+0x711192) https://github.com/facebook/rocksdb/issues/2 rocksdb::CompositeEnv::NewWritableFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unique_ptr<rocksdb::WritableFile, std::default_delete<rocksdb::WritableFile> >*, rocksdb::EnvOptions const&) internal_repo_rocksdb/repo/env/composite_env.cc:329 (db_stress+0x4d33fa) https://github.com/facebook/rocksdb/issues/3 rocksdb::EnvWrapper::NewWritableFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unique_ptr<rocksdb::WritableFile, std::default_delete<rocksdb::WritableFile> >*, rocksdb::EnvOptions const&) internal_repo_rocksdb/repo/include/rocksdb/env.h:1425 (db_stress+0x300662) ... ``` **Summary:** - Added the missing lock in functions mentioned above along with three other functions with a similar need in TestFSWritableFile - Added clarification comment Pull Request resolved: https://github.com/facebook/rocksdb/pull/10544 Test Plan: - Past the above race condition repro Reviewed By: ajkr Differential Revision: D38886634 Pulled By: hx235 fbshipit-source-id: 0571bae9615f35b16fbd8168204607e306b1b486 | 3 years ago | 
|  Bo Wang | b0048b673c | Post 7.6 branch cut changes (#10546) Summary: After branch 7.6.fb branch is cut, following release process, upgrade version number to 7.7 and add 7.6.fb to format compatibility check. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10546 Test Plan: Watch CI Reviewed By: ajkr Differential Revision: D38892023 Pulled By: gitbw95 fbshipit-source-id: 94e96dedbd973f5f9713e73d3bed336e4678565b | 3 years ago | 
|  gitbw95 | a9c2c7778d | Update HISTORY.md for the upcoming 7.6 release (#10543) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10543 Reviewed By: anand1976 Differential Revision: D38877168 Pulled By: gitbw95 fbshipit-source-id: d6888f7dbb1f2a5bef144ad2443429a61663c1e8 | 3 years ago | 
|  anand76 | 35cdd3e71e | MultiGet async IO across multiple levels (#10535) Summary: This PR exploits parallelism in MultiGet across levels. It applies only to the coroutine version of MultiGet. Previously, MultiGet file reads from SST files in the same level were parallelized. With this PR, MultiGet batches with keys distributed across multiple levels are read in parallel. This is accomplished by splitting the keys not present in a level (determined by bloom filtering) into a separate batch, and processing the new batch in parallel with the original batch. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10535 Test Plan: 1. Ensure existing MultiGet unit tests pass, updating them as necessary 2. New unit tests - TODO 3. Run stress test - TODO No noticeable regression (<1%) without async IO - Without PR: `multireadrandom : 7.261 micros/op 1101724 ops/sec 60.007 seconds 66110936 operations; 571.6 MB/s (8168992 of 8168992 found)` With PR: `multireadrandom : 7.305 micros/op 1095167 ops/sec 60.007 seconds 65717936 operations; 568.2 MB/s (8271992 of 8271992 found)` For a fully cached DB, but with async IO option on, no regression observed (<1%) - Without PR: `multireadrandom : 5.201 micros/op 1538027 ops/sec 60.005 seconds 92288936 operations; 797.9 MB/s (11540992 of 11540992 found) ` With PR: `multireadrandom : 5.249 micros/op 1524097 ops/sec 60.005 seconds 91452936 operations; 790.7 MB/s (11649992 of 11649992 found) ` Reviewed By: akankshamahajan15 Differential Revision: D38774009 Pulled By: anand1976 fbshipit-source-id: c955e259749f1c091590ade73105b3ee46cd0007 | 3 years ago | 
|  Levi Tamasi | 81388b36e0 | Add support for wide-column point lookups (#10540) Summary: The patch adds a new API `GetEntity` that can be used to perform wide-column point lookups. It also extends the `Get` code path and the `MemTable` / `MemTableList` and `Version` / `GetContext` logic accordingly so that wide-column entities can be served from both memtables and SSTs. If the result of a lookup is a wide-column entity (`kTypeWideColumnEntity`), it is passed to the application in deserialized form; if it is a plain old key-value (`kTypeValue`), it is presented as a wide-column entity with a single default (anonymous) column. (In contrast, regular `Get` returns plain old key-values as-is, and returns the value of the default column for wide-column entities, see https://github.com/facebook/rocksdb/issues/10483 .) The result of `GetEntity` is a self-contained `PinnableWideColumns` object. `PinnableWideColumns` contains a `PinnableSlice`, which either stores the underlying data in its own buffer or holds on to a cache handle. It also contains a `WideColumns` instance, which indexes the contents of the `PinnableSlice`, so applications can access the values of columns efficiently. There are several pieces of functionality which are currently not supported for wide-column entities: there is currently no `MultiGetEntity` or wide-column iterator; also, `Merge` and `GetMergeOperands` are not supported, and there is no `GetEntity` implementation for read-only and secondary instances. We plan to implement these in future PRs. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10540 Test Plan: `make check` Reviewed By: akankshamahajan15 Differential Revision: D38847474 Pulled By: ltamasi fbshipit-source-id: 42311a34ccdfe88b3775e847a5e2a5296e002b5b | 3 years ago | 
|  anand76 | 2553d1efa1 | Revert "Avoid dynamic memory allocation on read path (#10453)" (#10541) Summary:
This reverts commit  | 3 years ago | 
|  Bo Wang | 13cb7a84b6 | Fix the memory leak in db_stress tests that are caused by `FaultInjectionSecondaryCache` and add `CompressedSecondaryCache` into stress tests. (#10523) Summary:
1. Fix the memory leak in db_stress tests that are caused by `FaultInjectionSecondaryCache`. To address the test requirements for both CompressedSecondaryCache and CachlibWrapper, a new class variable `base_is_compressed_sec_cache_` is added to determine the different behaviors in `Lookup()` and `WaitAll()`.
2. Add `CompressedSecondaryCache` into stress tests.
Before this PR, memory leak is reported during crash tests if  `CompressedSecondaryCache` is in stress tests. One example is shown as follows:
```
==70722==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 6648240 byte(s) in 83103 object(s) allocated from:
    #0 0x13de9d7 in operator new(unsigned long) (/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/fbcode/buck-out/dbgo/gen/aab7ed39/internal_repo_rocksdb/repo/db_stress+0x13de9d7)
    https://github.com/facebook/rocksdb/issues/1 0x9084c7 in rocksdb::BlocklikeTraits<rocksdb::Block>::Create(rocksdb::BlockContents&&, unsigned long, rocksdb::Statistics*, bool, rocksdb::FilterPolicy const*) internal_repo_rocksdb/repo/table/block_based/block_like_traits.h:128
    https://github.com/facebook/rocksdb/issues/2 0x9084c7 in std::function<rocksdb::Status (void const*, unsigned long, void**, unsigned long*)> rocksdb::GetCreateCallback<rocksdb::Block>(unsigned long, rocksdb::Statistics*, bool, rocksdb::FilterPolicy const*)::'lambda'(void const*, unsigned long, void**, unsigned long*)::operator()(void const*, unsigned long, void**, unsigned long*) const internal_repo_rocksdb/repo/table/block_based/block_like_traits.h:34
    https://github.com/facebook/rocksdb/issues/3 0x9082c9 in rocksdb::Block std::__invoke_impl<rocksdb::Status, std::function<rocksdb::Status (void const*, unsigned long, void**, unsigned long*)> rocksdb::GetCreateCallback<rocksdb::Block>(unsigned long, rocksdb::Statistics*, bool, rocksdb::FilterPolicy const*)::'lambda'(void const*, unsigned long, void**, unsigned long*)&, void const*, unsigned long, void**, unsigned long*>(std::__invoke_other, std::function<rocksdb::Status (void const*, unsigned long, void**, unsigned long*)> rocksdb::GetCreateCallback<rocksdb::Block>(unsigned long, rocksdb::Statistics*, bool, rocksdb::FilterPolicy const*)::'lambda'(void const*, unsigned long, void**, unsigned long*)&, void const*&&, unsigned long&&, void**&&, unsigned long*&&) third-party-buck/platform010/build/libgcc/include/c++/trunk/bits/invoke.h:61
    https://github.com/facebook/rocksdb/issues/4 0x90825d in std::enable_if<is_invocable_r_v<rocksdb::Block, std::function<rocksdb::Status (void const*, unsigned long, void**, unsigned long*)> rocksdb::GetCreateCallback<rocksdb::Block>(unsigned long, rocksdb::Statistics*, bool, rocksdb::FilterPolicy const*)::'lambda'(void const*, unsigned long, void**, unsigned long*)&, void const*, unsigned long, void**, unsigned long*>, rocksdb::Block>::type std::__invoke_r<rocksdb::Status, std::function<rocksdb::Status (void const*, unsigned long, void**, unsigned long*)> rocksdb::GetCreateCallback<rocksdb::Block>(unsigned long, rocksdb::Statistics*, bool, rocksdb::FilterPolicy const*)::'lambda'(void const*, unsigned long, void**, unsigned long*)&, void const*, unsigned long, void**, unsigned long*>(std::function<rocksdb::Status (void const*, unsigned long, void**, unsigned long*)> rocksdb::GetCreateCallback<rocksdb::Block>(unsigned long, rocksdb::Statistics*, bool, rocksdb::FilterPolicy const*)::'lambda'(void const*, unsigned long, void**, unsigned long*)&, void const*&&, unsigned long&&, void**&&, unsigned long*&&) third-party-buck/platform010/build/libgcc/include/c++/trunk/bits/invoke.h:114
    https://github.com/facebook/rocksdb/issues/5 0x9081b0 in std::_Function_handler<rocksdb::Status (void const*, unsigned long, void**, unsigned long*), std::function<rocksdb::Status (void const*, unsigned long, void**, unsigned long*)> rocksdb::GetCreateCallback<rocksdb::Block>(unsigned long, rocksdb::Statistics*, bool, rocksdb::FilterPolicy const*)::'lambda'(void const*, unsigned long, void**, unsigned long*)>::_M_invoke(std::_Any_data const&, void const*&&, unsigned long&&, void**&&, unsigned long*&&) third-party-buck/platform010/build/libgcc/include/c++/trunk/bits/std_function.h:291
    https://github.com/facebook/rocksdb/issues/6 0x991f2c in std::function<rocksdb::Status (void const*, unsigned long, void**, unsigned long*)>::operator()(void const*, unsigned long, void**, unsigned long*) const third-party-buck/platform010/build/libgcc/include/c++/trunk/bits/std_function.h:560
    https://github.com/facebook/rocksdb/issues/7 0x990277 in rocksdb::CompressedSecondaryCache::Lookup(rocksdb::Slice const&, std::function<rocksdb::Status (void const*, unsigned long, void**, unsigned long*)> const&, bool, bool&) internal_repo_rocksdb/repo/cache/compressed_secondary_cache.cc:77
    https://github.com/facebook/rocksdb/issues/8 0xd3aa4d in rocksdb::FaultInjectionSecondaryCache::Lookup(rocksdb::Slice const&, std::function<rocksdb::Status (void const*, unsigned long, void**, unsigned long*)> const&, bool, bool&) internal_repo_rocksdb/repo/utilities/fault_injection_secondary_cache.cc:92
    https://github.com/facebook/rocksdb/issues/9 0xeadaab in rocksdb::lru_cache::LRUCacheShard::Lookup(rocksdb::Slice const&, unsigned int, rocksdb::Cache::CacheItemHelper const*, std::function<rocksdb::Status (void const*, unsigned long, void**, unsigned long*)> const&, rocksdb::Cache::Priority, bool, rocksdb::Statistics*) internal_repo_rocksdb/repo/cache/lru_cache.cc:445
    https://github.com/facebook/rocksdb/issues/10 0x1064573 in rocksdb::ShardedCache::Lookup(rocksdb::Slice const&, rocksdb::Cache::CacheItemHelper const*, std::function<rocksdb::Status (void const*, unsigned long, void**, unsigned long*)> const&, rocksdb::Cache::Priority, bool, rocksdb::Statistics*) internal_repo_rocksdb/repo/cache/sharded_cache.cc:89
    https://github.com/facebook/rocksdb/issues/11 0x8be0df in rocksdb::BlockBasedTable::GetEntryFromCache(rocksdb::CacheTier const&, rocksdb::Cache*, rocksdb::Slice const&, rocksdb::BlockType, bool, rocksdb::GetContext*, rocksdb::Cache::CacheItemHelper const*, std::function<rocksdb::Status (void const*, unsigned long, void**, unsigned long*)> const&, rocksdb::Cache::Priority) const internal_repo_rocksdb/repo/table/block_based/block_based_table_reader.cc:389
    https://github.com/facebook/rocksdb/issues/12 0x905790 in rocksdb::Status rocksdb::BlockBasedTable::GetDataBlockFromCache<rocksdb::Block>(rocksdb::Slice const&, rocksdb::Cache*, rocksdb::Cache*, rocksdb::ReadOptions const&, rocksdb::CachableEntry<rocksdb::Block>*, rocksdb::UncompressionDict const&, rocksdb::BlockType, bool, rocksdb::GetContext*) const internal_repo_rocksdb/repo/table/block_based/block_based_table_reader.cc:1263
    https://github.com/facebook/rocksdb/issues/13 0x8b9259 in rocksdb::Status rocksdb::BlockBasedTable::MaybeReadBlockAndLoadToCache<rocksdb::Block>(rocksdb::FilePrefetchBuffer*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::UncompressionDict const&, bool, bool, rocksdb::CachableEntry<rocksdb::Block>*, rocksdb::BlockType, rocksdb::GetContext*, rocksdb::BlockCacheLookupContext*, rocksdb::BlockContents*, bool) const internal_repo_rocksdb/repo/table/block_based/block_based_table_reader.cc:1559
    https://github.com/facebook/rocksdb/issues/14 0x8b710c in rocksdb::Status rocksdb::BlockBasedTable::RetrieveBlock<rocksdb::Block>(rocksdb::FilePrefetchBuffer*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::UncompressionDict const&, rocksdb::CachableEntry<rocksdb::Block>*, rocksdb::BlockType, rocksdb::GetContext*, rocksdb::BlockCacheLookupContext*, bool, bool, bool, bool) const internal_repo_rocksdb/repo/table/block_based/block_based_table_reader.cc:1726
    https://github.com/facebook/rocksdb/issues/15 0x8c329f in rocksdb::DataBlockIter* rocksdb::BlockBasedTable::NewDataBlockIterator<rocksdb::DataBlockIter>(rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::DataBlockIter*, rocksdb::BlockType, rocksdb::GetContext*, rocksdb::BlockCacheLookupContext*, rocksdb::FilePrefetchBuffer*, bool, bool, rocksdb::Status&) const internal_repo_rocksdb/repo/table/block_based/block_based_table_reader_impl.h:58
    https://github.com/facebook/rocksdb/issues/16 0x920117 in rocksdb::BlockBasedTableIterator::InitDataBlock() internal_repo_rocksdb/repo/table/block_based/block_based_table_iterator.cc:262
    https://github.com/facebook/rocksdb/issues/17 0x920d42 in rocksdb::BlockBasedTableIterator::MaterializeCurrentBlock() internal_repo_rocksdb/repo/table/block_based/block_based_table_iterator.cc:332
    https://github.com/facebook/rocksdb/issues/18 0xc6a201 in rocksdb::IteratorWrapperBase<rocksdb::Slice>::PrepareValue() internal_repo_rocksdb/repo/table/iterator_wrapper.h:78
    https://github.com/facebook/rocksdb/issues/19 0xc6a201 in rocksdb::IteratorWrapperBase<rocksdb::Slice>::PrepareValue() internal_repo_rocksdb/repo/table/iterator_wrapper.h:78
    https://github.com/facebook/rocksdb/issues/20 0xef9f6c in rocksdb::MergingIterator::PrepareValue() internal_repo_rocksdb/repo/table/merging_iterator.cc:260
    https://github.com/facebook/rocksdb/issues/21 0xc6a201 in rocksdb::IteratorWrapperBase<rocksdb::Slice>::PrepareValue() internal_repo_rocksdb/repo/table/iterator_wrapper.h:78
    https://github.com/facebook/rocksdb/issues/22 0xc67bcd in rocksdb::DBIter::FindNextUserEntryInternal(bool, rocksdb::Slice const*) internal_repo_rocksdb/repo/db/db_iter.cc:326
    https://github.com/facebook/rocksdb/issues/23 0xc66d36 in rocksdb::DBIter::FindNextUserEntry(bool, rocksdb::Slice const*) internal_repo_rocksdb/repo/db/db_iter.cc:234
    https://github.com/facebook/rocksdb/issues/24 0xc7ab47 in rocksdb::DBIter::Next() internal_repo_rocksdb/repo/db/db_iter.cc:161
    https://github.com/facebook/rocksdb/issues/25 0x70d938 in rocksdb::BatchedOpsStressTest::TestPrefixScan(rocksdb::ThreadState*, rocksdb::ReadOptions const&, std::vector<int, std::allocator<int> > const&, std::vector<long, std::allocator<long> > const&) internal_repo_rocksdb/repo/db_stress_tool/batched_ops_stress.cc:320
    https://github.com/facebook/rocksdb/issues/26 0x6dc6a8 in rocksdb::StressTest::OperateDb(rocksdb::ThreadState*) internal_repo_rocksdb/repo/db_stress_tool/db_stress_test_base.cc:907
    https://github.com/facebook/rocksdb/issues/27 0x6867de in rocksdb::ThreadBody(void*) internal_repo_rocksdb/repo/db_stress_tool/db_stress_driver.cc:33
    https://github.com/facebook/rocksdb/issues/28 0xce4cc2 in rocksdb::(anonymous namespace)::StartThreadWrapper(void*) internal_repo_rocksdb/repo/env/env_posix.cc:461
    https://github.com/facebook/rocksdb/issues/29 0x7f23f9068c0e in start_thread /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/nptl/pthread_create.c:434:8
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/10523
Test Plan:
```
$COMPILE_WITH_ASAN=1  make -j 24
$db_stress J=40 crash_test_with_txn
```
Reviewed By: anand1976
Differential Revision: D38646839
Pulled By: gitbw95
fbshipit-source-id: 9452895c7dc95481a9d7afe83b15193cf5b1c43e | 3 years ago | 
|  Akanksha Mahajan | 5956ef0089 | Add initial_auto_readahead_size and max_auto_readahead_size to db_bench (#10539) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10539 Reviewed By: anand1976 Differential Revision: D38837111 Pulled By: akankshamahajan15 fbshipit-source-id: eb845c6e15a3c823ff6113395817388ff15a20b1 | 3 years ago | 
|  Andrew Kryczka | 91166012c8 | Prevent a case of WriteBufferManager flush thrashing (#6364) Summary: Previously, the flushes triggered by `WriteBufferManager` could affect the same CF repeatedly if it happens to get consecutive writes. Such flushes are not particularly useful for reducing memory usage since they switch nearly-empty memtables to immutable while they've just begun filling their first arena block. In fact they may not even reduce the mutable memory count if they involve replacing one mutable memtable containing one arena block with a new mutable memtable containing one arena block. Further, if such switches happen even a few times before a flush finishes, the immutable memtable limit will be reached and writes will stall. This PR adds a heuristic to not switch memtables to immutable for CFs that already have one or more immutable memtables awaiting flush. There is a memory usage regression if the user continues writing to the same CF, that DB does not have any CFs eligible for switching, flushes are not finishing, and the `WriteBufferManager` was constructed with `allow_stall=false`. Before, it would grow by switching nearly empty memtables until writes stall. Now, it would grow by filling memtables until writes stall. This feels like an acceptable behavior change because users who prefer to stall over violate the memory limit should be using `allow_stall=true`, which is unaffected by this PR. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6364 Test Plan: - Command: `rm -rf /dev/shm/dbbench/ && TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=fillrandom -num_multi_db=8 -num_column_families=2 -write_buffer_size=4194304 -db_write_buffer_size=16777216 -compression_type=none -statistics=true -target_file_size_base=4194304 -max_bytes_for_level_base=16777216` - `rocksdb.db.write.stall` count before this PR: 175 - `rocksdb.db.write.stall` count after this PR: 0 Reviewed By: jay-zhuang Differential Revision: D20167197 Pulled By: ajkr fbshipit-source-id: 4a64064e9bc33d57c0a35f15547542d0191d0cb7 | 3 years ago | 
|  anand76 | 65814a4ae6 | Fix range deletion handling in async MultiGet (#10534) Summary: The fix in https://github.com/facebook/rocksdb/issues/10513 was not complete w.r.t range deletion handling. It didn't handle the case where a file with a range tombstone covering a key also overlapped another key in the batch. In that case, ```mget_range``` would be non-empty. However, ```mget_range``` would only have the second key and, therefore, the first key would be skipped when iterating through the range tombstones in ```TableCache::MultiGet```. Test plan - 1. Add a unit test 2. Run stress tests Pull Request resolved: https://github.com/facebook/rocksdb/pull/10534 Reviewed By: akankshamahajan15 Differential Revision: D38773880 Pulled By: anand1976 fbshipit-source-id: dae491dbe52e18bbce5179b77b63f20771a66c00 | 3 years ago | 
|  Gang Liao | 275cd80cdb | Add a blob-specific cache priority (#10461) Summary: RocksDB's `Cache` abstraction currently supports two priority levels for items: high (used for frequently accessed/highly valuable SST metablocks like index/filter blocks) and low (used for SST data blocks). Blobs are typically lower-value targets for caching than data blocks, since 1) with BlobDB, data blocks containing blob references conceptually form an index structure which has to be consulted before we can read the blob value, and 2) cached blobs represent only a single key-value, while cached data blocks generally contain multiple KVs. Since we would like to make it possible to use the same backing cache for the block cache and the blob cache, it would make sense to add a new, lower-than-low cache priority level (bottom level) for blobs so data blocks are prioritized over them. This task is a part of https://github.com/facebook/rocksdb/issues/10156 Pull Request resolved: https://github.com/facebook/rocksdb/pull/10461 Reviewed By: siying Differential Revision: D38672823 Pulled By: ltamasi fbshipit-source-id: 90cf7362036563d79891f47be2cc24b827482743 | 3 years ago | 
|  sdong | bc575c614c | Fix two extra headers (#10525) Summary: Fix copyright for two more extra headers to make internal tool happy. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10525 Reviewed By: jay-zhuang Differential Revision: D38661390 fbshipit-source-id: ab2d055bfd145dfe82b5bae7a6c25cc338c8de94 | 3 years ago | 
|  Changyu Bi | fd165c869d | Add memtable per key-value checksum (#10281) Summary: Append per key-value checksum to internal key. These checksums are verified on read paths including Get, Iterator and during Flush. Get and Iterator will return `Corruption` status if there is a checksum verification failure. Flush will make DB become read-only upon memtable entry checksum verification failure. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10281 Test Plan: - Added new unit test cases: `make check` - Benchmark on memtable insert ``` TEST_TMPDIR=/dev/shm/memtable_write ./db_bench -benchmarks=fillseq -disable_wal=true -max_write_buffer_number=100 -num=10000000 -min_write_buffer_number_to_merge=100 # avg over 10 runs Baseline: 1166936 ops/sec memtable 2 bytes kv checksum : 1.11674e+06 ops/sec (-4%) memtable 2 bytes kv checksum + write batch 8 bytes kv checksum: 1.08579e+06 ops/sec (-6.95%) write batch 8 bytes kv checksum: 1.17979e+06 ops/sec (+1.1%) ``` - Benchmark on only memtable read: ops/sec dropped 31% for `readseq` due to time spend on verifying checksum. ops/sec for `readrandom` dropped ~6.8%. ``` # Readseq sudo TEST_TMPDIR=/dev/shm/memtable_read ./db_bench -benchmarks=fillseq,readseq"[-X20]" -disable_wal=true -max_write_buffer_number=100 -num=10000000 -min_write_buffer_number_to_merge=100 readseq [AVG 20 runs] : 7432840 (± 212005) ops/sec; 822.3 (± 23.5) MB/sec readseq [MEDIAN 20 runs] : 7573878 ops/sec; 837.9 MB/sec With -memtable_protection_bytes_per_key=2: readseq [AVG 20 runs] : 5134607 (± 119596) ops/sec; 568.0 (± 13.2) MB/sec readseq [MEDIAN 20 runs] : 5232946 ops/sec; 578.9 MB/sec # Readrandom sudo TEST_TMPDIR=/dev/shm/memtable_read ./db_bench -benchmarks=fillrandom,readrandom"[-X10]" -disable_wal=true -max_write_buffer_number=100 -num=1000000 -min_write_buffer_number_to_merge=100 readrandom [AVG 10 runs] : 140236 (± 3938) ops/sec; 9.8 (± 0.3) MB/sec readrandom [MEDIAN 10 runs] : 140545 ops/sec; 9.8 MB/sec With -memtable_protection_bytes_per_key=2: readrandom [AVG 10 runs] : 130632 (± 2738) ops/sec; 9.1 (± 0.2) MB/sec readrandom [MEDIAN 10 runs] : 130341 ops/sec; 9.1 MB/sec ``` - Stress test: `python3 -u tools/db_crashtest.py whitebox --duration=1800` Reviewed By: ajkr Differential Revision: D37607896 Pulled By: cbi42 fbshipit-source-id: fdaefb475629d2471780d4a5f5bf81b44ee56113 | 3 years ago | 
|  Peter Dillinger | 86a1e3e0e7 | Derive cache keys from SST unique IDs (#10394) Summary: ... so that cache keys can be derived from DB manifest data before reading the file from storage--so that every part of the file can potentially go in a persistent cache. See updated comments in cache_key.cc for technical details. Importantly, the new cache key encoding uses some fancy but efficient math to pack data into the cache key without depending on the sizes of the various pieces. This simplifies some existing code creating cache keys, like cache warming before the file size is known. This should provide us an essentially permanent mapping between SST unique IDs and base cache keys, with the ability to "upgrade" SST unique IDs (and thus cache keys) with new SST format_versions. These cache keys are of similar, perhaps indistinguishable quality to the previous generation. Before this change (see "corrected" days between collision): ``` ./cache_bench -stress_cache_key -sck_keep_bits=43 18 collisions after 2 x 90 days, est 10 days between (1.15292e+19 corrected) ``` After this change (keep 43 bits, up through 50, to validate "trajectory" is ok on "corrected" days between collision): ``` 19 collisions after 3 x 90 days, est 14.2105 days between (1.63836e+19 corrected) 16 collisions after 5 x 90 days, est 28.125 days between (1.6213e+19 corrected) 15 collisions after 7 x 90 days, est 42 days between (1.21057e+19 corrected) 15 collisions after 17 x 90 days, est 102 days between (1.46997e+19 corrected) 15 collisions after 49 x 90 days, est 294 days between (2.11849e+19 corrected) 15 collisions after 62 x 90 days, est 372 days between (1.34027e+19 corrected) 15 collisions after 53 x 90 days, est 318 days between (5.72858e+18 corrected) 15 collisions after 309 x 90 days, est 1854 days between (1.66994e+19 corrected) ``` However, the change does modify (probably weaken) the "guaranteed unique" promise from this > SST files generated in a single process are guaranteed to have unique cache keys, unless/until number session ids * max file number = 2**86 to this (see https://github.com/facebook/rocksdb/issues/10388) > With the DB id limitation, we only have nice guaranteed unique cache keys for files generated in a single process until biggest session_id_counter and offset_in_file reach combined 64 bits I don't think this is a practical concern, though. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10394 Test Plan: unit tests updated, see simulation results above Reviewed By: jay-zhuang Differential Revision: D38667529 Pulled By: pdillinger fbshipit-source-id: 49af3fe7f47e5b61162809a78b76c769fd519fba | 3 years ago |