rocksdb

fork of https://github.com/oxigraph/rocksdb and https://github.com/facebook/rocksdb for nextgraph and oxigraph

History

Peter Dillinger 9d0cae7104 Eliminate unnecessary (slow) block cache Ref()ing in MultiGet (#9899 ) Summary: When MultiGet() determines that multiple query keys can be served by examining the same data block in block cache (one Lookup()), each PinnableSlice referring to data in that data block needs to hold on to the block in cache so that they can be released at arbitrary times by the API user. Historically this is accomplished with extra calls to Ref() on the Handle from Lookup(), with each PinnableSlice cleanup calling Release() on the Handle, but this creates extra contention on the block cache for the extra Ref()s and Release()es, especially because they hit the same cache shard repeatedly. In the case of merge operands (possibly more cases?), the problem was compounded by doing an extra Ref()+eventual Release() for each merge operand for a key reusing a block (which could be the same key!), rather than one Ref() per key. (Note: the non-shared case with `biter` was already one per key.) This change optimizes MultiGet not to rely on these extra, contentious Ref()+Release() calls by instead, in the shared block case, wrapping the cache Release() cleanup in a refcounted object referenced by the PinnableSlices, such that after the last wrapped reference is released, the cache entry is Release()ed. Relaxed atomic refcounts should be much faster than mutex-guarded Ref() and Release(), and much less prone to a performance cliff when MultiGet() does a lot of block sharing. Note that I did not use std::shared_ptr, because that would require an extra indirection object (shared_ptr itself new/delete) in order to associate a ref increment/decrement with a Cleanable cleanup entry. (If I assumed it was the size of two pointers, I could do some hackery to make it work without the extra indirection, but that's too fragile.) Some details: * Fixed (removed) extra block cache tracing entries in cases of cache entry reuse in MultiGet, but it's likely that in some other cases traces are missing (XXX comment inserted) * Moved existing implementations for cleanable.h from iterator.cc to new cleanable.cc * Improved API comments on Cleanable * Added a public SharedCleanablePtr class to cleanable.h in case others could benefit from the same pattern (potentially many Cleanables and/or smart pointers referencing a shared Cleanable) * Add a typedef for MultiGetContext::Mask * Some variable renaming for clarity Pull Request resolved: https://github.com/facebook/rocksdb/pull/9899 Test Plan: Added unit tests for SharedCleanablePtr. Greatly enhanced ability of existing tests to detect cache use-after-free. * Release PinnableSlices from MultiGet as they are read rather than in bulk (in db_test_util wrapper). * In ASAN build, default to using a trivially small LRUCache for block_cache so that entries are immediately erased when unreferenced. (Updated two tests that depend on caching.) New ASAN testsuite running time seems OK to me. If I introduce a bug into my implementation where we skip the shared cleanups on block reuse, ASAN detects the bug in `db_basic_test MultiGet`. If I remove either of the above testing enhancements, the bug is not detected. Consider for follow-up work: manipulate or randomize ordering of PinnableSlice use and release from MultiGet db_test_util wrapper. But in typical cases, natural ordering gives pretty good functional coverage. Performance test: In the extreme (but possible) case of MultiGetting the same or adjacent keys in a batch, throughput can improve by an order of magnitude. `./db_bench -benchmarks=multireadrandom -db=/dev/shm/testdb -readonly -num=5 -duration=10 -threads=20 -multiread_batched -batch_size=200` Before ops/sec, num=5: 1,384,394 Before ops/sec, num=500: 6,423,720 After ops/sec, num=500: 10,658,794 After ops/sec, num=5: 16,027,257 Also note that previously, with high parallelism, having query keys concentrated in a single block was worse than spreading them out a bit. Now concentrated in a single block is faster than spread out, which is hopefully consistent with natural expectation. Random query performance: with num=1000000, over 999 x 10s runs running before & after simultaneously (each -threads=12): Before: multireadrandom [AVG 999 runs] : 1088699 (± 7344) ops/sec; 120.4 (± 0.8 ) MB/sec After: multireadrandom [AVG 999 runs] : 1090402 (± 7230) ops/sec; 120.6 (± 0.8 ) MB/sec Possibly better, possibly in the noise. Reviewed By: anand1976 Differential Revision: D35907003 Pulled By: pdillinger fbshipit-source-id: bbd244d703649a8ca12d476f2d03853ed9d1a17e		4 years ago
..
aligned_buffer.h	Fix wrong comments about function TruncateToPageBoundary. (#6975 )	5 years ago
autovector.h	Add manifest fix-up utility for file temperatures (#9683 )	4 years ago
autovector_test.cc	Replace most typedef with using= (#8751 )	4 years ago
bloom_impl.h	FilterPolicy API changes for 7.0 (#9501 )	4 years ago
bloom_test.cc	Account memory of big memory users in BlockBasedTable in global memory limit (#9748 )	4 years ago
build_version.cc.in	Plugin Registry (#7949 )	4 years ago
cast_util.h	More refactoring ahead of footer & meta changes (#9240 )	4 years ago
channel.h	Fix and detect headers with missing dependencies (#8893 )	4 years ago
cleanable.cc	Eliminate unnecessary (slow) block cache Ref()ing in MultiGet (#9899 )	4 years ago
coding.cc	Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433 )	6 years ago
coding.h	More refactoring ahead of footer & meta changes (#9240 )	4 years ago
coding_lean.h	New stable, fixed-length cache keys (#9126 )	4 years ago
coding_test.cc	Fix potential overflow of unsigned type in for loop (#6902 )	5 years ago
compaction_job_stats_impl.cc	Update compaction statistics to include the amount of data read from blob files (#8022 )	5 years ago
comparator.cc	Return different Status based on ObjectRegistry::NewObject calls (#9333 )	4 years ago
compression.cc	Integrate WAL compression into log reader/writer. (#9642 )	4 years ago
compression.h	Fix minimum libzstd version that supports ZSTD_STREAMING (#9841 )	4 years ago
compression_context_cache.cc	Remove using namespace (#9369 )	4 years ago
compression_context_cache.h	Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433 )	6 years ago
concurrent_task_limiter_impl.cc	Remove TaskLimiterToken::ReleaseOnce for fix (#8567 )	4 years ago
concurrent_task_limiter_impl.h	Remove TaskLimiterToken::ReleaseOnce for fix (#8567 )	4 years ago
core_local.h	Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433 )	6 years ago
crc32c.cc	Replace most typedef with using= (#8751 )	4 years ago
crc32c.h	Implementation of Crc32c combine function (#8305 )	4 years ago
crc32c_arm64.cc	Mac M1 crc32 intrinsics ARM64 check support proposal (#7893 )	5 years ago
crc32c_arm64.h	Fix compilation on Apple Silicon (#7714 )	5 years ago
crc32c_ppc.c	Fix Compilation on ppc64le using Clang 11 (#7713 )	5 years ago
crc32c_ppc.h	Fix and detect headers with missing dependencies (#8893 )	4 years ago
crc32c_ppc_asm.S	Fix Compilation on ppc64le using Clang 11 (#7713 )	5 years ago
crc32c_ppc_constants.h	Remove PATENTS text from a few straggler files (#5326 )	7 years ago
crc32c_test.cc	Implementation of Crc32c combine function (#8305 )	4 years ago
defer.h	Fix and detect headers with missing dependencies (#8893 )	4 years ago
defer_test.cc	Fix insecure internal API for GetImpl (#8590 )	4 years ago
duplicate_detector.h	Cleanup includes in dbformat.h (#8930 )	4 years ago
dynamic_bloom.cc	Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433 )	6 years ago
dynamic_bloom.h	Fix major bug with MultiGet, DeleteRange, and memtable Bloom (#9453 )	4 years ago
dynamic_bloom_test.cc	Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033 )	5 years ago
fastrange.h	Fix and detect headers with missing dependencies (#8893 )	4 years ago
file_checksum_helper.cc	Restore Regex support for ObjectLibrary::Register, rename new APIs to allow old one to be deprecated in the future (#9362 )	4 years ago
file_checksum_helper.h	New stable, fixed-length cache keys (#9126 )	4 years ago
file_reader_writer_test.cc	Disallow a combination of options (#9348 )	4 years ago
filelock_test.cc	Fix MSVC-related build issues (#7439 )	5 years ago
filter_bench.cc	Refactor FilterPolicies toward Customizable (#9567 )	4 years ago
gflags_compat.h	Require C++17 (#9481 )	4 years ago
hash.cc	Experimental support for SST unique IDs (#8990 )	4 years ago
hash.h	Experimental support for SST unique IDs (#8990 )	4 years ago
hash128.h	Upgrade xxhash, add Hash128 (#8634 )	4 years ago
hash_containers.h	Meta-internal folly integration with F14FastMap (#9546 )	4 years ago
hash_map.h	Change HashMap::Insert()'s value to a const reference (#6567 )	6 years ago
hash_test.cc	Meta-internal folly integration with F14FastMap (#9546 )	4 years ago
heap.h	Avoid self-move-assign in pop operation of binary heap. (#7942 )	5 years ago
heap_test.cc	Revert "Update googletest from 1.8.1 to 1.10.0 (#6808 )" (#6923 )	5 years ago
kv_map.h	Replace most typedef with using= (#8751 )	4 years ago
log_write_bench.cc	Add a SystemClock class to capture the time functions of an Env (#7858 )	5 years ago
math.h	Meta-internal folly integration with F14FastMap (#9546 )	4 years ago
math128.h	New stable, fixed-length cache keys (#9126 )	4 years ago
murmurhash.cc	C++20 compatibility (#6697 )	6 years ago
murmurhash.h	Replace most typedef with using= (#8751 )	4 years ago
mutexlock.h	Prevents Table Cache to open same files more times (#6707 )	6 years ago
ppc-opcode.h	Remove PATENTS text from a few straggler files (#5326 )	7 years ago
random.cc	Add De/Serialization for CompactionInput/Result (#8247 )	5 years ago
random.h	Experimental support for SST unique IDs (#8990 )	4 years ago
random_test.cc	Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433 )	6 years ago
rate_limiter.cc	remove unused instance variable in GenericRateLimiter (#9484 )	4 years ago
rate_limiter.h	remove unused instance variable in GenericRateLimiter (#9484 )	4 years ago
rate_limiter_test.cc	Make RateLimiter Customizable (#9141 )	4 years ago
repeatable_thread.h	Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033 )	5 years ago
repeatable_thread_test.cc	Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033 )	5 years ago
ribbon_alg.h	Refine Ribbon configuration, improve testing, add Homogeneous (#7879 )	5 years ago
ribbon_config.cc	Refine Ribbon configuration, improve testing, add Homogeneous (#7879 )	5 years ago
ribbon_config.h	Refine Ribbon configuration, improve testing, add Homogeneous (#7879 )	5 years ago
ribbon_impl.h	Account Bloom/Ribbon filter construction memory in global memory limit (#9073 )	4 years ago
ribbon_test.cc	Upgrade xxhash, add Hash128 (#8634 )	4 years ago
set_comparator.h	Fix and detect headers with missing dependencies (#8893 )	4 years ago
slice.cc	Fix LITE build for SliceTransform::AsString (#9460 )	4 years ago
slice_test.cc	Require C++17 (#9481 )	4 years ago
slice_transform_test.cc	Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433 )	6 years ago
status.cc	Combine data members of IOStatus with Status (#9549 )	4 years ago
stderr_logger.h	Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433 )	6 years ago
stop_watch.h	Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033 )	5 years ago
string_util.cc	Remove some unneeded code (#8736 )	4 years ago
string_util.h	Experimental support for SST unique IDs (#8990 )	4 years ago
thread_guard.h	Introduce a ThreadGuard class and use it in ExternalSSTFileTest.PickedLevelBug (#8112 )	5 years ago
thread_list_test.cc	fix thread status synchronization in thread_list_test (#7825 )	5 years ago
thread_local.cc	Fix typo in ThreadData comment (#7131 )	5 years ago
thread_local.h	Replace most typedef with using= (#8751 )	4 years ago
thread_local_test.cc	Add StartThread type checking wrapper (#8303 )	4 years ago
thread_operation.h	Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433 )	6 years ago
threadpool_imp.cc	Remove incremental ID from background thread pool names (#9165 )	4 years ago
threadpool_imp.h	Make it able to lower cpu priority to specific level in threadpool (#6969 )	5 years ago
timer.h	Fix a timer crash caused by invalid memory management (#9656 )	4 years ago
timer_queue.h	Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433 )	6 years ago
timer_queue_test.cc	Change RocksDB License	8 years ago
timer_test.cc	Fix a timer crash caused by invalid memory management (#9656 )	4 years ago
user_comparator_wrapper.h	Enable backward iterator for keys with user-defined timestamp (#8035 )	5 years ago
vector_iterator.h	Cleanup multiple implementations of VectorIterator (#8901 )	4 years ago
work_queue.h	Fix and detect headers with missing dependencies (#8893 )	4 years ago
work_queue_test.cc	Add pipelined & parallel compression optimization (#6262 )	6 years ago
xxhash.cc	Upgrade xxhash, add Hash128 (#8634 )	4 years ago
xxhash.h	Upgrade xxhash, add Hash128 (#8634 )	4 years ago
xxph3.h	Fix and detect headers with missing dependencies (#8893 )	4 years ago