rocksdb

fork of https://github.com/oxigraph/rocksdb and https://github.com/facebook/rocksdb for nextgraph and oxigraph

History

Andrew Kryczka 504fe4de80 Avoid allocations/copies for large `GetMergeOperands()` results (#10458 ) Summary: This PR avoids allocations and copies for the result of `GetMergeOperands()` when the average operand size is at least 256 bytes and the total operands size is at least 32KB. The `GetMergeOperands()` already included `PinnableSlice` but was calling `PinSelf()` (i.e., allocating and copying) for each operand. When this optimization takes effect, we instead call `PinSlice()` to skip that allocation and copy. Resources are pinned in order for the `PinnableSlice` to point to valid memory even after `GetMergeOperands()` returns. The pinned resources include a referenced `SuperVersion`, a `MergingContext`, and a `PinnedIteratorsManager`. They are bundled into a `GetMergeOperandsState`. We use `SharedCleanablePtr` to share that bundle among all `PinnableSlice`s populated by `GetMergeOperands()`. That way, the last `PinnableSlice` to be `Reset()` will cleanup the bundle, including unreferencing the `SuperVersion`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10458 Test Plan: - new DB level test - measured benefit/regression in a number of memtable scenarios Setup command: ``` $ ./db_bench -benchmarks=mergerandom -merge_operator=StringAppendOperator -num=$num -writes=16384 -key_size=16 -value_size=$value_sz -compression_type=none -write_buffer_size=1048576000 ``` Benchmark command: ``` ./db_bench -threads=$threads -use_existing_db=true -avoid_flush_during_recovery=true -write_buffer_size=1048576000 -benchmarks=readrandomoperands -merge_operator=StringAppendOperator -num=$num -duration=10 ``` Worst regression is when a key has many tiny operands: - Parameters: num=1 (implying 16384 operands per key), value_sz=8, threads=1 - `GetMergeOperands()` latency increases 682 micros -> 800 micros (+17%) The regression disappears into the noise (<1% difference) if we remove the `Reset()` loop and the size counting loop. The former is arguably needed regardless of this PR as the convention in `Get()` and `MultiGet()` is to `Reset()` the input `PinnableSlice`s at the start. The latter could be optimized to count the size as we accumulate operands rather than after the fact. Best improvement is when a key has large operands and high concurrency: - Parameters: num=4 (implying 4096 operands per key), value_sz=2KB, threads=32 - `GetMergeOperands()` latency decreases 11492 micros -> 437 micros (-96%). Reviewed By: cbi42 Differential Revision: D38336578 Pulled By: ajkr fbshipit-source-id: 48146d127e04cb7f2d4d2939a2b9dff3aba18258		3 years ago
..
advisor	Update branch as "main" in tools/advisor/README.md (#8744 )	4 years ago
block_cache_analyzer	Use std::numeric_limits<> (#9954 )	4 years ago
dump	Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433 )	6 years ago
CMakeLists.txt	Mark dependencies as PRIVATE and fix missing dependencies in tools. (#6790 )	6 years ago
Dockerfile	adding docker build script and dockerfile	11 years ago
analyze_txn_stress_test.sh	Add copyright headers per FB open-source checkup tool. (#5199 )	7 years ago
auto_sanity_test.sh	Add copyright headers per FB open-source checkup tool. (#5199 )	7 years ago
backup_db.sh	Revamp check_format_compatible.sh (#8012 )	5 years ago
benchmark.sh	Revert "Add a blob-specific cache priority (#10309 )" (#10434 )	3 years ago
benchmark_ci.py	Run new benchmark script in branch. (#10303 )	3 years ago
benchmark_compare.sh	Run new benchmark script in branch. (#10303 )	3 years ago
benchmark_leveldb.sh	Add copyright headers per FB open-source checkup tool. (#5199 )	7 years ago
blob_dump.cc	Remove using namespace (#9369 )	4 years ago
check_all_python.py	Allow missing "unversioned" python, as in CentOS 8 (#6883 )	5 years ago
check_format_compatible.sh	Post 7.5 branch cut changes (#10376 )	3 years ago
db_bench.cc	Add (& fix) some simple source code checks (#8821 )	4 years ago
db_bench_tool.cc	Avoid allocations/copies for large `GetMergeOperands()` results (#10458 )	3 years ago
db_bench_tool_test.cc	Support prepopulating/warming the blob cache (#10298 )	3 years ago
db_crashtest.py	Add CompressedSecondaryCache into stress test (#10442 )	3 years ago
db_repl_stress.cc	Remove using namespace (#9369 )	4 years ago
db_sanity_test.cc	Remove own ToString() (#9955 )	4 years ago
dbench_monitor	Fix /bin/bash shebangs	8 years ago
generate_random_db.sh	Add copyright headers per FB open-source checkup tool. (#5199 )	7 years ago
ingest_external_sst.sh	Add copyright headers per FB open-source checkup tool. (#5199 )	7 years ago
io_tracer_parser.cc	Add IO Tracer Parser (#7333 )	5 years ago
io_tracer_parser_test.cc	Cleanup includes in dbformat.h (#8930 )	4 years ago
io_tracer_parser_tool.cc	Add request_id in IODebugContext. (#8045 )	5 years ago
io_tracer_parser_tool.h	Add IO Tracer Parser (#7333 )	5 years ago
ldb.cc	Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433 )	6 years ago
ldb_cmd.cc	ldb to display public unique id and dump work with key range (#10417 )	3 years ago
ldb_cmd_impl.h	Support single delete in ldb (#9469 )	4 years ago
ldb_cmd_test.cc	Add blob source to retrieve blobs in RocksDB (#10198 )	3 years ago
ldb_test.py	Make it possible to enable blob files starting from a certain LSM tree level (#10077 )	3 years ago
ldb_tool.cc	Default `try_load_options` to true when DB is specified (#9937 )	4 years ago
pflag	Fix /bin/bash shebangs	8 years ago
reduce_levels_test.cc	Remove own ToString() (#9955 )	4 years ago
regression_test.sh	regression_test.sh: kill very old db_bench (and more) (#10441 )	3 years ago
restore_db.sh	Revamp check_format_compatible.sh (#8012 )	5 years ago
rocksdb_dump_test.sh	Add copyright headers per FB open-source checkup tool. (#5199 )	7 years ago
run_blob_bench.sh	Support prepopulating/warming the blob cache (#10298 )	3 years ago
run_flash_bench.sh	Add copyright headers per FB open-source checkup tool. (#5199 )	7 years ago
run_leveldb.sh	Add copyright headers per FB open-source checkup tool. (#5199 )	7 years ago
sample-dump.dmp	First version of rocksdb_dump and rocksdb_undump.	10 years ago
simulated_hybrid_file_system.cc	Improve SimulatedHybridFileSystem (#9301 )	4 years ago
simulated_hybrid_file_system.h	Improve SimulatedHybridFileSystem (#9301 )	4 years ago
sst_dump.cc	Implement a new subcommand "identify" for sst_dump (#6943 )	5 years ago
sst_dump_test.cc	Use the comparator from the sst file table properties in sst_dump_tool (#9491 )	4 years ago
sst_dump_tool.cc	Support using ZDICT_finalizeDictionary to generate zstd dictionary (#9857 )	3 years ago
trace_analyzer.cc	Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433 )	6 years ago
trace_analyzer_test.cc	Support read rate-limiting in SequentialFileReader (#9973 )	3 years ago
trace_analyzer_tool.cc	Support read rate-limiting in SequentialFileReader (#9973 )	3 years ago
trace_analyzer_tool.h	Add commit marker with timestamp (#9266 )	4 years ago
verify_random_db.sh	Fix some bugs in verify_random_db.sh (#10112 )	3 years ago
write_external_sst.sh	Revamp check_format_compatible.sh (#8012 )	5 years ago
write_stress.cc	Add a SystemClock class to capture the time functions of an Env (#7858 )	5 years ago
write_stress_runner.py	Allow missing "unversioned" python, as in CentOS 8 (#6883 )	5 years ago