rocksdb

fork of https://github.com/oxigraph/rocksdb and https://github.com/facebook/rocksdb for nextgraph and oxigraph

History

Andrew Kryczka 82b81dc8b5 Simplify GenericRateLimiter algorithm (#8602 ) Summary: `GenericRateLimiter` slow path handles requests that cannot be satisfied immediately. Such requests enter a queue, and their thread stays in `Request()` until they are granted or the rate limiter is stopped. These threads are responsible for unblocking themselves. The work to do so is split into two main duties. (1) Waiting for the next refill time. (2) Refilling the bytes and granting requests. Prior to this PR, the slow path logic involved a leader election algorithm to pick one thread to perform (1) followed by (2). It elected the thread whose request was at the front of the highest priority non-empty queue since that request was most likely to be granted. This algorithm was efficient in terms of reducing intermediate wakeups, which is a thread waking up only to resume waiting after finding its request is not granted. However, the conceptual complexity of this algorithm was too high. It took me a long time to draw a timeline to understand how it works for just one edge case yet there were so many. This PR drops the leader election to reduce conceptual complexity. Now, the two duties can be performed by whichever thread acquires the lock first. The risk of this change is increasing the number of intermediate wakeups, however, we took steps to mitigate that. - `wait_until_refill_pending_` flag ensures only one thread performs (1). This\ prevents the thundering herd problem at the next refill time. The remaining\ threads wait on their condition variable with an unbounded duration -- thus we\ must remember to notify them to ensure forward progress. - (1) is typically done by a thread at the front of a queue. This is trivial\ when the queues are initially empty as the first choice that arrives must be\ the only entry in its queue. When queues are initially non-empty, we achieve\ this by having (2) notify a thread at the front of a queue (preferring higher\ priority) to perform the next duty. - We do not require any additional wakeup for (2). Typically it will just be\ done by the thread that finished (1). Combined, the second and third bullet points above suggest the refill/granting will typically be done by a request at the front of its queue. This is important because one wakeup is saved when a granted request happens to be in an already running thread. Note there are a few cases that still lead to intermediate wakeup, however. The first two are existing issues that also apply to the old algorithm, however, the third (including both subpoints) is new. - No request may be granted (only possible when rate limit dynamically\ decreases). - Requests from a different queue may be granted. - (2) may be run by a non-front request thread causing it to not be granted even\ if some requests in that same queue are granted. It can happen for a couple\ (unlikely) reasons. - A new request may sneak in and grab the lock at the refill time, before the\ thread finishing (1) can wake up and grab it. - A new request may sneak in and grab the lock and execute (1) before (2)'s\ chosen candidate can wake up and grab the lock. Then that non-front request\ thread performing (1) can carry over to perform (2). Pull Request resolved: https://github.com/facebook/rocksdb/pull/8602 Test Plan: - Use existing tests. The edge cases listed in the comment are all performance\ related; I could not really think of any related to correctness. The logic\ looks the same whether a thread wakes up/finishes its work early/on-time/late,\ or whether the thread is chosen vs. "steals" the work. - Verified write throughput and CPU overhead are basically the same with and\ without this change, even in a rate limiter heavy workload: Test command: ``` $ rm -rf /dev/shm/dbbench/ && TEST_TMPDIR=/dev/shm /usr/bin/time ./db_bench -benchmarks=fillrandom -num_multi_db=64 -num_low_pri_threads=64 -num_high_pri_threads=64 -write_buffer_size=262144 -target_file_size_base=262144 -max_bytes_for_level_base=1048576 -rate_limiter_bytes_per_sec=16777216 -key_size=24 -value_size=1000 -num=10000 -compression_type=none -rate_limiter_refill_period_us=1000 ``` Results before this PR: ``` fillrandom : 108.463 micros/op 9219 ops/sec; 9.0 MB/s 7.40user 8.84system 1:26.20elapsed 18%CPU (0avgtext+0avgdata 256140maxresident)k ``` Results after this PR: ``` fillrandom : 108.108 micros/op 9250 ops/sec; 9.0 MB/s 7.45user 8.23system 1:26.68elapsed 18%CPU (0avgtext+0avgdata 255688maxresident)k ``` Reviewed By: hx235 Differential Revision: D30048013 Pulled By: ajkr fbshipit-source-id: 6741bba9d9dfbccab359806d725105817fef818b		4 years ago
..
aligned_buffer.h	Fix wrong comments about function TruncateToPageBoundary. (#6975 )	5 years ago
autovector.h	Change autovector to have a reserved size in LITE mode (#6868 )	5 years ago
autovector_test.cc	Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433 )	6 years ago
bloom_impl.h	Ribbon: InterleavedSolutionStorage (#7598 )	5 years ago
bloom_test.cc	fix several MSVC build errors (#8519 )	4 years ago
build_version.cc.in	Make builds reproducible (#7866 )	5 years ago
cast_util.h	Replace reinterpret_cast with static_cast_with_check (#7067 )	5 years ago
channel.h	Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433 )	6 years ago
coding.cc	Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433 )	6 years ago
coding.h	Refine Ribbon configuration, improve testing, add Homogeneous (#7879 )	5 years ago
coding_lean.h	Refine Ribbon configuration, improve testing, add Homogeneous (#7879 )	5 years ago
coding_test.cc	Fix potential overflow of unsigned type in for loop (#6902 )	5 years ago
compaction_job_stats_impl.cc	Update compaction statistics to include the amount of data read from blob files (#8022 )	5 years ago
comparator.cc	Add customizable_util.h to the public API (#8301 )	4 years ago
compression.h	rocksdb: don't call LZ4_loadDictHC with null dictionary	4 years ago
compression_context_cache.cc	Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433 )	6 years ago
compression_context_cache.h	Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433 )	6 years ago
concurrent_task_limiter_impl.cc	Remove TaskLimiterToken::ReleaseOnce for fix (#8567 )	4 years ago
concurrent_task_limiter_impl.h	Remove TaskLimiterToken::ReleaseOnce for fix (#8567 )	4 years ago
core_local.h	Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433 )	6 years ago
crc32c.cc	Using existing crc32c checksum in checksum handoff for Manifest and WAL (#8412 )	4 years ago
crc32c.h	Implementation of Crc32c combine function (#8305 )	4 years ago
crc32c_arm64.cc	Mac M1 crc32 intrinsics ARM64 check support proposal (#7893 )	5 years ago
crc32c_arm64.h	Fix compilation on Apple Silicon (#7714 )	5 years ago
crc32c_ppc.c	Fix Compilation on ppc64le using Clang 11 (#7713 )	5 years ago
crc32c_ppc.h	Fix Compilation on ppc64le using Clang 11 (#7713 )	5 years ago
crc32c_ppc_asm.S	Fix Compilation on ppc64le using Clang 11 (#7713 )	5 years ago
crc32c_ppc_constants.h	Remove PATENTS text from a few straggler files (#5326 )	7 years ago
crc32c_test.cc	Implementation of Crc32c combine function (#8305 )	4 years ago
defer.h	Fix insecure internal API for GetImpl (#8590 )	4 years ago
defer_test.cc	Fix insecure internal API for GetImpl (#8590 )	4 years ago
duplicate_detector.h	Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433 )	6 years ago
dynamic_bloom.cc	Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433 )	6 years ago
dynamic_bloom.h	Genericize and clean up FastRange (#7436 )	5 years ago
dynamic_bloom_test.cc	Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033 )	5 years ago
fastrange.h	Genericize and clean up FastRange (#7436 )	5 years ago
file_checksum_helper.cc	Refactor with VersionEditHandler (#6581 )	5 years ago
file_checksum_helper.h	Real fix for race in backup custom checksum checking (#7309 )	5 years ago
file_reader_writer_test.cc	Fix a minor issue with initializing the test path (#8555 )	4 years ago
filelock_test.cc	Fix MSVC-related build issues (#7439 )	5 years ago
filter_bench.cc	Rename variables in ImmutableCFOptions to avoid conflicts with ImmutableDBOptions (#8227 )	5 years ago
gflags_compat.h	Fix many tests to run with MEM_ENV and ENCRYPTED_ENV; Introduce a MemoryFileSystem class (#7566 )	5 years ago
hash.cc	Integrity protection for live updates to WriteBatch (#7748 )	5 years ago
hash.h	Integrity protection for live updates to WriteBatch (#7748 )	5 years ago
hash_map.h	Change HashMap::Insert()'s value to a const reference (#6567 )	6 years ago
hash_test.cc	Use NPHash64 in more places (#7632 )	5 years ago
heap.h	Avoid self-move-assign in pop operation of binary heap. (#7942 )	5 years ago
heap_test.cc	Revert "Update googletest from 1.8.1 to 1.10.0 (#6808 )" (#6923 )	5 years ago
kv_map.h	Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433 )	6 years ago
log_write_bench.cc	Add a SystemClock class to capture the time functions of an Env (#7858 )	5 years ago
math.h	Fix MSVC-related build issues (#7439 )	5 years ago
math128.h	Refine Ribbon configuration, improve testing, add Homogeneous (#7879 )	5 years ago
murmurhash.cc	C++20 compatibility (#6697 )	6 years ago
murmurhash.h	Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433 )	6 years ago
mutexlock.h	Prevents Table Cache to open same files more times (#6707 )	6 years ago
ppc-opcode.h	Remove PATENTS text from a few straggler files (#5326 )	7 years ago
random.cc	Add De/Serialization for CompactionInput/Result (#8247 )	5 years ago
random.h	Add De/Serialization for CompactionInput/Result (#8247 )	5 years ago
random_test.cc	Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433 )	6 years ago
rate_limiter.cc	Simplify GenericRateLimiter algorithm (#8602 )	4 years ago
rate_limiter.h	Simplify GenericRateLimiter algorithm (#8602 )	4 years ago
rate_limiter_test.cc	Simplify GenericRateLimiter algorithm (#8602 )	4 years ago
repeatable_thread.h	Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033 )	5 years ago
repeatable_thread_test.cc	Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033 )	5 years ago
ribbon_alg.h	Refine Ribbon configuration, improve testing, add Homogeneous (#7879 )	5 years ago
ribbon_config.cc	Refine Ribbon configuration, improve testing, add Homogeneous (#7879 )	5 years ago
ribbon_config.h	Refine Ribbon configuration, improve testing, add Homogeneous (#7879 )	5 years ago
ribbon_impl.h	Refine Ribbon configuration, improve testing, add Homogeneous (#7879 )	5 years ago
ribbon_test.cc	Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033 )	5 years ago
set_comparator.h	Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433 )	6 years ago
slice.cc	Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433 )	6 years ago
slice_test.cc	Handoff checksum Implementation (#7523 )	5 years ago
slice_transform_test.cc	Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433 )	6 years ago
status.cc	Add remote compaction public API (#8300 )	4 years ago
stderr_logger.h	Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433 )	6 years ago
stop_watch.h	Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033 )	5 years ago
string_util.cc	Add remote compaction public API (#8300 )	4 years ago
string_util.h	Add remote compaction public API (#8300 )	4 years ago
thread_guard.h	Introduce a ThreadGuard class and use it in ExternalSSTFileTest.PickedLevelBug (#8112 )	5 years ago
thread_list_test.cc	fix thread status synchronization in thread_list_test (#7825 )	5 years ago
thread_local.cc	Fix typo in ThreadData comment (#7131 )	5 years ago
thread_local.h	Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433 )	6 years ago
thread_local_test.cc	Add StartThread type checking wrapper (#8303 )	4 years ago
thread_operation.h	Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433 )	6 years ago
threadpool_imp.cc	Prevent joining detached thread in ThreadPoolImpl (#8635 )	4 years ago
threadpool_imp.h	Make it able to lower cpu priority to specific level in threadpool (#6969 )	5 years ago
timer.h	Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033 )	5 years ago
timer_queue.h	Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433 )	6 years ago
timer_queue_test.cc	Change RocksDB License	8 years ago
timer_test.cc	Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033 )	5 years ago
user_comparator_wrapper.h	Enable backward iterator for keys with user-defined timestamp (#8035 )	5 years ago
vector_iterator.h	Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433 )	6 years ago
work_queue.h	Revamp cache_bench to resemble a real workload (#6629 )	6 years ago
work_queue_test.cc	Add pipelined & parallel compression optimization (#6262 )	6 years ago
xxh3p.h	Fix MSVC-related build issues (#7439 )	5 years ago
xxhash.cc	Remove unused includes (#7604 )	5 years ago
xxhash.h	Misc hashing updates / upgrades (#5909 )	6 years ago