Summary:
The previous API comments for LockWAL didn't provide much about why you might want to use it, and didn't really meet what one would infer its contract was. Also, LockWAL was not in db_stress / crash test. In this change:
* Implement a counting semantics for LockWAL()+UnlockWAL(), so that they can safely be used concurrently across threads or recursively within a thread. This should make the API much less bug-prone and easier to use.
* Make sure no UnlockWAL() is needed after non-OK LockWAL() (to match RocksDB conventions)
* Make UnlockWAL() reliably return non-OK when there's no matching LockWAL() (for debug-ability)
* Clarify API comments on LockWAL(), UnlockWAL(), FlushWAL(), and SyncWAL(). Their exact meanings are not obvious, and I don't think it's appropriate to talk about implementation mutexes in the API comments, but about what operations might block each other.
* Add LockWAL()/UnlockWAL() to db_stress and crash test, mostly to check for assertion failures, but also checks that latest seqno doesn't change while WAL is locked. This is simpler to add when LockWAL() is allowed in multiple threads.
* Remove unnecessary use of sync points in test DBWALTest::LockWal. There was a bug during development of above changes that caused this test to fail sporadically, with and without this sync point change.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11143
Test Plan: unit tests added / updated, added to stress/crash test
Reviewed By: ajkr
Differential Revision: D42848627
Pulled By: pdillinger
fbshipit-source-id: 6d976c51791941a31fd8fbf28b0f82e888d9f4b4
Summary:
folly DistributedMutex is faster than standard mutexes though
imposes some static obligations on usage. See
https://github.com/facebook/folly/blob/main/folly/synchronization/DistributedMutex.h
for details. Here we use this alternative for our Cache implementations
(especially LRUCache) for better locking performance, when RocksDB is
compiled with folly.
Also added information about which distributed mutex implementation is
being used to cache_bench output and to DB LOG.
Intended follow-up:
* Use DMutex in more places, perhaps improving API to support non-scoped
locking
* Fix linking with fbcode compiler (needs ROCKSDB_NO_FBCODE=1 currently)
Credit: Thanks Siying for reminding me about this line of work that was previously
left unfinished.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/10179
Test Plan:
for correctness, existing tests. CircleCI config updated.
Also Meta-internal buck build updated.
For performance, ran simultaneous before & after cache_bench. Out of three
comparison runs, the middle improvement to ops/sec was +21%:
Baseline: USE_CLANG=1 DEBUG_LEVEL=0 make -j24 cache_bench (fbcode
compiler)
```
Complete in 20.201 s; Rough parallel ops/sec = 1584062
Thread ops/sec = 107176
Operation latency (ns):
Count: 32000000 Average: 9257.9421 StdDev: 122412.04
Min: 134 Median: 3623.0493 Max: 56918500
Percentiles: P50: 3623.05 P75: 10288.02 P99: 30219.35 P99.9: 683522.04 P99.99: 7302791.63
```
New: (add USE_FOLLY=1)
```
Complete in 16.674 s; Rough parallel ops/sec = 1919135 (+21%)
Thread ops/sec = 135487
Operation latency (ns):
Count: 32000000 Average: 7304.9294 StdDev: 108530.28
Min: 132 Median: 3777.6012 Max: 91030902
Percentiles: P50: 3777.60 P75: 10169.89 P99: 24504.51 P99.9: 59721.59 P99.99: 1861151.83
```
Reviewed By: anand1976
Differential Revision: D37182983
Pulled By: pdillinger
fbshipit-source-id: a17eb05f25b832b6a2c1356f5c657e831a5af8d1
Summary:
This PR implements a coroutine version of batched MultiGet in order to concurrently read from multiple SST files in a level using async IO, thus reducing the latency of the MultiGet. The API from the user perspective is still synchronous and single threaded, with the RocksDB part of the processing happening in the context of the caller's thread. In Version::MultiGet, the decision is made whether to call synchronous or coroutine code.
A good way to review this PR is to review the first 4 commits in order - de773b3, 70c2f70, 10b50e1, and 377a597 - before reviewing the rest.
TODO:
1. Figure out how to build it in CircleCI (requires some dependencies to be installed)
2. Do some stress testing with coroutines enabled
No regression in synchronous MultiGet between this branch and main -
```
./db_bench -use_existing_db=true --db=/data/mysql/rocksdb/prefix_scan -benchmarks="readseq,multireadrandom" -key_size=32 -value_size=512 -num=5000000 -batch_size=64 -multiread_batched=true -use_direct_reads=false -duration=60 -ops_between_duration_checks=1 -readonly=true -adaptive_readahead=true -threads=16 -cache_size=10485760000 -async_io=false -multiread_stride=40000 -statistics
```
Branch - ```multireadrandom : 4.025 micros/op 3975111 ops/sec 60.001 seconds 238509056 operations; 2062.3 MB/s (14767808 of 14767808 found)```
Main - ```multireadrandom : 3.987 micros/op 4013216 ops/sec 60.001 seconds 240795392 operations; 2082.1 MB/s (15231040 of 15231040 found)```
More benchmarks in various scenarios are given below. The measurements were taken with ```async_io=false``` (no coroutines) and ```async_io=true``` (use coroutines). For an IO bound workload (with every key requiring an IO), the coroutines version shows a clear benefit, being ~2.6X faster. For CPU bound workloads, the coroutines version has ~6-15% higher CPU utilization, depending on how many keys overlap an SST file.
1. Single thread IO bound workload on remote storage with sparse MultiGet batch keys (~1 key overlap/file) -
No coroutines - ```multireadrandom : 831.774 micros/op 1202 ops/sec 60.001 seconds 72136 operations; 0.6 MB/s (72136 of 72136 found)```
Using coroutines - ```multireadrandom : 318.742 micros/op 3137 ops/sec 60.003 seconds 188248 operations; 1.6 MB/s (188248 of 188248 found)```
2. Single thread CPU bound workload (all data cached) with ~1 key overlap/file -
No coroutines - ```multireadrandom : 4.127 micros/op 242322 ops/sec 60.000 seconds 14539384 operations; 125.7 MB/s (14539384 of 14539384 found)```
Using coroutines - ```multireadrandom : 4.741 micros/op 210935 ops/sec 60.000 seconds 12656176 operations; 109.4 MB/s (12656176 of 12656176 found)```
3. Single thread CPU bound workload with ~2 key overlap/file -
No coroutines - ```multireadrandom : 3.717 micros/op 269000 ops/sec 60.000 seconds 16140024 operations; 139.6 MB/s (16140024 of 16140024 found)```
Using coroutines - ```multireadrandom : 4.146 micros/op 241204 ops/sec 60.000 seconds 14472296 operations; 125.1 MB/s (14472296 of 14472296 found)```
4. CPU bound multi-threaded (16 threads) with ~4 key overlap/file -
No coroutines - ```multireadrandom : 4.534 micros/op 3528792 ops/sec 60.000 seconds 211728728 operations; 1830.7 MB/s (12737024 of 12737024 found) ```
Using coroutines - ```multireadrandom : 4.872 micros/op 3283812 ops/sec 60.000 seconds 197030096 operations; 1703.6 MB/s (12548032 of 12548032 found) ```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9968
Reviewed By: akankshamahajan15
Differential Revision: D36348563
Pulled By: anand1976
fbshipit-source-id: c0ce85a505fd26ebfbb09786cbd7f25202038696
Summary:
The P95 and P99 metrics are flaky, similar to DBGet ones which removed
in https://github.com/facebook/rocksdb/issues/9742 .
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9844
Test Plan: `$ ./buckifier/buckify_rocksdb.py`
Reviewed By: ajkr
Differential Revision: D35655531
Pulled By: jay-zhuang
fbshipit-source-id: c1409f0fba4e23d461a65f988c27ac5e2ae85d13
Summary:
Especially after updating to C++17, I don't see a compelling case for
*requiring* any folly components in RocksDB. I was able to purge the existing
hard dependencies, and it can be quite difficult to strip out non-trivial components
from folly for use in RocksDB. (The prospect of doing that on F14 has changed
my mind on the best approach here.)
But this change creates an optional integration where we can plug in
components from folly at compile time, starting here with F14FastMap to replace
std::unordered_map when possible (probably no public APIs for example). I have
replaced the biggest CPU users of std::unordered_map with compile-time
pluggable UnorderedMap which will use F14FastMap when USE_FOLLY is set.
USE_FOLLY is always set in the Meta-internal buck build, and a simulation of
that is in the Makefile for public CI testing. A full folly build is not needed, but
checking out the full folly repo is much simpler for getting the dependency,
and anything else we might want to optionally integrate in the future.
Some picky details:
* I don't think the distributed mutex stuff is actually used, so it was easy to remove.
* I implemented an alternative to `folly::constexpr_log2` (which is much easier
in C++17 than C++11) so that I could pull out the hard dependencies on
`ConstexprMath.h`
* I had to add noexcept move constructors/operators to some types to make
F14's complainUnlessNothrowMoveAndDestroy check happy, and I added a
macro to make that easier in some common cases.
* Updated Meta-internal buck build to use folly F14Map (always)
No updates to HISTORY.md nor INSTALL.md as this is not (yet?) considered a
production integration for open source users.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9546
Test Plan:
CircleCI tests updated so that a couple of them use folly.
Most internal unit & stress/crash tests updated to use Meta-internal latest folly.
(Note: they should probably use buck but they currently use Makefile.)
Example performance improvement: when filter partitions are pinned in cache,
they are tracked by PartitionedFilterBlockReader::filter_map_ and we can build
a test that exercises that heavily. Build DB with
```
TEST_TMPDIR=/dev/shm/rocksdb ./db_bench -benchmarks=fillrandom -num=10000000 -disable_wal=1 -write_buffer_size=30000000 -bloom_bits=16 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=0 -partition_index_and_filters
```
and test with (simultaneous runs with & without folly, ~20 times each to see
convergence)
```
TEST_TMPDIR=/dev/shm/rocksdb ./db_bench_folly -readonly -use_existing_db -benchmarks=readrandom -num=10000000 -bloom_bits=16 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=0 -partition_index_and_filters -duration=40 -pin_l0_filter_and_index_blocks_in_cache
```
Average ops/s no folly: 26229.2
Average ops/s with folly: 26853.3 (+2.4%)
Reviewed By: ajkr
Differential Revision: D34181736
Pulled By: pdillinger
fbshipit-source-id: ffa6ad5104c2880321d8a1aa7187e00ab0d02e94
Summary:
This PR adds a ```-secondary_cache_uri``` option to the cache_bench and db_bench tools to allow the user to specify a custom secondary cache URI. The object registry is used to create an instance of the ```SecondaryCache``` object of the type specified in the URI.
The main cache_bench code is packaged into a separate library, similar to db_bench.
An example invocation of db_bench with a secondary cache URI -
```db_bench --env_uri=ws://ws.flash_sandbox.vll1_2/ -db=anand/nvm_cache_2 -use_existing_db=true -benchmarks=readrandom -num=30000000 -key_size=32 -value_size=256 -use_direct_reads=true -cache_size=67108864 -cache_index_and_filter_blocks=true -secondary_cache_uri='cachelibwrapper://filename=/home/anand76/nvm_cache/cache_file;size=2147483648;regionSize=16777216;admPolicy=random;admProbability=1.0;volatileSize=8388608;bktPower=20;lockPower=12' -partition_index_and_filters=true -duration=1800```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8312
Reviewed By: zhichao-cao
Differential Revision: D28544325
Pulled By: anand1976
fbshipit-source-id: 8f209b9af900c459dc42daa7a610d5f00176eeed
Summary:
New tests should by default be expected to be parallelizeable
and passing with ASSERT_STATUS_CHECKED. Thus, I'm changing those two
lists to exclusions rather than inclusions.
For the set of exclusions, I only listed things that currently failed
for me when attempting not to exclude, or had some other documented
reason. This marks many more tests as "parallel," which will potentially
cause some failures from self-interference, but we can address those as
they are discovered.
Also changed CircleCI ASC test to be parallelized; the easy way to do
that is to exclude building tests that don't pass ASC, which is now a
small set.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8146
Test Plan: Watch CI, etc.
Reviewed By: riversand963
Differential Revision: D27542782
Pulled By: pdillinger
fbshipit-source-id: bdd74bcd912a963ee33f3fc0d2cad2567dc7740f
Summary:
If the platform is ppc64 and the libc is not GNU libc, then we exclude the range_tree from compilation.
See https://jira.percona.com/browse/PS-7559
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8070
Reviewed By: jay-zhuang
Differential Revision: D27246004
Pulled By: mrambacher
fbshipit-source-id: 59d8433242ce7ce608988341becb4f83312445f5
Summary:
TIL we have different versions of TARGETS file generated with
options passed to buckifier. Someone thought they were totally fine to
squash the file by re-running the command to generate (pretty reasonable
assumption) but the command was incorrect due to missing the extra
argument used to generate THAT TARGETS file.
This change includes in the command written in the TARGETS header the
extra argument passed to buckify (when used).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7902
Test Plan:
manual, as in the (now fixed) comments at the top of
buckify_rocksdb.py
Reviewed By: ajkr
Differential Revision: D26108317
Pulled By: pdillinger
fbshipit-source-id: 46e93dc1465e27bd18e0e0baa8eeee1b591c765d
Summary:
We would like to build a shared library with all fbcode dependencies statically linked within.
This resulting .so should not drop any symbols definitions in the building process.
To ensure that, we use `link_whole=True` according to
https://buck.build/rule/cxx_library.html#link_whole.
Since `link_whole` is `False` by default, adding a `link_whole=False` to existing libraries won't
change any behavior.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7466
Test Plan: build a .so and test internally.
Reviewed By: pdillinger
Differential Revision: D24009780
Pulled By: riversand963
fbshipit-source-id: d18804d495da7195ed72a2040e1a5de4fd336519
Summary:
This is to fix special logic to run tests inside FB.
Buck test is broken after moving to cpp_unittest(). Move c_test back to the previous approach.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7076
Test Plan: Watch the Sandcastle run
Reviewed By: ajkr
Differential Revision: D22370096
fbshipit-source-id: 4a464d0903f2c76ae2de3a8ad373ffc9bedec64c
Summary:
Change the linking of tests/tools to be against a library rather than a list of objects. This change substantially reduces the size of the objects produced.
peterd clean repo size: 264M
Before this change, with make all: 40G
After this change, with make all: 28G
With make LIB_MODE=shared all: 7.0G
The list of TESTS was changed from being hard-coded to generated from the test sources variable. Note that there are some test sources that are not built as tests (though the set of tests is identical to the previous version).
Added OBJ_DIR option to Makefile to allow objects to be placed in an alternative location. By default, OBJ_DIR is the same as before ("./").
This change is a precursor to being able to build/run the tests/tools linked against static libraries. Additionally, it should be possible to clean up and merge some of the rules for building tests and the like if so desired.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6660
Reviewed By: riversand963
Differential Revision: D22244463
Pulled By: pdillinger
fbshipit-source-id: db9c6341d81ed62c2270374f4ede02fb9604c754
Summary:
Make RocksDB run a predefined unit test so that it can be integrated with better tools.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6926
Test Plan: Watch tests
Reviewed By: pdillinger
Differential Revision: D21866216
fbshipit-source-id: cafca82efdf0b72671be8d30b665e88a75ae6000
Summary:
RocksDB Makefile was assuming existence of 'python' command,
which is not present in CentOS 8. We avoid using 'python' if 'python3' is available.
Also added fancy logic to format-diff.sh to make clang-format-diff.py for Python2 work even with Python3 only (as some CentOS 8 FB machines come equipped)
Also, now use just 'python3' for PYTHON if not found so that an informative
"command not found" error will result rather than something weird.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6883
Test Plan: manually tried some variants, 'make check' on a fresh CentOS 8 machine without 'python' executable or Python2 but with clang-format-diff.py for Python2.
Reviewed By: gg814
Differential Revision: D21767029
Pulled By: pdillinger
fbshipit-source-id: 54761b376b140a3922407bdc462f3572f461d0e9
Summary:
... so that we have freedom to upgrade it (see https://github.com/facebook/rocksdb/issues/6808).
As a side benefit, gtest will no longer be linked into main library in
buck build.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6858
Test Plan: fb internal build & link
Reviewed By: riversand963
Differential Revision: D21652061
Pulled By: pdillinger
fbshipit-source-id: 6018104af944debde576b5beda6c134e737acedb
Summary:
In buck build with opt mode, target should not include rocksdb_test_lib.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6847
Test Plan: Watch for internal cont build.
Reviewed By: ajkr
Differential Revision: D21586803
Pulled By: riversand963
fbshipit-source-id: 76d253c18d16fac6cab86a8c3f6b471ad5b6efb3
Summary:
Before this PR, extra deps passed in from cmd line to buckifier will be parsed
and used to populate a dict. Using this dict and printing to TARGETS file will
lead to printing u'', disallowed by build tools. This PR removes the u''.
Test Plan (local dev server):
```
python buckifier/buckify_rocksdb.py '{"fake": {"extra_deps": [":test_dep", "//fake/module:mock1"], "extra_compiler_flags": ["-Os", "-DROCKSDB_LITE"]}}'
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6841
Reviewed By: siying
Differential Revision: D21538155
Pulled By: riversand963
fbshipit-source-id: 09403668a4aa1a15bad7dac229c2bc8ce8ee1349
Summary:
Some recent PRs added new source files or modified TARGETS file manually.
During next internal release, executing the following command will revert the
manual changes.
Update buckifier so that the following command
```
python buckfier/buckify_rocksdb.py
```
does not change TARGETS file.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6726
Test Plan:
```
python buckifier/buckify_rocksdb.py
```
Reviewed By: siying
Differential Revision: D21098930
Pulled By: riversand963
fbshipit-source-id: e884f507fefef88163363c9097a460c98f1ed850
Summary:
include db_stress_tool in rocksdb tools lib
Test Plan (on devserver):
```
$make db_stress
$./db_stress
$make all && make check
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5950
Differential Revision: D18044399
Pulled By: riversand963
fbshipit-source-id: 895585abbbdfd8b954965921dba4b1400b7af1b1
Summary:
Users may desire to specify extra dependencies via buck. This PR allows users to pass additional dependencies as a JSON object so that the buckifier script can generate TARGETS file with desired extra dependencies.
Test plan (on dev server)
```
$python buckifier/buckify_rocksdb.py '{"fake": {"extra_deps": [":test_dep", "//fakes/module:mock1"], "extra_compiler_flags": ["-DROCKSDB_LITE", "-Os"]}}'
Generating TARGETS
Extra dependencies:
{'': {'extra_compiler_flags': [], 'extra_deps': []}, 'test_dep1': {'extra_compiler_flags': ['-O2', '-DROCKSDB_LITE'], 'extra_deps': [':fake', '//dep1/mock']}}
Generated TARGETS Summary:
- 5 libs
- 0 binarys
- 296 tests
```
Verify the TARGETS file.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5648
Differential Revision: D16565043
Pulled By: riversand963
fbshipit-source-id: a6ef02274174fcf159692d7b846e828454d01e89
Summary:
There are too many types of files under util/. Some test related files don't belong to there or just are just loosely related. Mo
ve them to a new directory test_util/, so that util/ is cleaner.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5377
Differential Revision: D15551366
Pulled By: siying
fbshipit-source-id: 0f5c8653832354ef8caa31749c0143815d719e2c
Summary: Grandfather in super old lint issues to make a clean slate for moving forward that allows us to have stronger enforcement on new issues.
Reviewed By: yiwu-arbug
Differential Revision: D6821806
fbshipit-source-id: 22797d31ec58e9eb0255d3b66fedfcfcb0dc127c
Summary:
1. The buckifier script assume each test "foo" comes with a .cc file of the same name (i.e. foo.cc). Update cassandra tests to follow this pattern so that the buckifier script can recognize them.
2. add blob_db_test
Closes https://github.com/facebook/rocksdb/pull/2506
Differential Revision: D5331517
Pulled By: yiwu-arbug
fbshipit-source-id: 86f3eba471fc621186ab44cbd073b6162cde8e57