Summary:
Currently, db_stress performs verification by calling `VerifyDb()` at the end of test and optionally before tests start. In case of corruption or incorrect result, it will be too late. This PR adds more verification in two ways.
1. For cf consistency test, each test thread takes a snapshot and verifies every N ops. N is configurable via `-verify_db_one_in`. This option is not supported in other stress tests.
2. For cf consistency test, we use another background thread in which a secondary instance periodically tails the primary (interval is configurable). We verify the secondary. Once an error is detected, we terminate the test and report. This does not affect other stress tests.
Test plan (devserver)
```
$./db_stress -test_cf_consistency -verify_db_one_in=0 -ops_per_thread=100000 -continuous_verification_interval=100
$./db_stress -test_cf_consistency -verify_db_one_in=1000 -ops_per_thread=10000 -continuous_verification_interval=0
$make crash_test
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6173
Differential Revision: D19047367
Pulled By: riversand963
fbshipit-source-id: aeed584ad71f9310c111445f34975e5ab47a0615
Summary:
Complete some refactoring called for in https://github.com/facebook/rocksdb/issues/6148. Somehow I got some 'make format' in here for files I didn't change, but that should be OK. (I'm not sure why "hide whitespace changes" doesn't seem to help in review.)
Not addressed in this PR: some operations simply print to stdout rather than failing on discovering a bad status or inconsistency.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6195
Differential Revision: D19131067
Pulled By: pdillinger
fbshipit-source-id: 4f416e6b792023989ef119f385fe122426cb825d
Summary:
Add an option to db_stress, verify_checksum_one_in, to call DB::VerifyChecksum() once every N ops.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6203
Differential Revision: D19145753
Pulled By: anand1976
fbshipit-source-id: d09edf21f309ad53aa40dd25b7a563d50665fd8b
Summary:
In https://github.com/facebook/rocksdb/issues/6174 we fixed the stress test to respect the CancelAllBackgroundWork + Close order for WritePrepared transactions. The fix missed to take into account that some invocation of CancelAllBackgroundWork are with wait=false parameter which essentially breaks the order.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6191
Differential Revision: D19102709
Pulled By: maysamyabandeh
fbshipit-source-id: f4e7b5fdae47ff1c1ac284ba1cf67d5d3f3d03eb
Summary:
Several options are trivially added to crash test and random values are picked.
Made simple test run non-dynamic level and normal test run dynamic level.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6176
Test Plan: Run crash_test and watch the printing
Differential Revision: D19053955
fbshipit-source-id: 958cb43c968541ebd87ed4d91e778bd1d40e7502
Summary:
compaction history is stored in manifest files. Preserve all of them in db_stress would help debugging.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6142
Test Plan: Run db_stress and observe that manifest files are preserved. Run whole crash_test and see how DB directory looks like.
Differential Revision: D19047026
fbshipit-source-id: f83c3e0bb5332b1b4768be5dcee56a24f9b760a9
Summary:
In the current db_stress, all the keys are generated randomly and follows the uniform distribution. In order to test some corner cases that some key are always updated or read, we need to generate the key based on other distributions. In this PR, the key is generated based on Zipfian distribution and the skewness can be controlled by setting hot_key_alpha (0.8 to 1.5 is suggested). The larger hot_key_alpha is, the more skewed will be. Not that, usually, if hot_key_alpha is larger than 2, there might be only 1 or 2 keys that are generated. If hot_key_alpha is 0, it generate the key follows uniform distribution (random key)
Testing plan: pass the db_stress and printed the keys to make sure it follows the distribution.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6163
Differential Revision: D18978480
Pulled By: zhichao-cao
fbshipit-source-id: e123b4865477f7478e83fb581f9576bada334680
Summary:
Current implementation holds on to 10% of snapshots for 10x longer, and 1% of snapshots 100x longer.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6171
Test Plan:
```
make -j32 crash_test
Differential Revision: D19038399
Pulled By: maysamyabandeh
fbshipit-source-id: 75da2dbb5c47a0b3f37d299b8719e392b73b42c0
Summary:
Close asserts that there is no unreleased snapshots. For WritePrepared transaction, this means that the background work that holds on a snapshot must be canceled first. Update the stress tests to respect the sequence.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6174
Test Plan:
```
make -j32 crash_test
Differential Revision: D19057322
Pulled By: maysamyabandeh
fbshipit-source-id: c9e9e24f779bbfb0ab72c2717e34576c01bc6362
Summary:
And clean up related code, especially in stress test.
(More clean up of db_stress_test_base.cc coming after this.)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6154
Test Plan: make check, make blackbox_crash_test for a bit
Differential Revision: D18938180
Pulled By: pdillinger
fbshipit-source-id: 524d27621b8dbb25f6dff40f1081e7c00630357e
Summary:
With WritePrepared transactions configured with two_write_queues, unordered_write will offer the same guarantees as vanilla rocksdb and thus can be enabled in stress tests.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6164
Test Plan:
```
make -j32 crash_test_with_txn
Differential Revision: D18991899
Pulled By: maysamyabandeh
fbshipit-source-id: eece5e96b4169b67d7931e5c0afca88540a113e1
Summary:
Currently the default txn write policy in crash tests is WRITE_PREPARED. The patch randomly picks the write policy at the start of the crash test.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6158
Test Plan:
```
make -j32 crash_test_with_txn
```
Differential Revision: D18946307
Pulled By: maysamyabandeh
fbshipit-source-id: f77d7a94f99a08791ef9626da153d284bf521950
Summary:
Add SyncWAL to db_stress. Specify with `-sync_wal_one_in=N` so that it will be
called once every N operations on average.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6149
Test Plan:
```
$make db_stress
$./db_stress -sync_wal_one_in=100 -ops_per_thread=100000
```
Differential Revision: D18922529
Pulled By: riversand963
fbshipit-source-id: 4c0b8cb8fa21852722cffd957deddf688f12ea56
Summary:
CancelAllBackgroundWork() and Close() are frequently used features but we don't cover it in stress test. Simply execute them before closing the DB with 1/2 chance.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6141
Test Plan: Run "db_stress".
Differential Revision: D18900861
fbshipit-source-id: 49b46ccfae120d0f9de3e0543b82fb6d715949d0
Summary:
Add an option to explicitly disable building shared versions of the
RocksDB libraries. The shared libraries cannot be built in cases where
some dependencies are only available as static libraries. This allows
still building RocksDB in these situations.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6122
Differential Revision: D18920740
fbshipit-source-id: d24f66d93c68a1e65635e6e0b663bae62c903bca
Summary:
Right now, in db_stress, compact range is simply executed without any immediate data validation. Add a simply validation which compares hash for all keys within the compact range to stay the same against the same snapshot before and after the compaction.
Also, randomly tune most knobs of CompactRangeOptions.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6140
Test Plan: Run db_stress with "--compact_range_one_in=2000 --compact_range_width=100000000" for a while. Manually ingest some hacky code and observe the error path.
Differential Revision: D18900230
fbshipit-source-id: d96e75bc8c38dd5ec702571ffe7cf5f4ea93ee10
Summary:
Especially with non-integral bits/key now supported,
db_crashtest should vary the bloom_bits configuration. The probabilities
look like this:
1/2 chance of a uniform int from 0 to 19. This includes overall 1/40
chance of 0 which disables the bloom filter.
1/2 chance of a float from a lognormal distribution with a median of 10.
This always produces positive values but with a decent chance of < 1
(overall ~1/40) or > 100 (overall ~1/40), the enforced/coerced
implementation limits.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6103
Test Plan:
start 'make blackbox_crash_test' several times and look at
configuration output
Differential Revision: D18734877
Pulled By: pdillinger
fbshipit-source-id: 4a38cb057d3b3fc1327f93199f65b9a9ffbd7316
Summary:
Two changes:
1. Prevent static variables in a header file
2. Add "override" keyword when virtual functions are overridden.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6137
Test Plan: Build db_stress with or without LITE.
Differential Revision: D18892007
fbshipit-source-id: 295356427a34473b23ed36d6ed4ef3ae35a32db0
Summary:
db_stress_tool.cc now is a giant file. In order to main it easier to improve and maintain, break it down to multiple source files.
Most classes are turned into their own files. Separate .h and .cc files are created for gflag definiations. Another .h and .cc files are created for some common functions. Some test execution logic that is only loosely related to class StressTest is moved to db_stress_driver.h and db_stress_driver.cc. All the files are located under db_stress_tool/. The directory name is created as such because if we end it with either stress or test, .gitignore will ignore any file under it and makes it prone to issues in developements.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6134
Test Plan: Build under GCC7 with and without LITE on using GNU Make. Build with GCC 4.8. Build with cmake with -DWITH_TOOL=1
Differential Revision: D18876064
fbshipit-source-id: b25d0a7451840f31ac0f5ebb0068785f783fdf7d