Summary:
The fractional cascading index is not correctly generated when two files at the same level contains the same smallest or largest user key.
The result would be that it would hit an assertion in debug mode and lower level files might be skipped.
This might cause wrong results when the same user keys are of merge operands and Get() is called using the exact user key. In that case, the lower files would need to further checked.
The fix is to fix the fractional cascading index.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6285
Test Plan: Add a unit test which would cause the assertion which would be fixed.
Differential Revision: D19358426
fbshipit-source-id: 39b2b1558075fd95e99491d462a67f9f2298c48e
Summary:
This makes it easier to call the functions from Rust as otherwise they require mutable types.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6283
Differential Revision: D19349991
Pulled By: wqfish
fbshipit-source-id: e8da7a75efe8cd97757baef8ca844a054f2519b4
Summary:
Look at all compaction input files to compute the oldest ancestor time.
In https://github.com/facebook/rocksdb/issues/5992 we changed how creation_time (aka oldest-ancestor-time) table property of compaction output files is computed from max(creation-time-of-all-compaction-inputs) to min(creation-time-of-all-inputs). This exposed a bug where, during compaction, the creation_time:s of only the L0 compaction inputs were being looked at, and all other input levels were being ignored. This PR fixes the issue.
Some TTL compactions when using Level-Style compactions might not have run due to this bug.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6279
Test Plan: Enhanced the unit tests to validate that the correct time is propagated to the compaction outputs.
Differential Revision: D19337812
Pulled By: sagar0
fbshipit-source-id: edf8a72f11e405e93032ff5f45590816debe0bb4
Summary:
unordered_write is incompatible with non-zero max_successive_merges. Although we check this at runtime, we currently don't prevent the user from setting this combination in options. This has led to stress tests to fail with this combination is tried in ::SetOptions.
The patch fixes that and also reverts the changes performed by https://github.com/facebook/rocksdb/pull/6254, in which max_successive_merges was mistakenly declared incompatible with unordered_write.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6284
Differential Revision: D19356115
Pulled By: maysamyabandeh
fbshipit-source-id: f06dadec777622bd75f267361c022735cf8cecb6
Summary:
Fix compilation under LITE by putting `#ifndef ROCKSDB_LITE` around a code block.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6277
Differential Revision: D19334157
Pulled By: riversand963
fbshipit-source-id: 947111ed68aa550f5ea424b216c1442a8af9e32b
Summary:
Some shadow warning shows up when using gcc 4.8. An example:
./utilities/blob_db/blob_compaction_filter.h: In constructor ‘rocksdb::blob_db::BlobIndexCompactionFilterFactoryBase::BlobIndexCompactionFilterFactoryBase(rocksdb::blob_db::lobDBImpl*, rocksdb::Env*, rocksdb::Statistics*)’:
./utilities/blob_db/blob_compaction_filter.h:121:7: error: declaration of ‘blob_db_impl’ shadows a member of 'this' [-Werror=shadow]
: blob_db_impl_(blob_db_impl), env_(_env), statistics_(_statistics) {}
^
Fix them.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6242
Test Plan: Build and see the warnings go away.
Differential Revision: D19217789
fbshipit-source-id: 8ef631941f23dab47a388e060adec24b72efd65e
Summary:
Right now, when validating prefix iterator, if control iterator is invalidate but prefix iterator shows value, we determine it as a test failure. However, this fails to consider the case where a file or memtable containing a tombstone is filtered out by a prefix bloom filter. The fix is to relax the check in this case. If we are out of prefix range, then ignore the check.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6269
Test Plan: Run crash_test for a short while and it still passes.
Differential Revision: D19317594
fbshipit-source-id: b964a1cdc1df5efe439d4b32f8023e1fbc8598c1
Summary:
The crash test is failing with non-ok status after TransactionDB::Open. This patch adds more debugging information.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6272
Differential Revision: D19314527
Pulled By: maysamyabandeh
fbshipit-source-id: d45ecb0f2144e052fb4b5fdd483150440991a3b4
Summary:
This is the start of some JMH microbenchmarks for RocksJava.
Such benchmarks can help us decide on performance improvements of the Java API.
At the moment, I have only added benchmarks for various Comparator options, as that is one of the first areas where I want to improve performance. I plan to expand this to many more tests.
Details of how to compile and run the benchmarks are in the `README.md`.
A run of these on a XEON 3.5 GHz 4vCPU (QEMU Virtual CPU version 2.5+) / 8GB RAM KVM with Ubuntu 18.04, OpenJDK 1.8.0_232, and gcc 8.3.0 produced the following:
```
# Run complete. Total time: 01:43:17
REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.
Benchmark (comparatorName) Mode Cnt Score Error Units
ComparatorBenchmarks.put native_bytewise thrpt 25 122373.920 ± 2200.538 ops/s
ComparatorBenchmarks.put java_bytewise_adaptive_mutex thrpt 25 17388.201 ± 1444.006 ops/s
ComparatorBenchmarks.put java_bytewise_non-adaptive_mutex thrpt 25 16887.150 ± 1632.204 ops/s
ComparatorBenchmarks.put java_direct_bytewise_adaptive_mutex thrpt 25 15644.572 ± 1791.189 ops/s
ComparatorBenchmarks.put java_direct_bytewise_non-adaptive_mutex thrpt 25 14869.601 ± 2252.135 ops/s
ComparatorBenchmarks.put native_reverse_bytewise thrpt 25 116528.735 ± 4168.797 ops/s
ComparatorBenchmarks.put java_reverse_bytewise_adaptive_mutex thrpt 25 10651.975 ± 545.998 ops/s
ComparatorBenchmarks.put java_reverse_bytewise_non-adaptive_mutex thrpt 25 10514.224 ± 930.069 ops/s
```
Indicating a ~7x difference between comparators implemented natively (C++) and those implemented in Java. Let's see if we can't improve on that in the near future...
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6241
Differential Revision: D19290410
Pulled By: pdillinger
fbshipit-source-id: 25d44bf3a31de265502ed0c5d8a28cf4c7cb9c0b
Summary:
WritePreparedTxnDB calls CancelAllBackgroundWork in its destructor to avoid dangling references to it from background job's SnapshotChecker callback. However, if the DBImpl is already closed, the info log might be closed with it, which causes memory leak when CancelAllBackgroundWork tries to print to the info log. The patch fixes that by calling CancelAllBackgroundWork only if the db is not closed already.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6268
Differential Revision: D19303439
Pulled By: maysamyabandeh
fbshipit-source-id: 4228a6be7e78d43c90630347baa89b008200bd15
Summary:
This is a continuation of https://github.com/facebook/rocksdb/pull/5320/files
I open a new mr for these purposes, half a year has past since the old mr is posted so it's almost impossible to fulfill some points below on the old mr, especially 5)
1) add validation modes for optimistic txns
2) modify unittests to test both modes
3) make format
4) refine hash functor
5) push to master
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6240
Differential Revision: D19301296
fbshipit-source-id: 5b5b3cbd39558f43947f7d2dec6cd31a06386edb
Summary:
A new interface method Env::GetFreeSpace was added in https://github.com/facebook/rocksdb/issues/4164. It needs to be implemented for Windows port. Some error_handler_test cases fail on Windows because recovery cannot succeed without free space being reported.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6265
Differential Revision: D19303065
fbshipit-source-id: 1f1a83e53f334284781cf61feabc996e87b945ca
Summary:
Currently, the recently-added test DBTest2.SwitchMemtableRaceWithNewManifest
fails in LITE mode since SetOptions() returns "Not supported". I do not want to
put `#ifndef ROCKSDB_LITE` because it reduces test coverage. Instead, just
trigger compaction on a different column family. The bg compaction thread
calling LogAndApply() may race with thread calling SwitchMemtable().
Test Plan (dev server):
make check
OPT=-DROCKSDB_LITE make check
or run DBTest2.SwitchMemtableRaceWithNewManifest 100 times.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6267
Differential Revision: D19301309
Pulled By: riversand963
fbshipit-source-id: 88cedcca2f985968ed3bb234d324ffa2aa04ca50
Summary:
Fix an error message when CURRENT is not found.
Test plan (dev server)
```
make check
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6264
Differential Revision: D19300699
Pulled By: riversand963
fbshipit-source-id: 303fa206386a125960ecca1dbdeff07422690caf
Summary:
Add oldest snapshot sequence property, so we can use `db.GetProperty("rocksdb.oldest-snapshot-sequence")` to get the sequence number of the oldest snapshot.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6228
Differential Revision: D19264145
Pulled By: maysamyabandeh
fbshipit-source-id: 67fbe5304d89cbc475bd404e30d1299f7b11c010
Summary:
Reword the error message when keys are not added in strict ascending order.
Specifically, original error message is not clear when application tries to
call SstFileWriter::Merge() with duplicate keys.
Test plan (dev server)
```
make check
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6261
Differential Revision: D19290398
Pulled By: riversand963
fbshipit-source-id: 4dc30a701414e6894db2eb024e3734470c22b371
Summary:
It seems that the C-API doesn't expose the range delete functionality at the moment, so add the API.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6259
Differential Revision: D19290320
Pulled By: pdillinger
fbshipit-source-id: 3f403a4c3446d2042d55f1ece7cdc9c040f40c27
Summary:
When measure_io_stats_ is enabled, the volume of logging is beyond the default limit of 512 size. The patch allows the EventLoggerStream to change the limit, and also sets it to 1024 for FlushJob.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6258
Differential Revision: D19279269
Pulled By: maysamyabandeh
fbshipit-source-id: 3fb5d468dad488f289ac99d713378177eb7504d6
Summary:
When called on transactions, MultiGet could return a legit MergeInProgress status. The patch excludes this case from errors.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6257
Differential Revision: D19275787
Pulled By: maysamyabandeh
fbshipit-source-id: f7158229422af015947e592ae066b4273c9fb9a4
Summary:
Stress tests count number of errors and report them at the end. Not all the cases are accompanied with a log line which makes debugging difficult. The patch adds a log line to the remaining cases.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6256
Differential Revision: D19268785
Pulled By: maysamyabandeh
fbshipit-source-id: bdabcaa5c5c7edcb4ce4f25e38fd8a3fd9c7700b
Summary:
allow_concurrent_memtable_write is incompatible with non-zero max_successive_merges. Although we check this at runtime, we currently don't prevent the user from setting this combination in options. This has led to stress tests to fail with this combination is tried in ::SetOptions. The patch fixes that.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6254
Differential Revision: D19265819
Pulled By: maysamyabandeh
fbshipit-source-id: 47f2e2dc26fe0972c7152f4da15dadb9703f1179
Summary:
Clang analyzer was falsely reporting on use of txn=nullptr.
Added a new const variable so that it can properly prune impossible
control flows.
Also, 'make analyze' previously required setting USE_CLANG=1 as an
environment variable, not a make variable, or else compilation errors
like
g++: error: unrecognized command line option ‘-Wshorten-64-to-32’
Now USE_CLANG is not required for 'make analyze' (it's implied) and you
can do an incremental analysis (recompile what has changed) with
'USE_CLANG=1 make analyze_incremental'
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6244
Test Plan: 'make -j24 analyze', 'make crash_test'
Differential Revision: D19225950
Pulled By: pdillinger
fbshipit-source-id: 14f4039aa552228826a2de62b2671450e0fed3cb
Summary:
This commit is suspected in some crash test failures such as
Verification failed for column family 0 key 78438077: Value not found: NotFound:
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6243
Test Plan: 'make check' and start 'make crash_test'
Differential Revision: D19220495
Pulled By: pdillinger
fbshipit-source-id: 6c4709cee80ab4344e06ce360f51e947d79fb3fa
Summary:
Call Transaction::MultiGet from TestMultiGet() in db_stress. We add some Puts/Merges/Deletes into the transaction in order to exercise MultiGetFromBatchAndDB. There is no data verification on read, just check status. There is currently no read data verification in db_stress as it requires synchronization with writes. It needs to be tackled separately.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6227
Test Plan: make crash_test_with_txn
Differential Revision: D19204611
Pulled By: anand1976
fbshipit-source-id: 770d0e30d002e88626c264c58103f1d709bb060c
Summary:
Recently db_stress starts to use a special Env that keeps all manifest files. This should not apply to checkpoint directory and causes test failure like this:
Verification failed: Checkpoint gave inconsistent state. Status is IO error: While mkdir: /dev/shm/rocksdb/rocksdb_crashtest_whitebox/.checkpoint27.tmp: File exists
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6233
Test Plan: Run crash_test with high chance of checkpoint and make sure it doesn't reproduce.
Differential Revision: D19207250
fbshipit-source-id: 12a931379e2e0572bb84aa658b6d03770c8551d4
Summary:
Listners are one source of bugs because we frequently release some mutex to invoke them, which introduce race conditions. Implement all callback functions in db_stress's listener class, and randomly sleep.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6197
Test Plan: Run crash_test for a while and see no obvious problem.
Differential Revision: D19134015
fbshipit-source-id: b9ea8be9366e4501759119520cd4f204943538f6
Summary:
db_stress to execute DB::GetApproximateSizes() with randomized keys and options. Return value is not validated but error will be reported.
Two ways to generate the range keys: (1) two random keys; (2) a small range.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6213
Test Plan: (1) run "make crash_test" for a while; (2) hack the code to ingest some errors to see it is reported.
Differential Revision: D19204665
fbshipit-source-id: 652db36f13bcb5a3bd8fe4a10c0aa22a77a0bce2
Summary:
Currently, db_stress generates fixed length keys of 8 bytes. This patch adds the ability to generate variable length keys. Most of the db_stress code continues to work with a numeric key randomly generated, and the numeric key also acts as an index into the values_ array. The numeric key is mapped to a variable length string key in a deterministic way. Furthermore, the ordering is preserved.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6165
Test Plan: run make crash_test
Differential Revision: D19204646
Pulled By: anand1976
fbshipit-source-id: d2d46a96615b4832a8be2a981f5913905f0e1ca7
Summary:
Several improvements to crash_test/stress_test:
(1) Stress_test to support an parameter of bottommost compression
(2) Rename those FLAGS_* variables that are not gflags to avoid confusion
(3) Crash_test to randomly generate compression type for bottommost compression with half the chance.
(4) Stress_test to sanitize unsupported compression type to snappy, so that crash_test to cover all possible compression types and people don't need to worry about they don't support all comrpession types in their environment.
(5) In crash_test, when generating db_stress command, sort arguments in alphabeta order, so that it is easier to find value for a specific argument.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6215
Test Plan: Run "make crash_test" for a while and see the botommost option shown in LOG files.
Differential Revision: D19171255
fbshipit-source-id: d7001e246c4ff9ee5760776eea0be97738650735
Summary:
1. Cover SeekToFirst() and SeekToLast().
2. Try to record the history of iterator operations.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6166
Test Plan: Do some manual changes in the code to cover the failure cases and see the error printing is correct and SeekToFirst() and SeekToLast() sometimes show up.
Differential Revision: D19047079
fbshipit-source-id: 1ed616f919fe4d32c0a021fc37932a7bd3063bcd
Summary:
There are no API changes ;-)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6218
Differential Revision: D19200373
Pulled By: pdillinger
fbshipit-source-id: 58d34b01ea53b75a1eccbd72f8b14d6256a7380f
Summary:
Add the verification in operateDB to verify GetLiveFiles, GetSortedWalFiles and GetCurrentWalFile. The test will be called every 1 out of N, N is decided by get_live_files_and_wal_files_one_i, whose default is 1000000.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6224
Test Plan: pass db_stress default run.
Differential Revision: D19183358
Pulled By: zhichao-cao
fbshipit-source-id: 20073cf72ede77a3e0d3cf5f28304f1f605d2b1a
Summary:
As title. We can run non-cf-consistency stress tests with verify_db_one_in>0,
thus remove the check added previously.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6231
Test Plan:
```
make crash_test
```
Differential Revision: D19198295
Pulled By: riversand963
fbshipit-source-id: e874c701bb03ab76eaab00f059dd4032bb2f537f
Summary:
The patch adds support for BlobDB to `db_stress`. Note that BlobDB currently does
not support (amongst other features) Column Families or the `SingleDelete` API,
so for now, those should be disabled on the command line when running `db_stress` in
BlobDB mode (using `-column_families=1` and `-nooverwritepercent=0`,
respectively). Also, some BlobDB features that do not go well with the verification logic
in `db_stress` like TTL and FIFO eviction are not supported currently.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6230
Test Plan:
```
./db_stress -max_key=100000 -use_blob_db -column_families=1 -nooverwritepercent=0 -reopen=1 -blob_db_file_size=1000000 -target_file_size_base=1000000 -blob_db_enable_gc -blob_db_gc_cutoff=0.1 -blob_db_min_blob_size=10 -blob_db_bytes_per_sync=16384
```
Differential Revision: D19191476
Pulled By: ltamasi
fbshipit-source-id: 35840452af8c5e6095249c7fd9a53a119a0985fc
Summary:
Currently, db_stress performs verification by calling `VerifyDb()` at the end of test and optionally before tests start. In case of corruption or incorrect result, it will be too late. This PR adds more verification in two ways.
1. For cf consistency test, each test thread takes a snapshot and verifies every N ops. N is configurable via `-verify_db_one_in`. This option is not supported in other stress tests.
2. For cf consistency test, we use another background thread in which a secondary instance periodically tails the primary (interval is configurable). We verify the secondary. Once an error is detected, we terminate the test and report. This does not affect other stress tests.
Test plan (devserver)
```
$./db_stress -test_cf_consistency -verify_db_one_in=0 -ops_per_thread=100000 -continuous_verification_interval=100
$./db_stress -test_cf_consistency -verify_db_one_in=1000 -ops_per_thread=10000 -continuous_verification_interval=0
$make crash_test
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6173
Differential Revision: D19047367
Pulled By: riversand963
fbshipit-source-id: aeed584ad71f9310c111445f34975e5ab47a0615
Summary:
BlobDB currently only supports using the default column family. The earlier
code enforces this by comparing the `ColumnFamilyHandle` passed to the
`Get`/`Put`/etc. call with the handle returned by `DefaultColumnFamily`
(which, at the end of the day, comes from `DBImpl::default_cf_handle_`).
Since other `ColumnFamilyHandle`s can also point to the default column
family, this can reject legitimate requests as well. (As an example,
with the earlier code, the handle returned by `BlobDB::Open` cannot
actually be used in API calls.) The patch fixes this by comparing only
the IDs of the column family handles instead of the pointers themselves.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6226
Test Plan: `make check`
Differential Revision: D19187461
Pulled By: ltamasi
fbshipit-source-id: 54ce2e12ebb1f07e6d1e70e3b1e0213dfa94bda2
Summary:
The new Python syntax check could fail if external entities
were cloned or symlinked to a subdir in a rocksdb git clone. (E.g.
Facebook internal LITE build.) Only look for Python files in specific
subdirs
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6225
Test Plan: python tools/check_all_python.py (still 34 files checked)
Reviewed By: gfosco
Differential Revision: D19186110
Pulled By: pdillinger
fbshipit-source-id: 1fefa54e36b32cd5d96d3d1a43e8a2a694c22ea5
Summary:
Right now BlockBasedTable::ApproximateSize() uses default setting about whether to use total order seek. There is no reason for that. There is no reason to do any filtering for approximate size boundary key, and it may introduce bugs. Disable it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6222
Test Plan: Run existing tests
Differential Revision: D19184787
fbshipit-source-id: 64180660bd2800914fff75104172b61c06f0b1c9
Summary:
Complete some refactoring called for in https://github.com/facebook/rocksdb/issues/6148. Somehow I got some 'make format' in here for files I didn't change, but that should be OK. (I'm not sure why "hide whitespace changes" doesn't seem to help in review.)
Not addressed in this PR: some operations simply print to stdout rather than failing on discovering a bad status or inconsistency.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6195
Differential Revision: D19131067
Pulled By: pdillinger
fbshipit-source-id: 4f416e6b792023989ef119f385fe122426cb825d