Summary:
Atomic flush is incompatible with pipelined write. At least now.
If pipelined write is enabled, a thread performing write can exit the write
thread and start inserting into memtables. Consequently a thread performing
flush will enter write thread and race with memtable insertion by the former.
This will cause undefined result in terms of data persistence.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5860
Test Plan:
```
$make all && make check
```
Differential Revision: D17638944
Pulled By: riversand963
fbshipit-source-id: abc578dc49a5dbe41bc5adcecf448f8e042a6d49
Summary:
When prefix_size = -1, stress test crashes with run time error because of overflow. Fix it by not using -1 but 7 in prefix scan mode.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5862
Test Plan:
Run
python -u tools/db_crashtest.py --simple whitebox --random_kill_odd \
888887 --compression_type=zstd
and see it doesn't crash.
Differential Revision: D17642313
fbshipit-source-id: f029e7651498c905af1b1bee6d310ae50cdcda41
Summary:
For now, crash_test is not able to report any failure for the logic related to iterator upper, lower bounds or iterators, or reseek. These are features prone to errors. Improve db_stress in several ways:
(1) For each iterator run, reseek up to 3 times.
(2) For every iterator, create control iterator with upper or lower bound, with total order seek. Compare the results with the iterator.
(3) Make simple crash test to avoid prefix size to have more coverage.
(4) make prefix_size = 0 a valid size and -1 to indicate disabling prefix extractor.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5846
Test Plan: Manually hack the code to create wrong results and see they are caught by the tool.
Differential Revision: D17631760
fbshipit-source-id: acd460a177bd2124a5ffd7fff490702dba63030b
Summary:
Add unordered_write option api and related ut to rocksjava
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5839
Differential Revision: D17604446
Pulled By: maysamyabandeh
fbshipit-source-id: c6b07e85ca9d5e3a92973ddb6ab2bc079e53c9c1
Summary:
as title.
Test Plan (on devserver):
```
$make all && make check
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5855
Differential Revision: D17615125
Pulled By: riversand963
fbshipit-source-id: bd6ed8cf59eafff41f0d1fc044f39e8f3573172a
Summary:
This is a bug occaionally shows up in crash test, and this unit test is to reproduce it. The bug is following:
1. Database has multiple CFs.
2. Between one DB restart, the last log file is corrupted in the middle (not the tail)
3. During restart, DB crashes between flushes between two CFs.
The DB will fail to be opened again with error "SST file is ahead of WALs"
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5851
Test Plan: Run the test itself.
Differential Revision: D17614721
fbshipit-source-id: 1b0abce49b203a76a039e38e76bc940429975f20
Summary:
Partitioned filters make use of a top-level index to find the partition in which the filter resides. The top-level index has a key per partition. The key is guaranteed to be larger or equal than any key in that partition. When used with format_version 3, which excludes the sequence number form index keys, the separator key in the index could be equal to the prefix of the keys in the next partition. In this way, when searching for the key, the top-level index will lead us to the previous partition, which has no key with that prefix. The prefix bloom test thus returns false, although the prefix exists in the bloom of the next partition.
The patch fixes that by a hack: It always adds the prefix of the first key of the next partition to the bloom of the current partition. In this way, in the corner cases that the index will lead us to the previous partition, we still can find the bloom filter there.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5835
Differential Revision: D17513585
Pulled By: maysamyabandeh
fbshipit-source-id: e2d1ff26c759e6e03875c4d57f4228316ecf50e9
Summary:
The comparison of va_list and nullptr is always False under any arch, and will raise invalid operands of types error in aarch64 env (`error: invalid operands of types ‘va_list {aka __va_list}’ and ‘std::nullptr_t’ to binary ‘operator!=’`).
This patch removes this invalid assert.
Closes: https://github.com/facebook/rocksdb/issues/4277
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5836
Differential Revision: D17532470
fbshipit-source-id: ca98078ecbc6a9416c69de3bd6ffcfa33a0f0185
Summary:
format-diff.sh, a.k.a. 'make format', would use 'master'
to decide which commits are probably unpublished. Much better to use
facebook remote master since local master may not be caught up and may
have its own unpublished commits. Script now tries to compare against
facebook remote master branch (branch pointer is updated with any fetch
or pull), because those differences are what would be considered the
differences for a pull request.
Also, script would compare against *parent* of merge-base with that
reference point, which is just wrong since that includes the last
published commit.
In case of problems, you can now customize the reference point, by
setting the FORMAT_UPSTREAM variable.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5831
Test Plan: manual
Differential Revision: D17528462
Pulled By: pdillinger
fbshipit-source-id: 50fdb8795d683bf3c14d449669c1a5299e0dfa8b
Summary:
Further apply formatter to more recent commits.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5830
Test Plan: Run all existing tests.
Differential Revision: D17488031
fbshipit-source-id: 137458fd94d56dd271b8b40c522b03036943a2ab
Summary:
Some recent commits might not have passed through the formatter. I formatted recent 45 commits. The script hangs for more commits so I stopped there.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5827
Test Plan: Run all existing tests.
Differential Revision: D17483727
fbshipit-source-id: af23113ee63015d8a43d89a3bc2c1056189afe8f
Summary:
clang-analyzer has uncovered a bunch of places where the code is relying
on pointers being valid and one case (in VectorIterator) where a moved-from
object is being used:
In file included from db/range_tombstone_fragmenter.cc:17:
./util/vector_iterator.h:23:18: warning: Method called on moved-from object 'keys' of type 'std::vector'
current_(keys.size()) {
^~~~~~~~~~~
1 warning generated.
utilities/persistent_cache/block_cache_tier_file.cc:39:14: warning: Called C++ object pointer is null
Status s = env->NewRandomAccessFile(filepath, file, opt);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
utilities/persistent_cache/block_cache_tier_file.cc:47:19: warning: Called C++ object pointer is null
Status status = env_->GetFileSize(Path(), size);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
utilities/persistent_cache/block_cache_tier_file.cc:290:14: warning: Called C++ object pointer is null
Status s = env_->FileExists(Path());
^~~~~~~~~~~~~~~~~~~~~~~~
utilities/persistent_cache/block_cache_tier_file.cc:363:35: warning: Called C++ object pointer is null
CacheWriteBuffer* const buf = alloc_->Allocate();
^~~~~~~~~~~~~~~~~~
utilities/persistent_cache/block_cache_tier_file.cc:399:41: warning: Called C++ object pointer is null
const uint64_t file_off = buf_doff_ * alloc_->BufferSize();
^~~~~~~~~~~~~~~~~~~~
utilities/persistent_cache/block_cache_tier_file.cc:463:33: warning: Called C++ object pointer is null
size_t start_idx = lba.off_ / alloc_->BufferSize();
^~~~~~~~~~~~~~~~~~~~
utilities/persistent_cache/block_cache_tier_file.cc:515:5: warning: Called C++ object pointer is null
alloc_->Deallocate(bufs_[i]);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~
7 warnings generated.
ar: creating librocksdb_debug.a
utilities/memory/memory_test.cc:68:25: warning: Called C++ object pointer is null
cache_set->insert(db->GetDBOptions().row_cache.get());
^~~~~~~~~~~~~~~~~~
1 warning generated.
The patch fixes these by adding assertions and explicitly passing in zero
when initializing VectorIterator::current_ (which preserves the existing
behavior).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5821
Test Plan: Ran make check and make analyze to make sure the warnings have disappeared.
Differential Revision: D17455949
Pulled By: ltamasi
fbshipit-source-id: 363619618ea649a0674287f9f3b3393e390571ee
Summary:
Make class ObsoleteFilesTest inherit from DBTestBase.
Test plan (on devserver):
```
$COMPILE_WITH_ASAN=1 make obsolete_files_test
$./obsolete_files_test
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5820
Differential Revision: D17452348
Pulled By: riversand963
fbshipit-source-id: b09f4581a18022ca2bfd79f2836c0bf7083f5f25
Summary:
Originally the loop of closing WAL in PurgeObsoleteFiles resides inside a loop
iterating over the candidate files. It should be moved out.
Test plan (devserver)
```
$COMPILE_WITH_ASAN=1 make -j32 all
$make check
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5804
Differential Revision: D17374350
Pulled By: riversand963
fbshipit-source-id: 2bee7343fc0481d9a385a87c7676491522285c96
Summary:
We are seeing a bug of wrong results with merging iterator's reseek avoidence feature and prefix extractor. Disable this optimization for now.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5815
Test Plan: Validated the same MyRocks case was fixed; run all existing tests.
Differential Revision: D17430776
fbshipit-source-id: aef664277ba0ab8a2e68331ff0db6ae682535371
Summary:
purge_queue_ maybe contains thousands sst files, for example manual compact a range. If full scan is triggered at the same time and the total sst files number is large, RocksDB will be blocked at https://github.com/facebook/rocksdb/blob/master/db/db_impl_files.cc#L150 for several seconds. In our environment we have 140,000 sst files and the manual compaction delete about 1000 sst files, it blocked about 2 minutes.
Commandeering https://github.com/facebook/rocksdb/issues/5290.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5796
Differential Revision: D17357775
Pulled By: riversand963
fbshipit-source-id: 20eacca917355b8de975ccc7b1c9a3e7bd5b201a
Summary:
https://github.com/facebook/rocksdb/issues/5797 charges the block cache with the total of user-provided charge plus the metadata charge. It had a bug where in MaintainPoolSize the user-provided charge was used instead of the total charge. The patch fixes that.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5813
Differential Revision: D17412783
Pulled By: maysamyabandeh
fbshipit-source-id: 45c0ac9f1e2233760db5ccd61399605cd74edc87
Summary:
Doing some code reordering in DBIter::Seek() and DBIter::SeekForPrev().
The logic largely remains the same, except slight difference when handling some stats when valid_ = false, where they are not supposed to be used anyway.
Also remove prefix_start_key_, which sometimes point a part of seek target, some times prefix_start_buf_, which is confusing.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5794
Test Plan: Run all tests.
Differential Revision: D17375257
fbshipit-source-id: 7339a23898cecd3a8475bf72340fcd6f82b933c5
Summary:
Manual compaction may bring in very high load because sometime the amount of data involved in a compaction could be large, which may affect online service. So it would be good if the running compaction making the server busy can be stopped immediately. In this implementation, stopping manual compaction condition is only checked in slow process. We let deletion compaction and trivial move go through.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/3971
Test Plan: add tests at more spots.
Differential Revision: D17369043
fbshipit-source-id: 575a624fb992ce0bb07d9443eb209e547740043c
Summary:
Update version of dependencies.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5777
Test Plan: make release
Differential Revision: D17269421
fbshipit-source-id: e76dbe5389e1d7f811739d3bc1e404b482dfce34
Summary:
Unity build fails because of name conflict of IsFileSectorAligned() after recent refactoring. Consolidate the function.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5812
Test Plan: make unity. At least the failure goes away. Also "make all", "make release" and see no regression in normal cases.
Differential Revision: D17411403
fbshipit-source-id: 09d5653471ae2c3a4d898e120a024f7dd08d9c9d
Summary:
Refactoring to consolidate implementation details of legacy
Bloom filters. This helps to organize and document some related,
obscure code.
Also added make/cpp var TEST_CACHE_LINE_SIZE so that it's easy to
compile and run unit tests for non-native cache line size. (Fixed a
related test failure in db_properties_test.)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5784
Test Plan:
make check, including Recently added Bloom schema unit tests
(in ./plain_table_db_test && ./bloom_test), and including with
TEST_CACHE_LINE_SIZE=128U and TEST_CACHE_LINE_SIZE=256U. Tested the
schema tests with temporary fault injection into new implementations.
Some performance testing with modified unit tests suggest a small to moderate
improvement in speed.
Differential Revision: D17381384
Pulled By: pdillinger
fbshipit-source-id: ee42586da996798910fc45ac0b6289147f16d8df
Summary:
For our default block cache, each additional entry has extra memory overhead. It include LRUHandle (72 bytes currently) and the cache key (two varint64, file id and offset). The usage is not negligible. For example for block_size=4k, the overhead accounts for an extra 2% memory usage for the cache. The patch charging the cache for the extra usage, reducing untracked memory usage outside block cache. The feature is enabled by default and can be disabled by passing kDontChargeCacheMetadata to the cache constructor.
This PR builds up on https://github.com/facebook/rocksdb/issues/4258
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5797
Test Plan:
- Existing tests are updated to either disable the feature when the test has too much dependency on the old way of accounting the usage or increasing the cache capacity to account for the additional charge of metadata.
- The Usage tests in cache_test.cc are augmented to test the cache usage under kFullChargeCacheMetadata.
Differential Revision: D17396833
Pulled By: maysamyabandeh
fbshipit-source-id: 7684ccb9f8a40ca595e4f5efcdb03623afea0c6f
Summary:
PR https://github.com/facebook/rocksdb/issues/4020 enabled partitioned indexes/filters in stress tests; however,
this causes assertion failures in BatchedOpsStressTest. This patch
disables them until we can root cause the failures.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5811
Test Plan: Ran the script and made sure it only uses the binary search index.
Differential Revision: D17399366
Pulled By: ltamasi
fbshipit-source-id: adb116e6297f9c6ccd7ac15b6a16c9aa91f21ac5
Summary:
This will allow us to fix history by having the code changes for PR#5784 properly attributed to it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5810
Differential Revision: D17400231
Pulled By: pdillinger
fbshipit-source-id: 2da8b1cdf2533cfedb35b5526eadefb38c291f09
Summary:
Several functions of UniversalCompactionPicker share most of the parameters. Move these functions to a class with those shared arguments as class members. Hopefully this will make code slightly easier to maintain.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5639
Test Plan: Run all existing test.
Differential Revision: D16996403
fbshipit-source-id: fffafd1897ab132b420b1dec073542cffb5c44de
Summary:
file_reader_writer.h and .cc contain several files and helper function, and it's hard to navigate. Separate it to multiple files and put them under file/
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5803
Test Plan: Build whole project using make and cmake.
Differential Revision: D17374550
fbshipit-source-id: 10efca907721e7a78ed25bbf74dc5410dea05987
Summary:
DynamicBloom unit test now tests non-sequential as well as
sequential keys in testing FP rates. Also now verifies larger structures.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5805
Test Plan: thisisthetest
Differential Revision: D17398109
Pulled By: pdillinger
fbshipit-source-id: 374074206c76d242efa378afc27830448a0e892a
Summary:
1. Put the similar logic of adding valid iterator to heap and check invalid iterator's status code to the same helper functions.
2. Because of 1, in the changing direction case, move around the places where we check status a little bit so that we can call the helper function there too. The logic would only divert in the case where the iterator is valid but status is not OK, which is not expected to happen. Add an assertion for that.
3. Put the logic of changing direction from forward to backward to a separate function so the unlikely code path is not in Prev().
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5793
Test Plan: run all existing tests.
Differential Revision: D17374397
fbshipit-source-id: d595ffcf156095c4bd0f5532bacba854482a2332
Summary:
Currently IngestExternalFile() fails when its input files' ranges overlap. This condition doesn't need to hold for files that are to be ingested in L0, though.
This commit allows overlapping files and forces their target level to L0.
Additionally, ingest job's completion is logged to EventLogger, analogous to flush and compaction jobs.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5539
Differential Revision: D17370660
Pulled By: riversand963
fbshipit-source-id: 749a3899b17d1be267a5afd5b0a99d96b38ab2f3
Summary:
Move definition and implementation for ArenaWrappedDBIter into its own .h/.cc files. Also, change inlining of functions to better comply with the Google C++ style guide.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5801
Test Plan: make check
Differential Revision: D17371012
Pulled By: anand1976
fbshipit-source-id: c1361abc2851575111e357a63d88be3b3d6cb341
Summary:
In preparing to utilize a new Intel instruction extension, I
noticed problems with the existing build script in regard to the
existing utilized extensions, either with USE_SSE or PORTABLE flags.
* PORTABLE=0 was interpreted the same as PORTABLE=1. Now empty and 0
mean the same. (I guess you were not supposed to set PORTABLE= if you
wanted non-portable--except that...)
* The Facebook build script extensions would set PORTABLE=1 even if
it's already set in a make var or environment. Now it does not override
a non-empty setting, so use PORTABLE=0 for fully optimized build,
overriding Facebook environment default.
* Put in an explanation of the USE_SSE flag where it's used by
build_detect_platform, and cleaned up some confusing/redundant
associated logic.
* If USE_SSE was set and expected intrinsics were not available,
build_detect_platform would exit early but build would proceed with
broken, incomplete configuration. Now warning is gracefully recovered.
* If USE_SSE was set and expected intrinsics were not available,
build would still try to use flags like -msse4.2 etc. which could lead
to unexpected compilation failure or binary incompatibility. Now those
flags are not used if the warning is issued.
This should not break or change existing, valid build scripts.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5800
Test Plan: manual case testing
Differential Revision: D17369543
Pulled By: pdillinger
fbshipit-source-id: 4ee244911680ae71144d272c40aceea548e3ce88
Summary:
prefetch data for following block,avoid cache miss when doing crc caculate
I do performance test at kunpeng-920 server(arm-v8, 64core@2.6GHz)
./db_bench --benchmarks=crc32c --block_size=500000000
before optimise : 587313.500 micros/op 1 ops/sec; 811.9 MB/s (500000000 per op)
after optimise : 289248.500 micros/op 3 ops/sec; 1648.5 MB/s (500000000 per op)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5773
Differential Revision: D17347339
fbshipit-source-id: bfcd74f0f0eb4b322b959be68019ddcaae1e3341
Summary:
PR https://github.com/facebook/rocksdb/issues/4020 implicitly enabled the hash index as well in stress/crash
tests, resulting in assertion failures in Block. This patch disables
the hash index until we can pinpoint the root cause of these issues.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5792
Test Plan:
Ran tools/db_crashtest.py and made sure it only uses index types 0 and 2
(binary search and partitioned index).
Differential Revision: D17346777
Pulled By: ltamasi
fbshipit-source-id: b4318f37f1fda3ee1bbff4ef2c2f556ca9e6b551
Summary:
This is required to compile on Windows with Visual Studio 2015.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5786
Differential Revision: D17335994
fbshipit-source-id: 8f9568310bc6f697e312b5e24ad465e9084f0011
Summary:
The max batch size that we can write to the WAL is controlled by a static manner. So if the leader write is less than 128 KB we will have the batch size as leader write size + 128 KB else the limit will be 1 MB. Both of them are statically defined.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5759
Differential Revision: D17329298
fbshipit-source-id: a3d910629d8d8ca84ea39ad89c2b2d284571ded5