Summary:
We are seeing a bug of wrong results with merging iterator's reseek avoidence feature and prefix extractor. Disable this optimization for now.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5815
Test Plan: Validated the same MyRocks case was fixed; run all existing tests.
Differential Revision: D17430776
fbshipit-source-id: aef664277ba0ab8a2e68331ff0db6ae682535371
Summary:
purge_queue_ maybe contains thousands sst files, for example manual compact a range. If full scan is triggered at the same time and the total sst files number is large, RocksDB will be blocked at https://github.com/facebook/rocksdb/blob/master/db/db_impl_files.cc#L150 for several seconds. In our environment we have 140,000 sst files and the manual compaction delete about 1000 sst files, it blocked about 2 minutes.
Commandeering https://github.com/facebook/rocksdb/issues/5290.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5796
Differential Revision: D17357775
Pulled By: riversand963
fbshipit-source-id: 20eacca917355b8de975ccc7b1c9a3e7bd5b201a
Summary:
https://github.com/facebook/rocksdb/issues/5797 charges the block cache with the total of user-provided charge plus the metadata charge. It had a bug where in MaintainPoolSize the user-provided charge was used instead of the total charge. The patch fixes that.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5813
Differential Revision: D17412783
Pulled By: maysamyabandeh
fbshipit-source-id: 45c0ac9f1e2233760db5ccd61399605cd74edc87
Summary:
Doing some code reordering in DBIter::Seek() and DBIter::SeekForPrev().
The logic largely remains the same, except slight difference when handling some stats when valid_ = false, where they are not supposed to be used anyway.
Also remove prefix_start_key_, which sometimes point a part of seek target, some times prefix_start_buf_, which is confusing.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5794
Test Plan: Run all tests.
Differential Revision: D17375257
fbshipit-source-id: 7339a23898cecd3a8475bf72340fcd6f82b933c5
Summary:
Manual compaction may bring in very high load because sometime the amount of data involved in a compaction could be large, which may affect online service. So it would be good if the running compaction making the server busy can be stopped immediately. In this implementation, stopping manual compaction condition is only checked in slow process. We let deletion compaction and trivial move go through.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/3971
Test Plan: add tests at more spots.
Differential Revision: D17369043
fbshipit-source-id: 575a624fb992ce0bb07d9443eb209e547740043c
Summary:
Update version of dependencies.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5777
Test Plan: make release
Differential Revision: D17269421
fbshipit-source-id: e76dbe5389e1d7f811739d3bc1e404b482dfce34
Summary:
Unity build fails because of name conflict of IsFileSectorAligned() after recent refactoring. Consolidate the function.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5812
Test Plan: make unity. At least the failure goes away. Also "make all", "make release" and see no regression in normal cases.
Differential Revision: D17411403
fbshipit-source-id: 09d5653471ae2c3a4d898e120a024f7dd08d9c9d
Summary:
Refactoring to consolidate implementation details of legacy
Bloom filters. This helps to organize and document some related,
obscure code.
Also added make/cpp var TEST_CACHE_LINE_SIZE so that it's easy to
compile and run unit tests for non-native cache line size. (Fixed a
related test failure in db_properties_test.)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5784
Test Plan:
make check, including Recently added Bloom schema unit tests
(in ./plain_table_db_test && ./bloom_test), and including with
TEST_CACHE_LINE_SIZE=128U and TEST_CACHE_LINE_SIZE=256U. Tested the
schema tests with temporary fault injection into new implementations.
Some performance testing with modified unit tests suggest a small to moderate
improvement in speed.
Differential Revision: D17381384
Pulled By: pdillinger
fbshipit-source-id: ee42586da996798910fc45ac0b6289147f16d8df
Summary:
For our default block cache, each additional entry has extra memory overhead. It include LRUHandle (72 bytes currently) and the cache key (two varint64, file id and offset). The usage is not negligible. For example for block_size=4k, the overhead accounts for an extra 2% memory usage for the cache. The patch charging the cache for the extra usage, reducing untracked memory usage outside block cache. The feature is enabled by default and can be disabled by passing kDontChargeCacheMetadata to the cache constructor.
This PR builds up on https://github.com/facebook/rocksdb/issues/4258
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5797
Test Plan:
- Existing tests are updated to either disable the feature when the test has too much dependency on the old way of accounting the usage or increasing the cache capacity to account for the additional charge of metadata.
- The Usage tests in cache_test.cc are augmented to test the cache usage under kFullChargeCacheMetadata.
Differential Revision: D17396833
Pulled By: maysamyabandeh
fbshipit-source-id: 7684ccb9f8a40ca595e4f5efcdb03623afea0c6f
Summary:
PR https://github.com/facebook/rocksdb/issues/4020 enabled partitioned indexes/filters in stress tests; however,
this causes assertion failures in BatchedOpsStressTest. This patch
disables them until we can root cause the failures.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5811
Test Plan: Ran the script and made sure it only uses the binary search index.
Differential Revision: D17399366
Pulled By: ltamasi
fbshipit-source-id: adb116e6297f9c6ccd7ac15b6a16c9aa91f21ac5
Summary:
This will allow us to fix history by having the code changes for PR#5784 properly attributed to it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5810
Differential Revision: D17400231
Pulled By: pdillinger
fbshipit-source-id: 2da8b1cdf2533cfedb35b5526eadefb38c291f09
Summary:
Several functions of UniversalCompactionPicker share most of the parameters. Move these functions to a class with those shared arguments as class members. Hopefully this will make code slightly easier to maintain.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5639
Test Plan: Run all existing test.
Differential Revision: D16996403
fbshipit-source-id: fffafd1897ab132b420b1dec073542cffb5c44de
Summary:
file_reader_writer.h and .cc contain several files and helper function, and it's hard to navigate. Separate it to multiple files and put them under file/
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5803
Test Plan: Build whole project using make and cmake.
Differential Revision: D17374550
fbshipit-source-id: 10efca907721e7a78ed25bbf74dc5410dea05987
Summary:
DynamicBloom unit test now tests non-sequential as well as
sequential keys in testing FP rates. Also now verifies larger structures.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5805
Test Plan: thisisthetest
Differential Revision: D17398109
Pulled By: pdillinger
fbshipit-source-id: 374074206c76d242efa378afc27830448a0e892a
Summary:
1. Put the similar logic of adding valid iterator to heap and check invalid iterator's status code to the same helper functions.
2. Because of 1, in the changing direction case, move around the places where we check status a little bit so that we can call the helper function there too. The logic would only divert in the case where the iterator is valid but status is not OK, which is not expected to happen. Add an assertion for that.
3. Put the logic of changing direction from forward to backward to a separate function so the unlikely code path is not in Prev().
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5793
Test Plan: run all existing tests.
Differential Revision: D17374397
fbshipit-source-id: d595ffcf156095c4bd0f5532bacba854482a2332
Summary:
Currently IngestExternalFile() fails when its input files' ranges overlap. This condition doesn't need to hold for files that are to be ingested in L0, though.
This commit allows overlapping files and forces their target level to L0.
Additionally, ingest job's completion is logged to EventLogger, analogous to flush and compaction jobs.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5539
Differential Revision: D17370660
Pulled By: riversand963
fbshipit-source-id: 749a3899b17d1be267a5afd5b0a99d96b38ab2f3
Summary:
Move definition and implementation for ArenaWrappedDBIter into its own .h/.cc files. Also, change inlining of functions to better comply with the Google C++ style guide.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5801
Test Plan: make check
Differential Revision: D17371012
Pulled By: anand1976
fbshipit-source-id: c1361abc2851575111e357a63d88be3b3d6cb341
Summary:
In preparing to utilize a new Intel instruction extension, I
noticed problems with the existing build script in regard to the
existing utilized extensions, either with USE_SSE or PORTABLE flags.
* PORTABLE=0 was interpreted the same as PORTABLE=1. Now empty and 0
mean the same. (I guess you were not supposed to set PORTABLE= if you
wanted non-portable--except that...)
* The Facebook build script extensions would set PORTABLE=1 even if
it's already set in a make var or environment. Now it does not override
a non-empty setting, so use PORTABLE=0 for fully optimized build,
overriding Facebook environment default.
* Put in an explanation of the USE_SSE flag where it's used by
build_detect_platform, and cleaned up some confusing/redundant
associated logic.
* If USE_SSE was set and expected intrinsics were not available,
build_detect_platform would exit early but build would proceed with
broken, incomplete configuration. Now warning is gracefully recovered.
* If USE_SSE was set and expected intrinsics were not available,
build would still try to use flags like -msse4.2 etc. which could lead
to unexpected compilation failure or binary incompatibility. Now those
flags are not used if the warning is issued.
This should not break or change existing, valid build scripts.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5800
Test Plan: manual case testing
Differential Revision: D17369543
Pulled By: pdillinger
fbshipit-source-id: 4ee244911680ae71144d272c40aceea548e3ce88
Summary:
prefetch data for following block,avoid cache miss when doing crc caculate
I do performance test at kunpeng-920 server(arm-v8, 64core@2.6GHz)
./db_bench --benchmarks=crc32c --block_size=500000000
before optimise : 587313.500 micros/op 1 ops/sec; 811.9 MB/s (500000000 per op)
after optimise : 289248.500 micros/op 3 ops/sec; 1648.5 MB/s (500000000 per op)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5773
Differential Revision: D17347339
fbshipit-source-id: bfcd74f0f0eb4b322b959be68019ddcaae1e3341
Summary:
PR https://github.com/facebook/rocksdb/issues/4020 implicitly enabled the hash index as well in stress/crash
tests, resulting in assertion failures in Block. This patch disables
the hash index until we can pinpoint the root cause of these issues.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5792
Test Plan:
Ran tools/db_crashtest.py and made sure it only uses index types 0 and 2
(binary search and partitioned index).
Differential Revision: D17346777
Pulled By: ltamasi
fbshipit-source-id: b4318f37f1fda3ee1bbff4ef2c2f556ca9e6b551
Summary:
This is required to compile on Windows with Visual Studio 2015.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5786
Differential Revision: D17335994
fbshipit-source-id: 8f9568310bc6f697e312b5e24ad465e9084f0011
Summary:
The max batch size that we can write to the WAL is controlled by a static manner. So if the leader write is less than 128 KB we will have the batch size as leader write size + 128 KB else the limit will be 1 MB. Both of them are statically defined.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5759
Differential Revision: D17329298
fbshipit-source-id: a3d910629d8d8ca84ea39ad89c2b2d284571ded5
Summary:
Use delete to disable automatic generated methods instead of private, and put the constructor together for more clear.This modification cause the unused field warning, so add unused attribute to disable this warning.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5009
Differential Revision: D17288733
fbshipit-source-id: 8a767ce096f185f1db01bd28fc88fef1cdd921f3
Summary:
cmake doesn't re-generate the timestamp on subsequent builds causing rebuilds of the lib
This improves compile time turn-arounds if you have rocksdb as a compileable library include, since with the state its now it will re-generate the time stamp .cc file each time you build, and thus re-compile + re-link the rocksdb library though anything in the source actually changed.
The original timestamp is recorded into `CMakeCache.txt` and will remain there until you flush this cache.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4799
Differential Revision: D17290040
fbshipit-source-id: 28357fef3422693c9c19e88fa2873c8db0f662ed
Summary:
- In `db_stress`, support choosing index type and whether to enable filter partitioning, and randomly set those options in crash test
- When partitioned filter is enabled by crash test, force partitioned index to also be enabled since it's a prerequisite
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4020
Test Plan:
currently this is blocked on fixing the bug that crash test caught:
```
$ TEST_TMPDIR=/data/compaction_bench python ./tools/db_crashtest.py blackbox --simple --interval=10 --max_key=10000000
...
Verification failed for column family 0 key 937501: Value not found: NotFound:
Crash-recovery verification failed :(
```
Differential Revision: D8508683
Pulled By: maysamyabandeh
fbshipit-source-id: 0337e5d0558bcef26b1f3699f47265a2c1e99629
Summary:
On older macOS like 10.10 we saw the following compiler error:
```
/go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/env/env_posix.cc:845:19:
error: use of undeclared identifier 'CLOCK_THREAD_CPUTIME_ID'
clock_gettime(CLOCK_THREAD_CPUTIME_ID, &ts);
^
```
According to mac's `man clock_gettime`: "These functions first appeared in Mac
OSX 10.12". So we should not try to compile it on earlier versions.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5570
Test Plan:
verified it compiles now on 10.10. Also did some investigation to
ensure it does not cause regression on macOS 10.12+, although I do not
have access to such an environment to really test.
Differential Revision: D17322629
Pulled By: riversand963
fbshipit-source-id: e0a412223854f826b4d83e6d15c3739ff4620d7d
Summary:
for fillbatch benchmar, the numEntries should be [num_] but not [num_ / 1000] because numEntries is just the total entries we want to test
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5198
Differential Revision: D17274664
Pulled By: anand1976
fbshipit-source-id: f96e952babdbac63fb99d14e1254d478a10437be
Summary:
i.e. if alive logfile is not being moved to archive while we are in GetSortedWalsOfType()
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5695
Differential Revision: D17279489
Pulled By: vjnadimpalli
fbshipit-source-id: 02bcf920a75b812edba8b87c6079b4e6fd5e683c
Summary:
Bug found by valgrind. New DynamicBloom wasn't allocating in
block sizes. New assertion added that probes starting in final word
would be in bounds.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5783
Test Plan: ROCKSDB_VALGRIND_RUN=1 DISABLE_JEMALLOC=1 valgrind --leak-check=full ./dynamic_bloom_test
Differential Revision: D17270623
Pulled By: pdillinger
fbshipit-source-id: 1e0407504b875133a771383cd488c70f91be2b87
Summary:
Check that we don't accidentally change the on-disk format of
existing Bloom filter implementations, including for various
CACHE_LINE_SIZE (by changing temporarily).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5778
Test Plan: thisisthetest
Differential Revision: D17269630
Pulled By: pdillinger
fbshipit-source-id: c77017662f010a77603b7d475892b1f0d5563d8b
Summary:
When building with clang 9, warning is reported for InternalDBStatsType type names shadowed the one for statistics. Rename them.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5779
Test Plan: Build with clang 9 and see it passes.
Differential Revision: D17239378
fbshipit-source-id: af28fb42066c738cd1b841f9fe21ab4671dafd18
Summary:
cmake list add +crypto flag when use armv8 cpu
the function crc32c_arm64 use HAVE_ARM64_CRYPTO to check if can enable arm-neon instructions :
#ifdef HAVE_ARM64_CRYPTO
/* Crc32c Parallel computation
* Algorithm comes from Intel whitepaper:
* crc-iscsi-polynomial-crc32-instruction-paper
*
* Input data is divided into three equal-sized blocks
* Three parallel blocks (crc0, crc1, crc2) for 1024 Bytes
* One Block: 42(BLK_LENGTH) * 8(step length: crc32c_u64) bytes
*/
but the cmakelist not check and pass crypto flag now
I check the default Makefile has it:
ifeq (,$(shell $(CXX) -fsyntax-only -march=armv8-a+crc -xc /dev/null 2>&1))
CXXFLAGS += -march=armv8-a+crc+crypto
CFLAGS += -march=armv8-a+crc+crypto
ARMCRC_SOURCE=1
endif
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5750
Differential Revision: D17242027
fbshipit-source-id: 443c9b89755b4bc34e265205ab922db1b2e14bde
Summary:
ReadYourOwnWriteStress occasionally times out on some platforms. The patch splits it to three.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5776
Differential Revision: D17231743
Pulled By: maysamyabandeh
fbshipit-source-id: d42eeaf22f61a48d50f9c404d98b1081ae8dac94
Summary:
These uninitialized member variables can cause a key to not be pinned when it should be, causing erroneous behavior. For example ingesting a file with range deletion tombstones will yield an "external file have corrupted keys" on a Mac.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5720
Differential Revision: D17217673
fbshipit-source-id: cd7df7ce3ad9cf69c841c4d3dc6fd144eff9e212
Summary:
Fixes https://github.com/facebook/rocksdb/issues/5734. By reading the code the assert don't quite make sense to me, since `dataSize` and `fileOffset` has no correlation. But my knowledge about `EncryptedEnv` is very limited.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5735
Test Plan:
run `ENCRYPTED_ENV=1 ./db_encryption_test`
Signed-off-by: Yi Wu <yiwu@pingcap.com>
Differential Revision: D17133849
fbshipit-source-id: bb7262d308e5b2503c400b180edc252668df0ef0
Summary:
The `#include "core_local.h"` was pulling in libgcc's `posix_memalign()`
declaration. That declaration specifies `throw()` whereas musl libc's
declaration does not. This was leading to the following compiler error
when using musl libc:
```
In file included from /go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/port/jemalloc_helper.h:26:0,
from /go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/util/jemalloc_nodump_allocator.h:11,
from /go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/util/jemalloc_nodump_allocator.cc:6:
/go/native/x86_64-unknown-linux-musl/jemalloc/include/jemalloc/jemalloc.h:63:29: error: declaration of 'int posix_memalign(void**, size_t, size_t) throw ()' has a different exception specifier
# define je_posix_memalign posix_memalign
^
/go/native/x86_64-unknown-linux-musl/jemalloc/include/jemalloc/jemalloc.h:63:29: note: from previous declaration 'int posix_memalign(void**, size_t, size_t)'
# define je_posix_memalign posix_memalign
^
/go/native/x86_64-unknown-linux-musl/jemalloc/include/jemalloc/jemalloc.h:202:38: note: in expansion of macro 'je_posix_memalign'
JEMALLOC_EXPORT int JEMALLOC_NOTHROW je_posix_memalign(void **memptr,
^~~~~~~~~~~~~~~~~
make[4]: *** [CMakeFiles/rocksdb.dir/util/jemalloc_nodump_allocator.cc.o] Error 1
```
Since `#include "core_local.h"` is not actually used, we can just remove
it. I verified that fixes the build.
There was a related PR here (https://github.com/facebook/rocksdb/issues/2188), although the problem description is
slightly different.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5583
Differential Revision: D16343227
fbshipit-source-id: 0386bc2b5fd55b2c3b5fba19382014efa52e44f8