Summary:
**Background** - runtime detection of certain x86 CPU features was added for optimizing CRC32c checksums, where performance is dramatically affected by the availability of certain CPU instructions and code using intrinsics for those instructions. And Java builds with native library try to be broadly compatible but performant.
What has changed is that CRC32c is no longer the most efficient cheecksum on contemporary x86_64 hardware, nor the default checksum. XXH3 is generally faster and not as dramatically impacted by the availability of certain CPU instructions. For example, on my Skylake system using db_bench (similar on an older Skylake system without AVX512):
PORTABLE=1 empty USE_SSE : xxh3->8 GB/s crc32c->0.8 GB/s (no SSE4.2 nor AVX2 instructions)
PORTABLE=1 USE_SSE=1 : xxh3->19 GB/s crc32c->16 GB/s (with SSE4.2 and AVX2)
PORTABLE=0 USE_SSE ignored: xxh3->28 GB/s crc32c->16 GB/s (also some AVX512)
Testing a ~10 year old system, with SSE4.2 but without AVX2, crc32c is a similar speed to the new systems but xxh3 is only about half that speed, also 8GB/s like the non-AVX2 compile above. Given that xxh3 has specific optimization for AVX2, I think we can infer that that crc32c is only fastest for that ~2008-2013 period when SSE4.2 was included but not AVX2. And given that xxh3 is only about 2x slower on these systems (not like >10x slower for unoptimized crc32c), I don't think we need to invest too much in optimally adapting to these old cases.
x86 hardware that doesn't support fast CRC32c is now extremely rare, so requiring a custom build to support such hardware is fine IMHO.
**This change** does two related things:
* Remove runtime CPU detection for optimizing CRC32c on x86. Maintaining this code is non-zero work, and compiling special code that doesn't work on the configured target instruction set for code generation is always dubious. (On the one hand we have to ensure the CRC32c code uses SSE4.2 but on the other hand we have to ensure nothing else does.)
* Detect CPU features in source code, not in build scripts. Although there are some hypothetical advantages to detectiong in build scripts (compiler generality), RocksDB supports at least three build systems: make, cmake, and buck. It's not practical to support feature detection on all three, and we have suffered from missed optimization opportunities by relying on missing or incomplete detection in cmake and buck. We also depend on some components like xxhash that do source code detection anyway.
**In more detail:**
* `HAVE_SSE42`, `HAVE_AVX2`, and `HAVE_PCLMUL` replaced by standard macros `__SSE4_2__`, `__AVX2__`, and `__PCLMUL__`.
* MSVC does not provide high fidelity defines for SSE, PCLMUL, or POPCNT, but we can infer those from `__AVX__` or `__AVX2__` in a compatibility header. In rare cases of false negative or false positive feature detection, a build engineer should be able to set defines to work around the issue.
* `__POPCNT__` is another standard define, but we happen to only need it on MSVC, where it is set by that compatibility header, or can be set by the build engineer.
* `PORTABLE` can be set to a CPU type, e.g. "haswell", to compile for that CPU type.
* `USE_SSE` is deprecated, now equivalent to PORTABLE=haswell, which roughly approximates its old behavior.
Notably, this change should enable more builds to use the AVX2-optimized Bloom filter implementation.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11419
Test Plan:
existing tests, CI
Manual performance tests after the change match the before above (none expected with make build).
We also see AVX2 optimized Bloom filter code enabled when expected, by injecting a compiler error. (Performance difference is not big on my current CPU.)
Reviewed By: ajkr
Differential Revision: D45489041
Pulled By: pdillinger
fbshipit-source-id: 60ceb0dd2aa3b365c99ed08a8b2a087a9abb6a70
Summary:
We haven't been actively mantaining RocksDB LITE recently and the size must have been gone up significantly. We are removing the support.
Most of changes were done through following comments:
unifdef -m -UROCKSDB_LITE `git grep -l ROCKSDB_LITE | egrep '[.](cc|h)'`
by Peter Dillinger. Others changes were manually applied to build scripts, CircleCI manifests, ROCKSDB_LITE is used in an expression and file db_stress_test_base.cc.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11147
Test Plan: See CI
Reviewed By: pdillinger
Differential Revision: D42796341
fbshipit-source-id: 4920e15fc2060c2cd2221330a6d0e5e65d4b7fe2
Summary:
The current integration with folly requires cherry-picking folly source files to include in RocksDB for external CI builds. Its not scaleable as we depend on more features in folly, such as coroutines. This PR adds a dependency from RocksDB to the folly library when ```USE_FOLLY``` or ```USE_COROUTINES``` are set. We build folly using the build scripts in ```third-party/folly```, relying on it to download and build its dependencies. A new ```Makefile``` target, ```build_folly```, is provided to make building folly easier.
A new option, ```USE_FOLLY_LITE``` is added to retain the old model of compiling selected folly sources with RocksDB. This might be useful for short-term development.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/10103
Reviewed By: pdillinger
Differential Revision: D38426787
Pulled By: anand1976
fbshipit-source-id: 33bc84abd9fdc7e2567749f02aa1b2494eb62b2f
Summary:
Some C++ code changes between version 7.1.2 and 7.2.2 now seem to require at least macOS 10.13 (2017) to build successfully, previously we needed 10.12 (2016). I haven't been able to identify the exact commit.
**NOTE**: This needs to be merged to both `main` and `7.2.fb` branches.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9976
Reviewed By: jay-zhuang
Differential Revision: D36303226
Pulled By: ajkr
fbshipit-source-id: 589ce3ecf821db3402b0876e76d37b407896c945
Summary:
ROCKSDB_SUPPORT_THREAD_LOCAL definition has been removed.
`__thread`(#define) has been replaced with `thread_local`(C++ keyword) across the code base.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/10015
Reviewed By: siying
Differential Revision: D36485491
Pulled By: pdillinger
fbshipit-source-id: 6522d212514ee190b90b4e2750c80c7e34013c78
Summary:
Improve memkind library detection in build_detect_platform:
- The current position of -lmemkind does not work with all versions of gcc
- LDFLAGS allows specifying non-standard library path through EXTRA_LDFLAGS
After the change, the options match TBB detection.
This is a follow-up to https://github.com/facebook/rocksdb/issues/6214.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9134
Reviewed By: ajkr, mrambacher
Differential Revision: D32192028
fbshipit-source-id: 115fafe8d93f1fe6aaf80afb32b2cb67aad074c7
Summary:
It's to support Meta's internal environment platform010. Gcc still doesn't work but USE_CLANG=1 should work.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9843
Test Plan: Try to make and ROCKSDB_FBCODE_BUILD_WITH_PLATFORM010=1 USE_CLANG=1 make
Reviewed By: pdillinger
Differential Revision: D35652507
fbshipit-source-id: a4a14b2fa4a2d6ca6fbf1b65060e81c39f079363
Summary:
Especially after updating to C++17, I don't see a compelling case for
*requiring* any folly components in RocksDB. I was able to purge the existing
hard dependencies, and it can be quite difficult to strip out non-trivial components
from folly for use in RocksDB. (The prospect of doing that on F14 has changed
my mind on the best approach here.)
But this change creates an optional integration where we can plug in
components from folly at compile time, starting here with F14FastMap to replace
std::unordered_map when possible (probably no public APIs for example). I have
replaced the biggest CPU users of std::unordered_map with compile-time
pluggable UnorderedMap which will use F14FastMap when USE_FOLLY is set.
USE_FOLLY is always set in the Meta-internal buck build, and a simulation of
that is in the Makefile for public CI testing. A full folly build is not needed, but
checking out the full folly repo is much simpler for getting the dependency,
and anything else we might want to optionally integrate in the future.
Some picky details:
* I don't think the distributed mutex stuff is actually used, so it was easy to remove.
* I implemented an alternative to `folly::constexpr_log2` (which is much easier
in C++17 than C++11) so that I could pull out the hard dependencies on
`ConstexprMath.h`
* I had to add noexcept move constructors/operators to some types to make
F14's complainUnlessNothrowMoveAndDestroy check happy, and I added a
macro to make that easier in some common cases.
* Updated Meta-internal buck build to use folly F14Map (always)
No updates to HISTORY.md nor INSTALL.md as this is not (yet?) considered a
production integration for open source users.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9546
Test Plan:
CircleCI tests updated so that a couple of them use folly.
Most internal unit & stress/crash tests updated to use Meta-internal latest folly.
(Note: they should probably use buck but they currently use Makefile.)
Example performance improvement: when filter partitions are pinned in cache,
they are tracked by PartitionedFilterBlockReader::filter_map_ and we can build
a test that exercises that heavily. Build DB with
```
TEST_TMPDIR=/dev/shm/rocksdb ./db_bench -benchmarks=fillrandom -num=10000000 -disable_wal=1 -write_buffer_size=30000000 -bloom_bits=16 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=0 -partition_index_and_filters
```
and test with (simultaneous runs with & without folly, ~20 times each to see
convergence)
```
TEST_TMPDIR=/dev/shm/rocksdb ./db_bench_folly -readonly -use_existing_db -benchmarks=readrandom -num=10000000 -bloom_bits=16 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=0 -partition_index_and_filters -duration=40 -pin_l0_filter_and_index_blocks_in_cache
```
Average ops/s no folly: 26229.2
Average ops/s with folly: 26853.3 (+2.4%)
Reviewed By: ajkr
Differential Revision: D34181736
Pulled By: pdillinger
fbshipit-source-id: ffa6ad5104c2880321d8a1aa7187e00ab0d02e94
Summary:
Make `commit_prereq` work and a few other improvements:
* Remove gcc 481 and gcc5xx which are no longer supported
* Remove platform007 which is gone
* `make clean` work for both mac and linux
* `precommit_checker.py` to python3
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9797
Test Plan: `make commit_prereq`
Reviewed By: ajkr
Differential Revision: D35338536
Pulled By: jay-zhuang
fbshipit-source-id: 1e159962ab9d31c43c4b85de7d0f582d3e881ffe
Summary:
Fix g++ -march=native detection and reenable s390x in travis
This PR fixes s390x assembler messages:
```
Error: invalid switch -march=z14
Error: unrecognized option -march=z14
```
The s390x travis build was failing with gcc-7 because the assembler on
ubuntu 16.04 is too old to recognize the z14 model so it doesn't work
with -march=native on a z14 machine. It fixes the check for the
-march=native flag so that the assembler will get called and correctly
fail on ubuntu 16.04 which will cause the build to fall back to
-march=z196 which works.
The other changes are needed so builds work more consistently on
s390x:
1. Set make parallelism to 1 for s390x: The default was 4 previously
but I saw frequent internal compiler errors on travis probably due to
low resources. The `platform_dependent` job works more consistently
but is roughly 10 minutes slower although it varies.
2. Remove status_checked jobs, as we are relying on CircleCI for
these now and do not really need platform coverage on them.
Fixes https://github.com/facebook/rocksdb/issues/9524
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9631
Test Plan: CI
Reviewed By: ajkr
Differential Revision: D34553989
Pulled By: pdillinger
fbshipit-source-id: a6e3a7276446721c4c0bebc4ed217c2ca2b53f11
Summary:
Related to: https://github.com/facebook/rocksdb/pull/9215
* Adds build_detect_platform support for RISCV on Linux (at least on SiFive Unmatched platforms)
This still leaves some linking issues on RISCV remaining (e.g. when building `db_test`):
```
/usr/bin/ld: ./librocksdb_debug.a(memtable.o): in function `__gnu_cxx::new_allocator<char>::deallocate(char*, unsigned long)':
/usr/include/c++/10/ext/new_allocator.h:133: undefined reference to `__atomic_compare_exchange_1'
/usr/bin/ld: ./librocksdb_debug.a(memtable.o): in function `std::__atomic_base<bool>::compare_exchange_weak(bool&, bool, std::memory_order, std::memory_order)':
/usr/include/c++/10/bits/atomic_base.h:464: undefined reference to `__atomic_compare_exchange_1'
/usr/bin/ld: /usr/include/c++/10/bits/atomic_base.h:464: undefined reference to `__atomic_compare_exchange_1'
/usr/bin/ld: /usr/include/c++/10/bits/atomic_base.h:464: undefined reference to `__atomic_compare_exchange_1'
/usr/bin/ld: /usr/include/c++/10/bits/atomic_base.h:464: undefined reference to `__atomic_compare_exchange_1'
/usr/bin/ld: ./librocksdb_debug.a(memtable.o):/usr/include/c++/10/bits/atomic_base.h:464: more undefined references to `__atomic_compare_exchange_1' follow
/usr/bin/ld: ./librocksdb_debug.a(db_impl.o): in function `rocksdb::DBImpl::NewIteratorImpl(rocksdb::ReadOptions const&, rocksdb::ColumnFamilyData*, unsigned long, rocksdb::ReadCallback*, bool, bool)':
/home/adamretter/rocksdb/db/db_impl/db_impl.cc:3019: undefined reference to `__atomic_exchange_1'
/usr/bin/ld: ./librocksdb_debug.a(write_thread.o): in function `rocksdb::WriteThread::Writer::CreateMutex()':
/home/adamretter/rocksdb/./db/write_thread.h:205: undefined reference to `__atomic_compare_exchange_1'
/usr/bin/ld: ./librocksdb_debug.a(write_thread.o): in function `rocksdb::WriteThread::SetState(rocksdb::WriteThread::Writer*, unsigned char)':
/home/adamretter/rocksdb/db/write_thread.cc:222: undefined reference to `__atomic_compare_exchange_1'
collect2: error: ld returned 1 exit status
make: *** [Makefile:1449: db_test] Error 1
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9366
Reviewed By: jay-zhuang
Differential Revision: D34377664
Pulled By: mrambacher
fbshipit-source-id: c86f9d0cd1cb0c18de72b06f1bf5847f23f51118
Summary:
For RocksJava 7 we will move from requiring Java 7 to Java 8.
* This simplifies the `Makefile` as we no longer need to deal with Java 7; so we no longer use `javah`.
* Added a java-version target which is invoked by the java target, and which exits if the version of java being used is not 8 or greater.
* Enforces java 8 as a minimum.
* Fixed CMake build.
* Fixed broken java event listener test, as the test was broken and the assertions in the callbacks were not causing assertions in the tests. The callbacks now queue up assertion errors for the main thread of the tests to check.
* Fixed C++ dangling pointers in the test code.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9541
Reviewed By: pdillinger
Differential Revision: D34214929
Pulled By: jay-zhuang
fbshipit-source-id: fdff348758d0a23a742e83c87d5f54073ce16ca6
Summary:
* Fix LIB_MODE=shared for Meta-internal builds (use PIC libraries
appropriately)
* Fix gnu_parallel to recognize CircleCI and Travis builds as not
connected to a terminal (was previously relying on the
`| cat_ignore_eagain` stuff for Ubuntu 16). This problem could cause
timeouts that should be 10m to balloon to 5h.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9553
Test Plan: manual and CI
Reviewed By: jay-zhuang
Differential Revision: D34182886
Pulled By: pdillinger
fbshipit-source-id: e95fd8002d94c8dc414bae1975e4fd348589f2b5
Summary:
Drop support for some old compilers by requiring C++17 standard
(or higher). See https://github.com/facebook/rocksdb/issues/9388
First modification based on this is to remove some conditional compilation in slice.h (also
better for ODR)
Also in this PR:
* Fix some Makefile formatting that seems to affect ASSERT_STATUS_CHECKED config in
some cases
* Add c_test to NON_PARALLEL_TEST in Makefile
* Fix a clang-analyze reported "potential leak" in lru_cache_test
* Better "compatibility" definition of DEFINE_uint32 for old versions of gflags
* Fix a linking problem with shared libraries in Makefile (`./random_test: error while loading shared libraries: librocksdb.so.6.29: cannot open shared object file: No such file or directory`)
* Always set ROCKSDB_SUPPORT_THREAD_LOCAL and use thread_local (from C++11)
* TODO in later PR: clean up that obsolete flag
* Fix a cosmetic typo in c.h (https://github.com/facebook/rocksdb/issues/9488)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9481
Test Plan:
CircleCI config substantially updated.
* Upgrade to latest Ubuntu images for each release
* Generally prefer Ubuntu 20, but keep a couple Ubuntu 16 builds with oldest supported
compilers, to ensure compatibility
* Remove .circleci/cat_ignore_eagain except for Ubuntu 16 builds, because this is to work
around a kernel bug that should not affect anything but Ubuntu 16.
* Remove designated gcc-9 build, because the default linux build now uses GCC 9 from
Ubuntu 20.
* Add some `apt-key add` to fix some apt "couldn't be verified" errors
* Generally drop SKIP_LINK=1; work-around no longer needed
* Generally `add-apt-repository` before `apt-get update` as manual testing indicated the
reverse might not work.
Travis:
* Use gcc-7 by default (remove specific gcc-7 and gcc-4.8 builds)
* TODO in later PR: fix s390x "Assembler messages: Error: invalid switch -march=z14" failure
AppVeyor:
* Completely dropped because we are dropping VS2015 support and CircleCI covers
VS >= 2017
Also local testing with old gflags (out of necessity when using ROCKSDB_NO_FBCODE=1).
Reviewed By: mrambacher
Differential Revision: D33946377
Pulled By: pdillinger
fbshipit-source-id: ae077c823905b45370a26c0103ada119459da6c1
Summary:
This PR moves HDFS support from RocksDB repo to a separate repo. The new (temporary?) repo
in this PR serves as an example before we finalize the decision on where and who to host hdfs support. At this point,
people can start from the example repo and fork.
Java/JNI is not included yet, and needs to be done later if necessary.
The goal is to include this commit in RocksDB 7.0 release.
Reference:
https://github.com/ajkr/dedupfs by ajkr
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9170
Test Plan:
Follow the instructions in https://github.com/riversand963/rocksdb-hdfs-env/blob/master/README.md. Build and run db_bench and db_stress.
make check
Reviewed By: ajkr
Differential Revision: D33751662
Pulled By: riversand963
fbshipit-source-id: 22b4db7f31762ed417a20239f5a08dcd1696244f
Summary:
Closing https://github.com/facebook/rocksdb/issues/5954
fsync/fdatasync on Linux:
```
(fsync/fdatasync) includes writing through or flushing a disk cache if present.
```
However, on OS X and iOS:
```
(fsync) will flush all data from the host to the drive (i.e. the "permanent storage device"),
the drive itself may not physically write the data to the platters for quite some time and it
may be written in an out-of-order sequence.
```
Solution is to use `fcntl(F_FULLFSYNC)` on OS X so that we get the same
persistence guarantee.
According to OSX man page,
```
The F_FULLFSYNC fcntl asks the drive to flush **all** buffered data to permanent storage.
```
This suggests that it will be no faster than `fsync` on Linux, since Linux, according to its man page,
```
writing through or flushing a disk cache if present
```
It means Linux may not flush **all** data from disk cache.
This is similar to bug reports/fixes in:
- golang: https://github.com/golang/go/issues/26650
- leveldb: 296de8d5b8.
Not sure if we should fallback to fsync since we break persistence contract.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9356
Reviewed By: jay-zhuang
Differential Revision: D33417416
Pulled By: riversand963
fbshipit-source-id: 475548ff9c5eaccde325e0f6842694271cbc8cb7
Summary:
Replace `-o /dev/null` by `-o test.o` when testing for C++ features such as
-faligned-new otherwise tests will fail with some bugged binutils
(https://sourceware.org/bugzilla/show_bug.cgi?id=19526):
```
output/host/bin/xtensa-buildroot-linux-uclibc-g++ -faligned-new -x c++ - -o /dev/null <<EOF
struct alignas(1024) t {int a;};
int main() {}
EOF
/home/fabrice/buildroot/output/host/lib/gcc/xtensa-buildroot-linux-uclibc/8.3.0/../../../../xtensa-buildroot-linux-uclibc/bin/ld: final link failed: file truncated
```
Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com>
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6479
Reviewed By: ajkr
Differential Revision: D33574136
Pulled By: riversand963
fbshipit-source-id: 12b48658b17e36013042c98219b89ddf71161d3c
Summary:
Move the 'macosx-version-min' arg to the front of PLATFORM_SHARED_LDFLAGS so that it doesn't get concatenated with the library name. Fixes https://github.com/facebook/rocksdb/issues/9146
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9149
Reviewed By: mrambacher
Differential Revision: D32396101
Pulled By: pdillinger
fbshipit-source-id: aefcf53384e64d399049f158779acc3a4e54a8fe
Summary:
This PR adds support for building on s390x including updating travis CI. It uses the previous work in https://github.com/facebook/rocksdb/pull/6168 and adds some more changes to get all current tests (make check and jni tests) to pass. The tests were run with snappy, lz4, bzip2 and zstd all compiled in.
There are a few pieces still needed to get the travis build working that I don't think I can do. adamretter is this something you could help with?
1. A prebuilt https://rocksdb-deps.s3-us-west-2.amazonaws.com/cmake/cmake-3.14.5-Linux-s390x.deb package
2. A https://hub.docker.com/r/evolvedbinary/rocksjava s390x image
Not sure if there is more required for travis. Happy to help in any way I can.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8962
Reviewed By: mrambacher
Differential Revision: D31802198
Pulled By: pdillinger
fbshipit-source-id: 683511466fa6b505f85ba5a9964a268c6151f0c2
Summary:
DistributedMutex hasn't been used in the code base and enabling
`USE_FOLLY_DISTRIBUTED_MUTEX` only runs the mutex tests from third-party
lib. So disabling it for now.
The implementation may also out of date, should re-sync with folly before
using.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8584
Test Plan: CI
Reviewed By: ajkr
Differential Revision: D29888960
Pulled By: jay-zhuang
fbshipit-source-id: 3e75f73386c6ed03efb96a1400258d602a724f17
Summary:
Add google benchmark for microbench.
Add ribbon_bench for benchmark ribbon filter vs. other filters.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8493
Test Plan:
added test to CI
To run the benchmark on devhost:
Install benchmark: `$ sudo dnf install google-benchmark-devel`
Build and run:
`$ ROCKSDB_NO_FBCODE=1 DEBUG_LEVEL=0 make microbench`
or with cmake:
`$ mkdir build && cd build && cmake .. -DCMAKE_BUILD_TYPE=Release -DWITH_BENCHMARK=1 && make microbench`
Reviewed By: pdillinger
Differential Revision: D29589649
Pulled By: jay-zhuang
fbshipit-source-id: 8fed13b562bef4472f161ecacec1ab6b18911dff
Summary:
platform007 being phased out and sometimes broken
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8389
Test Plan: `make V=1` to see which compiler is being used
Reviewed By: jay-zhuang
Differential Revision: D29067183
Pulled By: pdillinger
fbshipit-source-id: d1b07267cbc55baa9395f2f4fe3967cc6dad52f7
Summary:
By default, try to build with liburing. For make, if ROCKSDB_USE_IO_URING is not set, treat as 1, which means RocksDB will try to build with liburing. For cmake, add WITH_LIBURING to control it, with default on.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8322
Test Plan: Build using cmake and make.
Reviewed By: anand1976
Differential Revision: D28586498
fbshipit-source-id: cfd39159ab697f4b93a9293a59c07f839b1e7ed5
Summary:
At least under MacOS, some things were excluded from the build (like Snappy) because the compilation flags were not passed in correctly. This PR does a few things:
- Passes the EXTRA_CXX/LDFLAGS into build_detect_platform. This means that if some tool (like TBB for example) is not installed in a standard place, it could still be detected by build_detect_platform. In this case, the developer would invoke: "EXTRA_CXXFLAGS=<path to TBB include> EXTRA_LDFLAGS=<path to TBB library> make", and the build script would find the tools in the extra location.
- Changes the compilation tests to use PLATFORM_CXXFLAGS. This change causes the EXTRA_FLAGS passed in to the script to be included in the compilation check. Additionally, flags set by the script itself (like --std=c++11) will be used during the checks.
Validated that the make_platform.mk file generated on Linux does not change with this change. On my MacOS machine, the SNAPPY libraries are now available (they were not before as they required --std=c++11 to build).
I also verified that I can build against TBB installed on my Mac by passing in the EXTRA CXX and LD FLAGS to the location in which TBB is installed.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8111
Reviewed By: jay-zhuang
Differential Revision: D27353516
Pulled By: mrambacher
fbshipit-source-id: b6b378c96dbf678bab1479556dcbcb49c47e807d
Summary:
If the platform is ppc64 and the libc is not GNU libc, then we exclude the range_tree from compilation.
See https://jira.percona.com/browse/PS-7559
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8070
Reviewed By: jay-zhuang
Differential Revision: D27246004
Pulled By: mrambacher
fbshipit-source-id: 59d8433242ce7ce608988341becb4f83312445f5
Summary:
This disables Linux/amd64 builds in Travis for PRs, and adds a
gcc-10+c++20 build in CircleCI, which should fill out sufficient coverage
vs. what we had in Travis
Fixed a use of std::is_pod, which is deprecated in c++20
Fixed ++ on a volatile in db_repl_stress.cc, with bigger refactoring.
Although ++ on this volatile was probably ok with one thread writer and
one thread reader, the code was still overly complex. There was a
deadcode check for error
`if (replThread.no_read < dataPump.no_records)` which can be proven
never to happen based on the structure of the code. It infinite loops
instead for the case intended to be checked. I just simplified the code
for what should be the same checking power.
Also most configurations seem to be using make parallelism = 2 * vcores,
so fixing / using that.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7791
Test Plan:
CI
and `while ./db_repl_stress; do echo again; done` for a while
Reviewed By: siying
Differential Revision: D25669834
Pulled By: pdillinger
fbshipit-source-id: b2c688053d0b1d52c989903449d3cd27a04130d6
Summary:
Expands on https://github.com/facebook/rocksdb/pull/7016 so that when `PORTABLE=1` is set the dependencies for RocksJava static target will also be built with backwards compatibility for MacOS as far back as 10.12 (i.e. 2016).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7683
Reviewed By: ajkr
Differential Revision: D25034164
Pulled By: pdillinger
fbshipit-source-id: dc9e51828869ed9ec336a8a86683e4d0bfe04f27
Summary:
These new functions and 128-bit value bit operations are
expected to be used in a forthcoming Bloom filter alternative.
No functional changes to production code, just new code only called by
unit tests, cosmetic changes to existing headers, and fix an existing
function for a yet-unused template instantiation (BitsSetToOne on
something signed and smaller than 32 bits).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7338
Test Plan:
Unit tests included. Works with and without
TEST_UINT128_COMPAT=1 to check compatibility with and without
__uint128_t. Also added that parameter to the CircleCI build
build-linux-shared_lib-alt_namespace-status_checked.
Reviewed By: jay-zhuang
Differential Revision: D23494945
Pulled By: pdillinger
fbshipit-source-id: 5c0dc419100d9df5d4d9abb153b2855d5aea39e8
Summary:
We see some hosts failed to build platform009 with gcc. Revert the default to be platform007 if USE_CLANG is not specified.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7253
Test Plan: Build with both of USE_CLANG=1 set and not set and observe it builds successfully, and see the tool chain used.
Reviewed By: jay-zhuang
Differential Revision: D23110550
fbshipit-source-id: 25cb47923f7174b24debdad0cc8d90b07c4d5d09
Summary:
Upgrade tool chain to the latest. It is done mostly manually as build_tools/build_detect_platform fails to update many of them.
Try to fix a new clang analyze warning with the new tool chain.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7251
Test Plan: "make all", "USE_CLANG=1 make all"
Reviewed By: riversand963
Differential Revision: D23091090
fbshipit-source-id: 732e5a30137837431438f85f36296406b641f975
Summary:
`USE_LTO=1` in `make` commands now enables LTO. The archiver (`ar`) needed
to change in this PR to use a wrapper that enables the LTO plugin.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7181
Test Plan:
build a few ways
```
$ make clean && USE_LTO=1 make -j48 db_bench
$ make clean && USE_CLANG=1 USE_LTO=1 make -j48 db_bench
$ make clean && ROCKSDB_NO_FBCODE=1 USE_LTO=1 make -j48 db_bench
```
Reviewed By: cheng-chang
Differential Revision: D22784994
Pulled By: ajkr
fbshipit-source-id: 9c45333bd49bf4615aa04c85b7c6fd3925421152
Summary:
Change the linking of tests/tools to be against a library rather than a list of objects. This change substantially reduces the size of the objects produced.
peterd clean repo size: 264M
Before this change, with make all: 40G
After this change, with make all: 28G
With make LIB_MODE=shared all: 7.0G
The list of TESTS was changed from being hard-coded to generated from the test sources variable. Note that there are some test sources that are not built as tests (though the set of tests is identical to the previous version).
Added OBJ_DIR option to Makefile to allow objects to be placed in an alternative location. By default, OBJ_DIR is the same as before ("./").
This change is a precursor to being able to build/run the tests/tools linked against static libraries. Additionally, it should be possible to clean up and merge some of the rules for building tests and the like if so desired.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6660
Reviewed By: riversand963
Differential Revision: D22244463
Pulled By: pdillinger
fbshipit-source-id: db9c6341d81ed62c2270374f4ede02fb9604c754
Summary:
When `PORTABLE=1` is set, RocksDB will now be built with backwards compatibility for MacOS as far back as 10.12 (i.e. 2016).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7016
Reviewed By: ajkr
Differential Revision: D22211312
Pulled By: pdillinger
fbshipit-source-id: 7b0858d9b55d6265d3ea27bf5ea1673639b6538c
Summary:
Based on https://github.com/facebook/rocksdb/issues/6648 (CLA Signed), but heavily modified / extended:
* Implicit capture of this via [=] deprecated in C++20, and [=,this] not standard before C++20 -> now using explicit capture lists
* Implicit copy operator deprecated in gcc 9 -> add explicit '= default' definition
* std::random_shuffle deprecated in C++17 and removed in C++20 -> migrated to a replacement in RocksDB random.h API
* Add the ability to build with different std version though -DCMAKE_CXX_STANDARD=11/14/17/20 on the cmake command line
* Minimal rebuild flag of MSVC is deprecated and is forbidden with /std:c++latest (C++20)
* Added MSVC 2019 C++11 & MSVC 2019 C++20 in AppVeyor
* Added GCC 9 C++11 & GCC9 C++20 in Travis
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6697
Test Plan: make check and CI
Reviewed By: cheng-chang
Differential Revision: D21020318
Pulled By: pdillinger
fbshipit-source-id: 12311be5dbd8675a0e2c817f7ec50fa11c18ab91
Summary:
This PR implements a fault injection mechanism for injecting errors in reads in db_stress. The FaultInjectionTestFS is used for this purpose. A thread local structure is used to track the errors, so that each db_stress thread can independently enable/disable error injection and verify observed errors against expected errors. This is initially enabled only for Get and MultiGet, but can be extended to iterator as well once its proven stable.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6538
Test Plan:
crash_test
make check
Reviewed By: riversand963
Differential Revision: D20714347
Pulled By: anand1976
fbshipit-source-id: d7598321d4a2d72bda0ced57411a337a91d87dc7
Summary:
New memory technologies are being developed by various hardware vendors (Intel DCPMM is one such technology currently available). These new memory types require different libraries for allocation and management (such as PMDK and memkind). The high capacities available make it possible to provision large caches (up to several TBs in size), beyond what is achievable with DRAM.
The new allocator provided in this PR uses the memkind library to allocate memory on different media.
**Performance**
We tested the new allocator using db_bench.
- For each test, we vary the size of the block cache (relative to the size of the uncompressed data in the database).
- The database is filled sequentially. Throughput is then measured with a readrandom benchmark.
- We use a uniform distribution as a worst-case scenario.
The plot shows throughput (ops/s) relative to a configuration with no block cache and default allocator.
For all tests, p99 latency is below 500 us.
![image](https://user-images.githubusercontent.com/26400080/71108594-42479100-2178-11ea-8231-8a775bbc92db.png)
**Changes**
- Add MemkindKmemAllocator
- Add --use_cache_memkind_kmem_allocator db_bench option (to create an LRU block cache with the new allocator)
- Add detection of memkind library with KMEM DAX support
- Add test for MemkindKmemAllocator
**Minimum Requirements**
- kernel 5.3.12
- ndctl v67 - https://github.com/pmem/ndctl
- memkind v1.10.0 - https://github.com/memkind/memkind
**Memory Configuration**
The allocator uses the MEMKIND_DAX_KMEM memory kind. Follow the instructions on[ memkind’s GitHub page](https://github.com/memkind/memkind) to set up NVDIMM memory accordingly.
Note on memory allocation with NVDIMM memory exposed as system memory.
- The MemkindKmemAllocator will only allocate from NVDIMM memory (using memkind_malloc with MEMKIND_DAX_KMEM kind).
- The default allocator is not restricted to RAM by default. Based on NUMA node latency, the kernel should allocate from local RAM preferentially, but it’s a kernel decision. numactl --preferred/--membind can be used to allocate preferentially/exclusively from the local RAM node.
**Usage**
When creating an LRU cache, pass a MemkindKmemAllocator object as argument.
For example (replace capacity with the desired value in bytes):
```
#include "rocksdb/cache.h"
#include "memory/memkind_kmem_allocator.h"
NewLRUCache(
capacity /*size_t*/,
6 /*cache_numshardbits*/,
false /*strict_capacity_limit*/,
false /*cache_high_pri_pool_ratio*/,
std::make_shared<MemkindKmemAllocator>());
```
Refer to [RocksDB’s block cache documentation](https://github.com/facebook/rocksdb/wiki/Block-Cache) to assign the LRU cache as block cache for a database.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6214
Reviewed By: cheng-chang
Differential Revision: D19292435
fbshipit-source-id: 7202f47b769e7722b539c86c2ffd669f64d7b4e1
Summary:
In the `.travis.yml` file the `jdk: openjdk7` element is ignored when `language: cpp`. So whatever version of the JDK that was installed in the Travis container was used - typically JDK 11.
To ensure our RocksJava builds are working, we now instead install and use OpenJDK 8. Ideally we would use OpenJDK 7, as RocksJava supports Java 7, but many of the newer Travis containers don't support Java 7, so Java 8 is the next best thing.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6512
Differential Revision: D20388296
Pulled By: pdillinger
fbshipit-source-id: 8bbe6b59b70cfab7fe81ff63867d907fefdd2df1
Summary:
Check for sys/auxv.h and getauxval before using them as they are not
always available (for example on uclibc)
Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com>
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6359
Differential Revision: D20239797
fbshipit-source-id: 175a098094d81545628c2372e7c388e70a32fd48
Summary:
We realized bugs related to IO Uring. Turn it off by default.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6405
Test Plan: Manually run build_tools/build_detect_platform and observe outputs.
Differential Revision: D19862792
fbshipit-source-id: 5d5e8e2762997b72a145ae59389ef3d7e4ccd060
Summary:
Right now, PosixRandomAccessFile::MultiRead() executes read requests in parallel. In this PR, it leverages I/O Uring library to run it in parallel, even when page cache is enabled. This function will fall back if the kernel version doesn't support it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5881
Test Plan: Run the unit test on a kernel version supporting it and make sure all tests pass, and run a unit test on kernel version supporting it and see it pass. Before merging, will also run stress test and see it passes.
Differential Revision: D17742266
fbshipit-source-id: e05699c925ac04fdb42379456a4e23e4ebcb803a
Summary:
Adds an improved, replacement Bloom filter implementation (FastLocalBloom) for full and partitioned filters in the block-based table. This replacement is faster and more accurate, especially for high bits per key or millions of keys in a single filter.
Speed
The improved speed, at least on recent x86_64, comes from
* Using fastrange instead of modulo (%)
* Using our new hash function (XXH3 preview, added in a previous commit), which is much faster for large keys and only *slightly* slower on keys around 12 bytes if hashing the same size many thousands of times in a row.
* Optimizing the Bloom filter queries with AVX2 SIMD operations. (Added AVX2 to the USE_SSE=1 build.) Careful design was required to support (a) SIMD-optimized queries, (b) compatible non-SIMD code that's simple and efficient, (c) flexible choice of number of probes, and (d) essentially maximized accuracy for a cache-local Bloom filter. Probes are made eight at a time, so any number of probes up to 8 is the same speed, then up to 16, etc.
* Prefetching cache lines when building the filter. Although this optimization could be applied to the old structure as well, it seems to balance out the small added cost of accumulating 64 bit hashes for adding to the filter rather than 32 bit hashes.
Here's nominal speed data from filter_bench (200MB in filters, about 10k keys each, 10 bits filter data / key, 6 probes, avg key size 24 bytes, includes hashing time) on Skylake DE (relatively low clock speed):
$ ./filter_bench -quick -impl=2 -net_includes_hashing # New Bloom filter
Build avg ns/key: 47.7135
Mixed inside/outside queries...
Single filter net ns/op: 26.2825
Random filter net ns/op: 150.459
Average FP rate %: 0.954651
$ ./filter_bench -quick -impl=0 -net_includes_hashing # Old Bloom filter
Build avg ns/key: 47.2245
Mixed inside/outside queries...
Single filter net ns/op: 63.2978
Random filter net ns/op: 188.038
Average FP rate %: 1.13823
Similar build time but dramatically faster query times on hot data (63 ns to 26 ns), and somewhat faster on stale data (188 ns to 150 ns). Performance differences on batched and skewed query loads are between these extremes as expected.
The only other interesting thing about speed is "inside" (query key was added to filter) vs. "outside" (query key was not added to filter) query times. The non-SIMD implementations are substantially slower when most queries are "outside" vs. "inside". This goes against what one might expect or would have observed years ago, as "outside" queries only need about two probes on average, due to short-circuiting, while "inside" always have num_probes (say 6). The problem is probably the nastily unpredictable branch. The SIMD implementation has few branches (very predictable) and has pretty consistent running time regardless of query outcome.
Accuracy
The generally improved accuracy (re: Issue https://github.com/facebook/rocksdb/issues/5857) comes from a better design for probing indices
within a cache line (re: Issue https://github.com/facebook/rocksdb/issues/4120) and improved accuracy for millions of keys in a single filter from using a 64-bit hash function (XXH3p). Design details in code comments.
Accuracy data (generalizes, except old impl gets worse with millions of keys):
Memory bits per key: FP rate percent old impl -> FP rate percent new impl
6: 5.70953 -> 5.69888
8: 2.45766 -> 2.29709
10: 1.13977 -> 0.959254
12: 0.662498 -> 0.411593
16: 0.353023 -> 0.0873754
24: 0.261552 -> 0.0060971
50: 0.225453 -> ~0.00003 (less than 1 in a million queries are FP)
Fixes https://github.com/facebook/rocksdb/issues/5857
Fixes https://github.com/facebook/rocksdb/issues/4120
Unlike the old implementation, this implementation has a fixed cache line size (64 bytes). At 10 bits per key, the accuracy of this new implementation is very close to the old implementation with 128-byte cache line size. If there's sufficient demand, this implementation could be generalized.
Compatibility
Although old releases would see the new structure as corrupt filter data and read the table as if there's no filter, we've decided only to enable the new Bloom filter with new format_version=5. This provides a smooth path for automatic adoption over time, with an option for early opt-in.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6007
Test Plan: filter_bench has been used thoroughly to validate speed, accuracy, and correctness. Unit tests have been carefully updated to exercise new and old implementations, as well as the logic to select an implementation based on context (format_version).
Differential Revision: D18294749
Pulled By: pdillinger
fbshipit-source-id: d44c9db3696e4d0a17caaec47075b7755c262c5f
Summary:
- Updated our included xxhash implementation to version 0.7.2 (== the latest dev version as of 2019-10-09).
- Using XXH_NAMESPACE (like other fb projects) to avoid potential name collisions.
- Added fastrange64, and unit tests for it and fastrange32. These are faster alternatives to hash % range.
- Use preview version of XXH3 instead of MurmurHash64A for NPHash64
-- Had to update cache_test to increase probability of passing for any given hash function.
- Use fastrange64 instead of % with uses of NPHash64
-- Had to fix WritePreparedTransactionTest.CommitOfDelayedPrepared to avoid deadlock apparently caused by new hash collision.
- Set default seed for NPHash64 because specifying a seed rarely makes sense for it.
- Removed unnecessary include xxhash.h in a popular .h file
- Rename preview version of XXH3 to XXH3p for clarity and to ease backward compatibility in case final version of XXH3 is integrated.
Relying on existing unit tests for NPHash64-related changes. Each new implementation of fastrange64 passed unit tests when manipulating my local build to select it. I haven't done any integration performance tests, but I consider the improved performance of the pieces being swapped in to be well established.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5909
Differential Revision: D18125196
Pulled By: pdillinger
fbshipit-source-id: f6bf83d49d20cbb2549926adf454fd035f0ecc0d
Summary:
In preparing to utilize a new Intel instruction extension, I
noticed problems with the existing build script in regard to the
existing utilized extensions, either with USE_SSE or PORTABLE flags.
* PORTABLE=0 was interpreted the same as PORTABLE=1. Now empty and 0
mean the same. (I guess you were not supposed to set PORTABLE= if you
wanted non-portable--except that...)
* The Facebook build script extensions would set PORTABLE=1 even if
it's already set in a make var or environment. Now it does not override
a non-empty setting, so use PORTABLE=0 for fully optimized build,
overriding Facebook environment default.
* Put in an explanation of the USE_SSE flag where it's used by
build_detect_platform, and cleaned up some confusing/redundant
associated logic.
* If USE_SSE was set and expected intrinsics were not available,
build_detect_platform would exit early but build would proceed with
broken, incomplete configuration. Now warning is gracefully recovered.
* If USE_SSE was set and expected intrinsics were not available,
build would still try to use flags like -msse4.2 etc. which could lead
to unexpected compilation failure or binary incompatibility. Now those
flags are not used if the warning is issued.
This should not break or change existing, valid build scripts.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5800
Test Plan: manual case testing
Differential Revision: D17369543
Pulled By: pdillinger
fbshipit-source-id: 4ee244911680ae71144d272c40aceea548e3ce88
Summary:
This ports `folly::DistributedMutex` into RocksDB. The PR includes everything else needed to compile and use DistributedMutex as a component within folly. Most files are unchanged except for some portability stuff and includes.
For now, I've put this under `rocksdb/third-party`, but if there is a better folder to put this under, let me know. I also am not sure how or where to put unit tests for third-party stuff like this. It seems like gtest is included already, but I need to link with it from another third-party folder.
This also includes some other common components from folly
- folly/Optional
- folly/ScopeGuard (In particular `SCOPE_EXIT`)
- folly/synchronization/ParkingLot (A portable futex-like interface)
- folly/synchronization/AtomicNotification (The standard C++ interface for futexes)
- folly/Indestructible (For singletons that don't get destroyed without allocations)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5642
Differential Revision: D16544439
fbshipit-source-id: 179b98b5dcddc3075926d31a30f92fd064245731