Summary:
When paranoid_checks is on, DBImpl::CheckConsistency() iterates over all sst files and calls Env::GetFileSize() for each of them. As far as I could understand, this is pretty arbitrary and doesn't affect correctness - if filesystem doesn't corrupt fsynced files, the file sizes will always match; if it does, it may as well corrupt contents as well as sizes, and rocksdb doesn't check contents on open.
If there are thousands of sst files, getting all their sizes takes a while. If, on top of that, Env is overridden to use some remote storage instead of local filesystem, it can be *really* slow and overload the remote storage service. This PR adds an option to not do GetFileSize(); instead it does GetChildren() for parent directory to check that all the expected sst files are at least present, but doesn't check their sizes.
We can't just disable paranoid_checks instead because paranoid_checks do a few other important things: make the DB read-only on write errors, print error messages on read errors, etc.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6353
Test Plan: ran the added sanity check unit test. Will try it out in a LogDevice test cluster where the GetFileSize() calls are causing a lot of trouble.
Differential Revision: D19656425
Pulled By: al13n321
fbshipit-source-id: c2c421b367633033760d1f56747bad206d1fbf82
Summary:
This is a redesign of the API for RocksJava comparators with the aim of improving performance. It also simplifies the class hierarchy.
**NOTE**: This breaks backwards compatibility for existing 3rd party Comparators implemented in Java... so we need to consider carefully which release branches this goes into.
Previously when implementing a comparator in Java the developer had a choice of subclassing either `DirectComparator` or `Comparator` which would use direct and non-direct byte-buffers resepectively (via `DirectSlice` and `Slice`).
In this redesign there we have eliminated the overhead of using the Java Slice classes, and just use `ByteBuffer`s. The `ComparatorOptions` supplied when constructing a Comparator allow you to choose between direct and non-direct byte buffers by setting `useDirect`.
In addition, the `ComparatorOptions` now allow you to choose whether a ByteBuffer is reused over multiple comparator calls, by setting `maxReusedBufferSize > 0`. When buffers are reused, ComparatorOptions provides a choice of mutex type by setting `useAdaptiveMutex`.
---
[JMH benchmarks previously indicated](https://github.com/facebook/rocksdb/pull/6241#issue-356398306) that the difference between C++ and Java for implementing a comparator was ~7x slowdown in Java.
With these changes, when reusing buffers and guarding access to them via mutexes the slowdown is approximately the same. However, these changes offer a new facility to not reuse mutextes, which reduces the slowdown to ~5.5x in Java. We also offer a `thread_local` mechanism for reusing buffers, which reduces slowdown to ~5.2x in Java (closes https://github.com/facebook/rocksdb/pull/4425).
These changes also form a good base for further optimisation work such as further JNI lookup caching, and JNI critical.
---
These numbers were captured without jemalloc. With jemalloc, the performance improves for all tests, and the Java slowdown reduces to between 4.8x and 5.x.
```
ComparatorBenchmarks.put native_bytewise thrpt 25 124483.795 ± 2032.443 ops/s
ComparatorBenchmarks.put native_reverse_bytewise thrpt 25 114414.536 ± 3486.156 ops/s
ComparatorBenchmarks.put java_bytewise_non-direct_reused-64_adaptive-mutex thrpt 25 17228.250 ± 1288.546 ops/s
ComparatorBenchmarks.put java_bytewise_non-direct_reused-64_non-adaptive-mutex thrpt 25 16035.865 ± 1248.099 ops/s
ComparatorBenchmarks.put java_bytewise_non-direct_reused-64_thread-local thrpt 25 21571.500 ± 871.521 ops/s
ComparatorBenchmarks.put java_bytewise_direct_reused-64_adaptive-mutex thrpt 25 23613.773 ± 8465.660 ops/s
ComparatorBenchmarks.put java_bytewise_direct_reused-64_non-adaptive-mutex thrpt 25 16768.172 ± 5618.489 ops/s
ComparatorBenchmarks.put java_bytewise_direct_reused-64_thread-local thrpt 25 23921.164 ± 8734.742 ops/s
ComparatorBenchmarks.put java_bytewise_non-direct_no-reuse thrpt 25 17899.684 ± 839.679 ops/s
ComparatorBenchmarks.put java_bytewise_direct_no-reuse thrpt 25 22148.316 ± 1215.527 ops/s
ComparatorBenchmarks.put java_reverse_bytewise_non-direct_reused-64_adaptive-mutex thrpt 25 11311.126 ± 820.602 ops/s
ComparatorBenchmarks.put java_reverse_bytewise_non-direct_reused-64_non-adaptive-mutex thrpt 25 11421.311 ± 807.210 ops/s
ComparatorBenchmarks.put java_reverse_bytewise_non-direct_reused-64_thread-local thrpt 25 11554.005 ± 960.556 ops/s
ComparatorBenchmarks.put java_reverse_bytewise_direct_reused-64_adaptive-mutex thrpt 25 22960.523 ± 1673.421 ops/s
ComparatorBenchmarks.put java_reverse_bytewise_direct_reused-64_non-adaptive-mutex thrpt 25 18293.317 ± 1434.601 ops/s
ComparatorBenchmarks.put java_reverse_bytewise_direct_reused-64_thread-local thrpt 25 24479.361 ± 2157.306 ops/s
ComparatorBenchmarks.put java_reverse_bytewise_non-direct_no-reuse thrpt 25 7942.286 ± 626.170 ops/s
ComparatorBenchmarks.put java_reverse_bytewise_direct_no-reuse thrpt 25 11781.955 ± 1019.843 ops/s
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6252
Differential Revision: D19331064
Pulled By: pdillinger
fbshipit-source-id: 1f3b794e6a14162b2c3ffb943e8c0e64a0c03738
Summary:
Add unordered_write option api and related ut to rocksjava
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5839
Differential Revision: D17604446
Pulled By: maysamyabandeh
fbshipit-source-id: c6b07e85ca9d5e3a92973ddb6ab2bc079e53c9c1
Summary:
This is my latest round of changes to add missing items to RocksJava. More to come in future PRs.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4833
Differential Revision: D14152266
Pulled By: sagar0
fbshipit-source-id: d6cff67e26da06c131491b5cf6911a8cd0db0775
Summary:
When adding CompactionFilter and CompactionFilterFactory settings to the Java layer, ColumnFamilyOptions was modified directly instead of ColumnFamilyOptionsInterface. This meant that the old-stye Options monolith was left behind.
This patch fixes that, by:
- promoting the CompactionFilter + CompactionFilterFactory setters from ColumnFamilyOptions -> ColumnFamilyOptionsInterface
- adding getters in ColumnFamilyOptionsInterface
- implementing setters in Options
- implementing getters in both ColumnFamilyOptions and Options
- adding testcases
- reusing a test CompactionFilterFactory by moving it to a common location
Pull Request resolved: https://github.com/facebook/rocksdb/pull/3461
Differential Revision: D13278788
Pulled By: sagar0
fbshipit-source-id: 72602c6eb97dc80734e718abb5e2e9958d3c753b
Summary:
Ran the following commands to recursively change all the files under RocksDB:
```
find . -type f -name "*.cc" -exec sed -i 's/ unique_ptr/ std::unique_ptr/g' {} +
find . -type f -name "*.cc" -exec sed -i 's/<unique_ptr/<std::unique_ptr/g' {} +
find . -type f -name "*.cc" -exec sed -i 's/ shared_ptr/ std::shared_ptr/g' {} +
find . -type f -name "*.cc" -exec sed -i 's/<shared_ptr/<std::shared_ptr/g' {} +
```
Running `make format` updated some formatting on the files touched.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4638
Differential Revision: D12934992
Pulled By: sagar0
fbshipit-source-id: 45a15d23c230cdd64c08f9c0243e5183934338a8
Summary:
1. `WriteBufferManager` should have a reference alive in Java side through `Options`/`DBOptions` otherwise, if it's GC'ed at java side, native side can seg fault.
2. native method `setWriteBufferManager()` in `DBOptions.java` doesn't have it's jni method invocation in rocksdbjni which is added in this PR
3. `DBOptionsTest.java` is referencing object of `Options`. Instead it should be testing against `DBOptions`. Seems like a copy paste error.
4. Add a getter for WriteBufferManager.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4579
Differential Revision: D10561150
Pulled By: sagar0
fbshipit-source-id: 139a15c7f051a9f77b4200215b88267b48fbc487
Summary:
Allow rocks java to explicitly create WriteBufferManager by plumbing it to the native code through JNI.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4492
Differential Revision: D10428506
Pulled By: sagar0
fbshipit-source-id: cd9dd8c2ef745a0303416b44e2080547bdcca1fd
Summary:
…ression
For `CompressionType` we have options `compression` and `bottommost_compression`. Thus, to make the compression options consitent with the compression type when bottommost_compression is enabled, we add the bottommost_compression_opts
Closes https://github.com/facebook/rocksdb/pull/3985
Reviewed By: riversand963
Differential Revision: D8385911
Pulled By: zhichao-cao
fbshipit-source-id: 07bc533dd61bcf1cef5927d8d62901c13d38d5fc
Summary:
This PR comments out the rest of the unused arguments which allow us to turn on the -Wunused-parameter flag. This is the second part of a codemod relating to https://github.com/facebook/rocksdb/pull/3557.
Closes https://github.com/facebook/rocksdb/pull/3662
Differential Revision: D7426121
Pulled By: Dayvedde
fbshipit-source-id: 223994923b42bd4953eb016a0129e47560f7e352
Summary:
This is an abstraction for working with custom Comparators implemented in native C++ code from Java. Native code must directly extend `rocksdb::Comparator`. When the native code comparator is compiled into the RocksDB codebase, you can then create a Java Class, and JNI stub to wrap it.
Useful if the C++/JNI barrier overhead is too much for your applications comparator performance.
An example is provided in `java/rocksjni/native_comparator_wrapper_test.cc` and `java/src/main/java/org/rocksdb/NativeComparatorWrapperTest.java`.
Closes https://github.com/facebook/rocksdb/pull/3334
Differential Revision: D7172605
Pulled By: miasantreble
fbshipit-source-id: e24b7eb267a3bcb6afa214e0379a1d5e8a2ceabe
Summary:
Add Java-side copy constructors for:
- Options
- DBOptions
- ColumnFamilyOptions
- WriteOptions
along with unit tests to assert the copy worked.
NOTE: Unit tests are failing in travis but it looks like a global timeout issue. These tests pass.
Closes https://github.com/facebook/rocksdb/pull/3450
Differential Revision: D6874425
Pulled By: sagar0
fbshipit-source-id: 5bde68ea5b5225e071faea2628bf8bbf10bd65ab
Summary:
This PR also includes some cleanup, bugfixes and refactoring of the Java API. However these are really pre-cursors on the road to CompactionFilterFactory support.
Closes https://github.com/facebook/rocksdb/pull/1241
Differential Revision: D6012778
Pulled By: sagar0
fbshipit-source-id: 0774465940ee99001a78906e4fed4ef57068ad5c
Summary:
This option was introduced in the C++ API in RocksDB 5.6 in bb01c1880c . Now, exposing it through RocksJava API.
Closes https://github.com/facebook/rocksdb/pull/2908
Differential Revision: D5864224
Pulled By: sagar0
fbshipit-source-id: 140aa55dcf74b14e4d11219d996735c7fdddf513
Summary:
Plumbed ReadOptions::iterate_upper_bound through JNI.
Made the following design choices:
* Used Slice instead of AbstractSlice due to the anticipated usecase (key / key prefix). Can change this if anyone disagrees.
* Used Slice instead of raw byte[] which seemed cleaner but necessitated the package-private handle-based Slice constructor. Followed WriteBatch as an example.
* We need a copy constructor for ReadOptions, as we create one base ReadOptions for a particular usecase and clone -> change the iterate_upper_bound on each slice operation. Shallow copy seemed cleanest.
* Hold a reference to the upper bound slice on ReadOptions, in contrast to Snapshot.
Signed a Facebook CLA this morning.
Closes https://github.com/facebook/rocksdb/pull/2872
Differential Revision: D5824446
Pulled By: sagar0
fbshipit-source-id: 74fc51313a10a81ecd348625e2a50ca5b7766888
Summary:
Replace Options::use_direct_writes with Options::use_direct_io_for_flush_and_compaction
Now if Options::use_direct_io_for_flush_and_compaction = true, we will enable direct io for both reads and writes for flush and compaction job. Whereas Options::use_direct_reads controls user reads like iterator and Get().
Closes https://github.com/facebook/rocksdb/pull/2117
Differential Revision: D4860912
Pulled By: lightmark
fbshipit-source-id: d93575a8a5e780cf7e40797287edc425ee648c19
Summary:
This adds almost all missing options to RocksJava
Closes https://github.com/facebook/rocksdb/pull/2039
Differential Revision: D4779991
Pulled By: siying
fbshipit-source-id: 4a1bf28
Summary:
I have manually audited the entire RocksJava code base.
Sorry for the large pull-request, I have broken it down into many small atomic commits though.
My initial intention was to fix the warnings that appear when running RocksJava on Java 8 with `-Xcheck:jni`, for example when running `make jtest` you would see many errors similar to:
```
WARNING in native method: JNI call made without checking exceptions when required to from CallObjectMethod
WARNING in native method: JNI call made without checking exceptions when required to from CallVoidMethod
WARNING in native method: JNI call made without checking exceptions when required to from CallStaticVoidMethod
...
```
A few of those warnings still remain, however they seem to come directly from the JVM and are not directly related to RocksJava; I am in contact with the OpenJDK hostpot-dev mailing list about these - http://mail.openjdk.java.net/pipermail/hotspot-dev/2017-February/025981.html.
As a result of fixing these, I realised we were not r
Closes https://github.com/facebook/rocksdb/pull/1890
Differential Revision: D4591758
Pulled By: siying
fbshipit-source-id: 7f7fdf4
Summary:
…action
The two options, min_partial_merge_operands and verify_checksums_in_compaction, are not seldom used. Remove them to reduce the total number of options. Also remove them from Java and C interface.
Closes https://github.com/facebook/rocksdb/pull/1902
Differential Revision: D4601219
Pulled By: siying
fbshipit-source-id: aad4cb2
Summary:
Remove disableDataSync, and another similarly named disable_data_sync options.
This is being done to simplify options, and also because the performance gains of this feature can be achieved by other methods.
Closes https://github.com/facebook/rocksdb/pull/1859
Differential Revision: D4541292
Pulled By: sagar0
fbshipit-source-id: 5b3a6ca
Summary:
I am not sure if this is the best way to fix this?
Closes https://github.com/facebook/rocksdb/pull/1452
Differential Revision: D4109338
Pulled By: yiwu-arbug
fbshipit-source-id: ca40809
Summary:
- Deprecated RateLimiterConfig and GenericRateLimiterConfig
- Introduced RateLimiter
It is now possible to use all C++ related methods also in RocksJava.
A noteable method is setBytesPerSecond which can change the allowed
number of bytes per second at runtime.
Test Plan:
make rocksdbjava
make jtest
Reviewers: adamretter, yhchiang, ankgup87
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D35715
Summary: To reduce number of options, merge source_compaction_factor, max_grandparent_overlap_bytes and expanded_compaction_factor into max_compaction_bytes.
Test Plan: Add two new unit tests. Run all existing tests, including jtest.
Reviewers: yhchiang, igor, IslamAbdelRahman
Reviewed By: IslamAbdelRahman
Subscribers: leveldb, andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D59829
* [refactor] Split Java ColumnFamilyOptions into mutable and immutable and implement any missing immutable options
* [feature] Implement RocksDB#setOptions
Summary: std::make_unique is not standard and not always available, remove it
Test Plan: Run "make clean jclean rocksdbjava jtest -j8" on my mac
Reviewers: yhchiang, yiwu, sdong, andrewkr
Reviewed By: andrewkr
Subscribers: andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D61143
* fixes 1220: rocksjni build fails on Windows due to variable-size array declaration
using new keyword to create variable-size arrays in order to satisfy most of the compilers
* fixes 1220: rocksjni build fails on Windows due to variable-size array declaration
using unique_ptr keyword to create variable-size arrays in order to satisfy most of the compilers
Summary: filter_deltes is not a frequently used feature. Remove it.
Test Plan: Run all test suites.
Reviewers: igor, yhchiang, IslamAbdelRahman
Reviewed By: IslamAbdelRahman
Subscribers: leveldb, andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D59427
Summary:
memtable_prefix_bloom_probes is not a critical option. Remove it to reduce number of options.
It's easier for users to make mistakes with memtable_prefix_bloom_bits, turn it to memtable_prefix_bloom_bits_ratio
Test Plan: Run all existing tests
Reviewers: yhchiang, igor, IslamAbdelRahman
Reviewed By: IslamAbdelRahman
Subscribers: gunnarku, yoshinorim, MarkCallaghan, leveldb, andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D59199
Summary:
Fixed RocksJava test failure of shouldSetTestCappedPrefixExtractor
by adding the missing native implementation of
useCappedPrefixExtractor.
Test Plan:
make jclean
make rocksdbjava -j32
make jtest
Reviewers: igor, anthony, IslamAbdelRahman, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D43551