Summary:
This is a redesign of the API for RocksJava comparators with the aim of improving performance. It also simplifies the class hierarchy.
**NOTE**: This breaks backwards compatibility for existing 3rd party Comparators implemented in Java... so we need to consider carefully which release branches this goes into.
Previously when implementing a comparator in Java the developer had a choice of subclassing either `DirectComparator` or `Comparator` which would use direct and non-direct byte-buffers resepectively (via `DirectSlice` and `Slice`).
In this redesign there we have eliminated the overhead of using the Java Slice classes, and just use `ByteBuffer`s. The `ComparatorOptions` supplied when constructing a Comparator allow you to choose between direct and non-direct byte buffers by setting `useDirect`.
In addition, the `ComparatorOptions` now allow you to choose whether a ByteBuffer is reused over multiple comparator calls, by setting `maxReusedBufferSize > 0`. When buffers are reused, ComparatorOptions provides a choice of mutex type by setting `useAdaptiveMutex`.
---
[JMH benchmarks previously indicated](https://github.com/facebook/rocksdb/pull/6241#issue-356398306) that the difference between C++ and Java for implementing a comparator was ~7x slowdown in Java.
With these changes, when reusing buffers and guarding access to them via mutexes the slowdown is approximately the same. However, these changes offer a new facility to not reuse mutextes, which reduces the slowdown to ~5.5x in Java. We also offer a `thread_local` mechanism for reusing buffers, which reduces slowdown to ~5.2x in Java (closes https://github.com/facebook/rocksdb/pull/4425).
These changes also form a good base for further optimisation work such as further JNI lookup caching, and JNI critical.
---
These numbers were captured without jemalloc. With jemalloc, the performance improves for all tests, and the Java slowdown reduces to between 4.8x and 5.x.
```
ComparatorBenchmarks.put native_bytewise thrpt 25 124483.795 ± 2032.443 ops/s
ComparatorBenchmarks.put native_reverse_bytewise thrpt 25 114414.536 ± 3486.156 ops/s
ComparatorBenchmarks.put java_bytewise_non-direct_reused-64_adaptive-mutex thrpt 25 17228.250 ± 1288.546 ops/s
ComparatorBenchmarks.put java_bytewise_non-direct_reused-64_non-adaptive-mutex thrpt 25 16035.865 ± 1248.099 ops/s
ComparatorBenchmarks.put java_bytewise_non-direct_reused-64_thread-local thrpt 25 21571.500 ± 871.521 ops/s
ComparatorBenchmarks.put java_bytewise_direct_reused-64_adaptive-mutex thrpt 25 23613.773 ± 8465.660 ops/s
ComparatorBenchmarks.put java_bytewise_direct_reused-64_non-adaptive-mutex thrpt 25 16768.172 ± 5618.489 ops/s
ComparatorBenchmarks.put java_bytewise_direct_reused-64_thread-local thrpt 25 23921.164 ± 8734.742 ops/s
ComparatorBenchmarks.put java_bytewise_non-direct_no-reuse thrpt 25 17899.684 ± 839.679 ops/s
ComparatorBenchmarks.put java_bytewise_direct_no-reuse thrpt 25 22148.316 ± 1215.527 ops/s
ComparatorBenchmarks.put java_reverse_bytewise_non-direct_reused-64_adaptive-mutex thrpt 25 11311.126 ± 820.602 ops/s
ComparatorBenchmarks.put java_reverse_bytewise_non-direct_reused-64_non-adaptive-mutex thrpt 25 11421.311 ± 807.210 ops/s
ComparatorBenchmarks.put java_reverse_bytewise_non-direct_reused-64_thread-local thrpt 25 11554.005 ± 960.556 ops/s
ComparatorBenchmarks.put java_reverse_bytewise_direct_reused-64_adaptive-mutex thrpt 25 22960.523 ± 1673.421 ops/s
ComparatorBenchmarks.put java_reverse_bytewise_direct_reused-64_non-adaptive-mutex thrpt 25 18293.317 ± 1434.601 ops/s
ComparatorBenchmarks.put java_reverse_bytewise_direct_reused-64_thread-local thrpt 25 24479.361 ± 2157.306 ops/s
ComparatorBenchmarks.put java_reverse_bytewise_non-direct_no-reuse thrpt 25 7942.286 ± 626.170 ops/s
ComparatorBenchmarks.put java_reverse_bytewise_direct_no-reuse thrpt 25 11781.955 ± 1019.843 ops/s
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6252
Differential Revision: D19331064
Pulled By: pdillinger
fbshipit-source-id: 1f3b794e6a14162b2c3ffb943e8c0e64a0c03738
Summary:
There are no API changes ;-)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6218
Differential Revision: D19200373
Pulled By: pdillinger
fbshipit-source-id: 58d34b01ea53b75a1eccbd72f8b14d6256a7380f
Summary:
Should fix Travis build error that randomly showed up upon
using Java 13 version of javadoc.
AdvancedColumnFamilyOptionsInterface.java:257: error:
unexpected heading used: <H2>, compared to implicit preceding heading: <H3>
According to this reference https://bugs.openjdk.java.net/browse/JDK-8220379
it should work to start at h4, but that didn't work, so avoiding
headings should be fine.
Also fix Java EnvironmentTest for JDK13.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6208
Test Plan: Travis run on PR (don't have Java 13 handy)
Differential Revision: D19163105
Pulled By: pdillinger
fbshipit-source-id: 4a9419cbe7ef780fba771b8a1508e1ea80d17b3e
Summary:
Add the jni library for musl-libc, specifically for incorporating into Alpine based docker images. The classifier is `musl64`.
I have signed the CLA electronically.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/3143
Differential Revision: D18719372
fbshipit-source-id: 6189d149310b6436d6def7d808566b0234b23313
Summary:
There's no technological impediment to allowing the Bloom
filter bits/key to be non-integer (fractional/decimal) values, and it
provides finer control over the memory vs. accuracy trade-off. This is
especially handy in using the format_version=5 Bloom filter in place
of the old one, because bits_per_key=9.55 provides the same accuracy as
the old bits_per_key=10.
This change not only requires refining the logic for choosing the best
num_probes for a given bits/key setting, it revealed a flaw in that logic.
As bits/key gets higher, the best num_probes for a cache-local Bloom
filter is closer to bpk / 2 than to bpk * 0.69, the best choice for a
standard Bloom filter. For example, at 16 bits per key, the best
num_probes is 9 (FP rate = 0.0843%) not 11 (FP rate = 0.0884%).
This change fixes and refines that logic (for the format_version=5
Bloom filter only, just in case) based on empirical tests to find
accuracy inflection points between each num_probes.
Although bits_per_key is now specified as a double, the new Bloom
filter converts/rounds this to "millibits / key" for predictable/precise
internal computations. Just in case of unforeseen compatibility
issues, we round to the nearest whole number bits / key for the
legacy Bloom filter, so as not to unlock new behaviors for it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6092
Test Plan: unit tests included
Differential Revision: D18711313
Pulled By: pdillinger
fbshipit-source-id: 1aa73295f152a995328cb846ef9157ae8a05522a
Summary:
Add unordered_write option api and related ut to rocksjava
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5839
Differential Revision: D17604446
Pulled By: maysamyabandeh
fbshipit-source-id: c6b07e85ca9d5e3a92973ddb6ab2bc079e53c9c1
Summary:
Further apply formatter to more recent commits.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5830
Test Plan: Run all existing tests.
Differential Revision: D17488031
fbshipit-source-id: 137458fd94d56dd271b8b40c522b03036943a2ab
Summary:
Some recent commits might not have passed through the formatter. I formatted recent 45 commits. The script hangs for more commits so I stopped there.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5827
Test Plan: Run all existing tests.
Differential Revision: D17483727
fbshipit-source-id: af23113ee63015d8a43d89a3bc2c1056189afe8f
Summary:
The actual value of default write buffer size within `rocksdb/include/rocksdb/options.h` is 64 MB, we should correct this value in java doc.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5670
Differential Revision: D16668815
Pulled By: maysamyabandeh
fbshipit-source-id: cc3a981c9f1c2cd4a8392b0ed5f1fd0a2d729afb
Summary:
if read_options.snapshot is not set, ::Get will take the last sequence number after taking a super-version and uses that as the sequence number. Theoretically max_eviceted_seq_ could advance this sequence number. This could lead ::IsInSnapshot that will be invoked by the ReadCallback to notice the absence of the snapshot. In this case, the ReadCallback should have passed a non-value to snap_released so that it could be set by the ::IsInSnapshot. The patch does that, and adds a unit test to verify it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5664
Differential Revision: D16614033
Pulled By: maysamyabandeh
fbshipit-source-id: 06fb3fd4aacd75806ed1a1acec7961f5d02486f2
Summary:
As [BlockBasedTableConfig setBlockCacheSize()](1966a7c055/java/src/main/java/org/rocksdb/BlockBasedTableConfig.java (L728)) said, If cacheSize is non-positive, then cache will not be used. but when we configure a negative number or 0, there is an unexpected result: the block cache becomes 8M.
- Allow 0 as a valid size. When block cache size is 0, an 8MB block cache is created, as it is the default C++ API behavior. Also updated the comment.
- Set no_block_cache true if negative value is passed to block cache size, and no block cache will be created.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5465
Differential Revision: D15968788
Pulled By: sagar0
fbshipit-source-id: ee02d6e95841c9e2c316a64bfdf192d46ff5638a
Summary:
Make the generics of the Options interfaces more strict so they are usable in a Kotlin Multiplatform expect/actual typealias implementation without causing a Violation of Finite Bound Restriction.
This fix would enable the creation of a generic Kotlin multiplatform library by just typealiasing the JVM implementation to the current Java implementation.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5461
Differential Revision: D15903288
Pulled By: sagar0
fbshipit-source-id: 75e83fdf5d2fcede40744a17e767563d6a4b0696
Summary:
I would like to be able to read out the current Filter that has been set (or not) for a BlockBasedTableConfig. Added one public method to BlockBasedTableConfig:
public Filter filterPolicy() {
return filterPolicy;
}
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5186
Differential Revision: D14921415
Pulled By: siying
fbshipit-source-id: 2a63c8685480197862b49fc48916c757cd6daf95
Summary:
BackupEngine relies on write-ahead logs to back up the memtable. Disabling write-ahead logs
can result in backups failing to preserve unflushed keys. This PR updates the documentation to specify this behavior, and suggest always flushing the memtable when write-ahead logs are disabled.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5071
Differential Revision: D14524124
Pulled By: miasantreble
fbshipit-source-id: 635f855f8a42ad60273b5efd226139b511e3e5d5
Summary:
Disabling `org.rocksdb.RocksDBTest.getApproximateSizes` test as it is frequently crashing on travis (#5020). It will be re-enabled once the root-cause is found and fixed.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5035
Differential Revision: D14294736
Pulled By: sagar0
fbshipit-source-id: e28bff0d143a58ad6c82991fec3d4cf8c0209995
Summary:
This reverts commit ee1818081f.
We are not ready to deprecate this feature. revert it for now.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5034
Differential Revision: D14287246
Pulled By: siying
fbshipit-source-id: e4beafdeaee1c94364fdaa6ba198218d158339f7
Summary:
`DefaultEnvTest.incBackgroundThreadsIfNeeded` jtest should assert that the number of threads is greater than or equal to the minimum number of threads.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5021
Differential Revision: D14268311
Pulled By: sagar0
fbshipit-source-id: 01fb32b5b3ce636451d162fa1a2bbc5bd1974682
Summary:
This is my latest round of changes to add missing items to RocksJava. More to come in future PRs.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4833
Differential Revision: D14152266
Pulled By: sagar0
fbshipit-source-id: d6cff67e26da06c131491b5cf6911a8cd0db0775
Summary:
The info log header feature never worked well, because log level Header was not
translated to Logger::LogHeader() call. Fix it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4980
Differential Revision: D14087283
Pulled By: siying
fbshipit-source-id: 7e7d03ce35fa8d13d4ee549f46f7326f7bc0006d
Summary:
We introduced ttl option in CompactionOptionsFIFO when ttl-based file
deletion (compaction) was supported only as part of FIFO Compaction. But
with the extension of ttl semantics even to Level compaction,
CompactionOptionsFIFO.ttl can now be deprecated. Instead we will start
using ColumnFamilyOptions.ttl for FIFO compaction as well.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4965
Differential Revision: D14072960
Pulled By: sagar0
fbshipit-source-id: c98cc2ae695a28136295787cd88d36a220fc219e
Summary:
Store_index_in_file is a less useful feature. To simplify the code to maintain, we are dropping the feature.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4914
Differential Revision: D13791883
Pulled By: siying
fbshipit-source-id: d187c5d662584866103e4b77d09dfb925509ae2e
Summary:
Expose common stats min,max,count,sum via statistics JNI. These stats are not fully exposed on the Java side as is, but are available on the native side.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4742
Differential Revision: D13403766
Pulled By: ajkr
fbshipit-source-id: 5b70f7bd3fb7490aab73dcbd09f13490fce5c773
Summary:
Updating the `HistogramType.java` and `TickerType.java` to expose and correct metrics for statistics callbacks.
Moved `NO_ITERATOR_CREATED` to the proper stat name and deprecated `NO_ITERATORS`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4733
Differential Revision: D13466936
Pulled By: sagar0
fbshipit-source-id: a58d1edcc07c7b68c3525b1aa05828212c89c6c7
Summary:
This PR fixes#4721. When an exception is caught and thrown as a different exception, then the original exception should be inserted as a cause of the new exception. This bug in RocksDB was swallowing the underlying exception from `NativeLibraryLoader` and throwing the following exception
```
...
Caused by: java.lang.RuntimeException: Unable to load the RocksDB shared libraryjava.nio.channels.ClosedByInterruptException
at org.rocksdb.RocksDB.loadLibrary(RocksDB.java:67)
at org.rocksdb.RocksDB.<clinit>(RocksDB.java:35)
... 73 more
```
The fix is simple and self-explanatory.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4728
Differential Revision: D13418371
Pulled By: sagar0
fbshipit-source-id: d76c25af2a83a0f8ba62cc8d7b721bfddc85fdf1
Summary:
Fixes some RocksJava regressions recently introduced, whereby RocksJava would not build on JDK 7.
These should have been visible on Travis-CI!
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4768
Differential Revision: D13418173
Pulled By: sagar0
fbshipit-source-id: 57bf223188887f84d9e072031af2e0d2c8a69c30
Summary:
When adding CompactionFilter and CompactionFilterFactory settings to the Java layer, ColumnFamilyOptions was modified directly instead of ColumnFamilyOptionsInterface. This meant that the old-stye Options monolith was left behind.
This patch fixes that, by:
- promoting the CompactionFilter + CompactionFilterFactory setters from ColumnFamilyOptions -> ColumnFamilyOptionsInterface
- adding getters in ColumnFamilyOptionsInterface
- implementing setters in Options
- implementing getters in both ColumnFamilyOptions and Options
- adding testcases
- reusing a test CompactionFilterFactory by moving it to a common location
Pull Request resolved: https://github.com/facebook/rocksdb/pull/3461
Differential Revision: D13278788
Pulled By: sagar0
fbshipit-source-id: 72602c6eb97dc80734e718abb5e2e9958d3c753b
Summary:
Compile logs have a bit of noise due to missing javadoc annotations. Updating docs to reduce.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4764
Differential Revision: D13400193
Pulled By: sagar0
fbshipit-source-id: 65c7efb70747cc3bb35a336a6881ea6536ae5ff4
Summary:
Transaction::GetForUpdate is extended with a do_validate parameter with default value of true. If false it skips validating the snapshot (if there is any) before doing the read. After the read it also returns the latest value (expects the ReadOptions::snapshot to be nullptr). This allows RocksDB applications to use GetForUpdate similarly to how InnoDB does. Similarly ::Merge, ::Put, ::Delete, and ::SingleDelete are extended with assume_exclusive_tracked with default value of false. It true it indicates that call is assumed to be after a ::GetForUpdate(do_validate=false).
The Java APIs are accordingly updated.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4680
Differential Revision: D13068508
Pulled By: maysamyabandeh
fbshipit-source-id: f0b59db28f7f6a078b60844d902057140765e67d
Summary:
Current implementation of `current_over_upper_bound_` fails to take into consideration that keys might be invalid in either base iterator or delta iterator. Calling key() in such scenario will lead to assertion failure and runtime errors.
This PR addresses the bug by adding check for valid keys before calling `IsOverUpperBound()`, also added test coverage for iterate_upper_bound usage in BaseDeltaIterator
Also recommit https://github.com/facebook/rocksdb/pull/4656 (It was reverted earlier due to bugs)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4702
Differential Revision: D13146643
Pulled By: miasantreble
fbshipit-source-id: 6d136929da12d0f2e2a5cea474a8038ec5cdf1d0
Summary:
Make CompactionOptionsFIFO's ttl and allow_compaction options to be available in RocksJava.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4609
Differential Revision: D12849503
Pulled By: sagar0
fbshipit-source-id: 47baa97918d252370f234c36c1af15ff2dad7658
Summary:
Currently transaction iterator does not apply `ReadOptions.iterate_upper_bound` when iterating. This PR attempts to fix the problem by having `BaseDeltaIterator` enforcing the upper bound check when iterator state is changed.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4656
Differential Revision: D13039257
Pulled By: miasantreble
fbshipit-source-id: 909eb9f6b4597a4d80418fb139f32ec82c6ec1d1
Summary:
Currently, `Statistics` can record tick by `recordTick()` whose second parameter is an `uint64_t`.
That means tick can only increase.
If we want to reduce tick, we have to work around like `RecordTick(statistics_, NO_ITERATORS, uint64_t(-1));`.
That's kind of a hack.
So, this PR divide `NO_ITERATORS` into two counters `NO_ITERATOR_CREATED` and `NO_ITERATOR_DELETE`, making the counters increase only.
Fixes#3013 .
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4498
Differential Revision: D10395010
Pulled By: sagar0
fbshipit-source-id: cfb523b22a37411c794b4e9da090f1ae30293db2