Summary:
Existing multiGet() in java calls multi_get_helper() which then calls DB::std::vector MultiGet(). This doesn't take advantage of io_uring.
This change adds another JNI level method that runs a parallel code path using the DB::void MultiGet(), using ByteBuffers at the JNI level. We call it multiGetDirect(). In addition to using the io_uring path, this code internally returns pinned slices which we can copy out of into our direct byte buffers; this should reduce the overall number of copies in the code path to/from Java. Some jmh benchmark runs (100k keys, 1000 key multiGet) suggest that for value sizes > 1k, we see about a 20% performance improvement, although performance is slightly reduced for small value sizes, there's a little bit more overhead in the JNI methods.
Closes https://github.com/facebook/rocksdb/issues/8407
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9224
Reviewed By: mrambacher
Differential Revision: D32951754
Pulled By: jay-zhuang
fbshipit-source-id: 1f70df7334be2b6c42a9c8f92725f67c71631690
Summary:
This is the start of some JMH microbenchmarks for RocksJava.
Such benchmarks can help us decide on performance improvements of the Java API.
At the moment, I have only added benchmarks for various Comparator options, as that is one of the first areas where I want to improve performance. I plan to expand this to many more tests.
Details of how to compile and run the benchmarks are in the `README.md`.
A run of these on a XEON 3.5 GHz 4vCPU (QEMU Virtual CPU version 2.5+) / 8GB RAM KVM with Ubuntu 18.04, OpenJDK 1.8.0_232, and gcc 8.3.0 produced the following:
```
# Run complete. Total time: 01:43:17
REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.
Benchmark (comparatorName) Mode Cnt Score Error Units
ComparatorBenchmarks.put native_bytewise thrpt 25 122373.920 ± 2200.538 ops/s
ComparatorBenchmarks.put java_bytewise_adaptive_mutex thrpt 25 17388.201 ± 1444.006 ops/s
ComparatorBenchmarks.put java_bytewise_non-adaptive_mutex thrpt 25 16887.150 ± 1632.204 ops/s
ComparatorBenchmarks.put java_direct_bytewise_adaptive_mutex thrpt 25 15644.572 ± 1791.189 ops/s
ComparatorBenchmarks.put java_direct_bytewise_non-adaptive_mutex thrpt 25 14869.601 ± 2252.135 ops/s
ComparatorBenchmarks.put native_reverse_bytewise thrpt 25 116528.735 ± 4168.797 ops/s
ComparatorBenchmarks.put java_reverse_bytewise_adaptive_mutex thrpt 25 10651.975 ± 545.998 ops/s
ComparatorBenchmarks.put java_reverse_bytewise_non-adaptive_mutex thrpt 25 10514.224 ± 930.069 ops/s
```
Indicating a ~7x difference between comparators implemented natively (C++) and those implemented in Java. Let's see if we can't improve on that in the near future...
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6241
Differential Revision: D19290410
Pulled By: pdillinger
fbshipit-source-id: 25d44bf3a31de265502ed0c5d8a28cf4c7cb9c0b