rocksdb

Commit Graph

Author	SHA1	Message	Date
Peter Dillinger	df5dc73bec	Don't hold DB mutex for block cache entry stat scans (#8538 ) Summary: I previously didn't notice the DB mutex was being held during block cache entry stat scans, probably because I primarily checked for read performance regressions, because they require the block cache and are traditionally latency-sensitive. This change does some refactoring to avoid holding DB mutex and to avoid triggering and waiting for a scan in GetProperty("rocksdb.cfstats"). Some tests have to be updated because now the stats collector is populated in the Cache aggressively on DB startup rather than lazily. (I hope to clean up some of this added complexity in the future.) This change also ensures proper treatment of need_out_of_mutex for non-int DB properties. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8538 Test Plan: Added unit test logic that uses sync points to fail if the DB mutex is held during a scan, covering the various ways that a scan might be triggered. Performance test - the known impact to holding the DB mutex is on TransactionDB, and the easiest way to see the impact is to hack the scan code to almost always miss and take an artificially long time scanning. Here I've injected an unconditional 5s sleep at the call to ApplyToAllEntries. Before (hacked): $ TEST_TMPDIR=/dev/shm ./db_bench.base_xxx -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 \| egrep 'db.db.write.micros\|micros/op' randomtransaction : 433.219 micros/op 2308 ops/sec; 0.1 MB/s ( transactions:78999 aborts:0) rocksdb.db.write.micros P50 : 16.135883 P95 : 36.622503 P99 : 66.036115 P100 : 5000614.000000 COUNT : 149677 SUM : 8364856 $ TEST_TMPDIR=/dev/shm ./db_bench.base_xxx -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 \| egrep 'db.db.write.micros\|micros/op' randomtransaction : 448.802 micros/op 2228 ops/sec; 0.1 MB/s ( transactions:75999 aborts:0) rocksdb.db.write.micros P50 : 16.629221 P95 : 37.320607 P99 : 72.144341 P100 : 5000871.000000 COUNT : 143995 SUM : 13472323 Notice the 5s P100 write time. After (hacked): $ TEST_TMPDIR=/dev/shm ./db_bench.new_xxx -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 \| egrep 'db.db.write.micros\|micros/op' randomtransaction : 303.645 micros/op 3293 ops/sec; 0.1 MB/s ( transactions:98999 aborts:0) rocksdb.db.write.micros P50 : 16.061871 P95 : 33.978834 P99 : 60.018017 P100 : 616315.000000 COUNT : 187619 SUM : 4097407 $ TEST_TMPDIR=/dev/shm ./db_bench.new_xxx -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 \| egrep 'db.db.write.micros\|micros/op' randomtransaction : 310.383 micros/op 3221 ops/sec; 0.1 MB/s ( transactions:96999 aborts:0) rocksdb.db.write.micros P50 : 16.270026 P95 : 35.786844 P99 : 64.302878 P100 : 603088.000000 COUNT : 183819 SUM : 4095918 P100 write is now ~0.6s. Not good, but it's the same even if I completely bypass all the scanning code: $ TEST_TMPDIR=/dev/shm ./db_bench.new_skip -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 \| egrep 'db.db.write.micros\|micros/op' randomtransaction : 311.365 micros/op 3211 ops/sec; 0.1 MB/s ( transactions:96999 aborts:0) rocksdb.db.write.micros P50 : 16.274362 P95 : 36.221184 P99 : 68.809783 P100 : 649808.000000 COUNT : 183819 SUM : 4156767 $ TEST_TMPDIR=/dev/shm ./db_bench.new_skip -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 \| egrep 'db.db.write.micros\|micros/op' randomtransaction : 308.395 micros/op 3242 ops/sec; 0.1 MB/s ( transactions:97999 aborts:0) rocksdb.db.write.micros P50 : 16.106222 P95 : 37.202403 P99 : 67.081875 P100 : 598091.000000 COUNT : 185714 SUM : 4098832 No substantial difference. Reviewed By: siying Differential Revision: D29738847 Pulled By: pdillinger fbshipit-source-id: 1c5c155f5a1b62e4fea0fd4eeb515a8b7474027b	4 years ago
Baptiste Lemaire	e817bc9628	Added memtable garbage statistics (#8411 ) Summary: Summary: 2 new statistics counters are added to RocksDB: `MEMTABLE_PAYLOAD_BYTES_AT_FLUSH` and `MEMTABLE_GARBAGE_BYTES_AT_FLUSH`. The former tracks how many raw bytes of useful data are present on the memtable at flush time, whereas the latter is tracks how many of these raw bytes are considered garbage, meaning that they ended up not being imported on the SSTables resulting from the flush operations. Unit test: run `make db_flush_test -j$(nproc); ./db_flush_test` to run the unit test. This executable includes 3 tests, that test support and correct stat calculations for workloads with inserts, deletes, and DeleteRanges. The parameters are set such that the workloads are performed on a single memtable, and a single SSTable is created as a result of the flush operation. The flush operation is manually called in the test file. The tests verify that the values of these 2 statistics counters introduced in this PR can be exactly predicted, showing that we have a full understanding of the underlying operations. Performance testing: `./db_bench -statistics -benchmarks=fillrandom -num=10000000` repeated 10 times. Timing done using "date" function in a bash script. _Results_: Original Rocksdb fork: mean 66.6 sec, std 1.18 sec. This feature branch: mean 67.4 sec, std 1.35 sec. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8411 Reviewed By: akankshamahajan15 Differential Revision: D29150629 Pulled By: bjlemaire fbshipit-source-id: 7b3c2e86d50c6aa34fa50fd134282eacb543a5b1	4 years ago
Sidi Mohamed EL AATIFI	298edae941	Fix a typo in Javadoc (#8394 ) Summary: iterateLowerBound Slice representing the lower bound Pull Request resolved: https://github.com/facebook/rocksdb/pull/8394 Reviewed By: ajkr Differential Revision: D29085721 Pulled By: jay-zhuang fbshipit-source-id: a154375879395c48e9bd3794d296e70316894056	4 years ago
Adam Retter	69c986825e	Fix javadoc for keyMayExist (#8232 ) Summary: Closes https://github.com/facebook/rocksdb/issues/6985 Pull Request resolved: https://github.com/facebook/rocksdb/pull/8232 Reviewed By: jay-zhuang Differential Revision: D27999779 Pulled By: mrambacher fbshipit-source-id: a37c88d93bde2692b8be9e46e673dda7bea701b2	5 years ago
Yanqin Jin	a376c22066	Handle rename() failure in non-local FS (#8192 ) Summary: In a distributed environment, a file `rename()` operation can succeed on server (remote) side, but the client can somehow return non-ok status to RocksDB. Possible reasons include network partition, connection issue, etc. This happens in `rocksdb::SetCurrentFile()`, which can be called in `LogAndApply() -> ProcessManifestWrites()` if RocksDB tries to switch to a new MANIFEST. We currently always delete the new MANIFEST if an error occurs. This is problematic in distributed world. If the server-side successfully updates the CURRENT file via renaming, then a subsequent `DB::Open()` will try to look for the new MANIFEST and fail. As a fix, we can track the execution result of IO operations on the new MANIFEST. - If IO operations on the new MANIFEST fail, then we know the CURRENT must point to the original MANIFEST. Therefore, it is safe to remove the new MANIFEST. - If IO operations on the new MANIFEST all succeed, but somehow we end up in the clean up code block, then we do not know whether CURRENT points to the new or old MANIFEST. (For local POSIX-compliant FS, it should still point to old MANIFEST, but it does not matter if we keep the new MANIFEST.) Therefore, we keep the new MANIFEST. - Any future `LogAndApply()` will switch to a new MANIFEST and update CURRENT. - If process reopens the db immediately after the failure, then the CURRENT file can point to either the new MANIFEST or the old one, both of which exist. Therefore, recovery can succeed and ignore the other. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8192 Test Plan: make check Reviewed By: zhichao-cao Differential Revision: D27804648 Pulled By: riversand963 fbshipit-source-id: 9c16f2a5ce41bc6aadf085e48449b19ede8423e4	5 years ago
Andrew Kryczka	1ba2b8a568	Add sample_for_compression results to table properties (#8139 ) Summary: Added `TableProperties::{fast,slow}_compression_estimated_data_size`. These properties are present in block-based tables when `ColumnFamilyOptions::sample_for_compression > 0` and the necessary compression library is supported when the file is generated. They contain estimates of what `TableProperties::data_size` would be if the "fast"/"slow" compression library had been used instead. One limitation is we do not record exactly which "fast" (ZSTD or Zlib) or "slow" (LZ4 or Snappy) compression library produced the result. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8139 Test Plan: - new unit test - ran `db_bench` with `sample_for_compression=1`; verified the `data_size` property matches the `{slow,fast}_compression_estimated_data_size` when the same compression type is used for the output file compression and the sampled compression Reviewed By: riversand963 Differential Revision: D27454338 Pulled By: ajkr fbshipit-source-id: 9529293de93ddac7f03b2e149d746e9f634abac4	5 years ago
Jay Zhuang	a781b103da	Fix getApproximateMemTableStats() return type (#8098 ) Summary: Which should return 2 long instead of an array. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8098 Reviewed By: mrambacher Differential Revision: D27308741 Pulled By: jay-zhuang fbshipit-source-id: 44beea2bd28cf6779b048bebc98f2426fe95e25c	5 years ago
Vlad Artamonov	4a6bc47b2e	Fix possible mistype in a comment (#8086 ) Summary: This is a small fix to what I think is a mistype in two comments in `DBOptionsInterface.java`. If it was not an error, feel free to close. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8086 Reviewed By: ajkr Differential Revision: D27260488 Pulled By: mrambacher fbshipit-source-id: 469daadaf6039d5b5187132b8e0c7c3672842f21	5 years ago
Zhichao Cao	08ec5e7321	Add the statistics and info log for Error handler (#8050 ) Summary: Add statistics and info log for error handler: counters for bg error, bg io error, bg retryable io error, auto resume, auto resume total retry, and auto resume sucess; Histogram for auto resume retry count in each recovery call. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8050 Test Plan: make check and add test to error_handler_fs_test Reviewed By: anand1976 Differential Revision: D26990565 Pulled By: zhichao-cao fbshipit-source-id: 49f71e8ea4e9db8b189943976404205b56ab883f	5 years ago
Xiaopeng Zhang	c603f2f898	support getUsage and getPinnedUsage in JavaAPI for Cache (#7925 ) Summary: support getUsage and getPinnedUsage in JavaAPI for Cache also fix a typo in LRUCacheTest.java that the highPriPoolRatio is not valid(set 5, I guess it means 0.05) Pull Request resolved: https://github.com/facebook/rocksdb/pull/7925 Reviewed By: mrambacher Differential Revision: D26900241 Pulled By: ajkr fbshipit-source-id: 735d1e40a16fa8919c89c7c7154ba7f81208ec33	5 years ago
stefan-zobel	8d9088464b	Java-API: Fix minor Javadoc copy-paste errors (#8034 ) Summary: Fixes 3 minor Javadoc copy-paste errors in the `RocksDB#newIterator()` and `Transaction#getIterator()` variants that take a column family handle but are talking about iterating over "the database" or "the default column family". Pull Request resolved: https://github.com/facebook/rocksdb/pull/8034 Reviewed By: jay-zhuang Differential Revision: D26877667 Pulled By: mrambacher fbshipit-source-id: 95dd95b667c496e389f221acc9a91b340e4b63bf	5 years ago
stefan-zobel	cc34da75b5	Java-API: byteCompressionType should be declared as primitive type byte (#7981 ) Summary: The variable `byteCompressionType` is only assigned values of primitive type and is never 'null', but it is declared with the boxed type 'Byte'. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7981 Reviewed By: ajkr Differential Revision: D26546600 Pulled By: jay-zhuang fbshipit-source-id: 07b579cdfcfc2262a448ca3626e216416fd05892	5 years ago
Peter Dillinger	0028e3398b	Make format_version=5 new default (#8017 ) Summary: Haven't seen any production issues with new Bloom filter and it's now > 1 year old (added in 6.6.0). Updated check_format_compatible.sh and HISTORY.md Pull Request resolved: https://github.com/facebook/rocksdb/pull/8017 Test Plan: tests updated (or prior bugs fixed) Reviewed By: ajkr Differential Revision: D26762197 Pulled By: pdillinger fbshipit-source-id: 0e755c46b443087c1544da0fd545beb9c403d1c2	5 years ago
stefan-zobel	430842f948	Java-API: Missing space in string literal (#7982 ) Summary: `TtlDB.open()`: missing space after 'column' `AdvancedColumnFamilyOptionsInterface.setLevelCompactionDynamicLevelBytes()`: missing space after 'cause' Pull Request resolved: https://github.com/facebook/rocksdb/pull/7982 Reviewed By: ajkr Differential Revision: D26546632 Pulled By: jay-zhuang fbshipit-source-id: 885dedcaa2200842764fbac9ce3766d54e1c8914	5 years ago
stefan-zobel	251143f8fb	rocksdbjni: Possible NPE in RocksDB.setOptions #7869 (#7909 ) Summary: Fix for https://github.com/facebook/rocksdb/issues/7869 Pull Request resolved: https://github.com/facebook/rocksdb/pull/7909 Reviewed By: akankshamahajan15 Differential Revision: D26181440 Pulled By: ajkr fbshipit-source-id: f323aec9d91e177fa873599b99801b391cf094b1	5 years ago
Tomas Kolda	d76a8eeef7	Fixing Windows build using CMake (#7854 ) Summary: Builds were not producing Windows binaries properly in 6.15 branch: ``` 00:00:46.413 Tests run: 11, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 0.183 sec <<< FAILURE! - in org.rocksdb.EventListenerTest 00:00:46.414 testAllCallbacksInvocation(org.rocksdb.EventListenerTest) Time elapsed: 0.012 sec <<< ERROR! 00:00:46.414 java.lang.UnsatisfiedLinkError: org.rocksdb.test.TestableEventListener.invokeAllCallbacks(J)V 00:00:46.414 at org.rocksdb.test.TestableEventListener.invokeAllCallbacks(Native Method) 00:00:46.414 at org.rocksdb.test.TestableEventListener.invokeAllCallbacks(TestableEventListener.java:19) 00:00:46.414 at org.rocksdb.EventListenerTest.testAllCallbacksInvocation(EventListenerTest.java:436) ``` ``` 00:00:41.497 "D:\j\workspace\RocksDB_Build_Windows\build\java\rocksdbjni_headers.vcxproj" (default target) (3) -> 00:00:41.497 (CustomBuild target) -> 00:00:41.497 CUSTOMBUILD : error : Could not find class file for 'org.rocksdb.TestableEventListener'. [D:\j\workspace\RocksDB_Build_Windows\build\java\rocksdbjni_headers.vcxproj] ``` Also failed on Linux as library was not initialized yet: ``` 00:01:25.103 Running org.rocksdb.NativeComparatorWrapperTest 00:01:25.133 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.006 sec <<< FAILURE! - in org.rocksdb.NativeComparatorWrapperTest 00:01:25.133 rountrip(org.rocksdb.NativeComparatorWrapperTest) Time elapsed: 0.002 sec <<< ERROR! 00:01:25.133 java.lang.UnsatisfiedLinkError: org.rocksdb.NativeComparatorWrapperTest$NativeStringComparatorWrapper.newStringComparator()J 00:01:25.133 at org.rocksdb.NativeComparatorWrapperTest$NativeStringComparatorWrapper.newStringComparator(Native Method) 00:01:25.133 at org.rocksdb.NativeComparatorWrapperTest$NativeStringComparatorWrapper.initializeNative(NativeComparatorWrapperTest.java:87) 00:01:25.133 at org.rocksdb.RocksCallbackObject.<init>(RocksCallbackObject.java:28) 00:01:25.133 at org.rocksdb.AbstractComparator.<init>(AbstractComparator.java:20) 00:01:25.133 at org.rocksdb.NativeComparatorWrapper.<init>(NativeComparatorWrapper.java:16) 00:01:25.133 at org.rocksdb.NativeComparatorWrapperTest$NativeStringComparatorWrapper.<init>(NativeComparatorWrapperTest.java:82) 00:01:25.133 at org.rocksdb.NativeComparatorWrapperTest.rountrip(NativeComparatorWrapperTest.java:30) ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/7854 Reviewed By: jay-zhuang Differential Revision: D25873378 Pulled By: ajkr fbshipit-source-id: 88afb08bfd30edff31f17da063e636df0769cbfe	5 years ago
Tomas Kolda	1001bc01c9	Read Options to support direct slice (#7132 ) Summary: This request is adding support for using DirectSlice in ReadOptions lower/upper bounds. To be more efficient I have added setLength to DirectSlice so I can just update the length to be used by slice from direct buffer. It is also needed, because when one creates iterator it keep pointer to original slice so setting new slice in options does not help (it needs to reuse existing one). Using this approach one can modify the slice any time during operations with iterator. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7132 Reviewed By: zhichao-cao Differential Revision: D25840092 Pulled By: jay-zhuang fbshipit-source-id: 760167baf61568c9a35138145c4bf9b06824cb71	5 years ago
Laurent Goujon	0426d4a4ee	Fix Java hashCode implementation (#7860 ) Summary: Classes ColumnFamilyHandle and CapturingWriteBatchHandler.Event have byte array fields as part of their identity, but they do not use the arrays' content to compute the instance's hash, and instead rely on the arrays' identity, causing instances to have different hashcodes although they are equal. The PR addresses it by using the arrays' content to compute the hash, like the equals method does. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7860 Reviewed By: jay-zhuang Differential Revision: D25901327 Pulled By: akankshamahajan15 fbshipit-source-id: 347e7b3d2ba7befe7faa956b033e6421b9d0c235	5 years ago
Adam Retter	62afa968c2	Fix various small build issues, Java API naming (#7776 ) Summary: * Compatibility with older GCC. * Compatibility with older jemalloc libraries. * Remove Docker warning when building i686 binaries. * Fix case inconsistency in Java API naming (potential update to HISTORY.md deferred) Pull Request resolved: https://github.com/facebook/rocksdb/pull/7776 Reviewed By: akankshamahajan15 Differential Revision: D25607235 Pulled By: pdillinger fbshipit-source-id: 7ab0fb7fa7a34e97ed0bec991f5081acb095777d	5 years ago
Cheng Chang	5e794b0841	Fix a recovery corner case (#7621 ) Summary: Consider the following sequence of events: 1. Db flushed an SST with file number N, appended to MANIFEST, and tried to sync the MANIFEST. 2. Syncing MANIFEST failed and db crashed. 3. Db tried to recover with this MANIFEST. In the meantime, no entry about the newly-flushed SST was found in the MANIFEST. Therefore, RocksDB replayed WAL and tried to flush to an SST file reusing the same file number N. This failed because file system does not support overwrite. Then Db deleted this file. 4. Db crashed again. 5. Db tried to recover. When db read the MANIFEST, there was an entry referencing N.sst. This could happen probably because the append in step 1 finally reached the MANIFEST and became visible. Since N.sst had been deleted in step 3, recovery failed. It is possible that N.sst created in step 1 is valid. Although step 3 would still fail since the MANIFEST was not synced properly in step 1 and 2, deleting N.sst would make it impossible for the db to recover even if the remaining part of MANIFEST was appended and visible after step 5. After this PR, in step 3, immediately after recovering from MANIFEST, a new MANIFEST is created, then we find that N.sst is not referenced in the MANIFEST, so we delete it, and we'll not reuse N as file number. Then in step 5, since the new MANIFEST does not contain N.sst, the recovery failure situation in step 5 won't happen. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7621 Test Plan: 1. some tests are updated, because these tests assume that new MANIFEST is created after WAL recovery. 2. a new unit test is added in db_basic_test to simulate step 3. Reviewed By: riversand963 Differential Revision: D24668144 Pulled By: cheng-chang fbshipit-source-id: 90d7487fbad2bc3714f5ede46ea949895b15ae3b	5 years ago
cheng-chang	1f627210ca	Simplify a test case in Java ReadOnlyTest (#7608 ) Summary: The original test nests a lot of `try` blocks. This PR flattens these blocks into independent blocks, so that each `try` block closes the DB before opening the next DB instance. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7608 Test Plan: watch the existing java tests to pass Reviewed By: zhichao-cao Differential Revision: D24611621 Pulled By: cheng-chang fbshipit-source-id: d486c5d37ac25d4b860d739ef2cdd58e6064d42d	5 years ago
Jermy Li	99a0305bb8	java: correct method name RocksDB.GetColumnFamilyMetaData() (#7606 ) Summary: update GetColumnFamilyMetaData() to getColumnFamilyMetaData() Pull Request resolved: https://github.com/facebook/rocksdb/pull/7606 Reviewed By: zhichao-cao Differential Revision: D24610298 Pulled By: cheng-chang fbshipit-source-id: d24f9b65478da1456f50747637dc95688af874de	5 years ago
Tomasz Posluszny	05fba96927	Make RocksDB instance responsible for closing associated ColumnFamilyHandle instances (#7428 ) Summary: - Takes the burden off developer to close ColumnFamilyHandle instances before closing RocksDB instance - The change is backward-compatible ---- Previously the pattern for working with Column Families was: ```java try (final ColumnFamilyOptions cfOpts = new ColumnFamilyOptions().optimizeUniversalStyleCompaction()) { // list of column family descriptors, first entry must always be default column family final List<ColumnFamilyDescriptor> cfDescriptors = Arrays.asList( new ColumnFamilyDescriptor(RocksDB.DEFAULT_COLUMN_FAMILY, cfOpts), new ColumnFamilyDescriptor("my-first-columnfamily".getBytes(), cfOpts) ); // a list which will hold the handles for the column families once the db is opened final List<ColumnFamilyHandle> columnFamilyHandleList = new ArrayList<>(); try (final DBOptions options = new DBOptions() .setCreateIfMissing(true) .setCreateMissingColumnFamilies(true); final RocksDB db = RocksDB.open(options, "path/to/do", cfDescriptors, columnFamilyHandleList)) { try { // do something } finally { // NOTE user must explicitly frees the column family handles before freeing the db for (final ColumnFamilyHandle columnFamilyHandle : columnFamilyHandleList) { columnFamilyHandle.close(); } } // frees the column family options } } // frees the db and the db options ``` With the changes in this PR, the Java user no longer has to worry about manually closing the Column Families, which allows them to write simpler symmetrical create/free oriented code like this: ```java try (final ColumnFamilyOptions cfOpts = new ColumnFamilyOptions().optimizeUniversalStyleCompaction()) { // list of column family descriptors, first entry must always be default column family final List<ColumnFamilyDescriptor> cfDescriptors = Arrays.asList( new ColumnFamilyDescriptor(RocksDB.DEFAULT_COLUMN_FAMILY, cfOpts), new ColumnFamilyDescriptor("my-first-columnfamily".getBytes(), cfOpts) ); // a list which will hold the handles for the column families once the db is opened final List<ColumnFamilyHandle> columnFamilyHandleList = new ArrayList<>(); try (final DBOptions options = new DBOptions() .setCreateIfMissing(true) .setCreateMissingColumnFamilies(true); final RocksDB db = RocksDB.open(options, "path/to/do", cfDescriptors, columnFamilyHandleList)) { // do something } // frees the column family options, then frees the db and the db options } } ``` NOTE: The changes in this PR are backwards API compatible, which means existing code using the original approach will also continue to function correctly. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7428 Reviewed By: cheng-chang Differential Revision: D24063348 Pulled By: jay-zhuang fbshipit-source-id: 648d7526669923128c863ead94516bf4d50ac658	5 years ago
Tomasz Posluszny	6528ecc800	Add event listeners to RocksJava (#7425 ) Summary: Allows adding event listeners in RocksJava. * Adds listeners getter and setter in `Options` and `DBOptions` classes. * Adds `EventListener` Java interface and base class for implementing custom event listener callbacks - `AbstractEventListener`, which has an underlying native callback class implementing C++ `EventListener` class. * `AbstractEventListener` class has mechanism for selectively enabling its callback methods in order to prevent invoking Java method if it is not implemented. This decreases performance cost in case only subset of event listener callback methods is needed - the JNI code for remaining "no-op" callbacks is not executed. * The code is covered by unit tests in `EventListenerTest.java`, there are also tests added for setting/getting listeners field in `OptionsTest.java` and `DBOptionsTest.java`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7425 Reviewed By: pdillinger Differential Revision: D24063390 Pulled By: jay-zhuang fbshipit-source-id: 508c359538983d6b765e70d9989c351794a944ee	5 years ago
Akanksha Mahajan	38d0a365e3	Add Stats for MultiGet (#7366 ) Summary: Add following stats for MultiGet in Histogram to get more insight on MultiGet. 1. Number of index and filter blocks read from file as part of MultiGet request per level. 2. Number of data blocks read from file per level. 3. Number of SST files loaded from file system per level. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7366 Reviewed By: anand1976 Differential Revision: D24127040 Pulled By: akankshamahajan15 fbshipit-source-id: e63a003056b833729b277edc0639c08fb432756b	5 years ago
Tomasz Posluszny	6efae4b00d	Add missing Java API for boolean and numerical fields in DBOptions (#7387 ) Summary: Exposes the following previously missing DBOptions fields in the RocksJava API: - persist_stats_to_disk - max_write_batch_group_size_bytes - skip_checking_sst_file_sizes_on_db_open - avoid_unnecessary_blocking_io - write_dbid_to_manifest - log_readahead_size - best_efforts_recovery - max_bgerror_resume_count - bgerror_resume_retry_interval Pull Request resolved: https://github.com/facebook/rocksdb/pull/7387 Reviewed By: siying Differential Revision: D23707785 Pulled By: jay-zhuang fbshipit-source-id: e5688c7d40d83128734605ef7b0720a55fdfa699	5 years ago
Adam Retter	3ac07a12fe	RocksJava - Add errorIfLogFileExists parameter to RocksDB.openReadOnly (#7046 ) Summary: Expose from C++ API to Java API. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7046 Reviewed By: riversand963 Differential Revision: D23726297 Pulled By: pdillinger fbshipit-source-id: fc66bf626ce6fe9797e7d021ac849eacab91bf6d	5 years ago
Tomasz Posluszny	6b72342a12	Implement missing Java API for ColumnFamilyOptions (#7372 ) Summary: Covered methods: - OldDefaults() - OptimizeForSmallDb(std::shared_ptr<Cache>) Covered fields: - cf_paths Pull Request resolved: https://github.com/facebook/rocksdb/pull/7372 Reviewed By: pdillinger Differential Revision: D23683449 Pulled By: jay-zhuang fbshipit-source-id: 3e5a8b657cc382c19de3a48c666a3b0e8d96968d	5 years ago
Tomasz Posluszny	ec5add398c	Implement Java API for ConcurrentTaskLimiter class and compaction_thread_limiter field in ColumnFamilyOptions (#7347 ) Summary: as title Pull Request resolved: https://github.com/facebook/rocksdb/pull/7347 Test Plan: unit tests included Reviewed By: jay-zhuang Differential Revision: D23592552 Pulled By: pdillinger fbshipit-source-id: 1c3571b6f42bfd0cfd723ff49d01fbc02a1be45b	5 years ago
Adam Retter	e503f5e0a0	RocksJava should not limit valid format_version (#7242 ) Summary: Previously RocksJava limited the format_version to 4. However, the C++ API is now at 5, and this will likely increase again in future. The Java API now allows any positive integer, and an exception is raised from JNI if the format_version is out-of-bounds. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7242 Reviewed By: cheng-chang Differential Revision: D23077941 Pulled By: pdillinger fbshipit-source-id: ee69f7203448acddc41c6d86b470ed987d3d366d	5 years ago
Arkady Dyakonov	2bc63e3aba	Fix Java test for uint64add merge operator (#7243 ) Summary: The PR fixes a Java test for Merge operator `uint64add`. The current implementation uses wrong byte order for long serialization, but fails to catch this error because the merge sum is lower than `256`. The PR makes this test case more representative (i.e. it fails with wrong byte order) and changes the byte order to little endian. Some background: RocksDB uses LittleEndian byte order for integer serialization across all platforms. `MergeTest` uses `ByteBuffer` that defaults to BigEndian byte order. This test case might probably be used as a sample of `MergeOperator` usage in Java. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7243 Reviewed By: ajkr Differential Revision: D23079593 Pulled By: pdillinger fbshipit-source-id: 82e8e166901d66733e96a0116f88d0ec4761ddf1	5 years ago
Aaron Kabcenell	56ed601df3	Compaction Read/Write Stats by Compaction Type (#7165 ) Summary: Adds compaction statistics (total bytes read and written) for compactions that occur for delete-triggered, periodic, and TTL compaction reasons. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7165 Test Plan: TTL and periodic can be checked by runnning db_bench with the options activated: /db_bench --benchmarks="fillrandom,stats" --statistics --num=10000000 -base_background_compactions=16 -periodic_compaction_seconds=1 ./db_bench --benchmarks="fillrandom,stats" --statistics --num=10000000 -base_background_compactions=16 -fifo_compaction_ttl=1 Setting the time to one second causes non-zero bytes read/written for those compaction reasons. Disabling them or setting them to times longer than the test run length causes the stats to return to zero as expected. Delete-triggered compaction counting is tested in DBTablePropertiesTest.DeletionTriggeredCompactionMarking Reviewed By: ajkr Differential Revision: D22693050 Pulled By: akabcenell fbshipit-source-id: d15cef4d94576f703015c8942d5f0d492f69401d	5 years ago
Tomas Kolda	cd4592c220	SST Partitioner interface that allows to split SST files (#6957 ) Summary: SST Partitioner interface that allows to split SST files during compactions. It basically instruct compaction to create a new file when needed. When one is using well defined prefixes and prefixed way of defining tables it is good to define also partitioning so that promotion of some SST file does not cover huge key space on next level (worst case complete space). Pull Request resolved: https://github.com/facebook/rocksdb/pull/6957 Reviewed By: ajkr Differential Revision: D22461239 fbshipit-source-id: 9ce07bba08b3ba89c2d45630520368f704d1316e	5 years ago
Adam Retter	a08f4031cb	Align RocksJava BlockBasedTableOptions with C++ API (#7088 ) Summary: Closes https://github.com/facebook/rocksdb/issues/6729 Pull Request resolved: https://github.com/facebook/rocksdb/pull/7088 Reviewed By: riversand963 Differential Revision: D22481624 Pulled By: pdillinger fbshipit-source-id: 27c0ebd4168d374ae81f3595e034150c1c97f8b8	5 years ago
Adam Retter	1a8ca6688a	Make sure directory exists before attempting to write to it (#7090 ) Summary: Closes https://github.com/facebook/rocksdb/issues/7053 Pull Request resolved: https://github.com/facebook/rocksdb/pull/7090 Reviewed By: riversand963 Differential Revision: D22481199 Pulled By: pdillinger fbshipit-source-id: 287477db94d57b18bee58189135f44936f1c3ca3	5 years ago
Adam Retter	899e59ecb7	Add DB::OpenAsSecondary to RocksJava (#7047 ) Summary: Closes https://github.com/facebook/rocksdb/issues/5852 Pull Request resolved: https://github.com/facebook/rocksdb/pull/7047 Reviewed By: cheng-chang Differential Revision: D22335162 Pulled By: pdillinger fbshipit-source-id: 75f3c524deccea7ebc0ad288da41f1ea81406c1c	5 years ago
Adam Retter	0117cbfc96	Adds a function to RocksJava for retrieving the version (#7083 ) Summary: Adds the function `RocksDB#rocksdbVersion()` for retrieving the RocksDB version. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7083 Reviewed By: cheng-chang Differential Revision: D22391628 Pulled By: pdillinger fbshipit-source-id: e1cabcf28aa81f5ee8dcdce5c9eca6b3155a279e	5 years ago
Akanksha Mahajan	2677bd5967	Add logs and stats in DeleteScheduler (#6927 ) Summary: Add logs and stats for files marked as trash and files deleted immediately in DeleteScheduler Pull Request resolved: https://github.com/facebook/rocksdb/pull/6927 Test Plan: make check -j64 Reviewed By: riversand963 Differential Revision: D21869068 Pulled By: akankshamahajan15 fbshipit-source-id: e9f673c4fa8049ce648b23c75d742f2f9c6c57a1	5 years ago
Adam Retter	9060e6fa79	Add newer WBWI::NewIteratorWithBase functions to RocksJava (#6872 ) Summary: Exposes the `ReadOptions` arguments to `NewIteratorWithBase`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6872 Reviewed By: ajkr Differential Revision: D21725867 Pulled By: pdillinger fbshipit-source-id: 4079ba590cc13ba7a6244ed91439d89c40a543b6	5 years ago
mrambacher	4cbc19d2a1	Add a ConfigOptions for use in comparing objects and converting to/from strings (#6389 ) Summary: The methods in convenience.h are used to compare/convert objects to/from strings. There is a mishmash of parameters in use here with more needed in the future. This PR replaces those parameters with a single structure. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6389 Reviewed By: siying Differential Revision: D21163707 Pulled By: zhichao-cao fbshipit-source-id: f807b4cc7e2b0af3871536b69546b2604dfa81bd	6 years ago
Nicolas Pépin-Perreault	9e6f3efcd2	Add RocksIterator::Refresh (#6573 ) Summary: This PR exposes the `Iterator::Refresh` method to the Java API by adding it on the `RocksIteratorInterface` interface. There are three concrete implementations: `RocksIterator`, `SstFileReaderIterator`, and `WBWIRocksIterator`. For the first two cases, the JNI side simply delegates to the underlying `Iterator::Refresh` method; in the last case, as it doesn't share an ancestor, and per the discussion in https://github.com/facebook/rocksdb/issues/3465, a `Status::NotSupported` exception is thrown. As the last PR had no activity in a while, I'm opening a new one - I'm completely fine with merging the previous PR if it gets completed before this is reviewed. Let me know if there's anything missing or anything else I can do 👍 Pull Request resolved: https://github.com/facebook/rocksdb/pull/6573 Reviewed By: cheng-chang Differential Revision: D20604666 Pulled By: pdillinger fbshipit-source-id: 4de17df1180c3b87b76cfdd77b674b81fc0563f7	6 years ago
Tomas Kolda	0b136308b0	Fix crash in JNI getApproximateSizes (#6652 ) Summary: This change is fixing a crash happening in getApproximateSizes JNI implementation. It also reenables Java test that was crashing most likelly because if this bug. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6652 Reviewed By: cheng-chang Differential Revision: D20874865 Pulled By: pdillinger fbshipit-source-id: da95516f15e5df2efe1a4e5690a2ce172cb53f87	6 years ago
Sahib Pandori	487ebe4fd5	Add Java API for rocksdb::CancelAllBackgroundWork() (#6657 ) Summary: Adding a Java API for rocksdb::CancelAllBackgroundWork() so that the user can call this (when required) before closing the DB. This is to prevent the crashes when manual compaction is running and the user decides to close the DB. Calling CancelAllBackgroundWork() seems to be the recommended way to make sure that it's safe to close the DB (according to RocksDB FAQ: https://github.com/facebook/rocksdb/wiki/RocksDB-FAQ#basic-readwrite). Pull Request resolved: https://github.com/facebook/rocksdb/pull/6657 Reviewed By: cheng-chang Differential Revision: D20896395 Pulled By: pdillinger fbshipit-source-id: 8a8208c10093db09bd35db9af362211897870d96	6 years ago
Otto Kekäläinen	f6c2777d95	Fix spelling: commited -> committed (#6481 ) Summary: In most places in the code the variable names are spelled correctly as COMMITTED but in a couple places not. This fixes them and ensures the variable is always called COMMITTED everywhere. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6481 Differential Revision: D20306776 Pulled By: pdillinger fbshipit-source-id: b6c1bfe41db559b4bc6955c530934460c07f7022	6 years ago
Jermy Li	72ee067b90	fix assert error while db.getDefaultColumnFamily().getDescriptor() (#6006 ) Summary: Threw assert error at assert(isOwningHandle()) in ColumnFamilyHandle.getDescriptor(), because default CF don't own a handle, due to [RocksDB.getDefaultColumnFamily()](`3a408eeae9/java/src/main/java/org/rocksdb/RocksDB.java (L3702)`) called cfHandle.disOwnNativeHandle(). Pull Request resolved: https://github.com/facebook/rocksdb/pull/6006 Differential Revision: D19031448 fbshipit-source-id: 2420c45e835bda0e552e919b1b63708472b91538	6 years ago
Andrew Kryczka	0f9dcb88b2	Return NotSupported from WriteBatchWithIndex::DeleteRange (#5393 ) Summary: As discovered in https://github.com/facebook/rocksdb/issues/5260 and https://github.com/facebook/rocksdb/issues/5392, reads on the indexed batch do not account for range tombstones. So, return `Status::NotSupported` from `WriteBatchWithIndex::DeleteRange` until we properly support it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5393 Test Plan: added unit test Differential Revision: D19912360 Pulled By: ajkr fbshipit-source-id: 0bbfc978ea015d64516ca708fce2429abba524cb	6 years ago
Tomas Kolda	e412a426d6	JNI direct buffer support for basic operations (#2283 ) Summary: It is very useful to support direct ByteBuffers in Java. It allows to have zero memory copy and some serializers are using that directly so one do not need to create byte[] array for it. This change also contains some fixes for Windows JNI build. Pull Request resolved: https://github.com/facebook/rocksdb/pull/2283 Differential Revision: D19834971 Pulled By: pdillinger fbshipit-source-id: 44173aa02afc9836c5498c592fd1ea95b6086e8e	6 years ago
Adam Retter	7242dae7fe	Improve RocksJava Comparator (#6252 ) Summary: This is a redesign of the API for RocksJava comparators with the aim of improving performance. It also simplifies the class hierarchy. NOTE: This breaks backwards compatibility for existing 3rd party Comparators implemented in Java... so we need to consider carefully which release branches this goes into. Previously when implementing a comparator in Java the developer had a choice of subclassing either `DirectComparator` or `Comparator` which would use direct and non-direct byte-buffers resepectively (via `DirectSlice` and `Slice`). In this redesign there we have eliminated the overhead of using the Java Slice classes, and just use `ByteBuffer`s. The `ComparatorOptions` supplied when constructing a Comparator allow you to choose between direct and non-direct byte buffers by setting `useDirect`. In addition, the `ComparatorOptions` now allow you to choose whether a ByteBuffer is reused over multiple comparator calls, by setting `maxReusedBufferSize > 0`. When buffers are reused, ComparatorOptions provides a choice of mutex type by setting `useAdaptiveMutex`. --- [JMH benchmarks previously indicated](https://github.com/facebook/rocksdb/pull/6241#issue-356398306) that the difference between C++ and Java for implementing a comparator was ~7x slowdown in Java. With these changes, when reusing buffers and guarding access to them via mutexes the slowdown is approximately the same. However, these changes offer a new facility to not reuse mutextes, which reduces the slowdown to ~5.5x in Java. We also offer a `thread_local` mechanism for reusing buffers, which reduces slowdown to ~5.2x in Java (closes https://github.com/facebook/rocksdb/pull/4425). These changes also form a good base for further optimisation work such as further JNI lookup caching, and JNI critical. --- These numbers were captured without jemalloc. With jemalloc, the performance improves for all tests, and the Java slowdown reduces to between 4.8x and 5.x. ``` ComparatorBenchmarks.put native_bytewise thrpt 25 124483.795 ± 2032.443 ops/s ComparatorBenchmarks.put native_reverse_bytewise thrpt 25 114414.536 ± 3486.156 ops/s ComparatorBenchmarks.put java_bytewise_non-direct_reused-64_adaptive-mutex thrpt 25 17228.250 ± 1288.546 ops/s ComparatorBenchmarks.put java_bytewise_non-direct_reused-64_non-adaptive-mutex thrpt 25 16035.865 ± 1248.099 ops/s ComparatorBenchmarks.put java_bytewise_non-direct_reused-64_thread-local thrpt 25 21571.500 ± 871.521 ops/s ComparatorBenchmarks.put java_bytewise_direct_reused-64_adaptive-mutex thrpt 25 23613.773 ± 8465.660 ops/s ComparatorBenchmarks.put java_bytewise_direct_reused-64_non-adaptive-mutex thrpt 25 16768.172 ± 5618.489 ops/s ComparatorBenchmarks.put java_bytewise_direct_reused-64_thread-local thrpt 25 23921.164 ± 8734.742 ops/s ComparatorBenchmarks.put java_bytewise_non-direct_no-reuse thrpt 25 17899.684 ± 839.679 ops/s ComparatorBenchmarks.put java_bytewise_direct_no-reuse thrpt 25 22148.316 ± 1215.527 ops/s ComparatorBenchmarks.put java_reverse_bytewise_non-direct_reused-64_adaptive-mutex thrpt 25 11311.126 ± 820.602 ops/s ComparatorBenchmarks.put java_reverse_bytewise_non-direct_reused-64_non-adaptive-mutex thrpt 25 11421.311 ± 807.210 ops/s ComparatorBenchmarks.put java_reverse_bytewise_non-direct_reused-64_thread-local thrpt 25 11554.005 ± 960.556 ops/s ComparatorBenchmarks.put java_reverse_bytewise_direct_reused-64_adaptive-mutex thrpt 25 22960.523 ± 1673.421 ops/s ComparatorBenchmarks.put java_reverse_bytewise_direct_reused-64_non-adaptive-mutex thrpt 25 18293.317 ± 1434.601 ops/s ComparatorBenchmarks.put java_reverse_bytewise_direct_reused-64_thread-local thrpt 25 24479.361 ± 2157.306 ops/s ComparatorBenchmarks.put java_reverse_bytewise_non-direct_no-reuse thrpt 25 7942.286 ± 626.170 ops/s ComparatorBenchmarks.put java_reverse_bytewise_direct_no-reuse thrpt 25 11781.955 ± 1019.843 ops/s ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/6252 Differential Revision: D19331064 Pulled By: pdillinger fbshipit-source-id: 1f3b794e6a14162b2c3ffb943e8c0e64a0c03738	6 years ago
Adam Retter	e697da0b18	RocksDB#keyMayExist should not assume database values are unicode strings (#6186 ) Summary: Closes https://github.com/facebook/rocksdb/issues/6183 Pull Request resolved: https://github.com/facebook/rocksdb/pull/6186 Differential Revision: D19201281 Pulled By: pdillinger fbshipit-source-id: 1c96b4ea09e826f91e44b0009eba3de0991d9053	6 years ago
Adam Retter	4d3264e4ab	Cleanup deprecation warnings and javadoc (#6218 ) Summary: There are no API changes ;-) Pull Request resolved: https://github.com/facebook/rocksdb/pull/6218 Differential Revision: D19200373 Pulled By: pdillinger fbshipit-source-id: 58d34b01ea53b75a1eccbd72f8b14d6256a7380f	6 years ago

1 2 3 4 5 ...

373 Commits (e1c7209bebfde34a03b397cf6cb6e02aeb864118)