Summary:
Haven't seen any production issues with new Bloom filter and
it's now > 1 year old (added in 6.6.0).
Updated check_format_compatible.sh and HISTORY.md
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8017
Test Plan: tests updated (or prior bugs fixed)
Reviewed By: ajkr
Differential Revision: D26762197
Pulled By: pdillinger
fbshipit-source-id: 0e755c46b443087c1544da0fd545beb9c403d1c2
Summary:
For dictionary compression, we need to collect some representative samples of the data to be compressed, which we use to either generate or train (when `CompressionOptions::zstd_max_train_bytes > 0`) a dictionary. Previously, the strategy was to buffer all the data blocks during flush, and up to the target file size during compaction. That strategy allowed us to randomly pick samples from as wide a range as possible that'd be guaranteed to land in a single output file.
However, some users try to make huge files in memory-constrained environments, where this strategy can cause OOM. This PR introduces an option, `CompressionOptions::max_dict_buffer_bytes`, that limits how much data blocks are buffered before we switch to unbuffered mode (which means creating the per-SST dictionary, writing out the buffered data, and compressing/writing new blocks as soon as they are built). It is not strict as we currently buffer more than just data blocks -- also keys are buffered. But it does make a step towards giving users predictable memory usage.
Related changes include:
- Changed sampling for dictionary compression to select unique data blocks when there is limited availability of data blocks
- Made use of `BlockBuilder::SwapAndReset()` to save an allocation+memcpy when buffering data blocks for building a dictionary
- Changed `ParseBoolean()` to accept an input containing characters after the boolean. This is necessary since, with this PR, a value for `CompressionOptions::enabled` is no longer necessarily the final component in the `CompressionOptions` string.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7970
Test Plan:
- updated `CompressionOptions` unit tests to verify limit is respected (to the extent expected in the current implementation) in various scenarios of flush/compaction to bottommost/non-bottommost level
- looked at jemalloc heap profiles right before and after switching to unbuffered mode during flush/compaction. Verified memory usage in buffering is proportional to the limit set.
Reviewed By: pdillinger
Differential Revision: D26467994
Pulled By: ajkr
fbshipit-source-id: 3da4ef9fba59974e4ef40e40c01611002c861465
Summary:
this is a trivial PR for rocksdb java samples, I think it is a typo about write options. to do sync write, WAL should not be disabled
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7894
Reviewed By: jay-zhuang
Differential Revision: D26047128
Pulled By: mrambacher
fbshipit-source-id: a06ce54cb61af0d3f2578a709c34a0b1ccecb0b2
Summary:
This request is adding support for using DirectSlice in ReadOptions lower/upper bounds.
To be more efficient I have added setLength to DirectSlice so I can just update the length to be used by slice from direct buffer. It is also needed, because when one creates iterator it keep pointer to original slice so setting new slice in options does not help (it needs to reuse existing one). Using this approach one can modify the slice any time during operations with iterator.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7132
Reviewed By: zhichao-cao
Differential Revision: D25840092
Pulled By: jay-zhuang
fbshipit-source-id: 760167baf61568c9a35138145c4bf9b06824cb71
Summary:
Fix ColumnFamilyOptionsTest.cfPaths and OptionsTest.cfPaths in 6.15 branch (and probably other branches including master)
has_exception variable was not initialized which was causing test failures and incorrect behavior on s390 platform (and maybe others as variable content is undefined).
adamretter please take a look.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7853
Reviewed By: akankshamahajan15
Differential Revision: D25901639
Pulled By: jay-zhuang
fbshipit-source-id: 151b5db27b495fc6d8ed54c0eccbde2508215ac5
Summary:
Classes ColumnFamilyHandle and CapturingWriteBatchHandler.Event have
byte array fields as part of their identity, but they do not use the
arrays' content to compute the instance's hash, and instead rely on the
arrays' identity, causing instances to have different hashcodes
although they are equal.
The PR addresses it by using the arrays' content to compute the hash,
like the equals method does.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7860
Reviewed By: jay-zhuang
Differential Revision: D25901327
Pulled By: akankshamahajan15
fbshipit-source-id: 347e7b3d2ba7befe7faa956b033e6421b9d0c235
Summary:
* Fixes a Java test compilation issue on macOS
* Cleans up CircleCI RocksDBJava build config
* Adds CircleCI for RocksDBJava on MacOS
* Ensures backwards compatibility with older macOS via CircleCI
* Fixes RocksJava static builds ordering
* Adds missing RocksJava static builds to CircleCI for Mac and Linux
* Improves parallelism in RocksJava builds
* Reduces the size of the machines used for RocksJava CircleCI as they don't need to be so large (Saves credits)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7769
Reviewed By: akankshamahajan15
Differential Revision: D25601293
Pulled By: pdillinger
fbshipit-source-id: 0a0bb9906f65438fe143487d78e37e1947364d08
Summary:
Consider the following sequence of events:
1. Db flushed an SST with file number N, appended to MANIFEST, and tried to sync the MANIFEST.
2. Syncing MANIFEST failed and db crashed.
3. Db tried to recover with this MANIFEST. In the meantime, no entry about the newly-flushed SST was found in the MANIFEST. Therefore, RocksDB replayed WAL and tried to flush to an SST file reusing the same file number N. This failed because file system does not support overwrite. Then Db deleted this file.
4. Db crashed again.
5. Db tried to recover. When db read the MANIFEST, there was an entry referencing N.sst. This could happen probably because the append in step 1 finally reached the MANIFEST and became visible. Since N.sst had been deleted in step 3, recovery failed.
It is possible that N.sst created in step 1 is valid. Although step 3 would still fail since the MANIFEST was not synced properly in step 1 and 2, deleting N.sst would make it impossible for the db to recover even if the remaining part of MANIFEST was appended and visible after step 5.
After this PR, in step 3, immediately after recovering from MANIFEST, a new MANIFEST is created, then we find that N.sst is not referenced in the MANIFEST, so we delete it, and we'll not reuse N as file number. Then in step 5, since the new MANIFEST does not contain N.sst, the recovery failure situation in step 5 won't happen.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7621
Test Plan:
1. some tests are updated, because these tests assume that new MANIFEST is created after WAL recovery.
2. a new unit test is added in db_basic_test to simulate step 3.
Reviewed By: riversand963
Differential Revision: D24668144
Pulled By: cheng-chang
fbshipit-source-id: 90d7487fbad2bc3714f5ede46ea949895b15ae3b
Summary:
The original test nests a lot of `try` blocks. This PR flattens these blocks into independent blocks, so that each `try` block closes the DB before opening the next DB instance.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7608
Test Plan: watch the existing java tests to pass
Reviewed By: zhichao-cao
Differential Revision: D24611621
Pulled By: cheng-chang
fbshipit-source-id: d486c5d37ac25d4b860d739ef2cdd58e6064d42d
Summary:
Fixes Issue https://github.com/facebook/rocksdb/issues/7497
When allow_data_in_errors db_options is set, log error key details in `ParseInternalKey()`
Have fixed most of the calls. Have few TODOs still pending - because have to make more deeper changes to pass in the allow_data_in_errors flag. Will do those in a separate PR later.
Tests:
- make check
- some of the existing tests that exercise the "internal key too small" condition are: dbformat_test, cuckoo_table_builder_test
- some of the existing tests that exercise the corrupted key path are: corruption_test, merge_helper_test, compaction_iterator_test
Example of new status returns:
- Key too small - `Corrupted Key: Internal Key too small. Size=5`
- Corrupt key with allow_data_in_errors option set to false: `Corrupted Key: '<redacted>' seq:3, type:3`
- Corrupt key with allow_data_in_errors option set to true: `Corrupted Key: '61' seq:3, type:3`
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7515
Reviewed By: ajkr
Differential Revision: D24240264
Pulled By: ramvadiv
fbshipit-source-id: bc48f5d4475ac19d7713e16df37505b31aac42e7
Summary:
- Takes the burden off developer to close ColumnFamilyHandle instances before closing RocksDB instance
- The change is backward-compatible
----
Previously the pattern for working with Column Families was:
```java
try (final ColumnFamilyOptions cfOpts = new ColumnFamilyOptions().optimizeUniversalStyleCompaction()) {
// list of column family descriptors, first entry must always be default column family
final List<ColumnFamilyDescriptor> cfDescriptors = Arrays.asList(
new ColumnFamilyDescriptor(RocksDB.DEFAULT_COLUMN_FAMILY, cfOpts),
new ColumnFamilyDescriptor("my-first-columnfamily".getBytes(), cfOpts)
);
// a list which will hold the handles for the column families once the db is opened
final List<ColumnFamilyHandle> columnFamilyHandleList =
new ArrayList<>();
try (final DBOptions options = new DBOptions()
.setCreateIfMissing(true)
.setCreateMissingColumnFamilies(true);
final RocksDB db = RocksDB.open(options,
"path/to/do", cfDescriptors,
columnFamilyHandleList)) {
try {
// do something
} finally {
// NOTE user must explicitly frees the column family handles before freeing the db
for (final ColumnFamilyHandle columnFamilyHandle :
columnFamilyHandleList) {
columnFamilyHandle.close();
}
} // frees the column family options
}
} // frees the db and the db options
```
With the changes in this PR, the Java user no longer has to worry about manually closing the Column Families, which allows them to write simpler symmetrical create/free oriented code like this:
```java
try (final ColumnFamilyOptions cfOpts = new ColumnFamilyOptions().optimizeUniversalStyleCompaction()) {
// list of column family descriptors, first entry must always be default column family
final List<ColumnFamilyDescriptor> cfDescriptors = Arrays.asList(
new ColumnFamilyDescriptor(RocksDB.DEFAULT_COLUMN_FAMILY, cfOpts),
new ColumnFamilyDescriptor("my-first-columnfamily".getBytes(), cfOpts)
);
// a list which will hold the handles for the column families once the db is opened
final List<ColumnFamilyHandle> columnFamilyHandleList =
new ArrayList<>();
try (final DBOptions options = new DBOptions()
.setCreateIfMissing(true)
.setCreateMissingColumnFamilies(true);
final RocksDB db = RocksDB.open(options,
"path/to/do", cfDescriptors,
columnFamilyHandleList)) {
// do something
} // frees the column family options, then frees the db and the db options
}
}
```
**NOTE**: The changes in this PR are backwards API compatible, which means existing code using the original approach will also continue to function correctly.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7428
Reviewed By: cheng-chang
Differential Revision: D24063348
Pulled By: jay-zhuang
fbshipit-source-id: 648d7526669923128c863ead94516bf4d50ac658
Summary:
Allows adding event listeners in RocksJava.
* Adds listeners getter and setter in `Options` and `DBOptions` classes.
* Adds `EventListener` Java interface and base class for implementing custom event listener callbacks - `AbstractEventListener`, which has an underlying native callback class implementing C++ `EventListener` class.
* `AbstractEventListener` class has mechanism for selectively enabling its callback methods in order to prevent invoking Java method if it is not implemented. This decreases performance cost in case only subset of event listener callback methods is needed - the JNI code for remaining "no-op" callbacks is not executed.
* The code is covered by unit tests in `EventListenerTest.java`, there are also tests added for setting/getting listeners field in `OptionsTest.java` and `DBOptionsTest.java`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7425
Reviewed By: pdillinger
Differential Revision: D24063390
Pulled By: jay-zhuang
fbshipit-source-id: 508c359538983d6b765e70d9989c351794a944ee
Summary:
Add following stats for MultiGet in Histogram to get more insight on MultiGet.
1. Number of index and filter blocks read from file as part of MultiGet
request per level.
2. Number of data blocks read from file per level.
3. Number of SST files loaded from file system per level.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7366
Reviewed By: anand1976
Differential Revision: D24127040
Pulled By: akankshamahajan15
fbshipit-source-id: e63a003056b833729b277edc0639c08fb432756b
Summary:
as title
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7347
Test Plan: unit tests included
Reviewed By: jay-zhuang
Differential Revision: D23592552
Pulled By: pdillinger
fbshipit-source-id: 1c3571b6f42bfd0cfd723ff49d01fbc02a1be45b
Summary:
Previously RocksJava limited the format_version to 4. However, the C++ API is now at 5, and this will likely increase again in future. The Java API now allows any positive integer, and an exception is raised from JNI if the format_version is out-of-bounds.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7242
Reviewed By: cheng-chang
Differential Revision: D23077941
Pulled By: pdillinger
fbshipit-source-id: ee69f7203448acddc41c6d86b470ed987d3d366d
Summary:
The PR fixes a Java test for Merge operator `uint64add`.
The current implementation uses wrong byte order for long serialization, but fails to catch this error because the merge sum is lower than `256`.
The PR makes this test case more representative (i.e. it fails with wrong byte order) and changes the byte order to little endian.
Some background: RocksDB uses LittleEndian byte order for integer serialization across all platforms. `MergeTest` uses `ByteBuffer` that defaults to BigEndian byte order.
This test case might probably be used as a sample of `MergeOperator` usage in Java.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7243
Reviewed By: ajkr
Differential Revision: D23079593
Pulled By: pdillinger
fbshipit-source-id: 82e8e166901d66733e96a0116f88d0ec4761ddf1
Summary:
Improvements to the RocksJava release process:
* Generates the Maven artifact version number as part of the release step
* Also generates appropriate checksum files to speed the deploy and publish step
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7219
Reviewed By: ltamasi
Differential Revision: D22983481
Pulled By: pdillinger
fbshipit-source-id: 7b8ffaf46471cd3cda181eb830c962b317d2e688
Summary:
Adds compaction statistics (total bytes read and written) for compactions that occur for delete-triggered, periodic, and TTL compaction reasons.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7165
Test Plan:
TTL and periodic can be checked by runnning db_bench with the options activated:
/db_bench --benchmarks="fillrandom,stats" --statistics --num=10000000 -base_background_compactions=16 -periodic_compaction_seconds=1
./db_bench --benchmarks="fillrandom,stats" --statistics --num=10000000 -base_background_compactions=16 -fifo_compaction_ttl=1
Setting the time to one second causes non-zero bytes read/written for those compaction reasons. Disabling them or setting them to times longer than the test run length causes the stats to return to zero as expected.
Delete-triggered compaction counting is tested in DBTablePropertiesTest.DeletionTriggeredCompactionMarking
Reviewed By: ajkr
Differential Revision: D22693050
Pulled By: akabcenell
fbshipit-source-id: d15cef4d94576f703015c8942d5f0d492f69401d
Summary:
SST Partitioner interface that allows to split SST files during compactions.
It basically instruct compaction to create a new file when needed. When one is using well defined prefixes and prefixed way of defining tables it is good to define also partitioning so that promotion of some SST file does not cover huge key space on next level (worst case complete space).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6957
Reviewed By: ajkr
Differential Revision: D22461239
fbshipit-source-id: 9ce07bba08b3ba89c2d45630520368f704d1316e
Summary:
The methods in convenience.h are used to compare/convert objects to/from strings. There is a mishmash of parameters in use here with more needed in the future. This PR replaces those parameters with a single structure.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6389
Reviewed By: siying
Differential Revision: D21163707
Pulled By: zhichao-cao
fbshipit-source-id: f807b4cc7e2b0af3871536b69546b2604dfa81bd
Summary:
This PR exposes the `Iterator::Refresh` method to the Java API by adding it on the `RocksIteratorInterface` interface. There are three concrete implementations: `RocksIterator`, `SstFileReaderIterator`, and `WBWIRocksIterator`. For the first two cases, the JNI side simply delegates to the underlying `Iterator::Refresh` method; in the last case, as it doesn't share an ancestor, and per the discussion in https://github.com/facebook/rocksdb/issues/3465, a `Status::NotSupported` exception is thrown.
As the last PR had no activity in a while, I'm opening a new one - I'm completely fine with merging the previous PR if it gets completed before this is reviewed.
Let me know if there's anything missing or anything else I can do 👍
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6573
Reviewed By: cheng-chang
Differential Revision: D20604666
Pulled By: pdillinger
fbshipit-source-id: 4de17df1180c3b87b76cfdd77b674b81fc0563f7
Summary:
This change is fixing a crash happening in getApproximateSizes JNI implementation. It also reenables Java test that was crashing most likelly because if this bug.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6652
Reviewed By: cheng-chang
Differential Revision: D20874865
Pulled By: pdillinger
fbshipit-source-id: da95516f15e5df2efe1a4e5690a2ce172cb53f87
Summary:
Adding a Java API for rocksdb::CancelAllBackgroundWork() so that the user can call this (when required) before closing the DB. This is to **prevent the crashes when manual compaction is running and the user decides to close the DB**.
Calling CancelAllBackgroundWork() seems to be the recommended way to make sure that it's safe to close the DB (according to RocksDB FAQ: https://github.com/facebook/rocksdb/wiki/RocksDB-FAQ#basic-readwrite).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6657
Reviewed By: cheng-chang
Differential Revision: D20896395
Pulled By: pdillinger
fbshipit-source-id: 8a8208c10093db09bd35db9af362211897870d96
Summary:
After we had a lot of failures with maven.org downloads, we
wanted an alternative location for downloading binary dependencies.
Hosting them through github would have been good in terms of
organizational and network dependencies, but that approach seems to be
awkward (fake releases, so would need a 'rocksdb-deps' repo) and
strangely complicated for Facebook policy on open source repositories.
This commit moves the downloads (that are not officially hosted by
others on github) from my personal rocksdb fork to an S3 bucket owned
by the Facebook RocksDB AWS account. Facebook employees can access
this through an internal tool, and we should be able to grant permission
to outside collaborators.
Assuming this works out, I will back-port to older branches to stabilize
their CI testing as well.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6526
Test Plan: CI
Differential Revision: D20430130
Pulled By: pdillinger
fbshipit-source-id: df52394a65e0a57942db3039bdaade8a4d520cb2
Summary:
In the `.travis.yml` file the `jdk: openjdk7` element is ignored when `language: cpp`. So whatever version of the JDK that was installed in the Travis container was used - typically JDK 11.
To ensure our RocksJava builds are working, we now instead install and use OpenJDK 8. Ideally we would use OpenJDK 7, as RocksJava supports Java 7, but many of the newer Travis containers don't support Java 7, so Java 8 is the next best thing.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6512
Differential Revision: D20388296
Pulled By: pdillinger
fbshipit-source-id: 8bbe6b59b70cfab7fe81ff63867d907fefdd2df1
Summary:
This helps to diagnose errors in the CMake build where it tries to retrieve dependencies.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6511
Differential Revision: D20387392
Pulled By: pdillinger
fbshipit-source-id: 7028dfd62704bcc747f39ff864ea9c9bf51cd1be
Summary:
In most places in the code the variable names are spelled correctly as
COMMITTED but in a couple places not. This fixes them and ensures the
variable is always called COMMITTED everywhere.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6481
Differential Revision: D20306776
Pulled By: pdillinger
fbshipit-source-id: b6c1bfe41db559b4bc6955c530934460c07f7022
Summary:
Threw assert error at assert(isOwningHandle()) in ColumnFamilyHandle.getDescriptor(),
because default CF don't own a handle, due to [RocksDB.getDefaultColumnFamily()](3a408eeae9/java/src/main/java/org/rocksdb/RocksDB.java (L3702)) called cfHandle.disOwnNativeHandle().
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6006
Differential Revision: D19031448
fbshipit-source-id: 2420c45e835bda0e552e919b1b63708472b91538
Summary:
When dynamically linking two binaries together, different builds of RocksDB from two sources might cause errors. To provide a tool for user to solve the problem, the RocksDB namespace is changed to a flag which can be overridden in build time.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6433
Test Plan: Build release, all and jtest. Try to build with ROCKSDB_NAMESPACE with another flag.
Differential Revision: D19977691
fbshipit-source-id: aa7f2d0972e1c31d75339ac48478f34f6cfcfb3e
Summary:
It is very useful to support direct ByteBuffers in Java. It allows to have zero memory copy and some serializers are using that directly so one do not need to create byte[] array for it.
This change also contains some fixes for Windows JNI build.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/2283
Differential Revision: D19834971
Pulled By: pdillinger
fbshipit-source-id: 44173aa02afc9836c5498c592fd1ea95b6086e8e