rocksdb

Commit Graph

Author	SHA1	Message	Date
myasuka	dc00e4b120	Introduce allowStall option for write buffer manager constructor (#9076 ) Summary: https://github.com/facebook/rocksdb/pull/7898 enable write buffer manager to stall write when memory_usage exceeds buffer_size, this is really useful for container running case to limit the memory usage. However, this feature is not visiable for rocksJava yet. This PR targets to introduce this feature for rocksJava. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9076 Reviewed By: akankshamahajan15 Differential Revision: D31931092 Pulled By: anand1976 fbshipit-source-id: 5531c16a87598663a02368c07b5e13a503164578	4 years ago
Jonathan Albrecht	e970248602	Add support for building on s390x platform (#8962 ) Summary: This PR adds support for building on s390x including updating travis CI. It uses the previous work in https://github.com/facebook/rocksdb/pull/6168 and adds some more changes to get all current tests (make check and jni tests) to pass. The tests were run with snappy, lz4, bzip2 and zstd all compiled in. There are a few pieces still needed to get the travis build working that I don't think I can do. adamretter is this something you could help with? 1. A prebuilt https://rocksdb-deps.s3-us-west-2.amazonaws.com/cmake/cmake-3.14.5-Linux-s390x.deb package 2. A https://hub.docker.com/r/evolvedbinary/rocksjava s390x image Not sure if there is more required for travis. Happy to help in any way I can. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8962 Reviewed By: mrambacher Differential Revision: D31802198 Pulled By: pdillinger fbshipit-source-id: 683511466fa6b505f85ba5a9964a268c6151f0c2	4 years ago
Alan Paxton	8d615a2b1d	New-style blob option bindings, Java option getter and improve/fix option parsing (#8999 ) Summary: Implementation of https://github.com/facebook/rocksdb/issues/8221, plus/including extension of Java options API to allow the get() of options from RocksDB. The extension allows more comprehensive testing of options at the Java side, by validating that the options are set at the C++ side. Variations on methods: MutableColumnFamilyOptions.MutableColumnFamilyOptionsBuilder getOptions() MutableDBOptions.MutableDBOptionsBuilder getDBOptions() retrieve the options via RocksDB C++ interfaces, and parse the resulting string into one of the Java-style option objects. This necessitated generalising the parsing of option strings in Java, which now parses the full range of option strings returned by the C++ interface, rather than a useful subset. This necessitates the list-separator being changed to :(colon) from , (comma). Pull Request resolved: https://github.com/facebook/rocksdb/pull/8999 Reviewed By: jay-zhuang Differential Revision: D31655487 Pulled By: ltamasi fbshipit-source-id: c38e98145c81c61dc38238b0df580db176ce4efd	4 years ago
Alan Paxton	86cf7266c3	keyMayExist() supports ByteBuffer (#9013 ) Summary: closes https://github.com/facebook/rocksdb/issues/7917 Implemented ByteBuffer API variants of Java keyMayExist() uniformly with and without column families, read options and return data values. Implemented 2 supporting C++ JNI methods. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9013 Reviewed By: mrambacher Differential Revision: D31665989 Pulled By: jay-zhuang fbshipit-source-id: 8adc1730217dba38d6fa7b31d788650a33e28af1	4 years ago
Alan Paxton	f5526af8ed	Fix multiget throwing NPE for num of keys > 70k (#9012 ) Summary: closes https://github.com/facebook/rocksdb/issues/8039 Unnecessary use of multiple local JNI references at the same time, 1 per key, was limiting the size of the key array. The local references don't need to be held simultaneously, so if we rearrange the code we can make it work for bigger key arrays. Incidentally, make errors throw helpful exception messages rather than returning a null pointer. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9012 Reviewed By: mrambacher Differential Revision: D31580862 Pulled By: jay-zhuang fbshipit-source-id: ce05831d52ede332e1b20e74d2dc621d219b9616	4 years ago
Jay Zhuang	6b34eb0ebc	Add remote compaction read/write bytes statistics (#8939 ) Summary: Add basic read/write bytes statistics on the primary side: `REMOTE_COMPACT_READ_BYTES` `REMOTE_COMPACT_WRITE_BYTES` Fixed existing statistics missing some IO for remote compaction. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8939 Test Plan: CI Reviewed By: ajkr Differential Revision: D31074672 Pulled By: jay-zhuang fbshipit-source-id: c57afdba369990185008ffaec7e3fe7c62e8902f	4 years ago
Peter Dillinger	cb5b851ff8	Add (& fix) some simple source code checks (#8821 ) Summary: * Don't hardcode namespace rocksdb (use ROCKSDB_NAMESPACE) * Don't #include <rocksdb/...> (use double quotes) * Support putting NOCOMMIT (any case) in source code that should not be committed/pushed in current state. These will be run with `make check` and in GitHub actions Pull Request resolved: https://github.com/facebook/rocksdb/pull/8821 Test Plan: existing tests, manually try out new checks Reviewed By: zhichao-cao Differential Revision: D30791726 Pulled By: pdillinger fbshipit-source-id: 399c883f312be24d9e55c58951d4013e18429d92	4 years ago
Andrew Kryczka	9308ff366c	Bytes read/written stats for `CreateNewBackup*()` (#8819 ) Summary: Gets `Statistics` from the options associated with the `DB` undergoing backup, and populates new ticker stats with the thread-local `IOContext` read/write counters for the threads doing backup work. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8819 Reviewed By: pdillinger Differential Revision: D30779238 Pulled By: ajkr fbshipit-source-id: 75ccafc355f90906df5cf80367f7245b985772d8	4 years ago
Andrew Kryczka	941543721d	Bytes read stat for `VerifyChecksum()` and `VerifyFileChecksums()` APIs (#8741 ) Summary: - Clarified some comments on compatibility for adding new ticker stats - Added read I/O stats for `VerifyChecksum()` and `VerifyFileChecksums()` APIs Pull Request resolved: https://github.com/facebook/rocksdb/pull/8741 Test Plan: new unit test Reviewed By: zhichao-cao Differential Revision: D30708578 Pulled By: ajkr fbshipit-source-id: d06b961f7e199ae92c266b683e39870aa8f63449	4 years ago
Peter Dillinger	c9cd5d25a8	Remove some unneeded code (#8736 ) Summary: * FullKey and ParseFullKey appear to serve no purpose in the public API (or anything else) so removed. Only use in one test updated. * NumberToString serves no purpose vs. ToString so removed, numerous calls updated * Remove unnecessary forward declarations in metadata.h by re-arranging class definitions. * Remove some unneeded semicolons Pull Request resolved: https://github.com/facebook/rocksdb/pull/8736 Test Plan: existing tests Reviewed By: mrambacher Differential Revision: D30700039 Pulled By: pdillinger fbshipit-source-id: 1e436a576f511a6ed8b4d97af7cc8216bc729af2	4 years ago
anand76	add68bd28a	Add a stat to count secondary cache hits (#8666 ) Summary: Add a stat for secondary cache hits. The ```Cache::Lookup``` API had an unused ```stats``` parameter. This PR uses that to pass the pointer to a ```Statistics``` object that ```LRUCache``` uses to record the stat. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8666 Test Plan: Update a unit test in lru_cache_test Reviewed By: zhichao-cao Differential Revision: D30353816 Pulled By: anand1976 fbshipit-source-id: 2046f78b460428877a26ffdd2bb914ae47dfbe77	4 years ago
sdong	e7c24168d8	Move old files to warm tier in FIFO compactions (#8310 ) Summary: Some FIFO users want to keep the data for longer, but the old data is rarely accessed. This feature allows users to configure FIFO compaction so that data older than a threshold is moved to a warm storage tier. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8310 Test Plan: Add several unit tests. Reviewed By: ajkr Differential Revision: D28493792 fbshipit-source-id: c14824ea634814dee5278b449ab5c98b6e0b5501	4 years ago
Brendan MacDonell	8ca081780b	Correct javadoc for Env#setBackgroundThreads(int) (#8576 ) Summary: By default, the low priority pool is not the flush pool, so calling `Env#setBackgroundThreads` without providing a priority will not do what the caller expected. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8576 Reviewed By: ajkr Differential Revision: D29925154 Pulled By: mrambacher fbshipit-source-id: cd7211fc374e7d9929a9b88ea0a5ba8134b76099	4 years ago
Mikhail Golubev	8f52972cf9	Allow to use a string as a delimiter in StringAppendOperator (#8536 ) Summary: An arbitrary string can be used as a delimiter in StringAppend merge operator flavor. In particular, it allows using an empty string, combining binary values for the same key byte-to-byte one next to another. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8536 Reviewed By: mrambacher Differential Revision: D29962120 Pulled By: zhichao-cao fbshipit-source-id: 4ef5d846a47835cf428a11200409e30e2dbffc4f	4 years ago
Anatolii Zhmaiev	9ddb55a8f6	Add periodic_compaction_seconds option to RocksJava (#8579 ) Summary: Fixes https://github.com/facebook/rocksdb/issues/8578 Pull Request resolved: https://github.com/facebook/rocksdb/pull/8579 Reviewed By: ajkr Differential Revision: D29895081 Pulled By: mrambacher fbshipit-source-id: 3e4120e26a3e8252f8301d657c0aaa0b8550cddf	4 years ago
Peter Dillinger	df5dc73bec	Don't hold DB mutex for block cache entry stat scans (#8538 ) Summary: I previously didn't notice the DB mutex was being held during block cache entry stat scans, probably because I primarily checked for read performance regressions, because they require the block cache and are traditionally latency-sensitive. This change does some refactoring to avoid holding DB mutex and to avoid triggering and waiting for a scan in GetProperty("rocksdb.cfstats"). Some tests have to be updated because now the stats collector is populated in the Cache aggressively on DB startup rather than lazily. (I hope to clean up some of this added complexity in the future.) This change also ensures proper treatment of need_out_of_mutex for non-int DB properties. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8538 Test Plan: Added unit test logic that uses sync points to fail if the DB mutex is held during a scan, covering the various ways that a scan might be triggered. Performance test - the known impact to holding the DB mutex is on TransactionDB, and the easiest way to see the impact is to hack the scan code to almost always miss and take an artificially long time scanning. Here I've injected an unconditional 5s sleep at the call to ApplyToAllEntries. Before (hacked): $ TEST_TMPDIR=/dev/shm ./db_bench.base_xxx -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 \| egrep 'db.db.write.micros\|micros/op' randomtransaction : 433.219 micros/op 2308 ops/sec; 0.1 MB/s ( transactions:78999 aborts:0) rocksdb.db.write.micros P50 : 16.135883 P95 : 36.622503 P99 : 66.036115 P100 : 5000614.000000 COUNT : 149677 SUM : 8364856 $ TEST_TMPDIR=/dev/shm ./db_bench.base_xxx -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 \| egrep 'db.db.write.micros\|micros/op' randomtransaction : 448.802 micros/op 2228 ops/sec; 0.1 MB/s ( transactions:75999 aborts:0) rocksdb.db.write.micros P50 : 16.629221 P95 : 37.320607 P99 : 72.144341 P100 : 5000871.000000 COUNT : 143995 SUM : 13472323 Notice the 5s P100 write time. After (hacked): $ TEST_TMPDIR=/dev/shm ./db_bench.new_xxx -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 \| egrep 'db.db.write.micros\|micros/op' randomtransaction : 303.645 micros/op 3293 ops/sec; 0.1 MB/s ( transactions:98999 aborts:0) rocksdb.db.write.micros P50 : 16.061871 P95 : 33.978834 P99 : 60.018017 P100 : 616315.000000 COUNT : 187619 SUM : 4097407 $ TEST_TMPDIR=/dev/shm ./db_bench.new_xxx -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 \| egrep 'db.db.write.micros\|micros/op' randomtransaction : 310.383 micros/op 3221 ops/sec; 0.1 MB/s ( transactions:96999 aborts:0) rocksdb.db.write.micros P50 : 16.270026 P95 : 35.786844 P99 : 64.302878 P100 : 603088.000000 COUNT : 183819 SUM : 4095918 P100 write is now ~0.6s. Not good, but it's the same even if I completely bypass all the scanning code: $ TEST_TMPDIR=/dev/shm ./db_bench.new_skip -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 \| egrep 'db.db.write.micros\|micros/op' randomtransaction : 311.365 micros/op 3211 ops/sec; 0.1 MB/s ( transactions:96999 aborts:0) rocksdb.db.write.micros P50 : 16.274362 P95 : 36.221184 P99 : 68.809783 P100 : 649808.000000 COUNT : 183819 SUM : 4156767 $ TEST_TMPDIR=/dev/shm ./db_bench.new_skip -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 \| egrep 'db.db.write.micros\|micros/op' randomtransaction : 308.395 micros/op 3242 ops/sec; 0.1 MB/s ( transactions:97999 aborts:0) rocksdb.db.write.micros P50 : 16.106222 P95 : 37.202403 P99 : 67.081875 P100 : 598091.000000 COUNT : 185714 SUM : 4098832 No substantial difference. Reviewed By: siying Differential Revision: D29738847 Pulled By: pdillinger fbshipit-source-id: 1c5c155f5a1b62e4fea0fd4eeb515a8b7474027b	4 years ago
longlijian	4e4ec16957	Replace the namespace "rocksdb" to "ROCKSDB_NAMESPACE" (#8531 ) Summary: For more detail can reference the https://github.com/facebook/rocksdb/issues/6433 (https://github.com/facebook/rocksdb/pull/6433) Pull Request resolved: https://github.com/facebook/rocksdb/pull/8531 Reviewed By: siying Differential Revision: D29717057 Pulled By: ajkr fbshipit-source-id: 3ccad9501e5612590e54a7cf8c447118f323c7f4	4 years ago
Baptiste Lemaire	e817bc9628	Added memtable garbage statistics (#8411 ) Summary: Summary: 2 new statistics counters are added to RocksDB: `MEMTABLE_PAYLOAD_BYTES_AT_FLUSH` and `MEMTABLE_GARBAGE_BYTES_AT_FLUSH`. The former tracks how many raw bytes of useful data are present on the memtable at flush time, whereas the latter is tracks how many of these raw bytes are considered garbage, meaning that they ended up not being imported on the SSTables resulting from the flush operations. Unit test: run `make db_flush_test -j$(nproc); ./db_flush_test` to run the unit test. This executable includes 3 tests, that test support and correct stat calculations for workloads with inserts, deletes, and DeleteRanges. The parameters are set such that the workloads are performed on a single memtable, and a single SSTable is created as a result of the flush operation. The flush operation is manually called in the test file. The tests verify that the values of these 2 statistics counters introduced in this PR can be exactly predicted, showing that we have a full understanding of the underlying operations. Performance testing: `./db_bench -statistics -benchmarks=fillrandom -num=10000000` repeated 10 times. Timing done using "date" function in a bash script. _Results_: Original Rocksdb fork: mean 66.6 sec, std 1.18 sec. This feature branch: mean 67.4 sec, std 1.35 sec. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8411 Reviewed By: akankshamahajan15 Differential Revision: D29150629 Pulled By: bjlemaire fbshipit-source-id: 7b3c2e86d50c6aa34fa50fd134282eacb543a5b1	4 years ago
Sidi Mohamed EL AATIFI	298edae941	Fix a typo in Javadoc (#8394 ) Summary: iterateLowerBound Slice representing the lower bound Pull Request resolved: https://github.com/facebook/rocksdb/pull/8394 Reviewed By: ajkr Differential Revision: D29085721 Pulled By: jay-zhuang fbshipit-source-id: a154375879395c48e9bd3794d296e70316894056	4 years ago
mrambacher	8948dc8524	Make ImmutableOptions struct that inherits from ImmutableCFOptions and ImmutableDBOptions (#8262 ) Summary: The ImmutableCFOptions contained a bunch of fields that belonged to the ImmutableDBOptions. This change cleans that up by introducing an ImmutableOptions struct. Following the pattern of Options struct, this class inherits from the DB and CFOption structs (of the Immutable form). Only one structural change (the ImmutableCFOptions::fs was changed to a shared_ptr from a raw one) is in this PR. All of the other changes involve moving the member variables from the ImmutableCFOptions into the ImmutableOptions and changing member variables or function parameters as required for compilation purposes. Follow-on PRs may do a further clean-up of the code, such as renaming variables (such as "ImmutableOptions cf_options") and potentially eliminating un-needed function parameters (there is no longer a need to pass both an ImmutableDBOptions and an ImmutableOptions to a function). Pull Request resolved: https://github.com/facebook/rocksdb/pull/8262 Reviewed By: pdillinger Differential Revision: D28226540 Pulled By: mrambacher fbshipit-source-id: 18ae71eadc879dedbe38b1eb8e6f9ff5c7147dbf	5 years ago
Adam Retter	69c986825e	Fix javadoc for keyMayExist (#8232 ) Summary: Closes https://github.com/facebook/rocksdb/issues/6985 Pull Request resolved: https://github.com/facebook/rocksdb/pull/8232 Reviewed By: jay-zhuang Differential Revision: D27999779 Pulled By: mrambacher fbshipit-source-id: a37c88d93bde2692b8be9e46e673dda7bea701b2	5 years ago
Yanqin Jin	a376c22066	Handle rename() failure in non-local FS (#8192 ) Summary: In a distributed environment, a file `rename()` operation can succeed on server (remote) side, but the client can somehow return non-ok status to RocksDB. Possible reasons include network partition, connection issue, etc. This happens in `rocksdb::SetCurrentFile()`, which can be called in `LogAndApply() -> ProcessManifestWrites()` if RocksDB tries to switch to a new MANIFEST. We currently always delete the new MANIFEST if an error occurs. This is problematic in distributed world. If the server-side successfully updates the CURRENT file via renaming, then a subsequent `DB::Open()` will try to look for the new MANIFEST and fail. As a fix, we can track the execution result of IO operations on the new MANIFEST. - If IO operations on the new MANIFEST fail, then we know the CURRENT must point to the original MANIFEST. Therefore, it is safe to remove the new MANIFEST. - If IO operations on the new MANIFEST all succeed, but somehow we end up in the clean up code block, then we do not know whether CURRENT points to the new or old MANIFEST. (For local POSIX-compliant FS, it should still point to old MANIFEST, but it does not matter if we keep the new MANIFEST.) Therefore, we keep the new MANIFEST. - Any future `LogAndApply()` will switch to a new MANIFEST and update CURRENT. - If process reopens the db immediately after the failure, then the CURRENT file can point to either the new MANIFEST or the old one, both of which exist. Therefore, recovery can succeed and ignore the other. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8192 Test Plan: make check Reviewed By: zhichao-cao Differential Revision: D27804648 Pulled By: riversand963 fbshipit-source-id: 9c16f2a5ce41bc6aadf085e48449b19ede8423e4	5 years ago
Andrew Kryczka	1ba2b8a568	Add sample_for_compression results to table properties (#8139 ) Summary: Added `TableProperties::{fast,slow}_compression_estimated_data_size`. These properties are present in block-based tables when `ColumnFamilyOptions::sample_for_compression > 0` and the necessary compression library is supported when the file is generated. They contain estimates of what `TableProperties::data_size` would be if the "fast"/"slow" compression library had been used instead. One limitation is we do not record exactly which "fast" (ZSTD or Zlib) or "slow" (LZ4 or Snappy) compression library produced the result. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8139 Test Plan: - new unit test - ran `db_bench` with `sample_for_compression=1`; verified the `data_size` property matches the `{slow,fast}_compression_estimated_data_size` when the same compression type is used for the output file compression and the sampled compression Reviewed By: riversand963 Differential Revision: D27454338 Pulled By: ajkr fbshipit-source-id: 9529293de93ddac7f03b2e149d746e9f634abac4	5 years ago
Jay Zhuang	a781b103da	Fix getApproximateMemTableStats() return type (#8098 ) Summary: Which should return 2 long instead of an array. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8098 Reviewed By: mrambacher Differential Revision: D27308741 Pulled By: jay-zhuang fbshipit-source-id: 44beea2bd28cf6779b048bebc98f2426fe95e25c	5 years ago
Vlad Artamonov	4a6bc47b2e	Fix possible mistype in a comment (#8086 ) Summary: This is a small fix to what I think is a mistype in two comments in `DBOptionsInterface.java`. If it was not an error, feel free to close. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8086 Reviewed By: ajkr Differential Revision: D27260488 Pulled By: mrambacher fbshipit-source-id: 469daadaf6039d5b5187132b8e0c7c3672842f21	5 years ago
Zhichao Cao	08ec5e7321	Add the statistics and info log for Error handler (#8050 ) Summary: Add statistics and info log for error handler: counters for bg error, bg io error, bg retryable io error, auto resume, auto resume total retry, and auto resume sucess; Histogram for auto resume retry count in each recovery call. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8050 Test Plan: make check and add test to error_handler_fs_test Reviewed By: anand1976 Differential Revision: D26990565 Pulled By: zhichao-cao fbshipit-source-id: 49f71e8ea4e9db8b189943976404205b56ab883f	5 years ago
Xiaopeng Zhang	c603f2f898	support getUsage and getPinnedUsage in JavaAPI for Cache (#7925 ) Summary: support getUsage and getPinnedUsage in JavaAPI for Cache also fix a typo in LRUCacheTest.java that the highPriPoolRatio is not valid(set 5, I guess it means 0.05) Pull Request resolved: https://github.com/facebook/rocksdb/pull/7925 Reviewed By: mrambacher Differential Revision: D26900241 Pulled By: ajkr fbshipit-source-id: 735d1e40a16fa8919c89c7c7154ba7f81208ec33	5 years ago
stefan-zobel	8d9088464b	Java-API: Fix minor Javadoc copy-paste errors (#8034 ) Summary: Fixes 3 minor Javadoc copy-paste errors in the `RocksDB#newIterator()` and `Transaction#getIterator()` variants that take a column family handle but are talking about iterating over "the database" or "the default column family". Pull Request resolved: https://github.com/facebook/rocksdb/pull/8034 Reviewed By: jay-zhuang Differential Revision: D26877667 Pulled By: mrambacher fbshipit-source-id: 95dd95b667c496e389f221acc9a91b340e4b63bf	5 years ago
stefan-zobel	cc34da75b5	Java-API: byteCompressionType should be declared as primitive type byte (#7981 ) Summary: The variable `byteCompressionType` is only assigned values of primitive type and is never 'null', but it is declared with the boxed type 'Byte'. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7981 Reviewed By: ajkr Differential Revision: D26546600 Pulled By: jay-zhuang fbshipit-source-id: 07b579cdfcfc2262a448ca3626e216416fd05892	5 years ago
Peter Dillinger	0028e3398b	Make format_version=5 new default (#8017 ) Summary: Haven't seen any production issues with new Bloom filter and it's now > 1 year old (added in 6.6.0). Updated check_format_compatible.sh and HISTORY.md Pull Request resolved: https://github.com/facebook/rocksdb/pull/8017 Test Plan: tests updated (or prior bugs fixed) Reviewed By: ajkr Differential Revision: D26762197 Pulled By: pdillinger fbshipit-source-id: 0e755c46b443087c1544da0fd545beb9c403d1c2	5 years ago
stefan-zobel	430842f948	Java-API: Missing space in string literal (#7982 ) Summary: `TtlDB.open()`: missing space after 'column' `AdvancedColumnFamilyOptionsInterface.setLevelCompactionDynamicLevelBytes()`: missing space after 'cause' Pull Request resolved: https://github.com/facebook/rocksdb/pull/7982 Reviewed By: ajkr Differential Revision: D26546632 Pulled By: jay-zhuang fbshipit-source-id: 885dedcaa2200842764fbac9ce3766d54e1c8914	5 years ago
Andrew Kryczka	d904233d2f	Limit buffering for collecting samples for compression dictionary (#7970 ) Summary: For dictionary compression, we need to collect some representative samples of the data to be compressed, which we use to either generate or train (when `CompressionOptions::zstd_max_train_bytes > 0`) a dictionary. Previously, the strategy was to buffer all the data blocks during flush, and up to the target file size during compaction. That strategy allowed us to randomly pick samples from as wide a range as possible that'd be guaranteed to land in a single output file. However, some users try to make huge files in memory-constrained environments, where this strategy can cause OOM. This PR introduces an option, `CompressionOptions::max_dict_buffer_bytes`, that limits how much data blocks are buffered before we switch to unbuffered mode (which means creating the per-SST dictionary, writing out the buffered data, and compressing/writing new blocks as soon as they are built). It is not strict as we currently buffer more than just data blocks -- also keys are buffered. But it does make a step towards giving users predictable memory usage. Related changes include: - Changed sampling for dictionary compression to select unique data blocks when there is limited availability of data blocks - Made use of `BlockBuilder::SwapAndReset()` to save an allocation+memcpy when buffering data blocks for building a dictionary - Changed `ParseBoolean()` to accept an input containing characters after the boolean. This is necessary since, with this PR, a value for `CompressionOptions::enabled` is no longer necessarily the final component in the `CompressionOptions` string. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7970 Test Plan: - updated `CompressionOptions` unit tests to verify limit is respected (to the extent expected in the current implementation) in various scenarios of flush/compaction to bottommost/non-bottommost level - looked at jemalloc heap profiles right before and after switching to unbuffered mode during flush/compaction. Verified memory usage in buffering is proportional to the limit set. Reviewed By: pdillinger Differential Revision: D26467994 Pulled By: ajkr fbshipit-source-id: 3da4ef9fba59974e4ef40e40c01611002c861465	5 years ago
stefan-zobel	251143f8fb	rocksdbjni: Possible NPE in RocksDB.setOptions #7869 (#7909 ) Summary: Fix for https://github.com/facebook/rocksdb/issues/7869 Pull Request resolved: https://github.com/facebook/rocksdb/pull/7909 Reviewed By: akankshamahajan15 Differential Revision: D26181440 Pulled By: ajkr fbshipit-source-id: f323aec9d91e177fa873599b99801b391cf094b1	5 years ago
Xiaopeng Zhang	bf6795aea0	fix java sample typo and replace deprecated code with latest (#7906 ) Summary: 1. replace deprecated code in sample java with latest api 2. fix optimistictransaction sample code typo Pull Request resolved: https://github.com/facebook/rocksdb/pull/7906 Reviewed By: ajkr Differential Revision: D26127429 Pulled By: jay-zhuang fbshipit-source-id: f015ad1435f565cffb8798a4fb5afc44c72d73d7	5 years ago
Xiaopeng Zhang	36963dc2ca	fix write option typo in java samples (#7894 ) Summary: this is a trivial PR for rocksdb java samples, I think it is a typo about write options. to do sync write, WAL should not be disabled Pull Request resolved: https://github.com/facebook/rocksdb/pull/7894 Reviewed By: jay-zhuang Differential Revision: D26047128 Pulled By: mrambacher fbshipit-source-id: a06ce54cb61af0d3f2578a709c34a0b1ccecb0b2	5 years ago
Tomas Kolda	d76a8eeef7	Fixing Windows build using CMake (#7854 ) Summary: Builds were not producing Windows binaries properly in 6.15 branch: ``` 00:00:46.413 Tests run: 11, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 0.183 sec <<< FAILURE! - in org.rocksdb.EventListenerTest 00:00:46.414 testAllCallbacksInvocation(org.rocksdb.EventListenerTest) Time elapsed: 0.012 sec <<< ERROR! 00:00:46.414 java.lang.UnsatisfiedLinkError: org.rocksdb.test.TestableEventListener.invokeAllCallbacks(J)V 00:00:46.414 at org.rocksdb.test.TestableEventListener.invokeAllCallbacks(Native Method) 00:00:46.414 at org.rocksdb.test.TestableEventListener.invokeAllCallbacks(TestableEventListener.java:19) 00:00:46.414 at org.rocksdb.EventListenerTest.testAllCallbacksInvocation(EventListenerTest.java:436) ``` ``` 00:00:41.497 "D:\j\workspace\RocksDB_Build_Windows\build\java\rocksdbjni_headers.vcxproj" (default target) (3) -> 00:00:41.497 (CustomBuild target) -> 00:00:41.497 CUSTOMBUILD : error : Could not find class file for 'org.rocksdb.TestableEventListener'. [D:\j\workspace\RocksDB_Build_Windows\build\java\rocksdbjni_headers.vcxproj] ``` Also failed on Linux as library was not initialized yet: ``` 00:01:25.103 Running org.rocksdb.NativeComparatorWrapperTest 00:01:25.133 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.006 sec <<< FAILURE! - in org.rocksdb.NativeComparatorWrapperTest 00:01:25.133 rountrip(org.rocksdb.NativeComparatorWrapperTest) Time elapsed: 0.002 sec <<< ERROR! 00:01:25.133 java.lang.UnsatisfiedLinkError: org.rocksdb.NativeComparatorWrapperTest$NativeStringComparatorWrapper.newStringComparator()J 00:01:25.133 at org.rocksdb.NativeComparatorWrapperTest$NativeStringComparatorWrapper.newStringComparator(Native Method) 00:01:25.133 at org.rocksdb.NativeComparatorWrapperTest$NativeStringComparatorWrapper.initializeNative(NativeComparatorWrapperTest.java:87) 00:01:25.133 at org.rocksdb.RocksCallbackObject.<init>(RocksCallbackObject.java:28) 00:01:25.133 at org.rocksdb.AbstractComparator.<init>(AbstractComparator.java:20) 00:01:25.133 at org.rocksdb.NativeComparatorWrapper.<init>(NativeComparatorWrapper.java:16) 00:01:25.133 at org.rocksdb.NativeComparatorWrapperTest$NativeStringComparatorWrapper.<init>(NativeComparatorWrapperTest.java:82) 00:01:25.133 at org.rocksdb.NativeComparatorWrapperTest.rountrip(NativeComparatorWrapperTest.java:30) ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/7854 Reviewed By: jay-zhuang Differential Revision: D25873378 Pulled By: ajkr fbshipit-source-id: 88afb08bfd30edff31f17da063e636df0769cbfe	5 years ago
Tomas Kolda	1001bc01c9	Read Options to support direct slice (#7132 ) Summary: This request is adding support for using DirectSlice in ReadOptions lower/upper bounds. To be more efficient I have added setLength to DirectSlice so I can just update the length to be used by slice from direct buffer. It is also needed, because when one creates iterator it keep pointer to original slice so setting new slice in options does not help (it needs to reuse existing one). Using this approach one can modify the slice any time during operations with iterator. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7132 Reviewed By: zhichao-cao Differential Revision: D25840092 Pulled By: jay-zhuang fbshipit-source-id: 760167baf61568c9a35138145c4bf9b06824cb71	5 years ago
Tomas Kolda	ac956f2bea	S390 Linux is failing tests ColumnFamilyOptionsTest.cfPaths (#7853 ) Summary: Fix ColumnFamilyOptionsTest.cfPaths and OptionsTest.cfPaths in 6.15 branch (and probably other branches including master) has_exception variable was not initialized which was causing test failures and incorrect behavior on s390 platform (and maybe others as variable content is undefined). adamretter please take a look. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7853 Reviewed By: akankshamahajan15 Differential Revision: D25901639 Pulled By: jay-zhuang fbshipit-source-id: 151b5db27b495fc6d8ed54c0eccbde2508215ac5	5 years ago
Adam Retter	3e6ee9f82e	Update the versions of the test dependencies used for RocksJava (#7805 ) Summary: Update the versions of the dependencies used for testing RocksJava. pdillinger Please can you add the following to your S3 bucket: 1. https://repo1.maven.org/maven2/junit/junit/4.13.1/junit-4.13.1.jar 2. https://repo1.maven.org/maven2/org/hamcrest/hamcrest/2.2/hamcrest-2.2.jar 3. https://repo1.maven.org/maven2/cglib/cglib/3.3.0/cglib-3.3.0.jar 4. https://repo1.maven.org/maven2/org/assertj/assertj-core/2.9.0/assertj-core-2.9.0.jar Thanks. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7805 Reviewed By: akankshamahajan15 Differential Revision: D25906134 Pulled By: jay-zhuang fbshipit-source-id: 1c6c7d461a73abaff1796bb31f0ad90dcbdef1a0	5 years ago
Laurent Goujon	0426d4a4ee	Fix Java hashCode implementation (#7860 ) Summary: Classes ColumnFamilyHandle and CapturingWriteBatchHandler.Event have byte array fields as part of their identity, but they do not use the arrays' content to compute the instance's hash, and instead rely on the arrays' identity, causing instances to have different hashcodes although they are equal. The PR addresses it by using the arrays' content to compute the hash, like the equals method does. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7860 Reviewed By: jay-zhuang Differential Revision: D25901327 Pulled By: akankshamahajan15 fbshipit-source-id: 347e7b3d2ba7befe7faa956b033e6421b9d0c235	5 years ago
Adam Retter	62afa968c2	Fix various small build issues, Java API naming (#7776 ) Summary: * Compatibility with older GCC. * Compatibility with older jemalloc libraries. * Remove Docker warning when building i686 binaries. * Fix case inconsistency in Java API naming (potential update to HISTORY.md deferred) Pull Request resolved: https://github.com/facebook/rocksdb/pull/7776 Reviewed By: akankshamahajan15 Differential Revision: D25607235 Pulled By: pdillinger fbshipit-source-id: 7ab0fb7fa7a34e97ed0bec991f5081acb095777d	5 years ago
Adam Retter	29d12748b0	Fix failing RocksJava test compilation and add CI (#7769 ) Summary: * Fixes a Java test compilation issue on macOS * Cleans up CircleCI RocksDBJava build config * Adds CircleCI for RocksDBJava on MacOS * Ensures backwards compatibility with older macOS via CircleCI * Fixes RocksJava static builds ordering * Adds missing RocksJava static builds to CircleCI for Mac and Linux * Improves parallelism in RocksJava builds * Reduces the size of the machines used for RocksJava CircleCI as they don't need to be so large (Saves credits) Pull Request resolved: https://github.com/facebook/rocksdb/pull/7769 Reviewed By: akankshamahajan15 Differential Revision: D25601293 Pulled By: pdillinger fbshipit-source-id: 0a0bb9906f65438fe143487d78e37e1947364d08	5 years ago
Cheng Chang	5e794b0841	Fix a recovery corner case (#7621 ) Summary: Consider the following sequence of events: 1. Db flushed an SST with file number N, appended to MANIFEST, and tried to sync the MANIFEST. 2. Syncing MANIFEST failed and db crashed. 3. Db tried to recover with this MANIFEST. In the meantime, no entry about the newly-flushed SST was found in the MANIFEST. Therefore, RocksDB replayed WAL and tried to flush to an SST file reusing the same file number N. This failed because file system does not support overwrite. Then Db deleted this file. 4. Db crashed again. 5. Db tried to recover. When db read the MANIFEST, there was an entry referencing N.sst. This could happen probably because the append in step 1 finally reached the MANIFEST and became visible. Since N.sst had been deleted in step 3, recovery failed. It is possible that N.sst created in step 1 is valid. Although step 3 would still fail since the MANIFEST was not synced properly in step 1 and 2, deleting N.sst would make it impossible for the db to recover even if the remaining part of MANIFEST was appended and visible after step 5. After this PR, in step 3, immediately after recovering from MANIFEST, a new MANIFEST is created, then we find that N.sst is not referenced in the MANIFEST, so we delete it, and we'll not reuse N as file number. Then in step 5, since the new MANIFEST does not contain N.sst, the recovery failure situation in step 5 won't happen. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7621 Test Plan: 1. some tests are updated, because these tests assume that new MANIFEST is created after WAL recovery. 2. a new unit test is added in db_basic_test to simulate step 3. Reviewed By: riversand963 Differential Revision: D24668144 Pulled By: cheng-chang fbshipit-source-id: 90d7487fbad2bc3714f5ede46ea949895b15ae3b	5 years ago
cheng-chang	1f627210ca	Simplify a test case in Java ReadOnlyTest (#7608 ) Summary: The original test nests a lot of `try` blocks. This PR flattens these blocks into independent blocks, so that each `try` block closes the DB before opening the next DB instance. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7608 Test Plan: watch the existing java tests to pass Reviewed By: zhichao-cao Differential Revision: D24611621 Pulled By: cheng-chang fbshipit-source-id: d486c5d37ac25d4b860d739ef2cdd58e6064d42d	5 years ago
Jermy Li	99a0305bb8	java: correct method name RocksDB.GetColumnFamilyMetaData() (#7606 ) Summary: update GetColumnFamilyMetaData() to getColumnFamilyMetaData() Pull Request resolved: https://github.com/facebook/rocksdb/pull/7606 Reviewed By: zhichao-cao Differential Revision: D24610298 Pulled By: cheng-chang fbshipit-source-id: d24f9b65478da1456f50747637dc95688af874de	5 years ago
Ramkumar Vadivelu	9a690a74e1	In ParseInternalKey(), include corrupt key info in Status (#7515 ) Summary: Fixes Issue https://github.com/facebook/rocksdb/issues/7497 When allow_data_in_errors db_options is set, log error key details in `ParseInternalKey()` Have fixed most of the calls. Have few TODOs still pending - because have to make more deeper changes to pass in the allow_data_in_errors flag. Will do those in a separate PR later. Tests: - make check - some of the existing tests that exercise the "internal key too small" condition are: dbformat_test, cuckoo_table_builder_test - some of the existing tests that exercise the corrupted key path are: corruption_test, merge_helper_test, compaction_iterator_test Example of new status returns: - Key too small - `Corrupted Key: Internal Key too small. Size=5` - Corrupt key with allow_data_in_errors option set to false: `Corrupted Key: '<redacted>' seq:3, type:3` - Corrupt key with allow_data_in_errors option set to true: `Corrupted Key: '61' seq:3, type:3` Pull Request resolved: https://github.com/facebook/rocksdb/pull/7515 Reviewed By: ajkr Differential Revision: D24240264 Pulled By: ramvadiv fbshipit-source-id: bc48f5d4475ac19d7713e16df37505b31aac42e7	5 years ago
Adam Retter	ccbf468cb1	Small JNI improvements (#7371 ) Summary: * Avoid some unnecessary array copy operations on read/write * Remove some duplicated code * Don't leak arrays on some exceptions * Fixed some doc comments Pull Request resolved: https://github.com/facebook/rocksdb/pull/7371 Reviewed By: jay-zhuang Differential Revision: D24312932 Pulled By: pdillinger fbshipit-source-id: 422fe6b98bbdb922a148922ac0d2d965c715176e	5 years ago
Tomasz Posluszny	05fba96927	Make RocksDB instance responsible for closing associated ColumnFamilyHandle instances (#7428 ) Summary: - Takes the burden off developer to close ColumnFamilyHandle instances before closing RocksDB instance - The change is backward-compatible ---- Previously the pattern for working with Column Families was: ```java try (final ColumnFamilyOptions cfOpts = new ColumnFamilyOptions().optimizeUniversalStyleCompaction()) { // list of column family descriptors, first entry must always be default column family final List<ColumnFamilyDescriptor> cfDescriptors = Arrays.asList( new ColumnFamilyDescriptor(RocksDB.DEFAULT_COLUMN_FAMILY, cfOpts), new ColumnFamilyDescriptor("my-first-columnfamily".getBytes(), cfOpts) ); // a list which will hold the handles for the column families once the db is opened final List<ColumnFamilyHandle> columnFamilyHandleList = new ArrayList<>(); try (final DBOptions options = new DBOptions() .setCreateIfMissing(true) .setCreateMissingColumnFamilies(true); final RocksDB db = RocksDB.open(options, "path/to/do", cfDescriptors, columnFamilyHandleList)) { try { // do something } finally { // NOTE user must explicitly frees the column family handles before freeing the db for (final ColumnFamilyHandle columnFamilyHandle : columnFamilyHandleList) { columnFamilyHandle.close(); } } // frees the column family options } } // frees the db and the db options ``` With the changes in this PR, the Java user no longer has to worry about manually closing the Column Families, which allows them to write simpler symmetrical create/free oriented code like this: ```java try (final ColumnFamilyOptions cfOpts = new ColumnFamilyOptions().optimizeUniversalStyleCompaction()) { // list of column family descriptors, first entry must always be default column family final List<ColumnFamilyDescriptor> cfDescriptors = Arrays.asList( new ColumnFamilyDescriptor(RocksDB.DEFAULT_COLUMN_FAMILY, cfOpts), new ColumnFamilyDescriptor("my-first-columnfamily".getBytes(), cfOpts) ); // a list which will hold the handles for the column families once the db is opened final List<ColumnFamilyHandle> columnFamilyHandleList = new ArrayList<>(); try (final DBOptions options = new DBOptions() .setCreateIfMissing(true) .setCreateMissingColumnFamilies(true); final RocksDB db = RocksDB.open(options, "path/to/do", cfDescriptors, columnFamilyHandleList)) { // do something } // frees the column family options, then frees the db and the db options } } ``` NOTE: The changes in this PR are backwards API compatible, which means existing code using the original approach will also continue to function correctly. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7428 Reviewed By: cheng-chang Differential Revision: D24063348 Pulled By: jay-zhuang fbshipit-source-id: 648d7526669923128c863ead94516bf4d50ac658	5 years ago
Tomasz Posluszny	6528ecc800	Add event listeners to RocksJava (#7425 ) Summary: Allows adding event listeners in RocksJava. * Adds listeners getter and setter in `Options` and `DBOptions` classes. * Adds `EventListener` Java interface and base class for implementing custom event listener callbacks - `AbstractEventListener`, which has an underlying native callback class implementing C++ `EventListener` class. * `AbstractEventListener` class has mechanism for selectively enabling its callback methods in order to prevent invoking Java method if it is not implemented. This decreases performance cost in case only subset of event listener callback methods is needed - the JNI code for remaining "no-op" callbacks is not executed. * The code is covered by unit tests in `EventListenerTest.java`, there are also tests added for setting/getting listeners field in `OptionsTest.java` and `DBOptionsTest.java`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7425 Reviewed By: pdillinger Differential Revision: D24063390 Pulled By: jay-zhuang fbshipit-source-id: 508c359538983d6b765e70d9989c351794a944ee	5 years ago
Akanksha Mahajan	38d0a365e3	Add Stats for MultiGet (#7366 ) Summary: Add following stats for MultiGet in Histogram to get more insight on MultiGet. 1. Number of index and filter blocks read from file as part of MultiGet request per level. 2. Number of data blocks read from file per level. 3. Number of SST files loaded from file system per level. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7366 Reviewed By: anand1976 Differential Revision: D24127040 Pulled By: akankshamahajan15 fbshipit-source-id: e63a003056b833729b277edc0639c08fb432756b	5 years ago

1 2 3 4 5 ...

794 Commits (5bf9a7d5ee367b542626f9e58041886e25d1650b)