rocksdb

Commit Graph

Author	SHA1	Message	Date
Hui Xiao	09882a52d6	Prepare for deprecation of Options::access_hint_on_compaction_start (#11658 ) Summary: Context/Summary: After https://github.com/facebook/rocksdb/pull/11631, file hint is not longer needed for compaction read. Therefore we can deprecate `Options::access_hint_on_compaction_start`. As this is a public API change, we should first mark the relevant APIs (including the Java's) deprecated and remove it in next major release 9.0. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11658 Test Plan: No code change Reviewed By: ajkr Differential Revision: D47997856 Pulled By: hx235 fbshipit-source-id: 16e015ae7728c224b1caef73143aa9915668f4ac	2 years ago
Vardhan	87a21d08fe	Add an option to trigger flush when the number of range deletions reach a threshold (#11358 ) Summary: Add a mutable column family option `memtable_max_range_deletions`. When non-zero, RocksDB will try to flush the current memtable after it has at least `memtable_max_range_deletions` range deletions. Java API is added and crash test is updated accordingly to randomly enable this option. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11358 Test Plan: * New unit test: `DBRangeDelTest.MemtableMaxRangeDeletions` * Ran crash test `python3 ./tools/db_crashtest.py whitebox --simple --memtable_max_range_deletions=20` and saw logs showing flushed memtables usually with 20 range deletions. Reviewed By: ajkr Differential Revision: D46582680 Pulled By: cbi42 fbshipit-source-id: f23d6fa8d8264ecf0a18d55c113ba03f5e2504da	2 years ago
Yu Zhang	c24ef26ca7	Support switching on / off UDT together with in-Memtable-only feature (#11623 ) Summary: Add support to allow enabling / disabling user-defined timestamps feature for an existing column family in combination with the in-Memtable only feature. To do this, this PR includes: 1) Log the `persist_user_defined_timestamps` option per column family in Manifest to facilitate detecting an attempt to enable / disable UDT. This entry is enforced to be logged in the same VersionEdit as the user comparator name entry. 2) User-defined timestamps related options are validated when re-opening a column family, including user comparator name and the `persist_user_defined_timestamps` flag. These type of settings and settings change are considered valid: a) no user comparator change and no effective `persist_user_defined_timestamp` flag change. b) switch user comparator to enable UDT provided the immediately effective `persist_user_defined_timestamps` flag is false. c) switch user comparator to disable UDT provided that the before-change `persist_user_defined_timestamps` is already false. 3) when an attempt to enable UDT is detected, we mark all its existing SST files as "having no UDT" by marking its `FileMetaData.user_defined_timestamps_persisted` flag to false and handle their file boundaries `FileMetaData.smallest`, `FileMetaData.largest` by padding a min timestamp. 4) while enabling / disabling UDT feature, timestamp size inconsistency in existing WAL logs are handled to make it compatible with the running user comparator. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11623 Test Plan: ``` make all check ./db_with_timestamp_basic_test --gtest-filter="EnableDisableUDT" ./db_wal_test --gtest_filter="EnableDisableUDT" ``` Reviewed By: ltamasi Differential Revision: D47636862 Pulled By: jowlyzhang fbshipit-source-id: dcd19f67292da3c3cc9584c09ad00331c9ab9322	2 years ago
Changyu Bi	bc04ec85db	Make option `level_compaction_dynamic_level_bytes` true by default (#11525 ) Summary: after https://github.com/facebook/rocksdb/issues/11321 and https://github.com/facebook/rocksdb/issues/11340 (both included in RocksDB v8.2), migration from `level_compaction_dynamic_level_bytes=false` to `level_compaction_dynamic_level_bytes=true` is automatic by RocksDB and requires no manual compaction from user. Making the option true by default as it has several advantages: 1. better space amplification guarantee (a more stable LSM shape). 2. compaction is more adaptive to write traffic. 3. automatic draining of unneeded levels. Wiki is updated with more detail: https://github.com/facebook/rocksdb/wiki/Leveled-Compaction#option-level_compaction_dynamic_level_bytes-and-levels-target-size. The PR mostly contains fixes for unit tests as they assumed `level_compaction_dynamic_level_bytes=false`. Most notable change is commit `f742be330c` and `b1928e42b3` which override the default option in DBTestBase to still set `level_compaction_dynamic_level_bytes=false` by default. This helps to reduce the change needed for unit tests. I think this default option override in unit tests is okay since the behavior of `level_compaction_dynamic_level_bytes=true` is tested by explicitly setting this option. Also, `level_compaction_dynamic_level_bytes=false` may be more desired in unit tests as it makes it easier to create a desired LSM shape. Comment for option `level_compaction_dynamic_level_bytes` is updated to reflect this change and change made in https://github.com/facebook/rocksdb/issues/10057. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11525 Test Plan: `make -j32 J=32 check` several times to try to catch flaky tests due to this option change. Reviewed By: ajkr Differential Revision: D46654256 Pulled By: cbi42 fbshipit-source-id: 6b5827dae124f6f1fdc8cca2ac6f6fcd878830e1	2 years ago
Alan Paxton	93e0715fad	Implement missing compactrangeoptions from Java API (#10880 ) Summary: Add the following missing options to `src/main/java/org/rocksdb/CompactRangeOptions.java` and in `java/rocksjni/options.cc` in RocksJava. For the descriptions and API see the C++ file `include/rocksdb/options.h`, specifically the struct `CompactRangeOptions` * full_history_ts_low * canceled We changed the handle to return an object (of class `Java_org_rocksdb_CompactRangeOptions`) containing a `ROCKSDB_NAMESPACE::CompactRangeOptions` at (almost certainly) 0-offset, rather than a raw `ROCKSDB_NAMESPACE::CompactRangeOptions`. The `Java_org_rocksdb_CompactRangeOptions` contains as supplementary fields objects (std::string, std::atomic<bool>) which are passed as pointers to the `ROCKSDB_NAMESPACE::CompactRangeOptions` and which must therefore live for as long as the `ROCKSDB_NAMESPACE::CompactRangeOptions`. By placing them in a `Java_org_rocksdb_CompactRangeOptions` we achieve this. Because the field offset of the `ROCKSDB_NAMESPACE::CompactRangeOptions` member is (very probably) 0, casting the handle to ROCKSDB_NAMESPACE::CompactRangeOptions works (i.e. old methods didn’t have to be changed), but really that’s a minefield and the correct answer is to cast to the correct type (Java_org_rocksdb_CompactRangeOptions) and then use the ROCKSDB_NAMESPACE::CompactRangeOptions field in that. So the get/set methods for existing parameters have this change. Testing ------- We added unit tests for getting and setting the newly implemented fields to `CompactRangeOptionsTest` Pull Request resolved: https://github.com/facebook/rocksdb/pull/10880 Reviewed By: ajkr Differential Revision: D41482476 Pulled By: anand1976 fbshipit-source-id: c70795e790436fb3544655920adf6fca62ed34e2	2 years ago
Hui Xiao	50046869a4	Add `rocksdb.file.read.db.open.micros` (#11455 ) Summary: Context/Summary: `rocksdb.file.read.db.open.micros` is left out in https://github.com/facebook/rocksdb/pull/11288 Pull Request resolved: https://github.com/facebook/rocksdb/pull/11455 Test Plan: - db bench Setup: `./db_bench -db=/dev/shm/testdb/ -statistics=true -benchmarks="fillseq" -key_size=32 -value_size=512 -num=5000 -write_buffer_size=655 -target_file_size_base=655 -disable_auto_compactions=false -compression_type=none -bloom_bits=3` Run: `./db_bench --bloom_bits=3 --use_existing_db=1 --seed=1682546046158958 --partition_index_and_filters=1 --statistics=1 -db=/dev/shm/testdb/ -benchmarks=readrandom -key_size=3200 -value_size=512 -num=0 -write_buffer_size=6550000 -disable_auto_compactions=false -target_file_size_base=6550000 -compression_type=none -file_checksum=1 -cache_size=1` ``` rocksdb.sst.read.micros P50 : 3.979798 P95 : 9.738420 P99 : 19.566667 P100 : 39.000000 COUNT : 2360 SUM : 12148 rocksdb.file.read.flush.micros P50 : 0.000000 P95 : 0.000000 P99 : 0.000000 P100 : 0.000000 COUNT : 0 SUM : 0 rocksdb.file.read.compaction.micros P50 : 0.000000 P95 : 0.000000 P99 : 0.000000 P100 : 0.000000 COUNT : 0 SUM : 0 rocksdb.file.read.db.open.micros P50 : 3.979798 P95 : 9.738420 P99 : 19.566667 P100 : 39.000000 COUNT : 2360 SUM : 12148 ``` Reviewed By: ajkr Differential Revision: D45951934 Pulled By: hx235 fbshipit-source-id: 6c88639dc1b10d98ecccc963ce32a8800495f55b	2 years ago
Alan Paxton	e110d713e0	Minimal RocksJava compliance with Java 8 language level (EB 1046) (#10951 ) Summary: Apply a small (and automatic) set of IntelliJ Java inspections/repairs to the Java interface to RocksDB Java and its tests. Partly enabled by the fact that we now (from RocksDB7) require java 8. Explicit <p> in empty lines in javadoc comments. Parameters and variables made final where possible. Anonymous subclasses converted lambdas. Some tests which previously used other assertion models were converted to assertj, e.g. (assertThat(actual).isEqualTo(expected) In a very few cases tests were found to be inoperative or broken, and were repaired. No problems with actual RocksDB behaviour were observed. This PR is intended to replace https://github.com/facebook/rocksdb/pull/9618 - that PR was not merged, and attempts to rebase it have yielded a questionable looking diff, so we choose to go back to square 1 here, and implement a conservative set of changes. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10951 Reviewed By: anand1976 Differential Revision: D45057849 Pulled By: ajkr fbshipit-source-id: e4ea46bfc80518ae86f37702b03ca9352bc11c3d	2 years ago
Peter Dillinger	206fdea3d9	Change internal headers with duplicate names (#11408 ) Summary: In IDE navigation I find it annoying that there are two statistics.h files (etc.) and often land on the wrong one. Here I migrate several headers to use the blah.h <- blah_impl.h <- blah.cc idiom. Although clang-format wants "blah.h" to be the top include for "blah.cc", I think overall this is an improvement. No public API changes. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11408 Test Plan: existing tests Reviewed By: ltamasi Differential Revision: D45456696 Pulled By: pdillinger fbshipit-source-id: 809d931253f3272c908cf5facf7e1d32fc507373	2 years ago
Andrew Kryczka	113f3250f1	Add block checksum mismatch ticker stat (#11438 ) Summary: Added a ticker stat, `BLOCK_CHECKSUM_MISMATCH_COUNT`, to count how many block checksum verifications detected a mismatch. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11438 Test Plan: new unit test Reviewed By: pdillinger Differential Revision: D45788179 Pulled By: ajkr fbshipit-source-id: e2b44eba7c23b3e110ebe69eaa78a710dec2590f	2 years ago
Changyu Bi	8827cd0618	Support compacting files to different temperatures in FIFO compaction (#11428 ) Summary: - Add a new option `CompactionOptionsFIFO::file_temperature_age_thresholds` that allows user to specify age thresholds for compacting files to different temperatures. File temperature can be used to store files in different storage media. The new options allows specifying multiple temperature-age pairs. The option uses struct for a temperature-age pair to use the existing parsing functionality to make the option dynamically settable. - Deprecate the old option `age_for_warm` that was added for a similar purpose. - Compaction score calculation logic is updated to check if a file needs to be compacted to change its temperature. - Some refactoring is done in `FIFOCompactionPicker::PickTemperatureChangeCompaction`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11428 Test Plan: adapted unit tests that were for `age_for_warm` to this new option. Reviewed By: ajkr Differential Revision: D45611412 Pulled By: cbi42 fbshipit-source-id: 2dc384841f61cc04abb9681e31aa2de0f0b06106	2 years ago
Hui Xiao	151242ce46	Group rocksdb.sst.read.micros stat by IOActivity flush and compaction (#11288 ) Summary: Context: The existing stat rocksdb.sst.read.micros does not reflect each of compaction and flush cases but aggregate them, which is not so helpful for us to understand IO read behavior of each of them. Summary - Update `StopWatch` and `RandomAccessFileReader` to record `rocksdb.sst.read.micros` and `rocksdb.file.{flush/compaction}.read.micros` - Fixed the default histogram in `RandomAccessFileReader` - New field `ReadOptions/IOOptions::io_activity`; Pass `ReadOptions` through paths under db open, flush and compaction to where we can prepare `IOOptions` and pass it to `RandomAccessFileReader` - Use `thread_status_util` for assertion in `DbStressFSWrapper` for continuous testing on we are passing correct `io_activity` under db open, flush and compaction Pull Request resolved: https://github.com/facebook/rocksdb/pull/11288 Test Plan: - Stress test - Db bench 1: rocksdb.sst.read.micros COUNT ≈ sum of rocksdb.file.read.flush.micros's and rocksdb.file.read.compaction.micros's. (without blob) - May not be exactly the same due to `HistogramStat::Add` only guarantees atomic not accuracy across threads. ``` ./db_bench -db=/dev/shm/testdb/ -statistics=true -benchmarks="fillseq" -key_size=32 -value_size=512 -num=50000 -write_buffer_size=655 -target_file_size_base=655 -disable_auto_compactions=false -compression_type=none -bloom_bits=3 (-use_plain_table=1 -prefix_size=10) ``` ``` // BlockBasedTable rocksdb.sst.read.micros P50 : 2.009374 P95 : 4.968548 P99 : 8.110362 P100 : 43.000000 COUNT : 40456 SUM : 114805 rocksdb.file.read.flush.micros P50 : 1.871841 P95 : 3.872407 P99 : 5.540541 P100 : 43.000000 COUNT : 2250 SUM : 6116 rocksdb.file.read.compaction.micros P50 : 2.023109 P95 : 5.029149 P99 : 8.196910 P100 : 26.000000 COUNT : 38206 SUM : 108689 // PlainTable Does not apply ``` - Db bench 2: performance Read SETUP: db with 900 files ``` ./db_bench -db=/dev/shm/testdb/ -benchmarks="fillseq" -key_size=32 -value_size=512 -num=50000 -write_buffer_size=655 -disable_auto_compactions=true -target_file_size_base=655 -compression_type=none ```run till convergence ``` ./db_bench -seed=1678564177044286 -use_existing_db=true -db=/dev/shm/testdb -benchmarks=readrandom[-X60] -statistics=true -num=1000000 -disable_auto_compactions=true -compression_type=none -bloom_bits=3 ``` Pre-change `readrandom [AVG 60 runs] : 21568 (± 248) ops/sec` Post-change (no regression, -0.3%) `readrandom [AVG 60 runs] : 21486 (± 236) ops/sec` Compaction/Flushrun till convergence ``` ./db_bench -db=/dev/shm/testdb2/ -seed=1678564177044286 -benchmarks="fillseq[-X60]" -key_size=32 -value_size=512 -num=50000 -write_buffer_size=655 -disable_auto_compactions=false -target_file_size_base=655 -compression_type=none rocksdb.sst.read.micros COUNT : 33820 rocksdb.sst.read.flush.micros COUNT : 1800 rocksdb.sst.read.compaction.micros COUNT : 32020 ``` Pre-change `fillseq [AVG 46 runs] : 1391 (± 214) ops/sec; 0.7 (± 0.1) MB/sec` Post-change (no regression, ~-0.4%) `fillseq [AVG 46 runs] : 1385 (± 216) ops/sec; 0.7 (± 0.1) MB/sec` Reviewed By: ajkr Differential Revision: D44007011 Pulled By: hx235 fbshipit-source-id: a54c89e4846dfc9a135389edf3f3eedfea257132	3 years ago
Andrew Kryczka	bd80433c73	Set -source 8 in CMAKE_JAVA_COMPILE_FLAGS (#11385 ) Summary: build-windows-vs2022 jobs (e.g., https://app.circleci.com/pipelines/github/facebook/rocksdb/26641/workflows/7d1c58b8-7dd6-4dd6-a222-ecdfb0892c3b/jobs/593583) began failing with: ``` (CustomBuild target) -> CUSTOMBUILD : error : Source option 7 is no longer supported. Use 8 or later. [C:\Users\circleci.PACKER-64370BA5\project\build\java\rocksdbjni_classes.vcxproj] ``` So, this PR tries setting `-source 8` instead. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11385 Reviewed By: ltamasi Differential Revision: D45058172 Pulled By: ajkr fbshipit-source-id: b0daa1ea6f576c8417add40bd6c92710d329c44d	3 years ago
Hui Xiao	cb58477185	New stat rocksdb.{cf\|db}-write-stall-stats exposed in a structural way (#11300 ) Summary: Context/Summary: Users are interested in figuring out what has caused write stall. - Refactor write stall related stats from property `kCFStats` into its own db property `rocksdb.cf-write-stall-stats` as a map or string. For now, this only contains count of different combination of (CF-scope `WriteStallCause`) + (`WriteStallCondition`) - Add new `WriteStallCause::kWriteBufferManagerLimit` to reflect write stall caused by write buffer manager - Add new `rocksdb.db-write-stall-stats`. For now, this only contains `WriteStallCause::kWriteBufferManagerLimit` + `WriteStallCondition::kStopped` - Expose functions in new class `WriteStallStatsMapKeys` for examining the above two properties returned as map - Misc: rename/comment some write stall InternalStats for clarity Pull Request resolved: https://github.com/facebook/rocksdb/pull/11300 Test Plan: - New UT - Stress test `python3 tools/db_crashtest.py blackbox --simple --get_property_one_in=1` - Perf test: Both converge very slowly at similar rates but post-change has higher average ops/sec than pre-change even though they are run at the same time. ``` ./db_bench -seed=1679014417652004 -db=/dev/shm/testdb/ -statistics=false -benchmarks="fillseq[-X60]" -key_size=32 -value_size=512 -num=100000 -db_write_buffer_size=655 -target_file_size_base=655 -disable_auto_compactions=false -compression_type=none -bloom_bits=3 ``` pre-change: ``` fillseq [AVG 15 runs] : 1176 (± 732) ops/sec; 0.6 (± 0.4) MB/sec fillseq : 1052.671 micros/op 949 ops/sec 105.267 seconds 100000 operations; 0.5 MB/s fillseq [AVG 16 runs] : 1162 (± 685) ops/sec; 0.6 (± 0.4) MB/sec fillseq : 1387.330 micros/op 720 ops/sec 138.733 seconds 100000 operations; 0.4 MB/s fillseq [AVG 17 runs] : 1136 (± 646) ops/sec; 0.6 (± 0.3) MB/sec fillseq : 1232.011 micros/op 811 ops/sec 123.201 seconds 100000 operations; 0.4 MB/s fillseq [AVG 18 runs] : 1118 (± 610) ops/sec; 0.6 (± 0.3) MB/sec fillseq : 1282.567 micros/op 779 ops/sec 128.257 seconds 100000 operations; 0.4 MB/s fillseq [AVG 19 runs] : 1100 (± 578) ops/sec; 0.6 (± 0.3) MB/sec fillseq : 1914.336 micros/op 522 ops/sec 191.434 seconds 100000 operations; 0.3 MB/s fillseq [AVG 20 runs] : 1071 (± 551) ops/sec; 0.6 (± 0.3) MB/sec fillseq : 1227.510 micros/op 814 ops/sec 122.751 seconds 100000 operations; 0.4 MB/s fillseq [AVG 21 runs] : 1059 (± 525) ops/sec; 0.5 (± 0.3) MB/sec ``` post-change: ``` fillseq [AVG 15 runs] : 1226 (± 732) ops/sec; 0.6 (± 0.4) MB/sec fillseq : 1323.825 micros/op 755 ops/sec 132.383 seconds 100000 operations; 0.4 MB/s fillseq [AVG 16 runs] : 1196 (± 687) ops/sec; 0.6 (± 0.4) MB/sec fillseq : 1223.905 micros/op 817 ops/sec 122.391 seconds 100000 operations; 0.4 MB/s fillseq [AVG 17 runs] : 1174 (± 647) ops/sec; 0.6 (± 0.3) MB/sec fillseq : 1168.996 micros/op 855 ops/sec 116.900 seconds 100000 operations; 0.4 MB/s fillseq [AVG 18 runs] : 1156 (± 611) ops/sec; 0.6 (± 0.3) MB/sec fillseq : 1348.729 micros/op 741 ops/sec 134.873 seconds 100000 operations; 0.4 MB/s fillseq [AVG 19 runs] : 1134 (± 579) ops/sec; 0.6 (± 0.3) MB/sec fillseq : 1196.887 micros/op 835 ops/sec 119.689 seconds 100000 operations; 0.4 MB/s fillseq [AVG 20 runs] : 1119 (± 550) ops/sec; 0.6 (± 0.3) MB/sec fillseq : 1193.697 micros/op 837 ops/sec 119.370 seconds 100000 operations; 0.4 MB/s fillseq [AVG 21 runs] : 1106 (± 524) ops/sec; 0.6 (± 0.3) MB/sec ``` Reviewed By: ajkr Differential Revision: D44159541 Pulled By: hx235 fbshipit-source-id: 8d29efb70001fdc52d34535eeb3364fc3e71e40b	3 years ago
Hui Xiao	bab5f9a6f2	Add new stat rocksdb.table.open.prefetch.tail.read.bytes, rocksdb.table.open.prefetch.tail.{miss\|hit} (#11265 ) Summary: Context/Summary: We are adding new stats to measure behavior of prefetched tail size and look up into this buffer The stat collection is done in FilePrefetchBuffer but only for prefetched tail buffer during table open for now using FilePrefetchBuffer enum. It's cleaner than the alternative of implementing in upper-level call places of FilePrefetchBuffer for table open. It also has the benefit of extensible to other types of FilePrefetchBuffer if needed. See db bench for perf regression concern. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11265 Test Plan: - Piggyback on existing test - rocksdb.table.open.prefetch.tail.miss is harder to UT so I manually set prefetch tail read bytes to be small and run db bench. ``` ./db_bench -db=/tmp/testdb -statistics=true -benchmarks="fillseq" -key_size=32 -value_size=512 -num=5000 -write_buffer_size=655 -target_file_size_base=655 -disable_auto_compactions=false -compression_type=none -bloom_bits=3 -use_direct_reads=true ``` ``` rocksdb.table.open.prefetch.tail.read.bytes P50 : 4096.000000 P95 : 4096.000000 P99 : 4096.000000 P100 : 4096.000000 COUNT : 225 SUM : 921600 rocksdb.table.open.prefetch.tail.miss COUNT : 91 rocksdb.table.open.prefetch.tail.hit COUNT : 1034 ``` - No perf regression observed in db_bench SETUP command: create same db with ~900 files for pre-change/post-change. ``` ./db_bench -db=/tmp/testdb -benchmarks="fillseq" -key_size=32 -value_size=512 -num=500000 -write_buffer_size=655360 -disable_auto_compactions=true -target_file_size_base=16777216 -compression_type=none ``` TEST command 60 runs or til convergence: as suggested by anand1976 and akankshamahajan15, vary `seek_nexts` and `async_io` in testing. ``` ./db_bench -use_existing_db=true -db=/tmp/testdb -statistics=false -cache_size=0 -cache_index_and_filter_blocks=false -benchmarks=seekrandom[-X60] -num=50000 -seek_nexts={10, 500, 1000} -async_io={0\|1} -use_direct_reads=true ``` async io = 0, direct io read = true \| seek_nexts = 10, 30 runs \| seek_nexts = 500, 12 runs \| seek_nexts = 1000, 6 runs -- \| -- \| -- \| -- pre-post change \| 4776 (± 28) ops/sec; 24.8 (± 0.1) MB/sec \| 288 (± 1) ops/sec; 74.8 (± 0.4) MB/sec \| 145 (± 4) ops/sec; 75.6 (± 2.2) MB/sec post-change \| 4790 (± 32) ops/sec; 24.9 (± 0.2) MB/sec \| 288 (± 3) ops/sec; 74.7 (± 0.8) MB/sec \| 143 (± 3) ops/sec; 74.5 (± 1.6) MB/sec async io = 1, direct io read = true \| seek_nexts = 10, 54 runs \| seek_nexts = 500, 6 runs \| seek_nexts = 1000, 4 runs -- \| -- \| -- \| -- pre-post change \| 3350 (± 36) ops/sec; 17.4 (± 0.2) MB/sec \| 264 (± 0) ops/sec; 68.7 (± 0.2) MB/sec \| 138 (± 1) ops/sec; 71.8 (± 1.0) MB/sec post-change \| 3358 (± 27) ops/sec; 17.4 (± 0.1) MB/sec \| 263 (± 2) ops/sec; 68.3 (± 0.8) MB/sec \| 139 (± 1) ops/sec; 72.6 (± 0.6) MB/sec Reviewed By: ajkr Differential Revision: D43781467 Pulled By: hx235 fbshipit-source-id: a706a18472a8edb2b952bac3af40eec803537f2a	3 years ago
Alan Paxton	fbd603d04a	Reverse wrong order of parameter names for Java WriteBatchWithIndex#iteratorWithBase (#11280 ) Summary: Fix for https://github.com/facebook/rocksdb/issues/11008 `Java_org_rocksdb_WriteBatchWithIndex_iteratorWithBase` takes parameters `(… jlong jwbwi_handle, jlong jcf_handle, jlong jbase_iterator_handle, jlong jread_opts_handle)` while `WriteBatchWithIndex.java` declares `private native long iteratorWithBase(final long handle, final long baseIteratorHandle, final long cfHandle, final long readOptionsHandle)`. Luckily the only call to `iteratorWithBase` passes the parameters in the correct order for the implementation `(… cfHandle, baseIteratorHandle …)` This type checks because the types are the same (long words). The code is currently used correctly, it is just extremely misleading. Swap the names of the 2 parameters in the Java method so that the correct usage is clear. There already exist test methods which call the API correctly and only succeed because of that. These continue to work. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11280 Reviewed By: cbi42 Differential Revision: D43874798 Pulled By: ajkr fbshipit-source-id: b59bc930bf579f4e0804f0effd4fb17f4225d60c	3 years ago
anand76	cf09917c18	Add filter/index/data secondary cache hits stats (#11246 ) Summary: Add more stats for better visibility into the usefulness of the secondary cache. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11246 Test Plan: Add a new unit test Reviewed By: akankshamahajan15 Differential Revision: D43521364 Pulled By: anand1976 fbshipit-source-id: a92f04884e738a9bf40ad4047acaaaea343838a7	3 years ago
Alan Paxton	d47126875b	Fix Java API ComparatorOptions use after delete error (#11176 ) Summary: The problem ------------- ComparatorOptions is AutoCloseable. AbstractComparator does not hold a reference to its ComparatorOptions, but the native C++ ComparatorJniCallback holds a reference to the ComparatorOptions’ native C++ options structure. This gets deleted when the ComparatorOptions is closed, either explicitly, or as part of try-with-resources. Later, the deleted C++ options structure gets used by the callback and the comparator options are effectively random. The original bug report https://github.com/facebook/rocksdb/issues/8715 was caused by a GC-initiated finalization closing the still-in-use ComparatorOptions. As of 7.0, finalization of RocksDB objects no longer closes them, which worked round the reported bug, but still left ComparatorOptions with a potentially broken lifetime. In any case, we encourage API clients to use the try-with-resources model, and so we need it to work. And if they don't use it, they leak resources. The solution ------------- The solution implemented here is to make a copy of the native C++ options object into the ComparatorJniCallback, rather than a reference. Then the deletion of the native object held by ComparatorOptions is correctly deleted when its scope is closed in try/finally. Testing ------- We added a regression unit test based on the original test for the reported ticket. This checkin closes https://github.com/facebook/rocksdb/issues/8715 We expect that there are more instances of "lifecycle" bugs in the Java API. They are a major source of support time/cost, and we note that they could be addressed as a whole using the model proposed/prototyped in https://github.com/facebook/rocksdb/pull/10736 Pull Request resolved: https://github.com/facebook/rocksdb/pull/11176 Reviewed By: cbi42 Differential Revision: D43160885 Pulled By: pdillinger fbshipit-source-id: 60b54215a02ad9abb17363319650328c00a9ad62	3 years ago
Peter Dillinger	3cacd4b4ec	Put Cache and CacheWrapper in new public header (#11192 ) Summary: The definition of the Cache class should not be needed by the vast majority of RocksDB users, so I think it is just distracting to include it in cache.h, which is primarily needed for configuring and creating caches. This change moves the class to a new header advanced_cache.h. It is just cut-and-paste except for modifying the class API comment. In general, operations on shared_ptr<Cache> should continue to work when only a forward declaration of Cache is available, as long as all the Cache instances provided are already shared_ptr. See https://stackoverflow.com/a/17650101/454544 Also, the most common way to customize a Cache is by wrapping an existing implementation, so it makes sense to provide CacheWrapper in the public API. This was a cut-and-paste job except removing the implementation of Name() so that derived classes must provide it. Intended follow-up: consolidate Release() into one function to reduce customization bugs / confusion Pull Request resolved: https://github.com/facebook/rocksdb/pull/11192 Test Plan: `make check` Reviewed By: anand1976 Differential Revision: D43055487 Pulled By: pdillinger fbshipit-source-id: 7b05492df35e0f30b581b4c24c579bc275b6d110	3 years ago
Hui Xiao	6650ca244e	Remove a couple deprecated convenience.h APIs (#11120 ) Summary: Context/Summary: As instructed by convenience.h comments, a few deprecated APIs are removed. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11120 Test Plan: - make check & CI - eyeball check on test semantics. Reviewed By: pdillinger Differential Revision: D42937507 Pulled By: hx235 fbshipit-source-id: a9e4709387da01b1d0e9148c2e210f02e9746ee1	3 years ago
Symious	68fa90ca43	Add kForceOptimized option to jni (#11181 ) Summary: Currently the option of "KForceOptimized" is not included in CompactRangeOptions.BottommostLevelCompaction. This PR is to add this option. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11181 Reviewed By: ajkr Differential Revision: D43056453 Pulled By: cbi42 fbshipit-source-id: 22fd53f980ab1a86c61dd42e948902542065128f	3 years ago
Peter Dillinger	0cf1008fe3	Deprecate write_global_seqno and default to false (#11179 ) Summary: This option has long been intended to be set to false by default and deprecated. It might never be practical to completely remove the feature, so that we can continue to test for backward compatibility by keeping the ability to generate DBs in the old way. Also improved API comments. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11179 Test Plan: existing tests (with one tiny update) Reviewed By: hx235 Differential Revision: D42973927 Pulled By: pdillinger fbshipit-source-id: e9bc161cb933266e094aea2dff8cc03753c39dab	3 years ago
sdong	4720ba4391	Remove RocksDB LITE (#11147 ) Summary: We haven't been actively mantaining RocksDB LITE recently and the size must have been gone up significantly. We are removing the support. Most of changes were done through following comments: unifdef -m -UROCKSDB_LITE `git grep -l ROCKSDB_LITE \| egrep '[.](cc\|h)'` by Peter Dillinger. Others changes were manually applied to build scripts, CircleCI manifests, ROCKSDB_LITE is used in an expression and file db_stress_test_base.cc. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11147 Test Plan: See CI Reviewed By: pdillinger Differential Revision: D42796341 fbshipit-source-id: 4920e15fc2060c2cd2221330a6d0e5e65d4b7fe2	3 years ago
Yu Zhang	6943ff6e50	Remove deprecated util functions in options_util.h (#11126 ) Summary: Remove the util functions in options_util.h that have previously been marked deprecated. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11126 Test Plan: `make check` Reviewed By: ltamasi Differential Revision: D42757496 Pulled By: jowlyzhang fbshipit-source-id: 2a138a3c207d0e0e0bbb4d99548cf2cadb44bcfb	3 years ago
sdong	e808858ae0	Remove Stats related to compressed block cache (#11135 ) Summary: Since compressed block cache is removed, those stats are not needed. They are removed in different PR in case there is a problem with it. The stats are removed in the same way in https://github.com/facebook/rocksdb/pull/11131/ . HISTORY.md was already updated by mistake, and it would be correct after merging this PR. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11135 Test Plan: Watch CI Reviewed By: ltamasi Differential Revision: D42757616 fbshipit-source-id: bd7cb782585c8535ce5784295225c376f3011f35	3 years ago
Levi Tamasi	6da2e20df3	Remove more obsolete statistics (#11131 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/11131 Test Plan: `make check` Reviewed By: pdillinger Differential Revision: D42753997 Pulled By: ltamasi fbshipit-source-id: ce8b84c1e55374257e93ed74fd255c9b759723ce	3 years ago
Levi Tamasi	99e559533d	Remove some deprecated/obsolete statistics from the API (#11123 ) Summary: These tickers/histograms have been obsolete (and not populated) for a long time. The patch removes them from the API completely. Note that this means that the numeric values of the remaining tickers change in the C++ code as they get shifted up. This should be OK: the values of some existing tickers have changed many times over the years as items have been added in the middle. (In contrast, the convention in the Java bindings is to keep the ids, which are not guaranteed to be the same as the ids on the C++ side, the same across releases.) Pull Request resolved: https://github.com/facebook/rocksdb/pull/11123 Test Plan: `make check` Reviewed By: akankshamahajan15 Differential Revision: D42727793 Pulled By: ltamasi fbshipit-source-id: e058a155a20b05b45f53e67ee380aece1b43b6c5	3 years ago
sdong	2800aa069a	Remove compressed block cache (#11117 ) Summary: Compressed block cache is replaced by compressed secondary cache. Remove the feature. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11117 Test Plan: See CI passes Reviewed By: pdillinger Differential Revision: D42700164 fbshipit-source-id: 6cbb24e460da29311150865f60ecb98637f9f67d	3 years ago
Hui Xiao	9502856edd	Add missing range conflict check between file ingestion and RefitLevel() (#10988 ) Summary: Context: File ingestion never checks whether the key range it acts on overlaps with an ongoing RefitLevel() (used in `CompactRange()` with `change_level=true`). That's because RefitLevel() doesn't register and make its key range known to file ingestion. Though it checks overlapping with other compactions by https://github.com/facebook/rocksdb/blob/7.8.fb/db/external_sst_file_ingestion_job.cc#L998. RefitLevel() (used in `CompactRange()` with `change_level=true`) doesn't check whether the key range it acts on overlaps with an ongoing file ingestion. That's because file ingestion does not register and make its key range known to other compactions. - Note that non-refitlevel-compaction (e.g, manual compaction w/o RefitLevel() or general compaction) also does not check key range overlap with ongoing file ingestion for the same reason. - But it's fine. Credited to cbi42's discovery, `WaitForIngestFile` was called by background and foreground compactions. They were introduced in `0f88160f67`, `5c64fb67d2` and `87dfc1d23e`. - Regardless, this PR registers file ingestion like a compaction is a general approach that will also add range conflict check between file ingestion and non-refitlevel-compaction, though it has not been the issue motivated this PR. Above are bugs resulting in two bad consequences: - If file ingestion and RefitLevel() creates files in the same level, then range-overlapped files will be created at that level and caught as corruption by `force_consistency_checks=true` - If file ingestion and RefitLevel() creates file in different levels, then with one further compaction on the ingested file, it can result in two same keys both with seqno 0 in two different levels. Then with iterator's [optimization](`c62f322169/db/db_iter.cc (L342-L343)`) that assumes no two same keys both with seqno 0, it will either break this assertion in debug build or, even worst, return value of this same key for the key after it, which is the wrong value to return, in release build. Therefore we decide to introduce range conflict check for file ingestion and RefitLevel() inspired from the existing range conflict check among compactions. Summary: - Treat file ingestion job and RefitLevel() as `Compaction` of new compaction reasons: `CompactionReason::kExternalSstIngestion` and `CompactionReason::kRefitLevel` and register/unregister them. File ingestion is treated as compaction from L0 to different levels and RefitLevel() as compaction from source level to target level. - Check for `RangeOverlapWithCompaction` with other ongoing compactions, `RegisterCompaction()` on this "compaction" before changing the LSM state in `VersionStorageInfo`, and `UnregisterCompaction()` after changing. - Replace scattered fixes (`0f88160f67`, `5c64fb67d2` and `87dfc1d23e`.) that prevents overlapping between file ingestion and non-refit-level compaction with this fix cuz those practices are easy to overlook. - Misc: logic cleanup, see PR comments Pull Request resolved: https://github.com/facebook/rocksdb/pull/10988 Test Plan: - New unit test `DBCompactionTestWithOngoingFileIngestionParam*` that failed pre-fix and passed afterwards. - Made compatible with existing tests, see PR comments - make check - [Ongoing] Stress test rehearsal with normal value and aggressive CI value https://github.com/facebook/rocksdb/pull/10761 Reviewed By: cbi42 Differential Revision: D41535685 Pulled By: hx235 fbshipit-source-id: 549833a577ba1496d20a870583d4caa737da1258	3 years ago
Alan Paxton	f8969ad7d4	Improve Java API get() performance by reducing copies (#10970 ) Summary: Performance improvements for `get()` paths in the RocksJava API (JNI). Document describing the performance results. Replace uses of the legacy `DB::Get()` method wrapper returning data in a `std::string` with direct calls to `DB::Get()` passing a pinnable slice to receive this data. Copying from a pinned slice direct to the destination java byte array, without going via an intervening std::string, is a major performance gain for this code path. Note that this gain only comes where `DB::Get()` is able to return a pinned buffer; where it has to copy into the buffer owned by the slice, there is still the intervening copy and no performance gain. It may be possible to address this case too, but it is not trivial. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10970 Reviewed By: pdillinger Differential Revision: D42125567 Pulled By: ajkr fbshipit-source-id: b7a4df7523b0420cadb1e9b6c7da3ec030a8da34	3 years ago
Alan Paxton	6a8920f988	JNI native memory leak - release array elements (#10981 ) Summary: Closes https://github.com/facebook/rocksdb/issues/10980 Reproduced as per the suggestion in the ticket, and `$ jcmd <PID> VM.native_memory \| grep Internal` reports that we are no longer leaking internal memory with the suggested fix. I did the repro in `MultiGetTest.java` which I have optimized imports on. It did not seem helpful to leave the test code around as it would be onerous to build a memory leak reproducer, and regression seems a remote possibility. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10981 Reviewed By: riversand963 Differential Revision: D41498748 Pulled By: ajkr fbshipit-source-id: 8c6dd0d608172879c8bda479c7c9c05c12d34e70	3 years ago
Hui Xiao	2f76ab150d	Fix missing WAL in new manifest by rolling over the WAL deletion record from prev manifest (#10892 ) Summary: Context `Options::track_and_verify_wals_in_manifest = true` verifies each of the WALs tracked in manifest indeed presents in the WAL folder. If not, a corruption "Missing WAL with log number" will be thrown. `DB::SyncWAL()` called at a specific timing (i.e, at the `TEST_SYNC_POINT("FindObsoleteFiles::PostMutexUnlock")`) can record in a new manifest the WAL addition of a WAL file that already had a WAL deletion recorded in the previous manifest. And the WAL deletion record is not rollover-ed to the new manifest. So the new manifest creates the illusion of such WAL never gets deleted and should presents at db re/open. - Such WAL deletion record can be caused by flushing the memtable associated with that WAL and such WAL deletion can actually happen in` PurgeObsoleteFiles()`. As a consequence, upon `DB::Reopen()`, this WAL file can be deleted while manifest still has its WAL addition record , which causes a false alarm of corruption "Missing WAL with log number" to be thrown. Summary This PR fixes this false alarm by rolling over the WAL deletion record from prev manifest to the new manifest by adding the WAL deletion record to the new manifest. Test - Make check - Added new unit test `TEST_F(DBWALTest, FixSyncWalOnObseletedWalWithNewManifestCausingMissingWAL)` that failed before the fix and passed after - [Ongoing]CI stress test + aggressive value as in https://github.com/facebook/rocksdb/pull/10761 , which is how this false alarm was first surfaced, to confirm such false alarm disappears - [Ongoing]Regular CI stress test to confirm such fix didn't harm anything Pull Request resolved: https://github.com/facebook/rocksdb/pull/10892 Reviewed By: ajkr Differential Revision: D40778965 Pulled By: hx235 fbshipit-source-id: a512364bfdeb0b1a55c171890e60d856c528f37f	3 years ago
Alan Paxton	ae115eff8f	improve copying of Env in Options (#10666 ) Summary: Closes https://github.com/facebook/rocksdb/issues/9909 - Constructing an Options from a DBOptions should use the Env from the DBOptions - DBOptions should be constructed with the default Env as the env_, rather than null. Why ever not ? Pull Request resolved: https://github.com/facebook/rocksdb/pull/10666 Reviewed By: riversand963 Differential Revision: D40515418 Pulled By: ajkr fbshipit-source-id: 4122ba3f537660720262694c21ab4bfb13b6f8de	3 years ago
anand76	ecba6a320e	Add some async read stats (#10947 ) Summary: Add stats for time spent in the ReadAsync call, and async read errors. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10947 Test Plan: Run db_bench and look at stats Reviewed By: akankshamahajan15 Differential Revision: D41236637 Pulled By: anand1976 fbshipit-source-id: 70539b69a28491d57acead449436a761f7108acf	3 years ago
Adam Retter	781a387488	Improve musl libc detection and provide an option for the user to override (#10889 ) Summary: The user may override the detection of whether to use GNU libc (the default) or musl libc by setting the environment variable: `ROCKSDB_MUSL_LIBC=true`. Builds upon and supersedes: https://github.com/facebook/rocksdb/pull/9977 Pull Request resolved: https://github.com/facebook/rocksdb/pull/10889 Reviewed By: akankshamahajan15 Differential Revision: D40788431 Pulled By: ajkr fbshipit-source-id: ef594d973fc14cbadf28bfb38434231a18a2107c	3 years ago
Levi Tamasi	ea1982d010	Add missing copyright headers to a couple of Java test files (#10900 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10900 Reviewed By: akankshamahajan15 Differential Revision: D40825886 Pulled By: ltamasi fbshipit-source-id: e60f74aa8a622c3c71e1fee420fd586728fb2b7b	3 years ago
Alan Paxton	17553bdd5e	RocksJava API - fix Transaction.multiGet() size limit, remove bogus EnsureLocalCapacity() calls (#10674 ) Summary: Resolves see https://github.com/facebook/rocksdb/issues/9006 Fixes 2 related issues with JNI local references in the RocksJava API. 1. Some instances of RocksJava API JNI code appear to have misunderstood the reason for `JNIEnv->EnsureLocalCapacity()` and are carrying out bogus checks which happen to fail with some larger parameter values (many column families in a single call, very long key names or values). Remove these checks and add some regression tests for the previous failures. 2. The helper for Transaction multiGet operations (`multiGet()`, `multiGetForUpdate()`,...) is limited in the number of keys it can `get()` for because it requires a corresponding number of live local references. Refactor the helper slightly, copying out the key contents within a loop so that the references don't have to exist at the same time. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10674 Reviewed By: ajkr Differential Revision: D40515361 Pulled By: jay-zhuang fbshipit-source-id: f1be0126181a698b3ad27c0945a39c54d950aa25	3 years ago
Brendan MacDonell	5f915b447d	Fix ChecksumType::kXXH3 in the Java API (#10862 ) Summary: While PR#9749 nominally added support for XXH3 in the Java API, it did not update the `toCppChecksumType` method. As a result, setting the checksum type to XXH3 actually set it to CRC32c instead. This commit adds the missing entry to portal.h, and also updates the tests so that they verify the options passed to RocksDB, instead of simply checking that the getter returns the value set by the setter. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10862 Reviewed By: pdillinger Differential Revision: D40665031 Pulled By: ajkr fbshipit-source-id: 2834419b3361a4bac47db3b858951fb451b5bdc8	3 years ago
sdong	2a551976f4	Run format check for .h and .cc files under java/ (#10851 ) Summary: Run format check for .h and .cc files to clean the format Pull Request resolved: https://github.com/facebook/rocksdb/pull/10851 Test Plan: Watch CI tests to pass Reviewed By: ajkr Differential Revision: D40649723 fbshipit-source-id: 62d32cead0b3b8e6540e86d25451bd72642109eb	3 years ago
Anatol Pomozov	4a83b16ce3	Use grep instead of obsolete egrep (#10701 ) Summary: It fixes "egrep: warning: egrep is obsolescent; using grep -E" warning at the systems with newer gnu grep. https://www.phoronix.com/news/GNU-Grep-3.8-Stop-egrep-fgrep Pull Request resolved: https://github.com/facebook/rocksdb/pull/10701 Reviewed By: ajkr Differential Revision: D39737908 Pulled By: cbi42 fbshipit-source-id: 13f3ccfc1a37d18541d156a6b4c2ba24f6c66589	3 years ago
anand76	72a3fb3424	Update statistics for async scan readaheads (#10585 ) Summary: Imported a fix to "rocksdb.prefetched.bytes.discarded" stat from https://github.com/facebook/rocksdb/issues/10561, and added a new stat "rocksdb.async.prefetch.abort.micros" to measure time spent waiting for async reads to abort. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10585 Reviewed By: akankshamahajan15 Differential Revision: D39067000 Pulled By: anand1976 fbshipit-source-id: d7cda71abb48017239bd5fd832345a16c7024faf	3 years ago
Levi Tamasi	64e74723f7	Use the default metadata charge policy when creating an LRU cache via the Java API (#10577 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10577 Reviewed By: akankshamahajan15 Differential Revision: D39035884 Pulled By: ltamasi fbshipit-source-id: 48f116f8ca172b7eb5eb3651f39ddb891a7ffade	3 years ago
Brendan MacDonell	418b36a9bc	Support CompactionPri::kRoundRobin in RocksJava (#10572 ) Summary: Pretty trivial — this PR just adds the new compaction priority to the Java API. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10572 Reviewed By: hx235 Differential Revision: D39006523 Pulled By: ajkr fbshipit-source-id: ea8d665817e7b05826c397afa41c3abcda81484e	3 years ago
Brendan MacDonell	9f290a5d15	Update the javadoc for setforceConsistencyChecks (#10574 ) Summary: As of v6.14 (released in 2020), force_consistency_checks is enabled by default. However, the Java documentation does not seem to have been updated to reflect the change at the time. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10574 Reviewed By: hx235 Differential Revision: D39006566 Pulled By: ajkr fbshipit-source-id: c7b029484d62deaa1f260ec55084049fe39eb84a	3 years ago
Gang Liao	275cd80cdb	Add a blob-specific cache priority (#10461 ) Summary: RocksDB's `Cache` abstraction currently supports two priority levels for items: high (used for frequently accessed/highly valuable SST metablocks like index/filter blocks) and low (used for SST data blocks). Blobs are typically lower-value targets for caching than data blocks, since 1) with BlobDB, data blocks containing blob references conceptually form an index structure which has to be consulted before we can read the blob value, and 2) cached blobs represent only a single key-value, while cached data blocks generally contain multiple KVs. Since we would like to make it possible to use the same backing cache for the block cache and the blob cache, it would make sense to add a new, lower-than-low cache priority level (bottom level) for blobs so data blocks are prioritized over them. This task is a part of https://github.com/facebook/rocksdb/issues/10156 Pull Request resolved: https://github.com/facebook/rocksdb/pull/10461 Reviewed By: siying Differential Revision: D38672823 Pulled By: ltamasi fbshipit-source-id: 90cf7362036563d79891f47be2cc24b827482743	3 years ago
sdong	9277569ba3	Add some missing headers (#10519 ) Summary: Some files miss headers. Also some headers are irregular. Fix them to make an internal checkup tool happy. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10519 Reviewed By: jay-zhuang Differential Revision: D38603291 fbshipit-source-id: 13b1bbd6d48f5ee15ba20da67544396de48238f1	3 years ago
Peter Dillinger	65036e4217	Revert "Add a blob-specific cache priority (#10309 )" (#10434 ) Summary: This reverts commit `8d178090be` because of a clear performance regression seen in internal dashboard https://fburl.com/unidash/tpz75iee Pull Request resolved: https://github.com/facebook/rocksdb/pull/10434 Reviewed By: ltamasi Differential Revision: D38256373 Pulled By: pdillinger fbshipit-source-id: 134aa00f50dd7b1bbe037c227884a351342ec44b	3 years ago
Gang Liao	8d178090be	Add a blob-specific cache priority (#10309 ) Summary: RocksDB's `Cache` abstraction currently supports two priority levels for items: high (used for frequently accessed/highly valuable SST metablocks like index/filter blocks) and low (used for SST data blocks). Blobs are typically lower-value targets for caching than data blocks, since 1) with BlobDB, data blocks containing blob references conceptually form an index structure which has to be consulted before we can read the blob value, and 2) cached blobs represent only a single key-value, while cached data blocks generally contain multiple KVs. Since we would like to make it possible to use the same backing cache for the block cache and the blob cache, it would make sense to add a new, lower-than-low cache priority level (bottom level) for blobs so data blocks are prioritized over them. This task is a part of https://github.com/facebook/rocksdb/issues/10156 Pull Request resolved: https://github.com/facebook/rocksdb/pull/10309 Reviewed By: ltamasi Differential Revision: D38211655 Pulled By: gangliao fbshipit-source-id: 65ef33337db4d85277cc6f9782d67c421ad71dd5	3 years ago
Alan Paxton	8db8b98f98	Transaction.prepare should be public (#10412 ) Summary: The absence of a public modifier appears to be an omission. prepare() is necessary for the TM to participate as a peer in a distributed transaction. Also add basic “yes it does work in java” tests. Resolves https://github.com/facebook/rocksdb/issues/10283 Pull Request resolved: https://github.com/facebook/rocksdb/pull/10412 Reviewed By: ajkr Differential Revision: D38135513 Pulled By: riversand963 fbshipit-source-id: ff52b96bc7218bc3bf12845dee49f5d8edf0e297	3 years ago
Gang Liao	ec4ebeff30	Support prepopulating/warming the blob cache (#10298 ) Summary: Many workloads have temporal locality, where recently written items are read back in a short period of time. When using remote file systems, this is inefficient since it involves network traffic and higher latencies. Because of this, we would like to support prepopulating the blob cache during flush. This task is a part of https://github.com/facebook/rocksdb/issues/10156 Pull Request resolved: https://github.com/facebook/rocksdb/pull/10298 Reviewed By: ltamasi Differential Revision: D37908743 Pulled By: gangliao fbshipit-source-id: 9feaed234bc719d38f0c02975c1ad19fa4bb37d1	3 years ago
Guido Tagliavini Ponce	7e1b417824	Revert NewClockCache signature (#10358 ) Summary: This complements https://github.com/facebook/rocksdb/issues/10351. This PR reverts NewClockCache's signature to an older version, expected by the users of the old (buggy) ClockCache. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10358 Test Plan: ``make -j24 check`` and re-run the pre-release tests. Reviewed By: siying Differential Revision: D37832601 Pulled By: guidotag fbshipit-source-id: 32a91d3da4119be187935003b7b897272ceb1950	3 years ago

1 2 3 4 5 ...

914 Commits (d5fb4a8763e06e47cc0847aac612bb7205efe745)