Summary:
**Context/Summary**
As titled. The benefit of doing so is to explicitly call readahead() instead of relying page cache behavior for compaction read when we know that we most likely need readahead as compaction read is sequential read .
**Test**
Extended the existing UT to cover compaction read case
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11631
Reviewed By: ajkr
Differential Revision: D47681437
Pulled By: hx235
fbshipit-source-id: 78792f64985c4dc44aa8f2a9c41ab3e8bbc0bc90
Summary:
both options `ttl` and `periodic_compaction_seconds` have the same meaning for FIFO compaction, which is redundant and can be confusing to use. For example, setting TTL to 0 does not disable TTL: user needs to also set periodic_compaction_seconds to 0. Another example is that dynamically setting `periodic_compaction_seconds` (surprisingly) has no effect on TTL compaction. This is because FIFO compaction picker internally only looks at value of `ttl`. The value of `ttl` is in `SanitizeOptions()` which take into account the value of `periodic_compaction_seconds`, but dynamically setting an option does not invoke this method.
This PR clarifies the usage of both options for FIFO compaction: only `ttl` should be used, `periodic_compaction_seconds` will not have any effect on FIFO compaction.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11550
Test Plan:
- updated existing unit test `DBOptionsTest.SanitizeFIFOPeriodicCompaction`
- checked existing values of both options in feature matrix: https://fburl.com/daiquery/xxd0gs9w. All current uses cases either have `periodic_compaction_seconds = 0` or have `periodic_compaction_seconds > ttl`, so should not cause change of behavior.
Reviewed By: ajkr
Differential Revision: D46902959
Pulled By: cbi42
fbshipit-source-id: a9ede235b276783b4906aaec443551fa62ceff4c
Summary:
after https://github.com/facebook/rocksdb/issues/11321 and https://github.com/facebook/rocksdb/issues/11340 (both included in RocksDB v8.2), migration from `level_compaction_dynamic_level_bytes=false` to `level_compaction_dynamic_level_bytes=true` is automatic by RocksDB and requires no manual compaction from user. Making the option true by default as it has several advantages: 1. better space amplification guarantee (a more stable LSM shape). 2. compaction is more adaptive to write traffic. 3. automatic draining of unneeded levels. Wiki is updated with more detail: https://github.com/facebook/rocksdb/wiki/Leveled-Compaction#option-level_compaction_dynamic_level_bytes-and-levels-target-size.
The PR mostly contains fixes for unit tests as they assumed `level_compaction_dynamic_level_bytes=false`. Most notable change is commit f742be330c and b1928e42b3 which override the default option in DBTestBase to still set `level_compaction_dynamic_level_bytes=false` by default. This helps to reduce the change needed for unit tests. I think this default option override in unit tests is okay since the behavior of `level_compaction_dynamic_level_bytes=true` is tested by explicitly setting this option. Also, `level_compaction_dynamic_level_bytes=false` may be more desired in unit tests as it makes it easier to create a desired LSM shape.
Comment for option `level_compaction_dynamic_level_bytes` is updated to reflect this change and change made in https://github.com/facebook/rocksdb/issues/10057.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11525
Test Plan: `make -j32 J=32 check` several times to try to catch flaky tests due to this option change.
Reviewed By: ajkr
Differential Revision: D46654256
Pulled By: cbi42
fbshipit-source-id: 6b5827dae124f6f1fdc8cca2ac6f6fcd878830e1
Summary:
when a DB is configured with `allow_ingest_behind = true`, the last level should be reserved for ingested files and these files should not be included in any compaction. Currently, a major compaction can compact these files to smaller levels. This can cause future files to be rejected for ingest behind (see `ExternalSstFileIngestionJob::CheckLevelForIngestedBehindFile()`). This PR fixes the issue such that files in the last level is not included in any compaction.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11489
Test Plan: * Updated unit test `ExternalSSTFileTest.IngestBehind` to test that last level is not included in manual and auto-compaction.
Reviewed By: ajkr
Differential Revision: D46455711
Pulled By: cbi42
fbshipit-source-id: 5e2142c2a709ef932ad797897795021c06c4ac8c
Summary:
Similar to point tombstones, we can drop a range tombstone during compaction when we know its range does not exist in any higher level. This PR adds this optimization. Some existing test in db_range_del_test is fixed to work under this optimization.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11459
Test Plan:
* Add unit test `DBRangeDelTest, NonBottommostCompactionDropRangetombstone`.
* Ran crash test that issues range deletion for a few hours: `python3 tools/db_crashtest.py blackbox --simple --write_buffer_size=1048576 --delrangepercent=10 --writepercent=31 --readpercent=40`
Reviewed By: ajkr
Differential Revision: D46007904
Pulled By: cbi42
fbshipit-source-id: 3f37205b6778b7d55ed106369ca41b0632a6d0fd
Summary:
currently for leveled compaction, the max output level of a call to `CompactRange()` is pre-computed before compacting each level. This max output level is the max level whose key range overlaps with the manual compaction key range. However, during manual compaction, files in the max output level may be compacted down further by some background compaction. When this background compaction is a trivial move, there is a race condition and the manual compaction may not be able to compact all keys in the specified key range. This PR updates `CompactRange()` to always compact to the bottommost level to make this race condition more unlikely (it can still happen, see more in comment here: 796f58f42a/db/db_impl/db_impl_compaction_flush.cc (L1180C29-L1184)).
This PR also changes the behavior of CompactRange() when `bottommost_level_compaction=kIfHaveCompactionFilter` (the default option). The old behavior is that, if a compaction filter is provided, CompactRange() always does an intra-level compaction at the final output level for all files in the manual compaction key range. The only exception when `first_overlapped_level = 0` and `max_overlapped_level = 0`. It’s awkward to maintain the same behavior after this PR since we do not compute max_overlapped_level anymore. So the new behavior is similar to kForceOptimized: always does intra-level compaction at the bottommost level, but not including new files generated during this manual compaction.
Several unit tests are updated to work with this new manual compaction behavior.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11468
Test Plan: Add new unit tests `DBCompactionTest.ManualCompactionCompactAllKeysInRange*`
Reviewed By: ajkr
Differential Revision: D46079619
Pulled By: cbi42
fbshipit-source-id: 19d844ba4ec8dc1a0b8af5d2f36ff15820c6e76f
Summary:
See new NOTE in HISTORY.md and unreleased_history/README.txt
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11481
Test Plan: some manual testing on my CentOS 8 system
Reviewed By: jaykorean
Differential Revision: D46233342
Pulled By: pdillinger
fbshipit-source-id: daf59cf3dc907f450b469090dcc481a30a7d7c0d