rocksdb

Commit Graph

Author	SHA1	Message	Date
Changyu Bi	e95cc1217d	`CompactRange()` always compacts to bottommost level for leveled compaction (#11468 ) Summary: currently for leveled compaction, the max output level of a call to `CompactRange()` is pre-computed before compacting each level. This max output level is the max level whose key range overlaps with the manual compaction key range. However, during manual compaction, files in the max output level may be compacted down further by some background compaction. When this background compaction is a trivial move, there is a race condition and the manual compaction may not be able to compact all keys in the specified key range. This PR updates `CompactRange()` to always compact to the bottommost level to make this race condition more unlikely (it can still happen, see more in comment here: `796f58f42a/db/db_impl/db_impl_compaction_flush.cc (L1180C29-L1184)`). This PR also changes the behavior of CompactRange() when `bottommost_level_compaction=kIfHaveCompactionFilter` (the default option). The old behavior is that, if a compaction filter is provided, CompactRange() always does an intra-level compaction at the final output level for all files in the manual compaction key range. The only exception when `first_overlapped_level = 0` and `max_overlapped_level = 0`. It’s awkward to maintain the same behavior after this PR since we do not compute max_overlapped_level anymore. So the new behavior is similar to kForceOptimized: always does intra-level compaction at the bottommost level, but not including new files generated during this manual compaction. Several unit tests are updated to work with this new manual compaction behavior. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11468 Test Plan: Add new unit tests `DBCompactionTest.ManualCompactionCompactAllKeysInRange*` Reviewed By: ajkr Differential Revision: D46079619 Pulled By: cbi42 fbshipit-source-id: 19d844ba4ec8dc1a0b8af5d2f36ff15820c6e76f	1 year ago
Jay Huh	81aeb15988	Add WaitForCompact with WaitForCompactOptions to public API (#11436 ) Summary: Context: This is the first PR for WaitForCompact() Implementation with WaitForCompactOptions. In this PR, we are introducing `Status WaitForCompact(const WaitForCompactOptions& wait_for_compact_options)` in the public API. This currently utilizes the existing internal `WaitForCompact()` implementation (with default abort_on_pause = false). `abort_on_pause` has been moved to `WaitForCompactOptions&`. In the later PRs, we will introduce the following two options in `WaitForCompactOptions` 1. `bool flush = false` by default - If true, flush before waiting for compactions to finish. Must be set to true to ensure no immediate compactions (except perhaps periodic compactions) after closing and re-opening the DB. 2. `bool close_db = false` by default - If true, will also close the DB upon compactions finishing. 1. struct `WaitForCompactOptions` added to options.h and `abort_on_pause` in the internal API moved to the option struct. 2. `Status WaitForCompact(const WaitForCompactOptions& wait_for_compact_options)` introduced in `db.h` 3. Changed the internal WaitForCompact() to `WaitForCompact(const WaitForCompactOptions& wait_for_compact_options)` and checks for the `abort_on_pause` inside the option. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11436 Test Plan: Following tests added - `DBCompactionTest::WaitForCompactWaitsOnCompactionToFinish` - `DBCompactionTest::WaitForCompactAbortOnPauseAborted` - `DBCompactionTest::WaitForCompactContinueAfterPauseNotAborted` - `DBCompactionTest::WaitForCompactShutdownWhileWaiting` - `TransactionTest::WaitForCompactAbortOnPause` NOTE: `TransactionTest::WaitForCompactAbortOnPause` was added to use `StackableDB` to ensure the wrapper function is in place. Reviewed By: pdillinger Differential Revision: D45799659 Pulled By: jaykorean fbshipit-source-id: b5b58f95957f2ab47d1221dee32a61d6cdc4685b	1 year ago
Jay Huh	586d78b31e	Remove wait_unscheduled from waitForCompact internal API (#11443 ) Summary: Context: In pull request https://github.com/facebook/rocksdb/issues/11436, we are introducing a new public API `waitForCompact(const WaitForCompactOptions& wait_for_compact_options)`. This API invokes the internal implementation `waitForCompact(bool wait_unscheduled=false)`. The unscheduled parameter indicates the compactions that are not yet scheduled but are required to process items in the queue. In certain cases, we are unable to wait for compactions, such as during a shutdown or when background jobs are paused. It is important to return the appropriate status in these scenarios. For all other cases, we should wait for all compaction and flush jobs, including the unscheduled ones. The primary purpose of this new API is to wait until the system has resolved its compaction debt. Currently, the usage of `wait_unscheduled` is limited to test code. This pull request eliminates the usage of wait_unscheduled. The internal `waitForCompact()` API now waits for unscheduled compactions unless the db is undergoing a shutdown. In the event of a shutdown, the API returns `Status::ShutdownInProgress()`. Additionally, a new parameter, `abort_on_pause`, has been introduced with a default value of `false`. This parameter addresses the possibility of waiting indefinitely for unscheduled jobs if `PauseBackgroundWork()` was called before `waitForCompact()` is invoked. By setting `abort_on_pause` to `true`, the API will immediately return `Status::Aborted`. Furthermore, all tests that previously called `waitForCompact(true)` have been fixed. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11443 Test Plan: Existing tests that involve a shutdown in progress: - DBCompactionTest::CompactRangeShutdownWhileDelayed - DBTestWithParam::PreShutdownMultipleCompaction - DBTestWithParam::PreShutdownCompactionMiddle Reviewed By: pdillinger Differential Revision: D45923426 Pulled By: jaykorean fbshipit-source-id: 7dc93fe6a6841a7d9d2d72866fa647090dba8eae	2 years ago
sdong	4720ba4391	Remove RocksDB LITE (#11147 ) Summary: We haven't been actively mantaining RocksDB LITE recently and the size must have been gone up significantly. We are removing the support. Most of changes were done through following comments: unifdef -m -UROCKSDB_LITE `git grep -l ROCKSDB_LITE \| egrep '[.](cc\|h)'` by Peter Dillinger. Others changes were manually applied to build scripts, CircleCI manifests, ROCKSDB_LITE is used in an expression and file db_stress_test_base.cc. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11147 Test Plan: See CI Reviewed By: pdillinger Differential Revision: D42796341 fbshipit-source-id: 4920e15fc2060c2cd2221330a6d0e5e65d4b7fe2	2 years ago
Jay Zhuang	1078d860a9	Add an unittest for Periodic compaction conflict with ongoing compaction (#10908 ) Summary: Add a tiered storage migration test which would conflict with an ongoing penultimate level compaction. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10908 Test Plan: Test only change Reviewed By: anand1976 Differential Revision: D40864509 Pulled By: ajkr fbshipit-source-id: e316e849a01a6c71a41be130101f909b6c0498cb	2 years ago
Jay Zhuang	c401f285c3	Add option `preserve_internal_time_seconds` to preserve the time info (#10747 ) Summary: Add option `preserve_internal_time_seconds` to preserve the internal time information. It's mostly for the migration of the existing data to tiered storage ( `preclude_last_level_data_seconds`). When the tiering feature is just enabled, the existing data won't have the time information to decide if it's hot or cold. Enabling this feature will start collect and preserve the time information for the new data. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10747 Reviewed By: siying Differential Revision: D39910141 Pulled By: siying fbshipit-source-id: 25c21638e37b1a7c44006f636b7d714fe7242138	2 years ago
Jay Zhuang	d9e71fb2c5	Fix periodic_task unable to re-register the same task type (#10379 ) Summary: Timer has a limitation that it cannot re-register a task with the same name, because the cancel only mark the task as invalid and wait for the Timer thread to clean it up later, before the task is cleaned up, the same task name cannot be added. Which makes the task option update likely to fail, which basically cancel and re-register the same task name. Change the periodic task name to a random unique id and store it in periodic_task_scheduler. Also refactor the `periodic_work` to `periodic_task` to make each job function as a `task`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10379 Test Plan: unittests Reviewed By: ajkr Differential Revision: D38000615 Pulled By: jay-zhuang fbshipit-source-id: e4135f9422e3b53aaec8eda54f4e18ce633a279e	2 years ago
Jay Zhuang	3f763763aa	Change `bottommost_temperture` to `last_level_temperture` (#10471 ) Summary: Change tiered compaction feature from `bottommost_temperture` to `last_level_temperture`. The old option is kept for migration purpose only, which is behaving the same as `last_level_temperture` and it will be removed in the next release. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10471 Test Plan: CI Reviewed By: siying Differential Revision: D38450621 Pulled By: jay-zhuang fbshipit-source-id: cc1cdf8bad409376fec0152abc0a64fb72a91527	2 years ago
Jay Zhuang	18a61a1734	Fix seqno->time worker not scheduled with multi DB instances (#10383 ) Summary: `PeriodicWorkScheduler` is a global singleton, which were used to store per-instance setting `record_seqno_time_cadence_`. Move that to db_impl.h which is per-instance. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10383 Reviewed By: siying Differential Revision: D37928009 Pulled By: jay-zhuang fbshipit-source-id: e517754f4a9db98798ac04f72033d4b517f734e9	2 years ago
Jay Zhuang	faa0f9723c	Tiered compaction: integrate Seqno time mapping with per key placement (#10370 ) Summary: Using the Sequence number to time mapping to decide if a key is hot or not in compaction and place it in the corresponding level. Note: the feature is not complete, level compaction will run indefinitely until all penultimate level data is cold and small enough to not trigger compaction. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10370 Test Plan: CI * Run basic db_bench for universal compaction manually Reviewed By: siying Differential Revision: D37892338 Pulled By: jay-zhuang fbshipit-source-id: 792bbd91b1ccc2f62b5d14c53118434bcaac4bbe	2 years ago
Jay Zhuang	a3acf2ef87	Add seqno to time mapping (#10338 ) Summary: Which will be used for tiered storage to preclude hot data from compacting to the cold tier (the last level). Internally, adding seqno to time mapping. A periodic_task is scheduled to record the current_seqno -> current_time in certain cadence. When memtable flush, the mapping informaiton is stored in sstable property. During compaction, the mapping information are merged and get the approximate time of sequence number, which is used to determine if a key is recently inserted or not and preclude it from the last level if it's recently inserted (within the `preclude_last_level_data_seconds`). Pull Request resolved: https://github.com/facebook/rocksdb/pull/10338 Test Plan: CI Reviewed By: siying Differential Revision: D37810187 Pulled By: jay-zhuang fbshipit-source-id: 6953be7a18a99de8b1cb3b162d712f79c2b4899f	2 years ago

11 Commits (b14c0b0602341c15b161bd1163857b0724b7bf89)