Summary:
- Switched all instances of SetMinPossibleForUserKey and SetMaxPossibleForUserKey in accordance to InternalKeyComparator's comparison logic
Closes https://github.com/facebook/rocksdb/pull/2868
Differential Revision: D5804152
Pulled By: axxufb
fbshipit-source-id: 80be35e04f2e8abc35cc64abe1fecb03af24e183
Summary:
When we had a single thread pool for compactions, a thread could be busy for a long time (minutes) executing a compaction involving the bottom level. In multi-instance setups, the entire thread pool could be consumed by such bottom-level compactions. Then, top-level compactions (e.g., a few L0 files) would be blocked for a long time ("head-of-line blocking"). Such top-level compactions are critical to prevent compaction stalls as they can quickly reduce number of L0 files / sorted runs.
This diff introduces a bottom-priority queue for universal compactions including the bottom level. This alleviates the head-of-line blocking situation for fast, top-level compactions.
- Added `Env::Priority::BOTTOM` thread pool. This feature is only enabled if user explicitly configures it to have a positive number of threads.
- Changed `ThreadPoolImpl`'s default thread limit from one to zero. This change is invisible to users as we call `IncBackgroundThreadsIfNeeded` on the low-pri/high-pri pools during `DB::Open` with values of at least one. It is necessary, though, for bottom-pri to start with zero threads so the feature is disabled by default.
- Separated `ManualCompaction` into two parts in `PrepickedCompaction`. `PrepickedCompaction` is used for any compaction that's picked outside of its execution thread, either manual or automatic.
- Forward universal compactions involving last level to the bottom pool (worker thread's entry point is `BGWorkBottomCompaction`).
- Track `bg_bottom_compaction_scheduled_` so we can wait for bottom-level compactions to finish. We don't count them against the background jobs limits. So users of this feature will get an extra compaction for free.
Closes https://github.com/facebook/rocksdb/pull/2580
Differential Revision: D5422916
Pulled By: ajkr
fbshipit-source-id: a74bd11f1ea4933df3739b16808bb21fcd512333
Summary:
This reverts the previous commit 1d7048c598, which broke the build.
Did a `git revert 1d7048c`.
Closes https://github.com/facebook/rocksdb/pull/2627
Differential Revision: D5476473
Pulled By: sagar0
fbshipit-source-id: 4756ff5c0dfc88c17eceb00e02c36176de728d06
Summary: This uses `clang-tidy` to comment out unused parameters (in functions, methods and lambdas) in fbcode. Cases that the tool failed to handle are fixed manually.
Reviewed By: igorsugak
Differential Revision: D5454343
fbshipit-source-id: 5dee339b4334e25e963891b519a5aa81fbf627b2
Summary:
Some users want to prevent rocksdb from entering read-only mode in certain error cases. This diff gives them a callback, `OnBackgroundError`, that they can use to achieve it.
- call `OnBackgroundError` every time we consider setting `bg_error_`. Use its result to assign `bg_error_` but not to change the function's return status.
- classified calls using `BackgroundErrorReason` to give the callback some info about where the error happened
- renamed `ParanoidCheck` to something more specific so we can provide a clear `BackgroundErrorReason`
- unit tests for the most common cases: flush or compaction errors
Closes https://github.com/facebook/rocksdb/pull/2477
Differential Revision: D5300190
Pulled By: ajkr
fbshipit-source-id: a0ea4564249719b83428e3f4c6ca2c49e366e9b3
Summary:
DBImpl's instance variables should only be accessed with mutex held. I moved an assert later to uphold this rule.
DBTest.LastWriteBufferDelay test was sporadically failing TSAN because it tried to flush around the same time the db was destroyed, so the variable was accessed simultaneously by two threads.
Closes https://github.com/facebook/rocksdb/pull/2471
Differential Revision: D5286857
Pulled By: ajkr
fbshipit-source-id: 435abd84efa601f667c254e320b0bb5a434b971f
Summary:
- `max_background_flushes` and `max_background_compactions` are still supported for backwards compatibility
- `base_background_compactions` is completely deprecated. Now we just throttle to one background compaction when there's no pressure.
- `max_background_jobs` is added to automatically partition the concurrent background jobs into flushes vs compactions. Currently it's very simple as we just allocate one-fourth of the jobs to flushes, and the remaining can be used for compactions.
- The test cases that set `base_background_compactions > 1` needed to be updated. I just grab the pressure token such that the desired number of compactions can be scheduled.
Closes https://github.com/facebook/rocksdb/pull/2205
Differential Revision: D4937461
Pulled By: ajkr
fbshipit-source-id: df52cbbd497e13bbc9a60560a5ac2a2526b3f1f9
Summary:
Previously users could set `max_background_flushes=0` to force rocksdb to use a single thread pool for both background flushes and compactions. That'll no longer be possible since I'm going to deprecate `max_background_flushes` and `max_background_compactions` in favor of a single option. This diff introduces a new way to force a single thread pool: when high-pri pool has zero threads, all background jobs will be submitted to low-pri pool.
Note the majority of the code change is adding `Env::GetBackgroundThreads()`, which is necessary to check whether the user has provided a zero-sized thread pool.
Closes https://github.com/facebook/rocksdb/pull/2204
Differential Revision: D4936256
Pulled By: ajkr
fbshipit-source-id: 929a07a0c0705f7766f5339cd013ff74e90d6e01
Summary:
First cut for early review; there are few conceptual points to answer and some code structure issues.
For conceptual points -
- restriction-wise, we're going to disallow ingest_behind if (use_seqno_zero_out=true || disable_auto_compaction=false), the user is responsible to properly open and close DB with required params
- we wanted to ingest into reserved bottom most level. Should we fail fast if bottom level isn't empty, or should we attempt to ingest if file fits there key-ranges-wise?
- Modifying AssignLevelForIngestedFile seems the place we we'd handle that.
On code structure - going to refactor GenerateAndAddExternalFile call in the test class to allow passing instance of IngestionOptions, that's just going to incur lots of changes at callsites.
Closes https://github.com/facebook/rocksdb/pull/2144
Differential Revision: D4873732
Pulled By: lightmark
fbshipit-source-id: 81cb698106b68ef8797f564453651d50900e153a
Summary:
A data race between a manual and an auto compaction can cause a scheduled automatic compaction to be cancelled and never rescheduled again. This may cause a condition of hanging forever. Fix this by always making sure the cancelled compaction is put back to the compaction queue.
Closes https://github.com/facebook/rocksdb/pull/2238
Differential Revision: D4984591
Pulled By: siying
fbshipit-source-id: 3ab153886403c7b991896dcb2158b96cac12f227
Summary:
1. Move universal compaction picker to separate files compaction_picker_universal.cc and compaction_picker_universal.h.
2. Rename some functions to make the code easier to understand.
3. Move leveled compaction picking code to a dedicated class, so that we we don't need to pass some common variable around when calling functions. It also allowed us to break down LevelCompactionPicker::PickCompaction() to smaller functions.
Closes https://github.com/facebook/rocksdb/pull/2100
Differential Revision: D4845948
Pulled By: siying
fbshipit-source-id: efa0ab4
Summary:
Move some files under util/ to new directories env/, monitoring/ options/ and cache/
Closes https://github.com/facebook/rocksdb/pull/2090
Differential Revision: D4833681
Pulled By: siying
fbshipit-source-id: 2fd8bef
Summary:
db_impl.cc is too large to manage. Divide db_impl.cc into db/db_impl.cc, db/db_impl_compaction_flush.cc, db/db_impl_files.cc, db/db_impl_open.cc and db/db_impl_write.cc.
Closes https://github.com/facebook/rocksdb/pull/2095
Differential Revision: D4838188
Pulled By: siying
fbshipit-source-id: c5f3059