Some API clarification for manual compaction and listeners (#8330)

Summary:
Avoid people hitting bugs

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8330

Test Plan: comments only

Reviewed By: siying

Differential Revision: D28683157

Pulled By: pdillinger

fbshipit-source-id: 2b34d3efb5e2fa34bea93d54c940cbd425212d25
main
Peter Dillinger 4 years ago committed by Facebook GitHub Bot
parent a607b88240
commit 956ce9bde2
  1. 1
      HISTORY.md
  2. 19
      include/rocksdb/db.h
  3. 19
      include/rocksdb/listener.h

@ -12,6 +12,7 @@
### Behavior Changes ### Behavior Changes
* Due to the fix of false-postive alert of "SST file is ahead of WAL", all the CFs with no SST file (CF empty) will bypass the consistency check. We fixed a false-positive, but introduced a very rare true-negative which will be triggered in the following conditions: A CF with some delete operations in the last a few queries which will result in an empty CF (those are flushed to SST file and a compaction triggered which combines this file and all other SST files and generates an empty CF, or there is another reason to write a manifest entry for this CF after a flush that generates no SST file from an empty CF). The deletion entries are logged in a WAL and this WAL was corrupted, while the CF's log number points to the next WAL (due to the flush). Therefore, the DB can only recover to the point without these trailing deletions and cause the inconsistent DB status. * Due to the fix of false-postive alert of "SST file is ahead of WAL", all the CFs with no SST file (CF empty) will bypass the consistency check. We fixed a false-positive, but introduced a very rare true-negative which will be triggered in the following conditions: A CF with some delete operations in the last a few queries which will result in an empty CF (those are flushed to SST file and a compaction triggered which combines this file and all other SST files and generates an empty CF, or there is another reason to write a manifest entry for this CF after a flush that generates no SST file from an empty CF). The deletion entries are logged in a WAL and this WAL was corrupted, while the CF's log number points to the next WAL (due to the flush). Therefore, the DB can only recover to the point without these trailing deletions and cause the inconsistent DB status.
* Added API comments clarifying safe usage of Disable/EnableManualCompaction and EventListener callbacks for compaction.
### New Features ### New Features
* Add new option allow_stall passed during instance creation of WriteBufferManager. When allow_stall is set, WriteBufferManager will stall all writers shared across multiple DBs and columns if memory usage goes beyond specified WriteBufferManager::buffer_size (soft limit). Stall will be cleared when memory is freed after flush and memory usage goes down below buffer_size. * Add new option allow_stall passed during instance creation of WriteBufferManager. When allow_stall is set, WriteBufferManager will stall all writers shared across multiple DBs and columns if memory usage goes beyond specified WriteBufferManager::buffer_size (soft limit). Stall will be cleared when memory is freed after flush and memory usage goes down below buffer_size.

@ -1096,6 +1096,8 @@ class DB {
// and the data is rearranged to reduce the cost of operations // and the data is rearranged to reduce the cost of operations
// needed to access the data. This operation should typically only // needed to access the data. This operation should typically only
// be invoked by users who understand the underlying implementation. // be invoked by users who understand the underlying implementation.
// This call blocks until the operation completes successfully, fails,
// or is aborted (Status::Incomplete). See DisableManualCompaction.
// //
// begin==nullptr is treated as a key before all keys in the database. // begin==nullptr is treated as a key before all keys in the database.
// end==nullptr is treated as a key after all keys in the database. // end==nullptr is treated as a key after all keys in the database.
@ -1150,9 +1152,9 @@ class DB {
const std::unordered_map<std::string, std::string>& new_options) = 0; const std::unordered_map<std::string, std::string>& new_options) = 0;
// CompactFiles() inputs a list of files specified by file numbers and // CompactFiles() inputs a list of files specified by file numbers and
// compacts them to the specified level. Note that the behavior is different // compacts them to the specified level. A small difference compared to
// from CompactRange() in that CompactFiles() performs the compaction job // CompactRange() is that CompactFiles() performs the compaction job
// using the CURRENT thread. // using the CURRENT thread, so is not considered a "background" job.
// //
// @see GetDataBaseMetaData // @see GetDataBaseMetaData
// @see GetColumnFamilyMetaData // @see GetColumnFamilyMetaData
@ -1195,12 +1197,15 @@ class DB {
const std::vector<ColumnFamilyHandle*>& column_family_handles) = 0; const std::vector<ColumnFamilyHandle*>& column_family_handles) = 0;
// After this function call, CompactRange() or CompactFiles() will not // After this function call, CompactRange() or CompactFiles() will not
// run compactions and fail. The function will wait for all outstanding // run compactions and fail. Calling this function will tell outstanding
// manual compactions to finish before returning // manual compactions to abort and will wait for them to finish or abort
// before returning.
virtual void DisableManualCompaction() = 0; virtual void DisableManualCompaction() = 0;
// Re-enable CompactRange() and ComapctFiles() that are disabled by // Re-enable CompactRange() and ComapctFiles() that are disabled by
// DisableManualCompaction(). In debug mode, it might hit assertion if // DisableManualCompaction(). This function must be called as many times
// no DisableManualCompaction() was previously called. // as DisableManualCompaction() has been called in order to re-enable
// manual compactions, and must not be called more times than
// DisableManualCompaction() has been called.
virtual void EnableManualCompaction() = 0; virtual void EnableManualCompaction() = 0;
// Number of levels used for this DB. // Number of levels used for this DB.

@ -333,13 +333,18 @@ struct ExternalFileIngestionInfo {
// be used as a building block for developing custom features such as // be used as a building block for developing custom features such as
// stats-collector or external compaction algorithm. // stats-collector or external compaction algorithm.
// //
// Note that callback functions should not run for an extended period of // IMPORTANT
// time before the function returns, otherwise RocksDB may be blocked. // Because compaction is needed to resolve a "writes stopped" condition,
// For example, it is not suggested to do DB::CompactFiles() (as it may // calling or waiting for any blocking DB write function (no_slowdown=false)
// run for a long while) or issue many of DB::Put() (as Put may be blocked // from a compaction-related listener callback can hang RocksDB. For DB
// in certain cases) in the same thread in the EventListener callback. // writes from a callback we recommend a WriteBatch and no_slowdown=true,
// However, doing DB::CompactFiles() and DB::Put() in another thread is // because the WriteBatch can accumulate writes for later in case DB::Write
// considered safe. // returns Status::Incomplete. Similarly, calling CompactRange or similar
// could hang by waiting for a background worker that is occupied until the
// callback returns.
//
// Otherwise, callback functions should not run for an extended period of
// time before the function returns, because this will slow RocksDB.
// //
// [Threading] All EventListener callback will be called using the // [Threading] All EventListener callback will be called using the
// actual thread that involves in that specific event. For example, it // actual thread that involves in that specific event. For example, it

Loading…
Cancel
Save