diff --git a/doc/index.html b/doc/index.html
index e9a41922b..088a73722 100644
--- a/doc/index.html
+++ b/doc/index.html
@@ -387,7 +387,8 @@ of point reads of small values may wish to switch to a smaller block
size if performance measurements indicate an improvement. There isn't
much benefit in using blocks smaller than one kilobyte, or larger than
a few megabytes. Also note that compression will be more effective
-with larger block sizes.
+with larger block sizes. To change block size parameter, use
+Options::block_size
.
@@ -434,7 +435,7 @@ filesystem and each file stores a sequence of compressed blocks. If
used uncompressed block contents. If options.block_cache_compressed
is non-NULL, it is used to cache frequently used compressed blocks. Compressed
cache is an alternative to OS cache, which also caches compressed blocks. If
-compressed cache is used, you should disable OS cache by setting
+compressed cache is used, the OS cache will be disabled automatically by setting
options.allow_os_buffer
to false.
@@ -588,7 +589,7 @@ Here we give overview of the options that impact behavior of Compactions:
Options::compaction_style
- RocksDB currently supports two
-compaction algorithms - Compaction style and Level style. This option switches
+compaction algorithms - Universal style and Level style. This option switches
between the two. Can be kCompactionStyleUniversal or kCompactionStyleLevel.
If this is kCompactionStyleUniversal, then you can configure universal style
parameters with Options::compaction_options_universal
.
@@ -608,16 +609,126 @@ key-value during background compaction.
Other options impacting performance of compactions and when they get triggered
-are: access_hint_on_compaction_start
,
-level0_file_num_compaction_trigger
,
-max_mem_compaction_level
, target_file_size_base
,
-target_file_size_multiplier
,
-expanded_compaction_factor
, source_compaction_factor
,
-max_grandparent_overlap_factor
,
-disable_seek_compaction
, max_background_compactions
.
+are:
+
+
Options::access_hint_on_compaction_start
- Specify the file access
+pattern once a compaction is started. It will be applied to all input files of a compaction. Default: NORMAL
++
Options::level0_file_num_compaction_trigger
- Number of files to trigger level-0 compaction.
+A negative value means that level-0 compaction will not be triggered by number of files at all.
++
Options::max_mem_compaction_level
- Maximum level to which a new compacted memtable is pushed if it
+does not create overlap. We try to push to level 2 to avoid the relatively expensive level 0=>1 compactions and to avoid some
+expensive manifest file operations. We do not push all the way to the largest level since that can generate a lot of wasted disk
+space if the same key space is being repeatedly overwritten.
++
Options::target_file_size_base
and Options::target_file_size_multiplier
-
+Target file size for compaction. target_file_size_base is per-file size for level-1.
+Target file size for level L can be calculated by target_file_size_base * (target_file_size_multiplier ^ (L-1))
+For example, if target_file_size_base is 2MB and target_file_size_multiplier is 10, then each file on level-1 will
+be 2MB, and each file on level 2 will be 20MB, and each file on level-3 will be 200MB. Default target_file_size_base is 2MB
+and default target_file_size_multiplier is 1.
++
Options::expanded_compaction_factor
- Maximum number of bytes in all compacted files. We avoid expanding
+the lower level file set of a compaction if it would make the total compaction cover more than
+(expanded_compaction_factor * targetFileSizeLevel()) many bytes.
++
Options::source_compaction_factor
- Maximum number of bytes in all source files to be compacted in a
+single compaction run. We avoid picking too many files in the source level so that we do not exceed the total source bytes
+for compaction to exceed (source_compaction_factor * targetFileSizeLevel()) many bytes.
+Default:1, i.e. pick maxfilesize amount of data as the source of a compaction.
++
Options::max_grandparent_overlap_factor
- Control maximum bytes of overlaps in grandparent (i.e., level+2) before we
+stop building a single file in a level->level+1 compaction.
++
Options::disable_seek_compaction
- Disable compaction triggered by seek.
+With bloomfilter and fast storage, a miss on one level is very cheap if the file handle is cached in table cache
+(which is true if max_open_files is large).
++
Options::max_background_compactions
- Maximum number of concurrent background jobs, submitted to
+the default LOW priority thread pool
+
You can learn more about all of those options in rocksdb/options.h
+
+If you're using Universal style compaction, there is an object CompactionOptionsUniversal
+that hold all the different options for that compaction. The exact definition is in
+rocksdb/universal_compaction.h
and you can set it in Options::compaction_options_universal
.
+Here we give short overview of options in CompactionOptionsUniversal
:
+
+
CompactionOptionsUniversal::size_ratio
- Percentage flexibilty while comparing file size. If the candidate file(s)
+ size is 1% smaller than the next file's size, then include next file into
+ this candidate set. Default: 1
++
CompactionOptionsUniversal::min_merge_width
- The minimum number of files in a single compaction run. Default: 2
++
CompactionOptionsUniversal::max_merge_width
- The maximum number of files in a single compaction run. Default: UINT_MAX
++
CompactionOptionsUniversal::max_size_amplification_percent
- The size amplification is defined as the amount (in percentage) of
+additional storage needed to store a single byte of data in the database. For example, a size amplification of 2% means that a database that
+contains 100 bytes of user-data may occupy upto 102 bytes of physical storage. By this definition, a fully compacted database has
+a size amplification of 0%. Rocksdb uses the following heuristic to calculate size amplification: it assumes that all files excluding
+the earliest file contribute to the size amplification. Default: 200, which means that a 100 byte database could require upto
+300 bytes of storage.
++
CompactionOptionsUniversal::compression_size_percent
- If this option is set to be -1 (the default value), all the output files
+will follow compression type specified. If this option is not negative, we will try to make sure compressed
+size is just above this value. In normal cases, at least this percentage
+of data will be compressed.
+When we are compacting to a new file, here is the criteria whether
+it needs to be compressed: assuming here are the list of files sorted
+by generation time: [ A1...An B1...Bm C1...Ct ],
+where A1 is the newest and Ct is the oldest, and we are going to compact
+B1...Bm, we calculate the total size of all the files as total_size, as
+well as the total size of C1...Ct as total_C, the compaction output file
+will be compressed iff total_C / total_size < this percentage
++
CompactionOptionsUniversal::stop_style
- The algorithm used to stop picking files into a single compaction run.
+Can be kCompactionStopStyleSimilarSize (pick files of similar size) or kCompactionStopStyleTotalSize (total size of picked files > next file).
+Default: kCompactionStopStyleTotalSize
+
+A thread pool is associated with Env environment object. The client has to create a thread pool by setting the number of background
+threads using method Env::SetBackgroundThreads()
defined in rocksdb/env.h
.
+We use the thread pool for compactions and memtable flushes.
+Since memtable flushes are in critical code path (stalling memtable flush can stall writes, increasing p99), we suggest
+having two thread pools - with priorities HIGH and LOW. Memtable flushes can be set up to be scheduled on HIGH thread pool.
+There are two options available for configuration of background compactions and flushes:
+
+
Options::max_background_compactions
- Maximum number of concurrent background jobs,
+submitted to the default LOW priority thread pool
++
Options::max_background_flushes
- Maximum number of concurrent background memtable flush jobs, submitted to
+the HIGH priority thread pool. By default, all background jobs (major compaction and memtable flush) go
+to the LOW priority pool. If this option is set to a positive number, memtable flush jobs will be submitted to the HIGH priority pool.
+It is important when the same Env is shared by multiple db instances. Without a separate pool, long running major compaction jobs could
+potentially block memtable flush jobs of other db instances, leading to unnecessary Put stalls.
++
+ #include "rocksdb/env.h" + #include "rocksdb/db.h" + + auto env = rocksdb::Env::Default(); + env->SetBackgroundThreads(2, rocksdb::Env::LOW); + env->SetBackgroundThreads(1, rocksdb::Env::HIGH); + rocksdb::DB* db; + rocksdb::Options options; + options.env = env; + options.max_background_compactions = 2; + options.max_background_flushes = 1; + rocksdb::Status status = rocksdb::DB::Open(options, "/tmp/testdb", &db); + assert(status.ok()); + ... +
The GetApproximateSizes
method can used to get the approximate