From 6277e28039c9ab0de27eaa550f2c1979836770dc Mon Sep 17 00:00:00 2001 From: sdong Date: Thu, 30 Apr 2020 15:19:31 -0700 Subject: [PATCH] Flag CompressionOptions::parallel_threads to be experimental (#6781) Summary: The feature of CompressionOptions::parallel_threads is still not yet mature. Mention it to be experimental in the comments for now. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6781 Reviewed By: pdillinger Differential Revision: D21330678 fbshipit-source-id: d7dd7d099fb002a5c6a5d8da689ce5ee08a9eb13 --- HISTORY.md | 2 +- include/rocksdb/advanced_options.h | 11 +++++------ 2 files changed, 6 insertions(+), 7 deletions(-) diff --git a/HISTORY.md b/HISTORY.md index bbfbbb6c5..c9ea5355b 100644 --- a/HISTORY.md +++ b/HISTORY.md @@ -16,7 +16,7 @@ * Add IsDirectory to Env and FS to indicate if a path is a directory. ### New Features -* Added support for pipelined & parallel compression optimization for `BlockBasedTableBuilder`. This optimization makes block building, block compression and block appending a pipeline, and uses multiple threads to accelerate block compression. Users can set `CompressionOptions::parallel_threads` greater than 1 to enable compression parallelism. +* Added support for pipelined & parallel compression optimization for `BlockBasedTableBuilder`. This optimization makes block building, block compression and block appending a pipeline, and uses multiple threads to accelerate block compression. Users can set `CompressionOptions::parallel_threads` greater than 1 to enable compression parallelism. This feature is experimental for now. * Provide an allocator for memkind to be used with block cache. This is to work with memory technologies (Intel DCPMM is one such technology currently available) that require different libraries for allocation and management (such as PMDK and memkind). The high capacities available make it possible to provision large caches (up to several TBs in size) beyond what is achievable with DRAM. * Option `max_background_flushes` can be set dynamically using DB::SetDBOptions(). * Added functionality in sst_dump tool to check the compressed file size for different compression levels and print the time spent on compressing files with each compression type. Added arguments `--compression_level_from` and `--compression_level_to` to report size of all compression levels and one compression_type must be specified with it so that it will report compressed sizes of one compression type with different levels. diff --git a/include/rocksdb/advanced_options.h b/include/rocksdb/advanced_options.h index ac4d677fe..574e9390a 100644 --- a/include/rocksdb/advanced_options.h +++ b/include/rocksdb/advanced_options.h @@ -119,15 +119,14 @@ struct CompressionOptions { // Number of threads for parallel compression. // Parallel compression is enabled only if threads > 1. + // THE FEATURE IS STILL EXPERIMENTAL // // This option is valid only when BlockBasedTable is used. // - // When parallel compression is enabled, SST size estimation becomes less - // accurate, because block building and compression are pipelined, and there - // might be inflight blocks being compressed and not finally written, when - // current SST size is fetched. This brings inflation of final output file - // size. - // To be more accurate, this inflation is also estimated by using historical + // When parallel compression is enabled, SST size file sizes might be + // more inflated compared to the target size, because more data of unknown + // compressed size is in flight when compression is parallelized. To be + // reasonably accurate, this inflation is also estimated by using historical // compression ratio and current bytes inflight. // // Default: 1.