Reduce scope of compression dictionary to single SST (#4952)

Summary:
Our previous approach was to train one compression dictionary per compaction, using the first output SST to train a dictionary, and then applying it on subsequent SSTs in the same compaction. While this was great for minimizing CPU/memory/I/O overhead, it did not achieve good compression ratios in practice. In our most promising potential use case, moderate reductions in a dictionary's scope make a major difference on compression ratio.

So, this PR changes compression dictionary to be scoped per-SST. It accepts the tradeoff during table building to use more memory and CPU. Important changes include:

- The `BlockBasedTableBuilder` has a new state when dictionary compression is in-use: `kBuffered`. In that state it accumulates uncompressed data in-memory whenever `Add` is called.
- After accumulating target file size bytes or calling `BlockBasedTableBuilder::Finish`, a `BlockBasedTableBuilder` moves to the `kUnbuffered` state. The transition (`EnterUnbuffered()`) involves sampling the buffered data, training a dictionary, and compressing/writing out all buffered data. In the `kUnbuffered` state, a `BlockBasedTableBuilder` behaves the same as before -- blocks are compressed/written out as soon as they fill up.
- Samples are now whole uncompressed data blocks, except the final sample may be a partial data block so we don't breach the user's configured `max_dict_bytes` or `zstd_max_train_bytes`. The dictionary trainer is supposed to work better when we pass it real units of compression. Previously we were passing 64-byte KV samples which was not realistic.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4952

Differential Revision: D13967980

Pulled By: ajkr

fbshipit-source-id: 82bea6f7537e1529c7a1a4cdee84585f5949300f
main
Andrew Kryczka 6 years ago committed by Facebook Github Bot
parent 79496d71ed
commit 62f70f6d14
  1. 1
      HISTORY.md
  2. 14
      db/builder.cc
  3. 6
      db/builder.h
  4. 113
      db/compaction_job.cc
  5. 2
      db/db_block_cache_test.cc
  6. 91
      db/db_test2.cc
  7. 241
      table/block_based_table_builder.cc
  8. 14
      table/block_based_table_builder.h
  9. 5
      table/block_based_table_factory.cc
  10. 1
      table/data_block_hash_index_test.cc
  11. 3
      table/sst_file_writer.cc
  12. 15
      table/table_builder.h
  13. 3
      table/table_reader_bench.cc
  14. 30
      table/table_test.cc
  15. 3
      tools/sst_dump_test.cc
  16. 8
      tools/sst_dump_tool.cc
  17. 6
      util/compression.h

@ -10,6 +10,7 @@
* Add a place holder in manifest which indicate a record from future that can be safely ignored. * Add a place holder in manifest which indicate a record from future that can be safely ignored.
* Add support for trace sampling. * Add support for trace sampling.
* Enable properties block checksum verification for block-based tables. * Enable properties block checksum verification for block-based tables.
* For all users of dictionary compression, we now generate a separate dictionary for compressing each bottom-level SST file. Previously we reused a single dictionary for a whole compaction to bottom level. The new approach achieves better compression ratios; however, it uses more memory and CPU for buffering/sampling data blocks and training dictionaries.
### Public API Change ### Public API Change
* Disallow CompactionFilter::IgnoreSnapshots() = false, because it is not very useful and the behavior is confusing. The filter will filter everything if there is no snapshot declared by the time the compaction starts. However, users can define a snapshot after the compaction starts and before it finishes and this new snapshot won't be repeatable, because after the compaction finishes, some keys may be dropped. * Disallow CompactionFilter::IgnoreSnapshots() = false, because it is not very useful and the behavior is confusing. The filter will filter everything if there is no snapshot declared by the time the compaction starts. However, users can define a snapshot after the compaction starts and before it finishes and this new snapshot won't be repeatable, because after the compaction finishes, some keys may be dropped.

@ -48,17 +48,18 @@ TableBuilder* NewTableBuilder(
uint32_t column_family_id, const std::string& column_family_name, uint32_t column_family_id, const std::string& column_family_name,
WritableFileWriter* file, const CompressionType compression_type, WritableFileWriter* file, const CompressionType compression_type,
const CompressionOptions& compression_opts, int level, const CompressionOptions& compression_opts, int level,
const std::string* compression_dict, const bool skip_filters, const bool skip_filters, const uint64_t creation_time,
const uint64_t creation_time, const uint64_t oldest_key_time) { const uint64_t oldest_key_time, const bool is_bottommost_level,
const uint64_t target_file_size) {
assert((column_family_id == assert((column_family_id ==
TablePropertiesCollectorFactory::Context::kUnknownColumnFamily) == TablePropertiesCollectorFactory::Context::kUnknownColumnFamily) ==
column_family_name.empty()); column_family_name.empty());
return ioptions.table_factory->NewTableBuilder( return ioptions.table_factory->NewTableBuilder(
TableBuilderOptions(ioptions, moptions, internal_comparator, TableBuilderOptions(ioptions, moptions, internal_comparator,
int_tbl_prop_collector_factories, compression_type, int_tbl_prop_collector_factories, compression_type,
compression_opts, compression_dict, skip_filters, compression_opts, skip_filters, column_family_name,
column_family_name, level, creation_time, level, creation_time, oldest_key_time,
oldest_key_time), is_bottommost_level, target_file_size),
column_family_id, file); column_family_id, file);
} }
@ -128,8 +129,7 @@ Status BuildTable(
ioptions, mutable_cf_options, internal_comparator, ioptions, mutable_cf_options, internal_comparator,
int_tbl_prop_collector_factories, column_family_id, int_tbl_prop_collector_factories, column_family_id,
column_family_name, file_writer.get(), compression, compression_opts, column_family_name, file_writer.get(), compression, compression_opts,
level, nullptr /* compression_dict */, false /* skip_filters */, level, false /* skip_filters */, creation_time, oldest_key_time);
creation_time, oldest_key_time);
} }
MergeHelper merge(env, internal_comparator.user_comparator(), MergeHelper merge(env, internal_comparator.user_comparator(),

@ -40,8 +40,6 @@ class InternalStats;
// @param column_family_name Name of the column family that is also identified // @param column_family_name Name of the column family that is also identified
// by column_family_id, or empty string if unknown. It must outlive the // by column_family_id, or empty string if unknown. It must outlive the
// TableBuilder returned by this function. // TableBuilder returned by this function.
// @param compression_dict Data for presetting the compression library's
// dictionary, or nullptr.
TableBuilder* NewTableBuilder( TableBuilder* NewTableBuilder(
const ImmutableCFOptions& options, const MutableCFOptions& moptions, const ImmutableCFOptions& options, const MutableCFOptions& moptions,
const InternalKeyComparator& internal_comparator, const InternalKeyComparator& internal_comparator,
@ -50,9 +48,9 @@ TableBuilder* NewTableBuilder(
uint32_t column_family_id, const std::string& column_family_name, uint32_t column_family_id, const std::string& column_family_name,
WritableFileWriter* file, const CompressionType compression_type, WritableFileWriter* file, const CompressionType compression_type,
const CompressionOptions& compression_opts, int level, const CompressionOptions& compression_opts, int level,
const std::string* compression_dict = nullptr,
const bool skip_filters = false, const uint64_t creation_time = 0, const bool skip_filters = false, const uint64_t creation_time = 0,
const uint64_t oldest_key_time = 0); const uint64_t oldest_key_time = 0, const bool is_bottommost_level = false,
const uint64_t target_file_size = 0);
// Build a Table file from the contents of *iter. The generated file // Build a Table file from the contents of *iter. The generated file
// will be named according to number specified in meta. On success, the rest of // will be named according to number specified in meta. On success, the rest of

@ -157,7 +157,6 @@ struct CompactionJob::SubcompactionState {
uint64_t overlapped_bytes = 0; uint64_t overlapped_bytes = 0;
// A flag determine whether the key has been seen in ShouldStopBefore() // A flag determine whether the key has been seen in ShouldStopBefore()
bool seen_key = false; bool seen_key = false;
std::string compression_dict;
SubcompactionState(Compaction* c, Slice* _start, Slice* _end, SubcompactionState(Compaction* c, Slice* _start, Slice* _end,
uint64_t size = 0) uint64_t size = 0)
@ -173,8 +172,7 @@ struct CompactionJob::SubcompactionState {
approx_size(size), approx_size(size),
grandparent_index(0), grandparent_index(0),
overlapped_bytes(0), overlapped_bytes(0),
seen_key(false), seen_key(false) {
compression_dict() {
assert(compaction != nullptr); assert(compaction != nullptr);
} }
@ -197,7 +195,6 @@ struct CompactionJob::SubcompactionState {
grandparent_index = std::move(o.grandparent_index); grandparent_index = std::move(o.grandparent_index);
overlapped_bytes = std::move(o.overlapped_bytes); overlapped_bytes = std::move(o.overlapped_bytes);
seen_key = std::move(o.seen_key); seen_key = std::move(o.seen_key);
compression_dict = std::move(o.compression_dict);
return *this; return *this;
} }
@ -865,42 +862,6 @@ void CompactionJob::ProcessKeyValueCompaction(SubcompactionState* sub_compact) {
prev_cpu_read_nanos = IOSTATS(cpu_read_nanos); prev_cpu_read_nanos = IOSTATS(cpu_read_nanos);
} }
const MutableCFOptions* mutable_cf_options =
sub_compact->compaction->mutable_cf_options();
// To build compression dictionary, we sample the first output file, assuming
// it'll reach the maximum length. We optionally pass these samples through
// zstd's dictionary trainer, or just use them directly. Then, the dictionary
// is used for compressing subsequent output files in the same subcompaction.
const bool kUseZstdTrainer =
sub_compact->compaction->output_compression_opts().zstd_max_train_bytes >
0;
const size_t kSampleBytes =
kUseZstdTrainer
? sub_compact->compaction->output_compression_opts()
.zstd_max_train_bytes
: sub_compact->compaction->output_compression_opts().max_dict_bytes;
const int kSampleLenShift = 6; // 2^6 = 64-byte samples
std::set<size_t> sample_begin_offsets;
if (bottommost_level_ && kSampleBytes > 0) {
const size_t kMaxSamples = kSampleBytes >> kSampleLenShift;
const size_t kOutFileLen =
static_cast<size_t>(MaxFileSizeForLevel(*mutable_cf_options,
compact_->compaction->output_level(),
cfd->ioptions()->compaction_style,
compact_->compaction->GetInputBaseLevel(),
cfd->ioptions()->level_compaction_dynamic_level_bytes));
if (kOutFileLen != port::kMaxSizet) {
const size_t kOutFileNumSamples = kOutFileLen >> kSampleLenShift;
Random64 generator{versions_->NewFileNumber()};
for (size_t i = 0; i < kMaxSamples; ++i) {
sample_begin_offsets.insert(
static_cast<size_t>(generator.Uniform(kOutFileNumSamples))
<< kSampleLenShift);
}
}
}
MergeHelper merge( MergeHelper merge(
env_, cfd->user_comparator(), cfd->ioptions()->merge_operator, env_, cfd->user_comparator(), cfd->ioptions()->merge_operator,
compaction_filter, db_options_.info_log.get(), compaction_filter, db_options_.info_log.get(),
@ -938,12 +899,6 @@ void CompactionJob::ProcessKeyValueCompaction(SubcompactionState* sub_compact) {
sub_compact->current_output_file_size); sub_compact->current_output_file_size);
} }
const auto& c_iter_stats = c_iter->iter_stats(); const auto& c_iter_stats = c_iter->iter_stats();
auto sample_begin_offset_iter = sample_begin_offsets.cbegin();
// data_begin_offset and dict_sample_data are only valid while generating
// dictionary from the first output file.
size_t data_begin_offset = 0;
std::string dict_sample_data;
dict_sample_data.reserve(kSampleBytes);
while (status.ok() && !cfd->IsDropped() && c_iter->Valid()) { while (status.ok() && !cfd->IsDropped() && c_iter->Valid()) {
// Invariant: c_iter.status() is guaranteed to be OK if c_iter->Valid() // Invariant: c_iter.status() is guaranteed to be OK if c_iter->Valid()
@ -979,55 +934,6 @@ void CompactionJob::ProcessKeyValueCompaction(SubcompactionState* sub_compact) {
key, c_iter->ikey().sequence); key, c_iter->ikey().sequence);
sub_compact->num_output_records++; sub_compact->num_output_records++;
if (sub_compact->outputs.size() == 1) { // first output file
// Check if this key/value overlaps any sample intervals; if so, appends
// overlapping portions to the dictionary.
for (const auto& data_elmt : {key, value}) {
size_t data_end_offset = data_begin_offset + data_elmt.size();
while (sample_begin_offset_iter != sample_begin_offsets.cend() &&
*sample_begin_offset_iter < data_end_offset) {
size_t sample_end_offset =
*sample_begin_offset_iter + (1 << kSampleLenShift);
// Invariant: Because we advance sample iterator while processing the
// data_elmt containing the sample's last byte, the current sample
// cannot end before the current data_elmt.
assert(data_begin_offset < sample_end_offset);
size_t data_elmt_copy_offset, data_elmt_copy_len;
if (*sample_begin_offset_iter <= data_begin_offset) {
// The sample starts before data_elmt starts, so take bytes starting
// at the beginning of data_elmt.
data_elmt_copy_offset = 0;
} else {
// data_elmt starts before the sample starts, so take bytes starting
// at the below offset into data_elmt.
data_elmt_copy_offset =
*sample_begin_offset_iter - data_begin_offset;
}
if (sample_end_offset <= data_end_offset) {
// The sample ends before data_elmt ends, so take as many bytes as
// needed.
data_elmt_copy_len =
sample_end_offset - (data_begin_offset + data_elmt_copy_offset);
} else {
// data_elmt ends before the sample ends, so take all remaining
// bytes in data_elmt.
data_elmt_copy_len =
data_end_offset - (data_begin_offset + data_elmt_copy_offset);
}
dict_sample_data.append(&data_elmt.data()[data_elmt_copy_offset],
data_elmt_copy_len);
if (sample_end_offset > data_end_offset) {
// Didn't finish sample. Try to finish it with the next data_elmt.
break;
}
// Next sample may require bytes from same data_elmt.
sample_begin_offset_iter++;
}
data_begin_offset = data_end_offset;
}
}
// Close output file if it is big enough. Two possibilities determine it's // Close output file if it is big enough. Two possibilities determine it's
// time to close it: (1) the current key should be this file's last key, (2) // time to close it: (1) the current key should be this file's last key, (2)
// the next key should not be in this file. // the next key should not be in this file.
@ -1069,18 +975,6 @@ void CompactionJob::ProcessKeyValueCompaction(SubcompactionState* sub_compact) {
&range_del_out_stats, next_key); &range_del_out_stats, next_key);
RecordDroppedKeys(range_del_out_stats, RecordDroppedKeys(range_del_out_stats,
&sub_compact->compaction_job_stats); &sub_compact->compaction_job_stats);
if (sub_compact->outputs.size() == 1) {
// Use samples from first output file to create dictionary for
// compression of subsequent files.
if (kUseZstdTrainer) {
sub_compact->compression_dict = ZSTD_TrainDictionary(
dict_sample_data, kSampleLenShift,
sub_compact->compaction->output_compression_opts()
.max_dict_bytes);
} else {
sub_compact->compression_dict = std::move(dict_sample_data);
}
}
} }
} }
@ -1606,8 +1500,9 @@ Status CompactionJob::OpenCompactionOutputFile(
cfd->GetID(), cfd->GetName(), sub_compact->outfile.get(), cfd->GetID(), cfd->GetName(), sub_compact->outfile.get(),
sub_compact->compaction->output_compression(), sub_compact->compaction->output_compression(),
sub_compact->compaction->output_compression_opts(), sub_compact->compaction->output_compression_opts(),
sub_compact->compaction->output_level(), &sub_compact->compression_dict, sub_compact->compaction->output_level(), skip_filters,
skip_filters, output_file_creation_time)); output_file_creation_time, 0 /* oldest_key_time */, bottommost_level_,
sub_compact->compaction->max_output_file_size()));
LogFlush(db_options_.info_log); LogFlush(db_options_.info_log);
return s; return s;
} }

@ -633,7 +633,7 @@ TEST_F(DBBlockCacheTest, CompressedCache) {
TEST_F(DBBlockCacheTest, CacheCompressionDict) { TEST_F(DBBlockCacheTest, CacheCompressionDict) {
const int kNumFiles = 4; const int kNumFiles = 4;
const int kNumEntriesPerFile = 32; const int kNumEntriesPerFile = 128;
const int kNumBytesPerEntry = 1024; const int kNumBytesPerEntry = 1024;
// Try all the available libraries that support dictionary compression // Try all the available libraries that support dictionary compression

@ -1039,17 +1039,17 @@ TEST_F(DBTest2, WalFilterTestWithColumnFamilies) {
} }
TEST_F(DBTest2, PresetCompressionDict) { TEST_F(DBTest2, PresetCompressionDict) {
// Verifies that compression ratio improves when dictionary is enabled, and
// improves even further when the dictionary is trained by ZSTD.
const size_t kBlockSizeBytes = 4 << 10; const size_t kBlockSizeBytes = 4 << 10;
const size_t kL0FileBytes = 128 << 10; const size_t kL0FileBytes = 128 << 10;
const size_t kApproxPerBlockOverheadBytes = 50; const size_t kApproxPerBlockOverheadBytes = 50;
const int kNumL0Files = 5; const int kNumL0Files = 5;
const int kZstdTrainFactor = 16;
Options options; Options options;
options.env = CurrentOptions().env; // Make sure to use any custom env that the test is configured with. options.env = CurrentOptions().env; // Make sure to use any custom env that the test is configured with.
options.allow_concurrent_memtable_write = false; options.allow_concurrent_memtable_write = false;
options.arena_block_size = kBlockSizeBytes; options.arena_block_size = kBlockSizeBytes;
options.compaction_style = kCompactionStyleUniversal;
options.create_if_missing = true; options.create_if_missing = true;
options.disable_auto_compactions = true; options.disable_auto_compactions = true;
options.level0_file_num_compaction_trigger = kNumL0Files; options.level0_file_num_compaction_trigger = kNumL0Files;
@ -1091,16 +1091,15 @@ TEST_F(DBTest2, PresetCompressionDict) {
options.compression_opts.zstd_max_train_bytes = 0; options.compression_opts.zstd_max_train_bytes = 0;
break; break;
case 1: case 1:
options.compression_opts.max_dict_bytes = kBlockSizeBytes; options.compression_opts.max_dict_bytes = 4 * kBlockSizeBytes;
options.compression_opts.zstd_max_train_bytes = 0; options.compression_opts.zstd_max_train_bytes = 0;
break; break;
case 2: case 2:
if (compression_type != kZSTD) { if (compression_type != kZSTD) {
continue; continue;
} }
options.compression_opts.max_dict_bytes = kBlockSizeBytes; options.compression_opts.max_dict_bytes = 4 * kBlockSizeBytes;
options.compression_opts.zstd_max_train_bytes = options.compression_opts.zstd_max_train_bytes = kL0FileBytes;
kZstdTrainFactor * kBlockSizeBytes;
break; break;
default: default:
assert(false); assert(false);
@ -1110,20 +1109,24 @@ TEST_F(DBTest2, PresetCompressionDict) {
options.table_factory.reset(NewBlockBasedTableFactory(table_options)); options.table_factory.reset(NewBlockBasedTableFactory(table_options));
CreateAndReopenWithCF({"pikachu"}, options); CreateAndReopenWithCF({"pikachu"}, options);
Random rnd(301); Random rnd(301);
std::string seq_data = std::string seq_datas[10];
for (int j = 0; j < 10; ++j) {
seq_datas[j] =
RandomString(&rnd, kBlockSizeBytes - kApproxPerBlockOverheadBytes); RandomString(&rnd, kBlockSizeBytes - kApproxPerBlockOverheadBytes);
}
ASSERT_EQ(0, NumTableFilesAtLevel(0, 1)); ASSERT_EQ(0, NumTableFilesAtLevel(0, 1));
for (int j = 0; j < kNumL0Files; ++j) { for (int j = 0; j < kNumL0Files; ++j) {
for (size_t k = 0; k < kL0FileBytes / kBlockSizeBytes + 1; ++k) { for (size_t k = 0; k < kL0FileBytes / kBlockSizeBytes + 1; ++k) {
ASSERT_OK(Put(1, Key(static_cast<int>( auto key_num = j * (kL0FileBytes / kBlockSizeBytes) + k;
j * (kL0FileBytes / kBlockSizeBytes) + k)), ASSERT_OK(Put(1, Key(static_cast<int>(key_num)),
seq_data)); seq_datas[(key_num / 10) % 10]));
} }
dbfull()->TEST_WaitForFlushMemTable(handles_[1]); dbfull()->TEST_WaitForFlushMemTable(handles_[1]);
ASSERT_EQ(j + 1, NumTableFilesAtLevel(0, 1)); ASSERT_EQ(j + 1, NumTableFilesAtLevel(0, 1));
} }
db_->CompactRange(CompactRangeOptions(), handles_[1], nullptr, nullptr); dbfull()->TEST_CompactRange(0, nullptr, nullptr, handles_[1],
true /* disallow_trivial_move */);
ASSERT_EQ(0, NumTableFilesAtLevel(0, 1)); ASSERT_EQ(0, NumTableFilesAtLevel(0, 1));
ASSERT_GT(NumTableFilesAtLevel(1, 1), 0); ASSERT_GT(NumTableFilesAtLevel(1, 1), 0);
@ -1138,7 +1141,7 @@ TEST_F(DBTest2, PresetCompressionDict) {
for (size_t j = 0; j < kNumL0Files * (kL0FileBytes / kBlockSizeBytes); for (size_t j = 0; j < kNumL0Files * (kL0FileBytes / kBlockSizeBytes);
j++) { j++) {
ASSERT_EQ(seq_data, Get(1, Key(static_cast<int>(j)))); ASSERT_EQ(seq_datas[(j / 10) % 10], Get(1, Key(static_cast<int>(j))));
} }
if (i) { if (i) {
ASSERT_GT(prev_out_bytes, out_bytes); ASSERT_GT(prev_out_bytes, out_bytes);
@ -1149,6 +1152,70 @@ TEST_F(DBTest2, PresetCompressionDict) {
} }
} }
TEST_F(DBTest2, PresetCompressionDictLocality) {
if (!ZSTD_Supported()) {
return;
}
// Verifies that compression dictionary is generated from local data. The
// verification simply checks all output SSTs have different compression
// dictionaries. We do not verify effectiveness as that'd likely be flaky in
// the future.
const int kNumEntriesPerFile = 1 << 10; // 1KB
const int kNumBytesPerEntry = 1 << 10; // 1KB
const int kNumFiles = 4;
Options options = CurrentOptions();
options.compression = kZSTD;
options.compression_opts.max_dict_bytes = 1 << 14; // 16KB
options.compression_opts.zstd_max_train_bytes = 1 << 18; // 256KB
options.statistics = rocksdb::CreateDBStatistics();
options.target_file_size_base = kNumEntriesPerFile * kNumBytesPerEntry;
BlockBasedTableOptions table_options;
table_options.cache_index_and_filter_blocks = true;
options.table_factory.reset(new BlockBasedTableFactory(table_options));
Reopen(options);
Random rnd(301);
for (int i = 0; i < kNumFiles; ++i) {
for (int j = 0; j < kNumEntriesPerFile; ++j) {
ASSERT_OK(Put(Key(i * kNumEntriesPerFile + j),
RandomString(&rnd, kNumBytesPerEntry)));
}
ASSERT_OK(Flush());
MoveFilesToLevel(1);
ASSERT_EQ(NumTableFilesAtLevel(1), i + 1);
}
// Store all the dictionaries generated during a full compaction.
std::vector<std::string> compression_dicts;
rocksdb::SyncPoint::GetInstance()->SetCallBack(
"BlockBasedTableBuilder::WriteCompressionDictBlock:RawDict",
[&](void* arg) {
compression_dicts.emplace_back(static_cast<Slice*>(arg)->ToString());
});
rocksdb::SyncPoint::GetInstance()->EnableProcessing();
CompactRangeOptions compact_range_opts;
compact_range_opts.bottommost_level_compaction =
BottommostLevelCompaction::kForce;
ASSERT_OK(db_->CompactRange(compact_range_opts, nullptr, nullptr));
// Dictionary compression should not be so good as to compress four totally
// random files into one. If it does then there's probably something wrong
// with the test.
ASSERT_GT(NumTableFilesAtLevel(1), 1);
// Furthermore, there should be one compression dictionary generated per file.
// And they should all be different from each other.
ASSERT_EQ(NumTableFilesAtLevel(1),
static_cast<int>(compression_dicts.size()));
for (size_t i = 1; i < compression_dicts.size(); ++i) {
std::string& a = compression_dicts[i - 1];
std::string& b = compression_dicts[i];
size_t alen = a.size();
size_t blen = b.size();
ASSERT_TRUE(alen != blen || memcmp(a.data(), b.data(), alen) != 0);
}
}
class CompactionCompressionListener : public EventListener { class CompactionCompressionListener : public EventListener {
public: public:
explicit CompactionCompressionListener(Options* db_options) explicit CompactionCompressionListener(Options* db_options)

@ -254,6 +254,13 @@ struct BlockBasedTableBuilder::Rep {
Status status; Status status;
size_t alignment; size_t alignment;
BlockBuilder data_block; BlockBuilder data_block;
// Buffers uncompressed data blocks and keys to replay later. Needed when
// compression dictionary is enabled so we can finalize the dictionary before
// compressing any data blocks.
// TODO(ajkr): ideally we don't buffer all keys and all uncompressed data
// blocks as it's redundant, but it's easier to implement for now.
std::vector<std::pair<std::string, std::vector<std::string>>>
data_block_and_keys_buffers;
BlockBuilder range_del_block; BlockBuilder range_del_block;
InternalKeySliceTransform internal_prefix_transform; InternalKeySliceTransform internal_prefix_transform;
@ -263,13 +270,40 @@ struct BlockBasedTableBuilder::Rep {
std::string last_key; std::string last_key;
CompressionType compression_type; CompressionType compression_type;
CompressionOptions compression_opts; CompressionOptions compression_opts;
CompressionDict compression_dict; std::unique_ptr<CompressionDict> compression_dict;
CompressionContext compression_ctx; CompressionContext compression_ctx;
std::unique_ptr<UncompressionContext> verify_ctx; std::unique_ptr<UncompressionContext> verify_ctx;
UncompressionDict verify_dict; std::unique_ptr<UncompressionDict> verify_dict;
size_t data_begin_offset = 0;
TableProperties props; TableProperties props;
bool closed = false; // Either Finish() or Abandon() has been called. // States of the builder.
//
// - `kBuffered`: This is the initial state where zero or more data blocks are
// accumulated uncompressed in-memory. From this state, call
// `EnterUnbuffered()` to finalize the compression dictionary if enabled,
// compress/write out any buffered blocks, and proceed to the `kUnbuffered`
// state.
//
// - `kUnbuffered`: This is the state when compression dictionary is finalized
// either because it wasn't enabled in the first place or it's been created
// from sampling previously buffered data. In this state, blocks are simply
// compressed/written out as they fill up. From this state, call `Finish()`
// to complete the file (write meta-blocks, etc.), or `Abandon()` to delete
// the partially created file.
//
// - `kClosed`: This indicates either `Finish()` or `Abandon()` has been
// called, so the table builder is no longer usable. We must be in this
// state by the time the destructor runs.
enum class State {
kBuffered,
kUnbuffered,
kClosed,
};
State state;
const bool use_delta_encoding_for_index_values; const bool use_delta_encoding_for_index_values;
std::unique_ptr<FilterBlockBuilder> filter_builder; std::unique_ptr<FilterBlockBuilder> filter_builder;
char compressed_cache_key_prefix[BlockBasedTable::kMaxCacheKeyPrefixSize]; char compressed_cache_key_prefix[BlockBasedTable::kMaxCacheKeyPrefixSize];
@ -283,6 +317,8 @@ struct BlockBasedTableBuilder::Rep {
const std::string& column_family_name; const std::string& column_family_name;
uint64_t creation_time = 0; uint64_t creation_time = 0;
uint64_t oldest_key_time = 0; uint64_t oldest_key_time = 0;
const bool is_bottommost_level;
const uint64_t target_file_size;
std::vector<std::unique_ptr<IntTblPropCollector>> table_properties_collectors; std::vector<std::unique_ptr<IntTblPropCollector>> table_properties_collectors;
@ -293,10 +329,10 @@ struct BlockBasedTableBuilder::Rep {
int_tbl_prop_collector_factories, int_tbl_prop_collector_factories,
uint32_t _column_family_id, WritableFileWriter* f, uint32_t _column_family_id, WritableFileWriter* f,
const CompressionType _compression_type, const CompressionType _compression_type,
const CompressionOptions& _compression_opts, const CompressionOptions& _compression_opts, const bool skip_filters,
const std::string* _compression_dict, const bool skip_filters,
const std::string& _column_family_name, const uint64_t _creation_time, const std::string& _column_family_name, const uint64_t _creation_time,
const uint64_t _oldest_key_time) const uint64_t _oldest_key_time, const bool _is_bottommost_level,
const uint64_t _target_file_size)
: ioptions(_ioptions), : ioptions(_ioptions),
moptions(_moptions), moptions(_moptions),
table_options(table_opt), table_options(table_opt),
@ -317,14 +353,12 @@ struct BlockBasedTableBuilder::Rep {
internal_prefix_transform(_moptions.prefix_extractor.get()), internal_prefix_transform(_moptions.prefix_extractor.get()),
compression_type(_compression_type), compression_type(_compression_type),
compression_opts(_compression_opts), compression_opts(_compression_opts),
compression_dict( compression_dict(),
_compression_dict == nullptr ? Slice() : Slice(*_compression_dict),
_compression_type, _compression_opts.level),
compression_ctx(_compression_type), compression_ctx(_compression_type),
verify_dict( verify_dict(),
_compression_dict == nullptr ? std::string() : *_compression_dict, state((_is_bottommost_level && _compression_opts.max_dict_bytes > 0)
_compression_type == kZSTD || ? State::kBuffered
_compression_type == kZSTDNotFinalCompression), : State::kUnbuffered),
use_delta_encoding_for_index_values(table_opt.format_version >= 4 && use_delta_encoding_for_index_values(table_opt.format_version >= 4 &&
!table_opt.block_align), !table_opt.block_align),
compressed_cache_key_prefix_size(0), compressed_cache_key_prefix_size(0),
@ -334,7 +368,9 @@ struct BlockBasedTableBuilder::Rep {
column_family_id(_column_family_id), column_family_id(_column_family_id),
column_family_name(_column_family_name), column_family_name(_column_family_name),
creation_time(_creation_time), creation_time(_creation_time),
oldest_key_time(_oldest_key_time) { oldest_key_time(_oldest_key_time),
is_bottommost_level(_is_bottommost_level),
target_file_size(_target_file_size) {
if (table_options.index_type == if (table_options.index_type ==
BlockBasedTableOptions::kTwoLevelIndexSearch) { BlockBasedTableOptions::kTwoLevelIndexSearch) {
p_index_builder_ = PartitionedIndexBuilder::CreateIndexBuilder( p_index_builder_ = PartitionedIndexBuilder::CreateIndexBuilder(
@ -383,10 +419,10 @@ BlockBasedTableBuilder::BlockBasedTableBuilder(
int_tbl_prop_collector_factories, int_tbl_prop_collector_factories,
uint32_t column_family_id, WritableFileWriter* file, uint32_t column_family_id, WritableFileWriter* file,
const CompressionType compression_type, const CompressionType compression_type,
const CompressionOptions& compression_opts, const CompressionOptions& compression_opts, const bool skip_filters,
const std::string* compression_dict, const bool skip_filters,
const std::string& column_family_name, const uint64_t creation_time, const std::string& column_family_name, const uint64_t creation_time,
const uint64_t oldest_key_time) { const uint64_t oldest_key_time, const bool is_bottommost_level,
const uint64_t target_file_size) {
BlockBasedTableOptions sanitized_table_options(table_options); BlockBasedTableOptions sanitized_table_options(table_options);
if (sanitized_table_options.format_version == 0 && if (sanitized_table_options.format_version == 0 &&
sanitized_table_options.checksum != kCRC32c) { sanitized_table_options.checksum != kCRC32c) {
@ -399,11 +435,11 @@ BlockBasedTableBuilder::BlockBasedTableBuilder(
sanitized_table_options.format_version = 1; sanitized_table_options.format_version = 1;
} }
rep_ = rep_ = new Rep(ioptions, moptions, sanitized_table_options,
new Rep(ioptions, moptions, sanitized_table_options, internal_comparator, internal_comparator, int_tbl_prop_collector_factories,
int_tbl_prop_collector_factories, column_family_id, file, column_family_id, file, compression_type, compression_opts,
compression_type, compression_opts, compression_dict, skip_filters, column_family_name, creation_time,
skip_filters, column_family_name, creation_time, oldest_key_time); oldest_key_time, is_bottommost_level, target_file_size);
if (rep_->filter_builder != nullptr) { if (rep_->filter_builder != nullptr) {
rep_->filter_builder->StartBlock(0); rep_->filter_builder->StartBlock(0);
@ -417,13 +453,14 @@ BlockBasedTableBuilder::BlockBasedTableBuilder(
} }
BlockBasedTableBuilder::~BlockBasedTableBuilder() { BlockBasedTableBuilder::~BlockBasedTableBuilder() {
assert(rep_->closed); // Catch errors where caller forgot to call Finish() // Catch errors where caller forgot to call Finish()
assert(rep_->state == Rep::State::kClosed);
delete rep_; delete rep_;
} }
void BlockBasedTableBuilder::Add(const Slice& key, const Slice& value) { void BlockBasedTableBuilder::Add(const Slice& key, const Slice& value) {
Rep* r = rep_; Rep* r = rep_;
assert(!r->closed); assert(rep_->state != Rep::State::kClosed);
if (!ok()) return; if (!ok()) return;
ValueType value_type = ExtractValueType(key); ValueType value_type = ExtractValueType(key);
if (IsValueType(value_type)) { if (IsValueType(value_type)) {
@ -438,6 +475,11 @@ void BlockBasedTableBuilder::Add(const Slice& key, const Slice& value) {
assert(!r->data_block.empty()); assert(!r->data_block.empty());
Flush(); Flush();
if (r->state == Rep::State::kBuffered &&
r->data_begin_offset > r->target_file_size) {
EnterUnbuffered();
}
// Add item to index block. // Add item to index block.
// We do not emit the index entry for a block until we have seen the // We do not emit the index entry for a block until we have seen the
// first key for the next data block. This allows us to use shorter // first key for the next data block. This allows us to use shorter
@ -446,20 +488,29 @@ void BlockBasedTableBuilder::Add(const Slice& key, const Slice& value) {
// "the r" as the key for the index block entry since it is >= all // "the r" as the key for the index block entry since it is >= all
// entries in the first block and < all entries in subsequent // entries in the first block and < all entries in subsequent
// blocks. // blocks.
if (ok()) { if (ok() && r->state == Rep::State::kUnbuffered) {
r->index_builder->AddIndexEntry(&r->last_key, &key, r->pending_handle); r->index_builder->AddIndexEntry(&r->last_key, &key, r->pending_handle);
} }
} }
// Note: PartitionedFilterBlockBuilder requires key being added to filter // Note: PartitionedFilterBlockBuilder requires key being added to filter
// builder after being added to index builder. // builder after being added to index builder.
if (r->filter_builder != nullptr) { if (r->state == Rep::State::kUnbuffered && r->filter_builder != nullptr) {
r->filter_builder->Add(ExtractUserKey(key)); r->filter_builder->Add(ExtractUserKey(key));
} }
r->last_key.assign(key.data(), key.size()); r->last_key.assign(key.data(), key.size());
r->data_block.Add(key, value); r->data_block.Add(key, value);
if (r->state == Rep::State::kBuffered) {
// Buffer keys to be replayed during `Finish()` once compression
// dictionary has been finalized.
if (r->data_block_and_keys_buffers.empty() || should_flush) {
r->data_block_and_keys_buffers.emplace_back();
}
r->data_block_and_keys_buffers.back().second.emplace_back(key.ToString());
} else {
r->index_builder->OnKeyAdded(key); r->index_builder->OnKeyAdded(key);
}
NotifyCollectTableCollectorsOnAdd(key, value, r->offset, NotifyCollectTableCollectorsOnAdd(key, value, r->offset,
r->table_properties_collectors, r->table_properties_collectors,
r->ioptions.info_log); r->ioptions.info_log);
@ -488,15 +539,10 @@ void BlockBasedTableBuilder::Add(const Slice& key, const Slice& value) {
void BlockBasedTableBuilder::Flush() { void BlockBasedTableBuilder::Flush() {
Rep* r = rep_; Rep* r = rep_;
assert(!r->closed); assert(rep_->state != Rep::State::kClosed);
if (!ok()) return; if (!ok()) return;
if (r->data_block.empty()) return; if (r->data_block.empty()) return;
WriteBlock(&r->data_block, &r->pending_handle, true /* is_data_block */); WriteBlock(&r->data_block, &r->pending_handle, true /* is_data_block */);
if (r->filter_builder != nullptr) {
r->filter_builder->StartBlock(r->offset);
}
r->props.data_size = r->offset;
++r->props.num_data_blocks;
} }
void BlockBasedTableBuilder::WriteBlock(BlockBuilder* block, void BlockBasedTableBuilder::WriteBlock(BlockBuilder* block,
@ -523,11 +569,24 @@ void BlockBasedTableBuilder::WriteBlock(const Slice& raw_block_contents,
StopWatchNano timer(r->ioptions.env, StopWatchNano timer(r->ioptions.env,
ShouldReportDetailedTime(r->ioptions.env, r->ioptions.statistics)); ShouldReportDetailedTime(r->ioptions.env, r->ioptions.statistics));
if (r->state == Rep::State::kBuffered) {
assert(is_data_block);
assert(!r->data_block_and_keys_buffers.empty());
r->data_block_and_keys_buffers.back().first = raw_block_contents.ToString();
r->data_begin_offset += r->data_block_and_keys_buffers.back().first.size();
return;
}
if (raw_block_contents.size() < kCompressionSizeLimit) { if (raw_block_contents.size() < kCompressionSizeLimit) {
CompressionInfo compression_info( const CompressionDict* compression_dict;
r->compression_opts, r->compression_ctx, if (!is_data_block || r->compression_dict == nullptr) {
is_data_block ? r->compression_dict : CompressionDict::GetEmptyDict(), compression_dict = &CompressionDict::GetEmptyDict();
r->compression_type); } else {
compression_dict = r->compression_dict.get();
}
assert(compression_dict != nullptr);
CompressionInfo compression_info(r->compression_opts, r->compression_ctx,
*compression_dict, r->compression_type);
block_contents = block_contents =
CompressBlock(raw_block_contents, compression_info, &type, CompressBlock(raw_block_contents, compression_info, &type,
r->table_options.format_version, &r->compressed_output); r->table_options.format_version, &r->compressed_output);
@ -537,10 +596,15 @@ void BlockBasedTableBuilder::WriteBlock(const Slice& raw_block_contents,
// compressed data and compare to the input. // compressed data and compare to the input.
if (type != kNoCompression && r->table_options.verify_compression) { if (type != kNoCompression && r->table_options.verify_compression) {
// Retrieve the uncompressed contents into a new buffer // Retrieve the uncompressed contents into a new buffer
const UncompressionDict* verify_dict;
if (!is_data_block || r->verify_dict == nullptr) {
verify_dict = &UncompressionDict::GetEmptyDict();
} else {
verify_dict = r->verify_dict.get();
}
assert(verify_dict != nullptr);
BlockContents contents; BlockContents contents;
UncompressionInfo uncompression_info( UncompressionInfo uncompression_info(*r->verify_ctx, *verify_dict,
*r->verify_ctx,
is_data_block ? r->verify_dict : UncompressionDict::GetEmptyDict(),
r->compression_type); r->compression_type);
Status stat = UncompressBlockContentsForCompressionType( Status stat = UncompressBlockContentsForCompressionType(
uncompression_info, block_contents.data(), block_contents.size(), uncompression_info, block_contents.data(), block_contents.size(),
@ -587,6 +651,13 @@ void BlockBasedTableBuilder::WriteBlock(const Slice& raw_block_contents,
WriteRawBlock(block_contents, type, handle, is_data_block); WriteRawBlock(block_contents, type, handle, is_data_block);
r->compressed_output.clear(); r->compressed_output.clear();
if (is_data_block) {
if (r->filter_builder != nullptr) {
r->filter_builder->StartBlock(r->offset);
}
r->props.data_size = r->offset;
++r->props.num_data_blocks;
}
} }
void BlockBasedTableBuilder::WriteRawBlock(const Slice& block_contents, void BlockBasedTableBuilder::WriteRawBlock(const Slice& block_contents,
@ -873,11 +944,18 @@ void BlockBasedTableBuilder::WritePropertiesBlock(
void BlockBasedTableBuilder::WriteCompressionDictBlock( void BlockBasedTableBuilder::WriteCompressionDictBlock(
MetaIndexBuilder* meta_index_builder) { MetaIndexBuilder* meta_index_builder) {
if (rep_->compression_dict.GetRawDict().size()) { if (rep_->compression_dict != nullptr &&
rep_->compression_dict->GetRawDict().size()) {
BlockHandle compression_dict_block_handle; BlockHandle compression_dict_block_handle;
if (ok()) { if (ok()) {
WriteRawBlock(rep_->compression_dict.GetRawDict(), kNoCompression, WriteRawBlock(rep_->compression_dict->GetRawDict(), kNoCompression,
&compression_dict_block_handle); &compression_dict_block_handle);
#ifndef NDEBUG
Slice compression_dict = rep_->compression_dict->GetRawDict();
TEST_SYNC_POINT_CALLBACK(
"BlockBasedTableBuilder::WriteCompressionDictBlock:RawDict",
&compression_dict);
#endif // NDEBUG
} }
if (ok()) { if (ok()) {
meta_index_builder->Add(kCompressionDictBlock, meta_index_builder->Add(kCompressionDictBlock,
@ -925,13 +1003,77 @@ void BlockBasedTableBuilder::WriteFooter(BlockHandle& metaindex_block_handle,
} }
} }
void BlockBasedTableBuilder::EnterUnbuffered() {
Rep* r = rep_;
assert(r->state == Rep::State::kBuffered);
r->state = Rep::State::kUnbuffered;
const size_t kSampleBytes = r->compression_opts.zstd_max_train_bytes > 0
? r->compression_opts.zstd_max_train_bytes
: r->compression_opts.max_dict_bytes;
Random64 generator{r->creation_time};
std::string compression_dict_samples;
std::vector<size_t> compression_dict_sample_lens;
if (!r->data_block_and_keys_buffers.empty()) {
while (compression_dict_samples.size() < kSampleBytes) {
size_t rand_idx =
generator.Uniform(r->data_block_and_keys_buffers.size());
size_t copy_len =
std::min(kSampleBytes - compression_dict_samples.size(),
r->data_block_and_keys_buffers[rand_idx].first.size());
compression_dict_samples.append(
r->data_block_and_keys_buffers[rand_idx].first, 0, copy_len);
compression_dict_sample_lens.emplace_back(copy_len);
}
}
// final data block flushed, now we can generate dictionary from the samples.
// OK if compression_dict_samples is empty, we'll just get empty dictionary.
std::string dict;
if (r->compression_opts.zstd_max_train_bytes > 0) {
dict = ZSTD_TrainDictionary(compression_dict_samples,
compression_dict_sample_lens,
r->compression_opts.max_dict_bytes);
} else {
dict = std::move(compression_dict_samples);
}
r->compression_dict.reset(new CompressionDict(dict, r->compression_type,
r->compression_opts.level));
r->verify_dict.reset(new UncompressionDict(
dict, r->compression_type == kZSTD ||
r->compression_type == kZSTDNotFinalCompression));
for (size_t i = 0; ok() && i < r->data_block_and_keys_buffers.size(); ++i) {
const auto& data_block = r->data_block_and_keys_buffers[i].first;
auto& keys = r->data_block_and_keys_buffers[i].second;
assert(!data_block.empty());
assert(!keys.empty());
for (const auto& key : keys) {
if (r->filter_builder != nullptr) {
r->filter_builder->Add(ExtractUserKey(key));
}
r->index_builder->OnKeyAdded(key);
}
WriteBlock(Slice(data_block), &r->pending_handle, true /* is_data_block */);
if (ok() && i + 1 < r->data_block_and_keys_buffers.size()) {
Slice first_key_in_next_block =
r->data_block_and_keys_buffers[i + 1].second.front();
Slice* first_key_in_next_block_ptr = &first_key_in_next_block;
r->index_builder->AddIndexEntry(&keys.back(), first_key_in_next_block_ptr,
r->pending_handle);
}
}
r->data_block_and_keys_buffers.clear();
}
Status BlockBasedTableBuilder::Finish() { Status BlockBasedTableBuilder::Finish() {
Rep* r = rep_; Rep* r = rep_;
assert(r->state != Rep::State::kClosed);
bool empty_data_block = r->data_block.empty(); bool empty_data_block = r->data_block.empty();
Flush(); Flush();
assert(!r->closed); if (r->state == Rep::State::kBuffered) {
r->closed = true; EnterUnbuffered();
}
// To make sure properties block is able to keep the accurate size of index // To make sure properties block is able to keep the accurate size of index
// block, we will finish writing all index entries first. // block, we will finish writing all index entries first.
if (ok() && !empty_data_block) { if (ok() && !empty_data_block) {
@ -962,23 +1104,20 @@ Status BlockBasedTableBuilder::Finish() {
if (ok()) { if (ok()) {
WriteFooter(metaindex_block_handle, index_block_handle); WriteFooter(metaindex_block_handle, index_block_handle);
} }
r->state = Rep::State::kClosed;
return r->status; return r->status;
} }
void BlockBasedTableBuilder::Abandon() { void BlockBasedTableBuilder::Abandon() {
Rep* r = rep_; assert(rep_->state != Rep::State::kClosed);
assert(!r->closed); rep_->state = Rep::State::kClosed;
r->closed = true;
} }
uint64_t BlockBasedTableBuilder::NumEntries() const { uint64_t BlockBasedTableBuilder::NumEntries() const {
return rep_->props.num_entries; return rep_->props.num_entries;
} }
uint64_t BlockBasedTableBuilder::FileSize() const { uint64_t BlockBasedTableBuilder::FileSize() const { return rep_->offset; }
return rep_->offset;
}
bool BlockBasedTableBuilder::NeedCompact() const { bool BlockBasedTableBuilder::NeedCompact() const {
for (const auto& collector : rep_->table_properties_collectors) { for (const auto& collector : rep_->table_properties_collectors) {

@ -37,8 +37,6 @@ class BlockBasedTableBuilder : public TableBuilder {
// Create a builder that will store the contents of the table it is // Create a builder that will store the contents of the table it is
// building in *file. Does not close the file. It is up to the // building in *file. Does not close the file. It is up to the
// caller to close the file after calling Finish(). // caller to close the file after calling Finish().
// @param compression_dict Data for presetting the compression library's
// dictionary, or nullptr.
BlockBasedTableBuilder( BlockBasedTableBuilder(
const ImmutableCFOptions& ioptions, const MutableCFOptions& moptions, const ImmutableCFOptions& ioptions, const MutableCFOptions& moptions,
const BlockBasedTableOptions& table_options, const BlockBasedTableOptions& table_options,
@ -47,10 +45,11 @@ class BlockBasedTableBuilder : public TableBuilder {
int_tbl_prop_collector_factories, int_tbl_prop_collector_factories,
uint32_t column_family_id, WritableFileWriter* file, uint32_t column_family_id, WritableFileWriter* file,
const CompressionType compression_type, const CompressionType compression_type,
const CompressionOptions& compression_opts, const CompressionOptions& compression_opts, const bool skip_filters,
const std::string* compression_dict, const bool skip_filters,
const std::string& column_family_name, const uint64_t creation_time = 0, const std::string& column_family_name, const uint64_t creation_time = 0,
const uint64_t oldest_key_time = 0); const uint64_t oldest_key_time = 0,
const bool is_bottommost_level = false,
const uint64_t target_file_size = 0);
// REQUIRES: Either Finish() or Abandon() has been called. // REQUIRES: Either Finish() or Abandon() has been called.
~BlockBasedTableBuilder(); ~BlockBasedTableBuilder();
@ -94,6 +93,11 @@ class BlockBasedTableBuilder : public TableBuilder {
private: private:
bool ok() const { return status().ok(); } bool ok() const { return status().ok(); }
// Transition state from buffered to unbuffered. See `Rep::State` API comment
// for details of the states.
// REQUIRES: `rep_->state == kBuffered`
void EnterUnbuffered();
// Call block's Finish() method // Call block's Finish() method
// and then write the compressed block contents to file. // and then write the compressed block contents to file.
void WriteBlock(BlockBuilder* block, BlockHandle* handle, bool is_data_block); void WriteBlock(BlockBuilder* block, BlockHandle* handle, bool is_data_block);

@ -215,11 +215,12 @@ TableBuilder* BlockBasedTableFactory::NewTableBuilder(
table_builder_options.int_tbl_prop_collector_factories, column_family_id, table_builder_options.int_tbl_prop_collector_factories, column_family_id,
file, table_builder_options.compression_type, file, table_builder_options.compression_type,
table_builder_options.compression_opts, table_builder_options.compression_opts,
table_builder_options.compression_dict,
table_builder_options.skip_filters, table_builder_options.skip_filters,
table_builder_options.column_family_name, table_builder_options.column_family_name,
table_builder_options.creation_time, table_builder_options.creation_time,
table_builder_options.oldest_key_time); table_builder_options.oldest_key_time,
table_builder_options.is_bottommost_level,
table_builder_options.target_file_size);
return table_builder; return table_builder;
} }

@ -559,7 +559,6 @@ void TestBoundary(InternalKey& ik1, std::string& v1, InternalKey& ik2,
TableBuilderOptions(ioptions, moptions, internal_comparator, TableBuilderOptions(ioptions, moptions, internal_comparator,
&int_tbl_prop_collector_factories, &int_tbl_prop_collector_factories,
options.compression, CompressionOptions(), options.compression, CompressionOptions(),
nullptr /* compression_dict */,
false /* skip_filters */, column_family_name, level_), false /* skip_filters */, column_family_name, level_),
TablePropertiesCollectorFactory::Context::kUnknownColumnFamily, TablePropertiesCollectorFactory::Context::kUnknownColumnFamily,
file_writer.get())); file_writer.get()));

@ -235,8 +235,7 @@ Status SstFileWriter::Open(const std::string& file_path) {
TableBuilderOptions table_builder_options( TableBuilderOptions table_builder_options(
r->ioptions, r->mutable_cf_options, r->internal_comparator, r->ioptions, r->mutable_cf_options, r->internal_comparator,
&int_tbl_prop_collector_factories, compression_type, compression_opts, &int_tbl_prop_collector_factories, compression_type, compression_opts,
nullptr /* compression_dict */, r->skip_filters, r->column_family_name, r->skip_filters, r->column_family_name, unknown_level);
unknown_level);
r->file_writer.reset(new WritableFileWriter( r->file_writer.reset(new WritableFileWriter(
std::move(sst_file), file_path, r->env_options, r->ioptions.env, std::move(sst_file), file_path, r->env_options, r->ioptions.env,
nullptr /* stats */, r->ioptions.listeners)); nullptr /* stats */, r->ioptions.listeners));

@ -74,22 +74,23 @@ struct TableBuilderOptions {
const std::vector<std::unique_ptr<IntTblPropCollectorFactory>>* const std::vector<std::unique_ptr<IntTblPropCollectorFactory>>*
_int_tbl_prop_collector_factories, _int_tbl_prop_collector_factories,
CompressionType _compression_type, CompressionType _compression_type,
const CompressionOptions& _compression_opts, const CompressionOptions& _compression_opts, bool _skip_filters,
const std::string* _compression_dict, bool _skip_filters,
const std::string& _column_family_name, int _level, const std::string& _column_family_name, int _level,
const uint64_t _creation_time = 0, const int64_t _oldest_key_time = 0) const uint64_t _creation_time = 0, const int64_t _oldest_key_time = 0,
bool _is_bottommost_level = false, const uint64_t _target_file_size = 0)
: ioptions(_ioptions), : ioptions(_ioptions),
moptions(_moptions), moptions(_moptions),
internal_comparator(_internal_comparator), internal_comparator(_internal_comparator),
int_tbl_prop_collector_factories(_int_tbl_prop_collector_factories), int_tbl_prop_collector_factories(_int_tbl_prop_collector_factories),
compression_type(_compression_type), compression_type(_compression_type),
compression_opts(_compression_opts), compression_opts(_compression_opts),
compression_dict(_compression_dict),
skip_filters(_skip_filters), skip_filters(_skip_filters),
column_family_name(_column_family_name), column_family_name(_column_family_name),
level(_level), level(_level),
creation_time(_creation_time), creation_time(_creation_time),
oldest_key_time(_oldest_key_time) {} oldest_key_time(_oldest_key_time),
is_bottommost_level(_is_bottommost_level),
target_file_size(_target_file_size) {}
const ImmutableCFOptions& ioptions; const ImmutableCFOptions& ioptions;
const MutableCFOptions& moptions; const MutableCFOptions& moptions;
const InternalKeyComparator& internal_comparator; const InternalKeyComparator& internal_comparator;
@ -97,13 +98,13 @@ struct TableBuilderOptions {
int_tbl_prop_collector_factories; int_tbl_prop_collector_factories;
CompressionType compression_type; CompressionType compression_type;
const CompressionOptions& compression_opts; const CompressionOptions& compression_opts;
// Data for presetting the compression library's dictionary, or nullptr.
const std::string* compression_dict;
bool skip_filters; // only used by BlockBasedTableBuilder bool skip_filters; // only used by BlockBasedTableBuilder
const std::string& column_family_name; const std::string& column_family_name;
int level; // what level this table/file is on, -1 for "not set, don't know" int level; // what level this table/file is on, -1 for "not set, don't know"
const uint64_t creation_time; const uint64_t creation_time;
const int64_t oldest_key_time; const int64_t oldest_key_time;
const bool is_bottommost_level;
const uint64_t target_file_size;
}; };
// TableBuilder provides the interface used to build a Table // TableBuilder provides the interface used to build a Table

@ -101,8 +101,7 @@ void TableReaderBenchmark(Options& opts, EnvOptions& env_options,
TableBuilderOptions( TableBuilderOptions(
ioptions, moptions, ikc, &int_tbl_prop_collector_factories, ioptions, moptions, ikc, &int_tbl_prop_collector_factories,
CompressionType::kNoCompression, CompressionOptions(), CompressionType::kNoCompression, CompressionOptions(),
nullptr /* compression_dict */, false /* skip_filters */, false /* skip_filters */, kDefaultColumnFamilyName, unknown_level),
kDefaultColumnFamilyName, unknown_level),
0 /* column_family_id */, file_writer.get()); 0 /* column_family_id */, file_writer.get());
} else { } else {
s = DB::Open(opts, dbname, &db); s = DB::Open(opts, dbname, &db);

@ -329,11 +329,11 @@ class TableConstructor: public Constructor {
int_tbl_prop_collector_factories; int_tbl_prop_collector_factories;
std::string column_family_name; std::string column_family_name;
builder.reset(ioptions.table_factory->NewTableBuilder( builder.reset(ioptions.table_factory->NewTableBuilder(
TableBuilderOptions( TableBuilderOptions(ioptions, moptions, internal_comparator,
ioptions, moptions, internal_comparator, &int_tbl_prop_collector_factories,
&int_tbl_prop_collector_factories, options.compression, options.compression, CompressionOptions(),
CompressionOptions(), nullptr /* compression_dict */, false /* skip_filters */, column_family_name,
false /* skip_filters */, column_family_name, level_), level_),
TablePropertiesCollectorFactory::Context::kUnknownColumnFamily, TablePropertiesCollectorFactory::Context::kUnknownColumnFamily,
file_writer_.get())); file_writer_.get()));
@ -2640,10 +2640,10 @@ TEST_F(PlainTableTest, BasicPlainTableProperties) {
std::string column_family_name; std::string column_family_name;
int unknown_level = -1; int unknown_level = -1;
std::unique_ptr<TableBuilder> builder(factory.NewTableBuilder( std::unique_ptr<TableBuilder> builder(factory.NewTableBuilder(
TableBuilderOptions( TableBuilderOptions(ioptions, moptions, ikc,
ioptions, moptions, ikc, &int_tbl_prop_collector_factories, &int_tbl_prop_collector_factories, kNoCompression,
kNoCompression, CompressionOptions(), nullptr /* compression_dict */, CompressionOptions(), false /* skip_filters */,
false /* skip_filters */, column_family_name, unknown_level), column_family_name, unknown_level),
TablePropertiesCollectorFactory::Context::kUnknownColumnFamily, TablePropertiesCollectorFactory::Context::kUnknownColumnFamily,
file_writer.get())); file_writer.get()));
@ -3269,8 +3269,8 @@ TEST_P(BlockBasedTableTest, DISABLED_TableWithGlobalSeqno) {
std::unique_ptr<TableBuilder> builder(options.table_factory->NewTableBuilder( std::unique_ptr<TableBuilder> builder(options.table_factory->NewTableBuilder(
TableBuilderOptions(ioptions, moptions, ikc, TableBuilderOptions(ioptions, moptions, ikc,
&int_tbl_prop_collector_factories, kNoCompression, &int_tbl_prop_collector_factories, kNoCompression,
CompressionOptions(), nullptr /* compression_dict */, CompressionOptions(), false /* skip_filters */,
false /* skip_filters */, column_family_name, -1), column_family_name, -1),
TablePropertiesCollectorFactory::Context::kUnknownColumnFamily, TablePropertiesCollectorFactory::Context::kUnknownColumnFamily,
file_writer.get())); file_writer.get()));
@ -3449,8 +3449,8 @@ TEST_P(BlockBasedTableTest, BlockAlignTest) {
std::unique_ptr<TableBuilder> builder(options.table_factory->NewTableBuilder( std::unique_ptr<TableBuilder> builder(options.table_factory->NewTableBuilder(
TableBuilderOptions(ioptions, moptions, ikc, TableBuilderOptions(ioptions, moptions, ikc,
&int_tbl_prop_collector_factories, kNoCompression, &int_tbl_prop_collector_factories, kNoCompression,
CompressionOptions(), nullptr /* compression_dict */, CompressionOptions(), false /* skip_filters */,
false /* skip_filters */, column_family_name, -1), column_family_name, -1),
TablePropertiesCollectorFactory::Context::kUnknownColumnFamily, TablePropertiesCollectorFactory::Context::kUnknownColumnFamily,
file_writer.get())); file_writer.get()));
@ -3542,8 +3542,8 @@ TEST_P(BlockBasedTableTest, PropertiesBlockRestartPointTest) {
std::unique_ptr<TableBuilder> builder(options.table_factory->NewTableBuilder( std::unique_ptr<TableBuilder> builder(options.table_factory->NewTableBuilder(
TableBuilderOptions(ioptions, moptions, ikc, TableBuilderOptions(ioptions, moptions, ikc,
&int_tbl_prop_collector_factories, kNoCompression, &int_tbl_prop_collector_factories, kNoCompression,
CompressionOptions(), nullptr /* compression_dict */, CompressionOptions(), false /* skip_filters */,
false /* skip_filters */, column_family_name, -1), column_family_name, -1),
TablePropertiesCollectorFactory::Context::kUnknownColumnFamily, TablePropertiesCollectorFactory::Context::kUnknownColumnFamily,
file_writer.get())); file_writer.get()));

@ -60,8 +60,7 @@ void createSST(const Options& opts, const std::string& file_name) {
TableBuilderOptions( TableBuilderOptions(
imoptions, moptions, ikc, &int_tbl_prop_collector_factories, imoptions, moptions, ikc, &int_tbl_prop_collector_factories,
CompressionType::kNoCompression, CompressionOptions(), CompressionType::kNoCompression, CompressionOptions(),
nullptr /* compression_dict */, false /* skip_filters */, false /* skip_filters */, column_family_name, unknown_level),
column_family_name, unknown_level),
TablePropertiesCollectorFactory::Context::kUnknownColumnFamily, TablePropertiesCollectorFactory::Context::kUnknownColumnFamily,
file_writer.get())); file_writer.get()));

@ -214,10 +214,10 @@ int SstFileDumper::ShowAllCompressionSizes(
CompressionOptions compress_opt; CompressionOptions compress_opt;
std::string column_family_name; std::string column_family_name;
int unknown_level = -1; int unknown_level = -1;
TableBuilderOptions tb_opts( TableBuilderOptions tb_opts(imoptions, moptions, ikc,
imoptions, moptions, ikc, &block_based_table_factories, i.first, &block_based_table_factories, i.first,
compress_opt, nullptr /* compression_dict */, compress_opt, false /* skip_filters */,
false /* skip_filters */, column_family_name, unknown_level); column_family_name, unknown_level);
uint64_t file_size = CalculateCompressedTableSize(tb_opts, block_size); uint64_t file_size = CalculateCompressedTableSize(tb_opts, block_size);
fprintf(stdout, "Compression: %s", i.second); fprintf(stdout, "Compression: %s", i.second);
fprintf(stdout, " Size: %" PRIu64 "\n", file_size); fprintf(stdout, " Size: %" PRIu64 "\n", file_size);

@ -158,13 +158,13 @@ struct CompressionDict {
#if ZSTD_VERSION_NUMBER >= 700 #if ZSTD_VERSION_NUMBER >= 700
ZSTD_CDict* zstd_cdict_ = nullptr; ZSTD_CDict* zstd_cdict_ = nullptr;
#endif // ZSTD_VERSION_NUMBER >= 700 #endif // ZSTD_VERSION_NUMBER >= 700
Slice dict_; std::string dict_;
public: public:
#if ZSTD_VERSION_NUMBER >= 700 #if ZSTD_VERSION_NUMBER >= 700
CompressionDict(Slice dict, CompressionType type, int level) { CompressionDict(std::string dict, CompressionType type, int level) {
#else // ZSTD_VERSION_NUMBER >= 700 #else // ZSTD_VERSION_NUMBER >= 700
CompressionDict(Slice dict, CompressionType /*type*/, int /*level*/) { CompressionDict(std::string dict, CompressionType /*type*/, int /*level*/) {
#endif // ZSTD_VERSION_NUMBER >= 700 #endif // ZSTD_VERSION_NUMBER >= 700
dict_ = std::move(dict); dict_ = std::move(dict);
#if ZSTD_VERSION_NUMBER >= 700 #if ZSTD_VERSION_NUMBER >= 700

Loading…
Cancel
Save