diff --git a/HISTORY.md b/HISTORY.md index cb10fdd80..0ad4471d4 100644 --- a/HISTORY.md +++ b/HISTORY.md @@ -8,8 +8,9 @@ * Provide support for ReadOptions.async_io with direct_io to improve Seek latency by using async IO to parallelize child iterator seek and doing asynchronous prefetching on sequential scans. * Added support for blob caching in order to cache frequently used blobs for BlobDB. * User can configure the new ColumnFamilyOptions `blob_cache` to enable/disable blob caching. - * Either sharing the backend cache with the block cache or using a completely separate cache is supported. - * A new abstraction interface called `BlobSource` for blob read logic gives all users access to blobs, whether they are in the blob cache, secondary cache, or (remote) storage. Blobs can be potentially read both while handling user reads (`Get`, `MultiGet`, or iterator) and during compaction (while dealing with compaction filters, Merges, or garbage collection) but eventually all blob reads go through `Version::GetBlob` or, for MultiGet, `Version::MultiGetBlob` (and then get dispatched to the interface -- `BlobSource`). + * Either sharing the backend cache with the block cache or using a completely separate cache is supported. + * A new abstraction interface called `BlobSource` for blob read logic gives all users access to blobs, whether they are in the blob cache, secondary cache, or (remote) storage. Blobs can be potentially read both while handling user reads (`Get`, `MultiGet`, or iterator) and during compaction (while dealing with compaction filters, Merges, or garbage collection) but eventually all blob reads go through `Version::GetBlob` or, for MultiGet, `Version::MultiGetBlob` (and then get dispatched to the interface -- `BlobSource`). + * Added `prepopulate_blob_cache` to ColumnFamilyOptions. If enabled, prepopulate warm/hot blobs which are already in memory into blob cache at the time of flush. On a flush, the blob that is in memory (in memtables) get flushed to the device. If using Direct IO, additional IO is incurred to read this blob back into memory again, which is avoided by enabling this option. This further helps if the workload exhibits high temporal locality, where most of the reads go to recently written data. This also helps in case of the remote file system since it involves network traffic and higher latencies. * Support using secondary cache with the blob cache. When creating a blob cache, the user can set a secondary blob cache by configuring `secondary_cache` in LRUCacheOptions. * Add experimental tiered compaction feature `AdvancedColumnFamilyOptions::preclude_last_level_data_seconds`, which makes sure the new data inserted within preclude_last_level_data_seconds won't be placed on cold tier (the feature is not complete). @@ -23,6 +24,8 @@ * When using block cache strict capacity limit (`LRUCache` with `strict_capacity_limit=true`), DB operations now fail with Status code `kAborted` subcode `kMemoryLimit` (`IsMemoryLimit()`) instead of `kIncomplete` (`IsIncomplete()`) when the capacity limit is reached, because Incomplete can mean other specific things for some operations. In more detail, `Cache::Insert()` now returns the updated Status code and this usually propagates through RocksDB to the user on failure. * NewClockCache calls temporarily return an LRUCache (with similar characteristics as the desired ClockCache). This is because ClockCache is being replaced by a new version (the old one had unknown bugs) but this is still under development. * Add two functions `int ReserveThreads(int threads_to_be_reserved)` and `int ReleaseThreads(threads_to_be_released)` into `Env` class. In the default implementation, both return 0. Newly added `xxxEnv` class that inherits `Env` should implement these two functions for thread reservation/releasing features. +* Add `rocksdb_options_get_prepopulate_blob_cache` and `rocksdb_options_set_prepopulate_blob_cache` to C API. +* Add `prepopulateBlobCache` and `setPrepopulateBlobCache` to Java API. ### Bug Fixes * Fix a bug in which backup/checkpoint can include a WAL deleted by RocksDB. diff --git a/db/blob/blob_file_builder.cc b/db/blob/blob_file_builder.cc index 48817984a..0e6fa46aa 100644 --- a/db/blob/blob_file_builder.cc +++ b/db/blob/blob_file_builder.cc @@ -32,9 +32,9 @@ BlobFileBuilder::BlobFileBuilder( VersionSet* versions, FileSystem* fs, const ImmutableOptions* immutable_options, const MutableCFOptions* mutable_cf_options, const FileOptions* file_options, - int job_id, uint32_t column_family_id, - const std::string& column_family_name, Env::IOPriority io_priority, - Env::WriteLifeTimeHint write_hint, + std::string db_id, std::string db_session_id, int job_id, + uint32_t column_family_id, const std::string& column_family_name, + Env::IOPriority io_priority, Env::WriteLifeTimeHint write_hint, const std::shared_ptr& io_tracer, BlobFileCompletionCallback* blob_callback, BlobFileCreationReason creation_reason, @@ -42,17 +42,18 @@ BlobFileBuilder::BlobFileBuilder( std::vector* blob_file_additions) : BlobFileBuilder([versions]() { return versions->NewFileNumber(); }, fs, immutable_options, mutable_cf_options, file_options, - job_id, column_family_id, column_family_name, io_priority, - write_hint, io_tracer, blob_callback, creation_reason, - blob_file_paths, blob_file_additions) {} + db_id, db_session_id, job_id, column_family_id, + column_family_name, io_priority, write_hint, io_tracer, + blob_callback, creation_reason, blob_file_paths, + blob_file_additions) {} BlobFileBuilder::BlobFileBuilder( std::function file_number_generator, FileSystem* fs, const ImmutableOptions* immutable_options, const MutableCFOptions* mutable_cf_options, const FileOptions* file_options, - int job_id, uint32_t column_family_id, - const std::string& column_family_name, Env::IOPriority io_priority, - Env::WriteLifeTimeHint write_hint, + std::string db_id, std::string db_session_id, int job_id, + uint32_t column_family_id, const std::string& column_family_name, + Env::IOPriority io_priority, Env::WriteLifeTimeHint write_hint, const std::shared_ptr& io_tracer, BlobFileCompletionCallback* blob_callback, BlobFileCreationReason creation_reason, @@ -64,7 +65,10 @@ BlobFileBuilder::BlobFileBuilder( min_blob_size_(mutable_cf_options->min_blob_size), blob_file_size_(mutable_cf_options->blob_file_size), blob_compression_type_(mutable_cf_options->blob_compression_type), + prepopulate_blob_cache_(mutable_cf_options->prepopulate_blob_cache), file_options_(file_options), + db_id_(std::move(db_id)), + db_session_id_(std::move(db_session_id)), job_id_(job_id), column_family_id_(column_family_id), column_family_name_(column_family_name), @@ -133,6 +137,16 @@ Status BlobFileBuilder::Add(const Slice& key, const Slice& value, } } + { + const Status s = + PutBlobIntoCacheIfNeeded(value, blob_file_number, blob_offset); + if (!s.ok()) { + ROCKS_LOG_WARN(immutable_options_->info_log, + "Failed to pre-populate the blob into blob cache: %s", + s.ToString().c_str()); + } + } + BlobIndex::EncodeBlob(blob_index, blob_file_number, blob_offset, blob.size(), blob_compression_type_); @@ -372,4 +386,52 @@ void BlobFileBuilder::Abandon(const Status& s) { blob_count_ = 0; blob_bytes_ = 0; } + +Status BlobFileBuilder::PutBlobIntoCacheIfNeeded(const Slice& blob, + uint64_t blob_file_number, + uint64_t blob_offset) const { + Status s = Status::OK(); + + auto blob_cache = immutable_options_->blob_cache; + auto statistics = immutable_options_->statistics.get(); + bool warm_cache = + prepopulate_blob_cache_ == PrepopulateBlobCache::kFlushOnly && + creation_reason_ == BlobFileCreationReason::kFlush; + + if (blob_cache && warm_cache) { + // The blob file during flush is unknown to be exactly how big it is. + // Therefore, we set the file size to kMaxOffsetStandardEncoding. For any + // max_offset <= this value, the same encoding scheme is guaranteed. + const OffsetableCacheKey base_cache_key( + db_id_, db_session_id_, blob_file_number, + OffsetableCacheKey::kMaxOffsetStandardEncoding); + const CacheKey cache_key = base_cache_key.WithOffset(blob_offset); + const Slice key = cache_key.AsSlice(); + + const Cache::Priority priority = Cache::Priority::LOW; + + // Objects to be put into the cache have to be heap-allocated and + // self-contained, i.e. own their contents. The Cache has to be able to + // take unique ownership of them. Therefore, we copy the blob into a + // string directly, and insert that into the cache. + std::unique_ptr buf = std::make_unique(); + buf->assign(blob.data(), blob.size()); + + // TODO: support custom allocators and provide a better estimated memory + // usage using malloc_usable_size. + s = blob_cache->Insert(key, buf.get(), buf->size(), + &DeleteCacheEntry, + nullptr /* cache_handle */, priority); + if (s.ok()) { + RecordTick(statistics, BLOB_DB_CACHE_ADD); + RecordTick(statistics, BLOB_DB_CACHE_BYTES_WRITE, buf->size()); + buf.release(); + } else { + RecordTick(statistics, BLOB_DB_CACHE_ADD_FAILURES); + } + } + + return s; +} + } // namespace ROCKSDB_NAMESPACE diff --git a/db/blob/blob_file_builder.h b/db/blob/blob_file_builder.h index 745af20eb..8e7aab502 100644 --- a/db/blob/blob_file_builder.h +++ b/db/blob/blob_file_builder.h @@ -10,6 +10,7 @@ #include #include +#include "rocksdb/advanced_options.h" #include "rocksdb/compression_type.h" #include "rocksdb/env.h" #include "rocksdb/rocksdb_namespace.h" @@ -35,7 +36,8 @@ class BlobFileBuilder { BlobFileBuilder(VersionSet* versions, FileSystem* fs, const ImmutableOptions* immutable_options, const MutableCFOptions* mutable_cf_options, - const FileOptions* file_options, int job_id, + const FileOptions* file_options, std::string db_id, + std::string db_session_id, int job_id, uint32_t column_family_id, const std::string& column_family_name, Env::IOPriority io_priority, @@ -49,7 +51,8 @@ class BlobFileBuilder { BlobFileBuilder(std::function file_number_generator, FileSystem* fs, const ImmutableOptions* immutable_options, const MutableCFOptions* mutable_cf_options, - const FileOptions* file_options, int job_id, + const FileOptions* file_options, std::string db_id, + std::string db_session_id, int job_id, uint32_t column_family_id, const std::string& column_family_name, Env::IOPriority io_priority, @@ -78,13 +81,19 @@ class BlobFileBuilder { Status CloseBlobFile(); Status CloseBlobFileIfNeeded(); + Status PutBlobIntoCacheIfNeeded(const Slice& blob, uint64_t blob_file_number, + uint64_t blob_offset) const; + std::function file_number_generator_; FileSystem* fs_; const ImmutableOptions* immutable_options_; uint64_t min_blob_size_; uint64_t blob_file_size_; CompressionType blob_compression_type_; + PrepopulateBlobCache prepopulate_blob_cache_; const FileOptions* file_options_; + const std::string db_id_; + const std::string db_session_id_; int job_id_; uint32_t column_family_id_; std::string column_family_name_; diff --git a/db/blob/blob_file_builder_test.cc b/db/blob/blob_file_builder_test.cc index 4b30fcfbf..663681966 100644 --- a/db/blob/blob_file_builder_test.cc +++ b/db/blob/blob_file_builder_test.cc @@ -144,8 +144,9 @@ TEST_F(BlobFileBuilderTest, BuildAndCheckOneFile) { BlobFileBuilder builder( TestFileNumberGenerator(), fs_, &immutable_options, &mutable_cf_options, - &file_options_, job_id, column_family_id, column_family_name, io_priority, - write_hint, nullptr /*IOTracer*/, nullptr /*BlobFileCompletionCallback*/, + &file_options_, "" /*db_id*/, "" /*db_session_id*/, job_id, + column_family_id, column_family_name, io_priority, write_hint, + nullptr /*IOTracer*/, nullptr /*BlobFileCompletionCallback*/, BlobFileCreationReason::kFlush, &blob_file_paths, &blob_file_additions); std::vector> expected_key_value_pairs( @@ -228,8 +229,9 @@ TEST_F(BlobFileBuilderTest, BuildAndCheckMultipleFiles) { BlobFileBuilder builder( TestFileNumberGenerator(), fs_, &immutable_options, &mutable_cf_options, - &file_options_, job_id, column_family_id, column_family_name, io_priority, - write_hint, nullptr /*IOTracer*/, nullptr /*BlobFileCompletionCallback*/, + &file_options_, "" /*db_id*/, "" /*db_session_id*/, job_id, + column_family_id, column_family_name, io_priority, write_hint, + nullptr /*IOTracer*/, nullptr /*BlobFileCompletionCallback*/, BlobFileCreationReason::kFlush, &blob_file_paths, &blob_file_additions); std::vector> expected_key_value_pairs( @@ -315,8 +317,9 @@ TEST_F(BlobFileBuilderTest, InlinedValues) { BlobFileBuilder builder( TestFileNumberGenerator(), fs_, &immutable_options, &mutable_cf_options, - &file_options_, job_id, column_family_id, column_family_name, io_priority, - write_hint, nullptr /*IOTracer*/, nullptr /*BlobFileCompletionCallback*/, + &file_options_, "" /*db_id*/, "" /*db_session_id*/, job_id, + column_family_id, column_family_name, io_priority, write_hint, + nullptr /*IOTracer*/, nullptr /*BlobFileCompletionCallback*/, BlobFileCreationReason::kFlush, &blob_file_paths, &blob_file_additions); for (size_t i = 0; i < number_of_blobs; ++i) { @@ -369,8 +372,9 @@ TEST_F(BlobFileBuilderTest, Compression) { BlobFileBuilder builder( TestFileNumberGenerator(), fs_, &immutable_options, &mutable_cf_options, - &file_options_, job_id, column_family_id, column_family_name, io_priority, - write_hint, nullptr /*IOTracer*/, nullptr /*BlobFileCompletionCallback*/, + &file_options_, "" /*db_id*/, "" /*db_session_id*/, job_id, + column_family_id, column_family_name, io_priority, write_hint, + nullptr /*IOTracer*/, nullptr /*BlobFileCompletionCallback*/, BlobFileCreationReason::kFlush, &blob_file_paths, &blob_file_additions); const std::string key("1"); @@ -452,8 +456,9 @@ TEST_F(BlobFileBuilderTest, CompressionError) { BlobFileBuilder builder( TestFileNumberGenerator(), fs_, &immutable_options, &mutable_cf_options, - &file_options_, job_id, column_family_id, column_family_name, io_priority, - write_hint, nullptr /*IOTracer*/, nullptr /*BlobFileCompletionCallback*/, + &file_options_, "" /*db_id*/, "" /*db_session_id*/, job_id, + column_family_id, column_family_name, io_priority, write_hint, + nullptr /*IOTracer*/, nullptr /*BlobFileCompletionCallback*/, BlobFileCreationReason::kFlush, &blob_file_paths, &blob_file_additions); SyncPoint::GetInstance()->SetCallBack("CompressData:TamperWithReturnValue", @@ -531,8 +536,9 @@ TEST_F(BlobFileBuilderTest, Checksum) { BlobFileBuilder builder( TestFileNumberGenerator(), fs_, &immutable_options, &mutable_cf_options, - &file_options_, job_id, column_family_id, column_family_name, io_priority, - write_hint, nullptr /*IOTracer*/, nullptr /*BlobFileCompletionCallback*/, + &file_options_, "" /*db_id*/, "" /*db_session_id*/, job_id, + column_family_id, column_family_name, io_priority, write_hint, + nullptr /*IOTracer*/, nullptr /*BlobFileCompletionCallback*/, BlobFileCreationReason::kFlush, &blob_file_paths, &blob_file_additions); const std::string key("1"); @@ -628,8 +634,9 @@ TEST_P(BlobFileBuilderIOErrorTest, IOError) { BlobFileBuilder builder( TestFileNumberGenerator(), fs_, &immutable_options, &mutable_cf_options, - &file_options_, job_id, column_family_id, column_family_name, io_priority, - write_hint, nullptr /*IOTracer*/, nullptr /*BlobFileCompletionCallback*/, + &file_options_, "" /*db_id*/, "" /*db_session_id*/, job_id, + column_family_id, column_family_name, io_priority, write_hint, + nullptr /*IOTracer*/, nullptr /*BlobFileCompletionCallback*/, BlobFileCreationReason::kFlush, &blob_file_paths, &blob_file_additions); SyncPoint::GetInstance()->SetCallBack(sync_point_, [this](void* arg) { diff --git a/db/blob/blob_source.cc b/db/blob/blob_source.cc index 70595ef06..c02fae9df 100644 --- a/db/blob/blob_source.cc +++ b/db/blob/blob_source.cc @@ -63,15 +63,16 @@ Status BlobSource::PutBlobIntoCache(const Slice& cache_key, // self-contained, i.e. own their contents. The Cache has to be able to take // unique ownership of them. Therefore, we copy the blob into a string // directly, and insert that into the cache. - std::string* buf = new std::string(); + std::unique_ptr buf = std::make_unique(); buf->assign(blob->data(), blob->size()); // TODO: support custom allocators and provide a better estimated memory // usage using malloc_usable_size. Cache::Handle* cache_handle = nullptr; - s = InsertEntryIntoCache(cache_key, buf, buf->size(), &cache_handle, + s = InsertEntryIntoCache(cache_key, buf.get(), buf->size(), &cache_handle, priority); if (s.ok()) { + buf.release(); assert(cache_handle != nullptr); *cached_blob = CacheHandleGuard(blob_cache_.get(), cache_handle); diff --git a/db/blob/db_blob_basic_test.cc b/db/blob/db_blob_basic_test.cc index 417e5c739..18ab1ef7d 100644 --- a/db/blob/db_blob_basic_test.cc +++ b/db/blob/db_blob_basic_test.cc @@ -1432,6 +1432,126 @@ TEST_P(DBBlobBasicIOErrorTest, CompactionFilterReadBlob_IOError) { SyncPoint::GetInstance()->ClearAllCallBacks(); } +TEST_F(DBBlobBasicTest, WarmCacheWithBlobsDuringFlush) { + Options options = GetDefaultOptions(); + + LRUCacheOptions co; + co.capacity = 1 << 25; + co.num_shard_bits = 2; + co.metadata_charge_policy = kDontChargeCacheMetadata; + auto backing_cache = NewLRUCache(co); + + options.blob_cache = backing_cache; + + BlockBasedTableOptions block_based_options; + block_based_options.no_block_cache = false; + block_based_options.block_cache = backing_cache; + block_based_options.cache_index_and_filter_blocks = true; + options.table_factory.reset(NewBlockBasedTableFactory(block_based_options)); + + options.enable_blob_files = true; + options.create_if_missing = true; + options.disable_auto_compactions = true; + options.enable_blob_garbage_collection = true; + options.blob_garbage_collection_age_cutoff = 1.0; + options.prepopulate_blob_cache = PrepopulateBlobCache::kFlushOnly; + options.statistics = ROCKSDB_NAMESPACE::CreateDBStatistics(); + + DestroyAndReopen(options); + + constexpr size_t kNumBlobs = 10; + constexpr size_t kValueSize = 100; + + std::string value(kValueSize, 'a'); + + for (size_t i = 1; i <= kNumBlobs; i++) { + ASSERT_OK(Put(std::to_string(i), value)); + ASSERT_OK(Put(std::to_string(i + kNumBlobs), value)); // Add some overlap + ASSERT_OK(Flush()); + ASSERT_EQ(i * 2, options.statistics->getTickerCount(BLOB_DB_CACHE_ADD)); + ASSERT_EQ(value, Get(std::to_string(i))); + ASSERT_EQ(value, Get(std::to_string(i + kNumBlobs))); + ASSERT_EQ(0, options.statistics->getTickerCount(BLOB_DB_CACHE_MISS)); + ASSERT_EQ(i * 2, options.statistics->getTickerCount(BLOB_DB_CACHE_HIT)); + } + + // Verify compaction not counted + ASSERT_OK(db_->CompactRange(CompactRangeOptions(), /*begin=*/nullptr, + /*end=*/nullptr)); + EXPECT_EQ(kNumBlobs * 2, + options.statistics->getTickerCount(BLOB_DB_CACHE_ADD)); +} + +#ifndef ROCKSDB_LITE +TEST_F(DBBlobBasicTest, DynamicallyWarmCacheDuringFlush) { + Options options = GetDefaultOptions(); + + LRUCacheOptions co; + co.capacity = 1 << 25; + co.num_shard_bits = 2; + co.metadata_charge_policy = kDontChargeCacheMetadata; + auto backing_cache = NewLRUCache(co); + + options.blob_cache = backing_cache; + + BlockBasedTableOptions block_based_options; + block_based_options.no_block_cache = false; + block_based_options.block_cache = backing_cache; + block_based_options.cache_index_and_filter_blocks = true; + options.table_factory.reset(NewBlockBasedTableFactory(block_based_options)); + + options.enable_blob_files = true; + options.create_if_missing = true; + options.disable_auto_compactions = true; + options.enable_blob_garbage_collection = true; + options.blob_garbage_collection_age_cutoff = 1.0; + options.prepopulate_blob_cache = PrepopulateBlobCache::kFlushOnly; + options.statistics = ROCKSDB_NAMESPACE::CreateDBStatistics(); + + DestroyAndReopen(options); + + constexpr size_t kNumBlobs = 10; + constexpr size_t kValueSize = 100; + + std::string value(kValueSize, 'a'); + + for (size_t i = 1; i <= 5; i++) { + ASSERT_OK(Put(std::to_string(i), value)); + ASSERT_OK(Put(std::to_string(i + kNumBlobs), value)); // Add some overlap + ASSERT_OK(Flush()); + ASSERT_EQ(2, options.statistics->getAndResetTickerCount(BLOB_DB_CACHE_ADD)); + + ASSERT_EQ(value, Get(std::to_string(i))); + ASSERT_EQ(value, Get(std::to_string(i + kNumBlobs))); + ASSERT_EQ(0, options.statistics->getAndResetTickerCount(BLOB_DB_CACHE_ADD)); + ASSERT_EQ(0, + options.statistics->getAndResetTickerCount(BLOB_DB_CACHE_MISS)); + ASSERT_EQ(2, options.statistics->getAndResetTickerCount(BLOB_DB_CACHE_HIT)); + } + + ASSERT_OK(dbfull()->SetOptions({{"prepopulate_blob_cache", "kDisable"}})); + + for (size_t i = 6; i <= kNumBlobs; i++) { + ASSERT_OK(Put(std::to_string(i), value)); + ASSERT_OK(Put(std::to_string(i + kNumBlobs), value)); // Add some overlap + ASSERT_OK(Flush()); + ASSERT_EQ(0, options.statistics->getAndResetTickerCount(BLOB_DB_CACHE_ADD)); + + ASSERT_EQ(value, Get(std::to_string(i))); + ASSERT_EQ(value, Get(std::to_string(i + kNumBlobs))); + ASSERT_EQ(2, options.statistics->getAndResetTickerCount(BLOB_DB_CACHE_ADD)); + ASSERT_EQ(2, + options.statistics->getAndResetTickerCount(BLOB_DB_CACHE_MISS)); + ASSERT_EQ(0, options.statistics->getAndResetTickerCount(BLOB_DB_CACHE_HIT)); + } + + // Verify compaction not counted + ASSERT_OK(db_->CompactRange(CompactRangeOptions(), /*begin=*/nullptr, + /*end=*/nullptr)); + EXPECT_EQ(0, options.statistics->getTickerCount(BLOB_DB_CACHE_ADD)); +} +#endif // !ROCKSDB_LITE + } // namespace ROCKSDB_NAMESPACE int main(int argc, char** argv) { diff --git a/db/builder.cc b/db/builder.cc index 7d8244af9..03760ec91 100644 --- a/db/builder.cc +++ b/db/builder.cc @@ -188,10 +188,10 @@ Status BuildTable( blob_file_additions) ? new BlobFileBuilder( versions, fs, &ioptions, &mutable_cf_options, &file_options, - job_id, tboptions.column_family_id, - tboptions.column_family_name, io_priority, write_hint, - io_tracer, blob_callback, blob_creation_reason, - &blob_file_paths, blob_file_additions) + tboptions.db_id, tboptions.db_session_id, job_id, + tboptions.column_family_id, tboptions.column_family_name, + io_priority, write_hint, io_tracer, blob_callback, + blob_creation_reason, &blob_file_paths, blob_file_additions) : nullptr); const std::atomic kManualCompactionCanceledFalse{false}; diff --git a/db/c.cc b/db/c.cc index de5744ee4..4ac5e5797 100644 --- a/db/c.cc +++ b/db/c.cc @@ -99,6 +99,7 @@ using ROCKSDB_NAMESPACE::Options; using ROCKSDB_NAMESPACE::PerfContext; using ROCKSDB_NAMESPACE::PerfLevel; using ROCKSDB_NAMESPACE::PinnableSlice; +using ROCKSDB_NAMESPACE::PrepopulateBlobCache; using ROCKSDB_NAMESPACE::RandomAccessFile; using ROCKSDB_NAMESPACE::Range; using ROCKSDB_NAMESPACE::RateLimiter; @@ -3140,6 +3141,14 @@ void rocksdb_options_set_blob_cache(rocksdb_options_t* opt, opt->rep.blob_cache = blob_cache->rep; } +void rocksdb_options_set_prepopulate_blob_cache(rocksdb_options_t* opt, int t) { + opt->rep.prepopulate_blob_cache = static_cast(t); +} + +int rocksdb_options_get_prepopulate_blob_cache(rocksdb_options_t* opt) { + return static_cast(opt->rep.prepopulate_blob_cache); +} + void rocksdb_options_set_num_levels(rocksdb_options_t* opt, int n) { opt->rep.num_levels = n; } diff --git a/db/c_test.c b/db/c_test.c index 343ce3cf3..12d6fd143 100644 --- a/db/c_test.c +++ b/db/c_test.c @@ -2068,6 +2068,9 @@ int main(int argc, char** argv) { rocksdb_options_set_blob_file_starting_level(o, 5); CheckCondition(5 == rocksdb_options_get_blob_file_starting_level(o)); + rocksdb_options_set_prepopulate_blob_cache(o, 1 /* flush only */); + CheckCondition(1 == rocksdb_options_get_prepopulate_blob_cache(o)); + // Create a copy that should be equal to the original. rocksdb_options_t* copy; copy = rocksdb_options_create_copy(o); diff --git a/db/compaction/compaction_job.cc b/db/compaction/compaction_job.cc index e5334cb99..b914f5e9d 100644 --- a/db/compaction/compaction_job.cc +++ b/db/compaction/compaction_job.cc @@ -999,10 +999,10 @@ void CompactionJob::ProcessKeyValueCompaction(SubcompactionState* sub_compact) { ? new BlobFileBuilder( versions_, fs_.get(), sub_compact->compaction->immutable_options(), - mutable_cf_options, &file_options_, job_id_, cfd->GetID(), - cfd->GetName(), Env::IOPriority::IO_LOW, write_hint_, - io_tracer_, blob_callback_, BlobFileCreationReason::kCompaction, - &blob_file_paths, + mutable_cf_options, &file_options_, db_id_, db_session_id_, + job_id_, cfd->GetID(), cfd->GetName(), Env::IOPriority::IO_LOW, + write_hint_, io_tracer_, blob_callback_, + BlobFileCreationReason::kCompaction, &blob_file_paths, sub_compact->Current().GetBlobFileAdditionsPtr()) : nullptr); diff --git a/db_stress_tool/db_stress_common.h b/db_stress_tool/db_stress_common.h index b822518eb..c445f1e9d 100644 --- a/db_stress_tool/db_stress_common.h +++ b/db_stress_tool/db_stress_common.h @@ -272,6 +272,7 @@ DECLARE_bool(use_blob_cache); DECLARE_bool(use_shared_block_and_blob_cache); DECLARE_uint64(blob_cache_size); DECLARE_int32(blob_cache_numshardbits); +DECLARE_int32(prepopulate_blob_cache); DECLARE_int32(approximate_size_one_in); DECLARE_bool(sync_fault_injection); diff --git a/db_stress_tool/db_stress_gflags.cc b/db_stress_tool/db_stress_gflags.cc index db7837e20..5bfccd2cd 100644 --- a/db_stress_tool/db_stress_gflags.cc +++ b/db_stress_tool/db_stress_gflags.cc @@ -474,6 +474,10 @@ DEFINE_int32(blob_cache_numshardbits, 6, "the block and blob caches are different " "(use_shared_block_and_blob_cache = false)."); +DEFINE_int32(prepopulate_blob_cache, 0, + "[Integrated BlobDB] Pre-populate hot/warm blobs in blob cache. 0 " + "to disable and 1 to insert during flush."); + static const bool FLAGS_subcompactions_dummy __attribute__((__unused__)) = RegisterFlagValidator(&FLAGS_subcompactions, &ValidateUint32Range); diff --git a/db_stress_tool/db_stress_test_base.cc b/db_stress_tool/db_stress_test_base.cc index ee9baddfd..3847d6c8a 100644 --- a/db_stress_tool/db_stress_test_base.cc +++ b/db_stress_tool/db_stress_test_base.cc @@ -270,6 +270,8 @@ bool StressTest::BuildOptionsTable() { std::vector{"0", "1M", "4M"}); options_tbl.emplace("blob_file_starting_level", std::vector{"0", "1", "2"}); + options_tbl.emplace("prepopulate_blob_cache", + std::vector{"kDisable", "kFlushOnly"}); } options_table_ = std::move(options_tbl); @@ -2401,9 +2403,12 @@ void StressTest::Open(SharedState* shared) { fprintf(stdout, "Integrated BlobDB: blob cache enabled, block and blob caches " "shared: %d, blob cache size %" PRIu64 - ", blob cache num shard bits: %d\n", + ", blob cache num shard bits: %d, blob cache prepopulated: %s\n", FLAGS_use_shared_block_and_blob_cache, FLAGS_blob_cache_size, - FLAGS_blob_cache_numshardbits); + FLAGS_blob_cache_numshardbits, + options_.prepopulate_blob_cache == PrepopulateBlobCache::kFlushOnly + ? "flush only" + : "disable"); } else { fprintf(stdout, "Integrated BlobDB: blob cache disabled\n"); } @@ -3043,6 +3048,17 @@ void InitializeOptionsFromFlags( exit(1); } } + switch (FLAGS_prepopulate_blob_cache) { + case 0: + options.prepopulate_blob_cache = PrepopulateBlobCache::kDisable; + break; + case 1: + options.prepopulate_blob_cache = PrepopulateBlobCache::kFlushOnly; + break; + default: + fprintf(stderr, "Unknown prepopulate blob cache mode\n"); + exit(1); + } } options.wal_compression = diff --git a/include/rocksdb/advanced_options.h b/include/rocksdb/advanced_options.h index 32598d04d..cd2582e8a 100644 --- a/include/rocksdb/advanced_options.h +++ b/include/rocksdb/advanced_options.h @@ -246,6 +246,11 @@ enum UpdateStatus { // Return status For inplace update callback UPDATED = 2, // No inplace update. Merged value set }; +enum class PrepopulateBlobCache : uint8_t { + kDisable = 0x0, // Disable prepopulate blob cache + kFlushOnly = 0x1, // Prepopulate blobs during flush only +}; + struct AdvancedColumnFamilyOptions { // The maximum number of write buffers that are built up in memory. // The default and the minimum number is 2, so that when 1 write buffer @@ -992,6 +997,20 @@ struct AdvancedColumnFamilyOptions { // Default: nullptr (disabled) std::shared_ptr blob_cache = nullptr; + // If enabled, prepopulate warm/hot blobs which are already in memory into + // blob cache at the time of flush. On a flush, the blob that is in memory (in + // memtables) get flushed to the device. If using Direct IO, additional IO is + // incurred to read this blob back into memory again, which is avoided by + // enabling this option. This further helps if the workload exhibits high + // temporal locality, where most of the reads go to recently written data. + // This also helps in case of the remote file system since it involves network + // traffic and higher latencies. + // + // Default: disabled + // + // Dynamically changeable through the SetOptions() API + PrepopulateBlobCache prepopulate_blob_cache = PrepopulateBlobCache::kDisable; + // Create ColumnFamilyOptions with default values for all fields AdvancedColumnFamilyOptions(); // Create ColumnFamilyOptions from Options diff --git a/include/rocksdb/c.h b/include/rocksdb/c.h index 3c42f904e..8cc51b57c 100644 --- a/include/rocksdb/c.h +++ b/include/rocksdb/c.h @@ -1302,6 +1302,17 @@ extern ROCKSDB_LIBRARY_API int rocksdb_options_get_blob_file_starting_level( extern ROCKSDB_LIBRARY_API void rocksdb_options_set_blob_cache( rocksdb_options_t* opt, rocksdb_cache_t* blob_cache); +enum { + rocksdb_prepopulate_blob_disable = 0, + rocksdb_prepopulate_blob_flush_only = 1 +}; + +extern ROCKSDB_LIBRARY_API void rocksdb_options_set_prepopulate_blob_cache( + rocksdb_options_t* opt, int val); + +extern ROCKSDB_LIBRARY_API int rocksdb_options_get_prepopulate_blob_cache( + rocksdb_options_t* opt); + /* returns a pointer to a malloc()-ed, null terminated string */ extern ROCKSDB_LIBRARY_API char* rocksdb_options_statistics_get_string( rocksdb_options_t* opt); diff --git a/include/rocksdb/utilities/ldb_cmd.h b/include/rocksdb/utilities/ldb_cmd.h index cddb531e9..007638192 100644 --- a/include/rocksdb/utilities/ldb_cmd.h +++ b/include/rocksdb/utilities/ldb_cmd.h @@ -70,6 +70,7 @@ class LDBCommand { static const std::string ARG_BLOB_GARBAGE_COLLECTION_FORCE_THRESHOLD; static const std::string ARG_BLOB_COMPACTION_READAHEAD_SIZE; static const std::string ARG_BLOB_FILE_STARTING_LEVEL; + static const std::string ARG_PREPOPULATE_BLOB_CACHE; static const std::string ARG_DECODE_BLOB_INDEX; static const std::string ARG_DUMP_UNCOMPRESSED_BLOBS; diff --git a/java/CMakeLists.txt b/java/CMakeLists.txt index 8eada17e8..5d62630fd 100644 --- a/java/CMakeLists.txt +++ b/java/CMakeLists.txt @@ -194,6 +194,7 @@ set(JAVA_MAIN_CLASSES src/main/java/org/rocksdb/OptionsUtil.java src/main/java/org/rocksdb/PersistentCache.java src/main/java/org/rocksdb/PlainTableConfig.java + src/main/java/org/rocksdb/PrepopulateBlobCache.java src/main/java/org/rocksdb/Priority.java src/main/java/org/rocksdb/Range.java src/main/java/org/rocksdb/RateLimiter.java diff --git a/java/rocksjni/options.cc b/java/rocksjni/options.cc index 6fc232c7f..34eb900b3 100644 --- a/java/rocksjni/options.cc +++ b/java/rocksjni/options.cc @@ -3885,6 +3885,31 @@ jint Java_org_rocksdb_Options_blobFileStartingLevel(JNIEnv*, jobject, return static_cast(opts->blob_file_starting_level); } +/* + * Class: org_rocksdb_Options + * Method: setPrepopulateBlobCache + * Signature: (JB)V + */ +void Java_org_rocksdb_Options_setPrepopulateBlobCache( + JNIEnv*, jobject, jlong jhandle, jbyte jprepopulate_blob_cache_value) { + auto* opts = reinterpret_cast(jhandle); + opts->prepopulate_blob_cache = + ROCKSDB_NAMESPACE::PrepopulateBlobCacheJni::toCppPrepopulateBlobCache( + jprepopulate_blob_cache_value); +} + +/* + * Class: org_rocksdb_Options + * Method: prepopulateBlobCache + * Signature: (J)B + */ +jbyte Java_org_rocksdb_Options_prepopulateBlobCache(JNIEnv*, jobject, + jlong jhandle) { + auto* opts = reinterpret_cast(jhandle); + return ROCKSDB_NAMESPACE::PrepopulateBlobCacheJni::toJavaPrepopulateBlobCache( + opts->prepopulate_blob_cache); +} + ////////////////////////////////////////////////////////////////////////////// // ROCKSDB_NAMESPACE::ColumnFamilyOptions @@ -5717,6 +5742,34 @@ jint Java_org_rocksdb_ColumnFamilyOptions_blobFileStartingLevel(JNIEnv*, return static_cast(opts->blob_file_starting_level); } +/* + * Class: org_rocksdb_ColumnFamilyOptions + * Method: setPrepopulateBlobCache + * Signature: (JB)V + */ +void Java_org_rocksdb_ColumnFamilyOptions_setPrepopulateBlobCache( + JNIEnv*, jobject, jlong jhandle, jbyte jprepopulate_blob_cache_value) { + auto* opts = + reinterpret_cast(jhandle); + opts->prepopulate_blob_cache = + ROCKSDB_NAMESPACE::PrepopulateBlobCacheJni::toCppPrepopulateBlobCache( + jprepopulate_blob_cache_value); +} + +/* + * Class: org_rocksdb_ColumnFamilyOptions + * Method: prepopulateBlobCache + * Signature: (J)B + */ +jbyte Java_org_rocksdb_ColumnFamilyOptions_prepopulateBlobCache(JNIEnv*, + jobject, + jlong jhandle) { + auto* opts = + reinterpret_cast(jhandle); + return ROCKSDB_NAMESPACE::PrepopulateBlobCacheJni::toJavaPrepopulateBlobCache( + opts->prepopulate_blob_cache); +} + ///////////////////////////////////////////////////////////////////// // ROCKSDB_NAMESPACE::DBOptions diff --git a/java/rocksjni/portal.h b/java/rocksjni/portal.h index bf2135070..b9fde668e 100644 --- a/java/rocksjni/portal.h +++ b/java/rocksjni/portal.h @@ -7894,6 +7894,40 @@ class SanityLevelJni { } }; +// The portal class for org.rocksdb.PrepopulateBlobCache +class PrepopulateBlobCacheJni { + public: + // Returns the equivalent org.rocksdb.PrepopulateBlobCache for the provided + // C++ ROCKSDB_NAMESPACE::PrepopulateBlobCache enum + static jbyte toJavaPrepopulateBlobCache( + ROCKSDB_NAMESPACE::PrepopulateBlobCache prepopulate_blob_cache) { + switch (prepopulate_blob_cache) { + case ROCKSDB_NAMESPACE::PrepopulateBlobCache::kDisable: + return 0x0; + case ROCKSDB_NAMESPACE::PrepopulateBlobCache::kFlushOnly: + return 0x1; + default: + return 0x7f; // undefined + } + } + + // Returns the equivalent C++ ROCKSDB_NAMESPACE::PrepopulateBlobCache enum for + // the provided Java org.rocksdb.PrepopulateBlobCache + static ROCKSDB_NAMESPACE::PrepopulateBlobCache toCppPrepopulateBlobCache( + jbyte jprepopulate_blob_cache) { + switch (jprepopulate_blob_cache) { + case 0x0: + return ROCKSDB_NAMESPACE::PrepopulateBlobCache::kDisable; + case 0x1: + return ROCKSDB_NAMESPACE::PrepopulateBlobCache::kFlushOnly; + case 0x7F: + default: + // undefined/default + return ROCKSDB_NAMESPACE::PrepopulateBlobCache::kDisable; + } + } +}; + // The portal class for org.rocksdb.AbstractListener.EnabledEventCallback class EnabledEventCallbackJni { public: diff --git a/java/src/main/java/org/rocksdb/AbstractMutableOptions.java b/java/src/main/java/org/rocksdb/AbstractMutableOptions.java index 243b8040a..7189272b8 100644 --- a/java/src/main/java/org/rocksdb/AbstractMutableOptions.java +++ b/java/src/main/java/org/rocksdb/AbstractMutableOptions.java @@ -341,8 +341,18 @@ public abstract class AbstractMutableOptions { return setIntArray(key, value); case ENUM: - final CompressionType compressionType = CompressionType.getFromInternal(valueStr); - return setEnum(key, compressionType); + String optionName = key.name(); + if (optionName.equals("prepopulate_blob_cache")) { + final PrepopulateBlobCache prepopulateBlobCache = + PrepopulateBlobCache.getFromInternal(valueStr); + return setEnum(key, prepopulateBlobCache); + } else if (optionName.equals("compression") + || optionName.equals("blob_compression_type")) { + final CompressionType compressionType = CompressionType.getFromInternal(valueStr); + return setEnum(key, compressionType); + } else { + throw new IllegalArgumentException("Unknown enum type: " + key.name()); + } default: throw new IllegalStateException(key + " has unknown value type: " + key.getValueType()); diff --git a/java/src/main/java/org/rocksdb/AdvancedMutableColumnFamilyOptionsInterface.java b/java/src/main/java/org/rocksdb/AdvancedMutableColumnFamilyOptionsInterface.java index 928750446..162d15d80 100644 --- a/java/src/main/java/org/rocksdb/AdvancedMutableColumnFamilyOptionsInterface.java +++ b/java/src/main/java/org/rocksdb/AdvancedMutableColumnFamilyOptionsInterface.java @@ -801,6 +801,29 @@ public interface AdvancedMutableColumnFamilyOptionsInterface< */ int blobFileStartingLevel(); + /** + * Set a certain prepopulate blob cache option. + * + * Default: 0 + * + * Dynamically changeable through + * {@link RocksDB#setOptions(ColumnFamilyHandle, MutableColumnFamilyOptions)}. + * + * @param prepopulateBlobCache the prepopulate blob cache option + * + * @return the reference to the current options. + */ + T setPrepopulateBlobCache(final PrepopulateBlobCache prepopulateBlobCache); + + /** + * Get the prepopulate blob cache option. + * + * Default: 0 + * + * @return the current prepopulate blob cache option. + */ + PrepopulateBlobCache prepopulateBlobCache(); + // // END options for blobs (integrated BlobDB) // diff --git a/java/src/main/java/org/rocksdb/ColumnFamilyOptions.java b/java/src/main/java/org/rocksdb/ColumnFamilyOptions.java index 433fbcf08..a642cb6fa 100644 --- a/java/src/main/java/org/rocksdb/ColumnFamilyOptions.java +++ b/java/src/main/java/org/rocksdb/ColumnFamilyOptions.java @@ -1280,6 +1280,37 @@ public class ColumnFamilyOptions extends RocksObject return blobFileStartingLevel(nativeHandle_); } + /** + * Set a certain prepopulate blob cache option. + * + * Default: 0 + * + * Dynamically changeable through + * {@link RocksDB#setOptions(ColumnFamilyHandle, MutableColumnFamilyOptions)}. + * + * @param prepopulateBlobCache the prepopulate blob cache option + * + * @return the reference to the current options. + */ + @Override + public ColumnFamilyOptions setPrepopulateBlobCache( + final PrepopulateBlobCache prepopulateBlobCache) { + setPrepopulateBlobCache(nativeHandle_, prepopulateBlobCache.getValue()); + return this; + } + + /** + * Get the prepopulate blob cache option. + * + * Default: 0 + * + * @return the current prepopulate blob cache option. + */ + @Override + public PrepopulateBlobCache prepopulateBlobCache() { + return PrepopulateBlobCache.getPrepopulateBlobCache(prepopulateBlobCache(nativeHandle_)); + } + // // END options for blobs (integrated BlobDB) // @@ -1488,6 +1519,9 @@ public class ColumnFamilyOptions extends RocksObject private native void setBlobFileStartingLevel( final long nativeHandle_, final int blobFileStartingLevel); private native int blobFileStartingLevel(final long nativeHandle_); + private native void setPrepopulateBlobCache( + final long nativeHandle_, final byte prepopulateBlobCache); + private native byte prepopulateBlobCache(final long nativeHandle_); // instance variables // NOTE: If you add new member variables, please update the copy constructor above! diff --git a/java/src/main/java/org/rocksdb/MutableColumnFamilyOptions.java b/java/src/main/java/org/rocksdb/MutableColumnFamilyOptions.java index 8f55c5201..af28fa8ce 100644 --- a/java/src/main/java/org/rocksdb/MutableColumnFamilyOptions.java +++ b/java/src/main/java/org/rocksdb/MutableColumnFamilyOptions.java @@ -123,7 +123,8 @@ public class MutableColumnFamilyOptions blob_garbage_collection_age_cutoff(ValueType.DOUBLE), blob_garbage_collection_force_threshold(ValueType.DOUBLE), blob_compaction_readahead_size(ValueType.LONG), - blob_file_starting_level(ValueType.INT); + blob_file_starting_level(ValueType.INT), + prepopulate_blob_cache(ValueType.ENUM); private final ValueType valueType; BlobOption(final ValueType valueType) { @@ -607,5 +608,16 @@ public class MutableColumnFamilyOptions public int blobFileStartingLevel() { return getInt(BlobOption.blob_file_starting_level); } + + @Override + public MutableColumnFamilyOptionsBuilder setPrepopulateBlobCache( + final PrepopulateBlobCache prepopulateBlobCache) { + return setEnum(BlobOption.prepopulate_blob_cache, prepopulateBlobCache); + } + + @Override + public PrepopulateBlobCache prepopulateBlobCache() { + return (PrepopulateBlobCache) getEnum(BlobOption.prepopulate_blob_cache); + } } } diff --git a/java/src/main/java/org/rocksdb/Options.java b/java/src/main/java/org/rocksdb/Options.java index f7e725f07..1f1e5507a 100644 --- a/java/src/main/java/org/rocksdb/Options.java +++ b/java/src/main/java/org/rocksdb/Options.java @@ -2104,6 +2104,17 @@ public class Options extends RocksObject return blobFileStartingLevel(nativeHandle_); } + @Override + public Options setPrepopulateBlobCache(final PrepopulateBlobCache prepopulateBlobCache) { + setPrepopulateBlobCache(nativeHandle_, prepopulateBlobCache.getValue()); + return this; + } + + @Override + public PrepopulateBlobCache prepopulateBlobCache() { + return PrepopulateBlobCache.getPrepopulateBlobCache(prepopulateBlobCache(nativeHandle_)); + } + // // END options for blobs (integrated BlobDB) // @@ -2541,6 +2552,9 @@ public class Options extends RocksObject private native void setBlobFileStartingLevel( final long nativeHandle_, final int blobFileStartingLevel); private native int blobFileStartingLevel(final long nativeHandle_); + private native void setPrepopulateBlobCache( + final long nativeHandle_, final byte prepopulateBlobCache); + private native byte prepopulateBlobCache(final long nativeHandle_); // instance variables // NOTE: If you add new member variables, please update the copy constructor above! diff --git a/java/src/main/java/org/rocksdb/PrepopulateBlobCache.java b/java/src/main/java/org/rocksdb/PrepopulateBlobCache.java new file mode 100644 index 000000000..f1237aa7c --- /dev/null +++ b/java/src/main/java/org/rocksdb/PrepopulateBlobCache.java @@ -0,0 +1,117 @@ +// Copyright (c) Meta Platforms, Inc. and affiliates. +// This source code is licensed under both the GPLv2 (found in the +// COPYING file in the root directory) and Apache 2.0 License +// (found in the LICENSE.Apache file in the root directory). + +package org.rocksdb; + +/** + * Enum PrepopulateBlobCache + * + *

Prepopulate warm/hot blobs which are already in memory into blob + * cache at the time of flush. On a flush, the blob that is in memory + * (in memtables) get flushed to the device. If using Direct IO, + * additional IO is incurred to read this blob back into memory again, + * which is avoided by enabling this option. This further helps if the + * workload exhibits high temporal locality, where most of the reads go + * to recently written data. This also helps in case of the remote file + * system since it involves network traffic and higher latencies.

+ */ +public enum PrepopulateBlobCache { + PREPOPULATE_BLOB_DISABLE((byte) 0x0, "prepopulate_blob_disable", "kDisable"), + PREPOPULATE_BLOB_FLUSH_ONLY((byte) 0x1, "prepopulate_blob_flush_only", "kFlushOnly"); + + /** + *

Get the PrepopulateBlobCache enumeration value by + * passing the library name to this method.

+ * + *

If library cannot be found the enumeration + * value {@code PREPOPULATE_BLOB_DISABLE} will be returned.

+ * + * @param libraryName prepopulate blob cache library name. + * + * @return PrepopulateBlobCache instance. + */ + public static PrepopulateBlobCache getPrepopulateBlobCache(String libraryName) { + if (libraryName != null) { + for (PrepopulateBlobCache prepopulateBlobCache : PrepopulateBlobCache.values()) { + if (prepopulateBlobCache.getLibraryName() != null + && prepopulateBlobCache.getLibraryName().equals(libraryName)) { + return prepopulateBlobCache; + } + } + } + return PrepopulateBlobCache.PREPOPULATE_BLOB_DISABLE; + } + + /** + *

Get the PrepopulateBlobCache enumeration value by + * passing the byte identifier to this method.

+ * + * @param byteIdentifier of PrepopulateBlobCache. + * + * @return PrepopulateBlobCache instance. + * + * @throws IllegalArgumentException If PrepopulateBlobCache cannot be found for the + * provided byteIdentifier + */ + public static PrepopulateBlobCache getPrepopulateBlobCache(byte byteIdentifier) { + for (final PrepopulateBlobCache prepopulateBlobCache : PrepopulateBlobCache.values()) { + if (prepopulateBlobCache.getValue() == byteIdentifier) { + return prepopulateBlobCache; + } + } + + throw new IllegalArgumentException("Illegal value provided for PrepopulateBlobCache."); + } + + /** + *

Get a PrepopulateBlobCache value based on the string key in the C++ options output. + * This gets used in support of getting options into Java from an options string, + * which is generated at the C++ level. + *

+ * + * @param internalName the internal (C++) name by which the option is known. + * + * @return PrepopulateBlobCache instance (optional) + */ + static PrepopulateBlobCache getFromInternal(final String internalName) { + for (final PrepopulateBlobCache prepopulateBlobCache : PrepopulateBlobCache.values()) { + if (prepopulateBlobCache.internalName_.equals(internalName)) { + return prepopulateBlobCache; + } + } + + throw new IllegalArgumentException( + "Illegal internalName '" + internalName + " ' provided for PrepopulateBlobCache."); + } + + /** + *

Returns the byte value of the enumerations value.

+ * + * @return byte representation + */ + public byte getValue() { + return value_; + } + + /** + *

Returns the library name of the prepopulate blob cache mode + * identified by the enumeration value.

+ * + * @return library name + */ + public String getLibraryName() { + return libraryName_; + } + + PrepopulateBlobCache(final byte value, final String libraryName, final String internalName) { + value_ = value; + libraryName_ = libraryName; + internalName_ = internalName; + } + + private final byte value_; + private final String libraryName_; + private final String internalName_; +} diff --git a/java/src/test/java/org/rocksdb/BlobOptionsTest.java b/java/src/test/java/org/rocksdb/BlobOptionsTest.java index 7b94702da..fe3d9b246 100644 --- a/java/src/test/java/org/rocksdb/BlobOptionsTest.java +++ b/java/src/test/java/org/rocksdb/BlobOptionsTest.java @@ -79,6 +79,8 @@ public class BlobOptionsTest { assertThat(options.blobGarbageCollectionAgeCutoff()).isEqualTo(0.25); assertThat(options.blobGarbageCollectionForceThreshold()).isEqualTo(1.0); assertThat(options.blobCompactionReadaheadSize()).isEqualTo(0); + assertThat(options.prepopulateBlobCache()) + .isEqualTo(PrepopulateBlobCache.PREPOPULATE_BLOB_DISABLE); assertThat(options.setEnableBlobFiles(true)).isEqualTo(options); assertThat(options.setMinBlobSize(132768L)).isEqualTo(options); @@ -90,6 +92,8 @@ public class BlobOptionsTest { assertThat(options.setBlobGarbageCollectionForceThreshold(0.80)).isEqualTo(options); assertThat(options.setBlobCompactionReadaheadSize(262144L)).isEqualTo(options); assertThat(options.setBlobFileStartingLevel(0)).isEqualTo(options); + assertThat(options.setPrepopulateBlobCache(PrepopulateBlobCache.PREPOPULATE_BLOB_FLUSH_ONLY)) + .isEqualTo(options); assertThat(options.enableBlobFiles()).isEqualTo(true); assertThat(options.minBlobSize()).isEqualTo(132768L); @@ -100,6 +104,8 @@ public class BlobOptionsTest { assertThat(options.blobGarbageCollectionForceThreshold()).isEqualTo(0.80); assertThat(options.blobCompactionReadaheadSize()).isEqualTo(262144L); assertThat(options.blobFileStartingLevel()).isEqualTo(0); + assertThat(options.prepopulateBlobCache()) + .isEqualTo(PrepopulateBlobCache.PREPOPULATE_BLOB_FLUSH_ONLY); } } @@ -130,6 +136,9 @@ public class BlobOptionsTest { assertThat(columnFamilyOptions.setBlobCompactionReadaheadSize(262144L)) .isEqualTo(columnFamilyOptions); assertThat(columnFamilyOptions.setBlobFileStartingLevel(0)).isEqualTo(columnFamilyOptions); + assertThat(columnFamilyOptions.setPrepopulateBlobCache( + PrepopulateBlobCache.PREPOPULATE_BLOB_DISABLE)) + .isEqualTo(columnFamilyOptions); assertThat(columnFamilyOptions.enableBlobFiles()).isEqualTo(true); assertThat(columnFamilyOptions.minBlobSize()).isEqualTo(132768L); @@ -141,6 +150,8 @@ public class BlobOptionsTest { assertThat(columnFamilyOptions.blobGarbageCollectionForceThreshold()).isEqualTo(0.80); assertThat(columnFamilyOptions.blobCompactionReadaheadSize()).isEqualTo(262144L); assertThat(columnFamilyOptions.blobFileStartingLevel()).isEqualTo(0); + assertThat(columnFamilyOptions.prepopulateBlobCache()) + .isEqualTo(PrepopulateBlobCache.PREPOPULATE_BLOB_DISABLE); } } @@ -156,7 +167,8 @@ public class BlobOptionsTest { .setBlobGarbageCollectionAgeCutoff(0.89) .setBlobGarbageCollectionForceThreshold(0.80) .setBlobCompactionReadaheadSize(262144) - .setBlobFileStartingLevel(1); + .setBlobFileStartingLevel(1) + .setPrepopulateBlobCache(PrepopulateBlobCache.PREPOPULATE_BLOB_FLUSH_ONLY); assertThat(builder.enableBlobFiles()).isEqualTo(true); assertThat(builder.minBlobSize()).isEqualTo(1024); @@ -167,6 +179,8 @@ public class BlobOptionsTest { assertThat(builder.blobGarbageCollectionForceThreshold()).isEqualTo(0.80); assertThat(builder.blobCompactionReadaheadSize()).isEqualTo(262144); assertThat(builder.blobFileStartingLevel()).isEqualTo(1); + assertThat(builder.prepopulateBlobCache()) + .isEqualTo(PrepopulateBlobCache.PREPOPULATE_BLOB_FLUSH_ONLY); builder.setEnableBlobFiles(false) .setMinBlobSize(4096) @@ -176,7 +190,8 @@ public class BlobOptionsTest { .setBlobGarbageCollectionAgeCutoff(0.91) .setBlobGarbageCollectionForceThreshold(0.96) .setBlobCompactionReadaheadSize(1024) - .setBlobFileStartingLevel(0); + .setBlobFileStartingLevel(0) + .setPrepopulateBlobCache(PrepopulateBlobCache.PREPOPULATE_BLOB_DISABLE); assertThat(builder.enableBlobFiles()).isEqualTo(false); assertThat(builder.minBlobSize()).isEqualTo(4096); @@ -187,16 +202,19 @@ public class BlobOptionsTest { assertThat(builder.blobGarbageCollectionForceThreshold()).isEqualTo(0.96); assertThat(builder.blobCompactionReadaheadSize()).isEqualTo(1024); assertThat(builder.blobFileStartingLevel()).isEqualTo(0); + assertThat(builder.prepopulateBlobCache()) + .isEqualTo(PrepopulateBlobCache.PREPOPULATE_BLOB_DISABLE); final MutableColumnFamilyOptions options = builder.build(); assertThat(options.getKeys()) .isEqualTo(new String[] {"enable_blob_files", "min_blob_size", "blob_file_size", "blob_compression_type", "enable_blob_garbage_collection", "blob_garbage_collection_age_cutoff", "blob_garbage_collection_force_threshold", - "blob_compaction_readahead_size", "blob_file_starting_level"}); + "blob_compaction_readahead_size", "blob_file_starting_level", + "prepopulate_blob_cache"}); assertThat(options.getValues()) - .isEqualTo(new String[] { - "false", "4096", "2048", "LZ4_COMPRESSION", "false", "0.91", "0.96", "1024", "0"}); + .isEqualTo(new String[] {"false", "4096", "2048", "LZ4_COMPRESSION", "false", "0.91", + "0.96", "1024", "0", "PREPOPULATE_BLOB_DISABLE"}); } /** diff --git a/java/src/test/java/org/rocksdb/MutableColumnFamilyOptionsTest.java b/java/src/test/java/org/rocksdb/MutableColumnFamilyOptionsTest.java index 66b458a9c..b2b2599a7 100644 --- a/java/src/test/java/org/rocksdb/MutableColumnFamilyOptionsTest.java +++ b/java/src/test/java/org/rocksdb/MutableColumnFamilyOptionsTest.java @@ -96,8 +96,9 @@ public class MutableColumnFamilyOptionsTest { public void mutableColumnFamilyOptions_parse_getOptions_output() { final String optionsString = "bottommost_compression=kDisableCompressionOption; sample_for_compression=0; " - + "blob_garbage_collection_age_cutoff=0.250000; blob_garbage_collection_force_threshold=0.800000; arena_block_size=1048576; enable_blob_garbage_collection=false; " - + "level0_stop_writes_trigger=36; min_blob_size=65536; blob_compaction_readahead_size=262144; blob_file_starting_level=5; " + + "blob_garbage_collection_age_cutoff=0.250000; blob_garbage_collection_force_threshold=0.800000;" + + "arena_block_size=1048576; enable_blob_garbage_collection=false; level0_stop_writes_trigger=36; min_blob_size=65536;" + + "blob_compaction_readahead_size=262144; blob_file_starting_level=5; prepopulate_blob_cache=kDisable;" + "compaction_options_universal={allow_trivial_move=false;stop_style=kCompactionStopStyleTotalSize;min_merge_width=2;" + "compression_size_percent=-1;max_size_amplification_percent=200;max_merge_width=4294967295;size_ratio=1;}; " + "target_file_size_base=67108864; max_bytes_for_level_base=268435456; memtable_whole_key_filtering=false; " @@ -133,6 +134,7 @@ public class MutableColumnFamilyOptionsTest { assertThat(cf.minBlobSize()).isEqualTo(65536); assertThat(cf.blobCompactionReadaheadSize()).isEqualTo(262144); assertThat(cf.blobFileStartingLevel()).isEqualTo(5); + assertThat(cf.prepopulateBlobCache()).isEqualTo(PrepopulateBlobCache.PREPOPULATE_BLOB_DISABLE); assertThat(cf.targetFileSizeBase()).isEqualTo(67108864); assertThat(cf.maxBytesForLevelBase()).isEqualTo(268435456); assertThat(cf.softPendingCompactionBytesLimit()).isEqualTo(68719476736L); diff --git a/java/src/test/java/org/rocksdb/OptionsTest.java b/java/src/test/java/org/rocksdb/OptionsTest.java index 612b1bd7c..129f1c39a 100644 --- a/java/src/test/java/org/rocksdb/OptionsTest.java +++ b/java/src/test/java/org/rocksdb/OptionsTest.java @@ -1033,6 +1033,18 @@ public class OptionsTest { } } + @Test + public void prepopulateBlobCache() { + try (final Options options = new Options()) { + for (final PrepopulateBlobCache prepopulateBlobCache : PrepopulateBlobCache.values()) { + options.setPrepopulateBlobCache(prepopulateBlobCache); + assertThat(options.prepopulateBlobCache()).isEqualTo(prepopulateBlobCache); + assertThat(PrepopulateBlobCache.valueOf("PREPOPULATE_BLOB_DISABLE")) + .isEqualTo(PrepopulateBlobCache.PREPOPULATE_BLOB_DISABLE); + } + } + } + @Test public void compressionPerLevel() { try (final Options options = new Options()) { diff --git a/options/cf_options.cc b/options/cf_options.cc index 6fbc31151..c8c45ecf8 100644 --- a/options/cf_options.cc +++ b/options/cf_options.cc @@ -451,6 +451,10 @@ static std::unordered_map {offsetof(struct MutableCFOptions, blob_file_starting_level), OptionType::kInt, OptionVerificationType::kNormal, OptionTypeFlags::kMutable}}, + {"prepopulate_blob_cache", + OptionTypeInfo::Enum( + offsetof(struct MutableCFOptions, prepopulate_blob_cache), + &prepopulate_blob_cache_string_map, OptionTypeFlags::kMutable)}, {"sample_for_compression", {offsetof(struct MutableCFOptions, sample_for_compression), OptionType::kUInt64T, OptionVerificationType::kNormal, @@ -1097,7 +1101,10 @@ void MutableCFOptions::Dump(Logger* log) const { blob_compaction_readahead_size); ROCKS_LOG_INFO(log, " blob_file_starting_level: %d", blob_file_starting_level); - + ROCKS_LOG_INFO(log, " prepopulate_blob_cache: %s", + prepopulate_blob_cache == PrepopulateBlobCache::kFlushOnly + ? "flush only" + : "disable"); ROCKS_LOG_INFO(log, " bottommost_temperature: %d", static_cast(bottommost_temperature)); } diff --git a/options/cf_options.h b/options/cf_options.h index feba50522..ff4df0d7a 100644 --- a/options/cf_options.h +++ b/options/cf_options.h @@ -147,6 +147,7 @@ struct MutableCFOptions { options.blob_garbage_collection_force_threshold), blob_compaction_readahead_size(options.blob_compaction_readahead_size), blob_file_starting_level(options.blob_file_starting_level), + prepopulate_blob_cache(options.prepopulate_blob_cache), max_sequential_skip_in_iterations( options.max_sequential_skip_in_iterations), check_flush_compaction_key_order( @@ -198,6 +199,7 @@ struct MutableCFOptions { blob_garbage_collection_force_threshold(0.0), blob_compaction_readahead_size(0), blob_file_starting_level(0), + prepopulate_blob_cache(PrepopulateBlobCache::kDisable), max_sequential_skip_in_iterations(0), check_flush_compaction_key_order(true), paranoid_file_checks(false), @@ -281,6 +283,7 @@ struct MutableCFOptions { double blob_garbage_collection_force_threshold; uint64_t blob_compaction_readahead_size; int blob_file_starting_level; + PrepopulateBlobCache prepopulate_blob_cache; // Misc options uint64_t max_sequential_skip_in_iterations; diff --git a/options/options.cc b/options/options.cc index b870bad28..6dff5e62f 100644 --- a/options/options.cc +++ b/options/options.cc @@ -105,7 +105,8 @@ AdvancedColumnFamilyOptions::AdvancedColumnFamilyOptions(const Options& options) options.blob_garbage_collection_force_threshold), blob_compaction_readahead_size(options.blob_compaction_readahead_size), blob_file_starting_level(options.blob_file_starting_level), - blob_cache(options.blob_cache) { + blob_cache(options.blob_cache), + prepopulate_blob_cache(options.prepopulate_blob_cache) { assert(memtable_factory.get() != nullptr); if (max_bytes_for_level_multiplier_additional.size() < static_cast(num_levels)) { @@ -428,6 +429,11 @@ void ColumnFamilyOptions::Dump(Logger* log) const { blob_cache->Name()); ROCKS_LOG_HEADER(log, " blob_cache options: %s", blob_cache->GetPrintableOptions().c_str()); + ROCKS_LOG_HEADER( + log, " blob_cache prepopulated: %s", + prepopulate_blob_cache == PrepopulateBlobCache::kFlushOnly + ? "flush only" + : "disabled"); } ROCKS_LOG_HEADER(log, "Options.experimental_mempurge_threshold: %f", experimental_mempurge_threshold); diff --git a/options/options_helper.cc b/options/options_helper.cc index 280f59a4f..0424ba3a5 100644 --- a/options/options_helper.cc +++ b/options/options_helper.cc @@ -257,6 +257,7 @@ void UpdateColumnFamilyOptions(const MutableCFOptions& moptions, cf_opts->blob_compaction_readahead_size = moptions.blob_compaction_readahead_size; cf_opts->blob_file_starting_level = moptions.blob_file_starting_level; + cf_opts->prepopulate_blob_cache = moptions.prepopulate_blob_cache; // Misc options cf_opts->max_sequential_skip_in_iterations = @@ -850,6 +851,11 @@ std::unordered_map {"kWarm", Temperature::kWarm}, {"kCold", Temperature::kCold}}; +std::unordered_map + OptionsHelper::prepopulate_blob_cache_string_map = { + {"kDisable", PrepopulateBlobCache::kDisable}, + {"kFlushOnly", PrepopulateBlobCache::kFlushOnly}}; + Status OptionTypeInfo::NextToken(const std::string& opts, char delimiter, size_t pos, size_t* end, std::string* token) { while (pos < opts.size() && isspace(opts[pos])) { diff --git a/options/options_helper.h b/options/options_helper.h index 60b7dac49..7c751fc25 100644 --- a/options/options_helper.h +++ b/options/options_helper.h @@ -82,6 +82,8 @@ struct OptionsHelper { static std::unordered_map checksum_type_string_map; static std::unordered_map compression_type_string_map; + static std::unordered_map + prepopulate_blob_cache_string_map; #ifndef ROCKSDB_LITE static std::unordered_map compaction_stop_style_string_map; @@ -113,6 +115,8 @@ static auto& compaction_style_string_map = static auto& compaction_pri_string_map = OptionsHelper::compaction_pri_string_map; static auto& temperature_string_map = OptionsHelper::temperature_string_map; +static auto& prepopulate_blob_cache_string_map = + OptionsHelper::prepopulate_blob_cache_string_map; #endif // !ROCKSDB_LITE } // namespace ROCKSDB_NAMESPACE diff --git a/options/options_settable_test.cc b/options/options_settable_test.cc index e4b932a44..86edbff41 100644 --- a/options/options_settable_test.cc +++ b/options/options_settable_test.cc @@ -526,6 +526,7 @@ TEST_F(OptionsSettableTest, ColumnFamilyOptionsAllFieldsSettable) { "blob_garbage_collection_force_threshold=0.75;" "blob_compaction_readahead_size=262144;" "blob_file_starting_level=1;" + "prepopulate_blob_cache=kDisable;" "bottommost_temperature=kWarm;" "preclude_last_level_data_seconds=86400;" "compaction_options_fifo={max_table_files_size=3;allow_" diff --git a/options/options_test.cc b/options/options_test.cc index 0eeaca484..fbd078ba6 100644 --- a/options/options_test.cc +++ b/options/options_test.cc @@ -127,6 +127,7 @@ TEST_F(OptionsTest, GetOptionsFromMapTest) { {"blob_garbage_collection_force_threshold", "0.75"}, {"blob_compaction_readahead_size", "256K"}, {"blob_file_starting_level", "1"}, + {"prepopulate_blob_cache", "kDisable"}, {"bottommost_temperature", "kWarm"}, }; @@ -266,6 +267,7 @@ TEST_F(OptionsTest, GetOptionsFromMapTest) { ASSERT_EQ(new_cf_opt.blob_garbage_collection_force_threshold, 0.75); ASSERT_EQ(new_cf_opt.blob_compaction_readahead_size, 262144); ASSERT_EQ(new_cf_opt.blob_file_starting_level, 1); + ASSERT_EQ(new_cf_opt.prepopulate_blob_cache, PrepopulateBlobCache::kDisable); ASSERT_EQ(new_cf_opt.bottommost_temperature, Temperature::kWarm); cf_options_map["write_buffer_size"] = "hello"; @@ -2356,6 +2358,7 @@ TEST_F(OptionsOldApiTest, GetOptionsFromMapTest) { {"blob_garbage_collection_force_threshold", "0.75"}, {"blob_compaction_readahead_size", "256K"}, {"blob_file_starting_level", "1"}, + {"prepopulate_blob_cache", "kDisable"}, {"bottommost_temperature", "kWarm"}, }; @@ -2489,6 +2492,7 @@ TEST_F(OptionsOldApiTest, GetOptionsFromMapTest) { ASSERT_EQ(new_cf_opt.blob_garbage_collection_force_threshold, 0.75); ASSERT_EQ(new_cf_opt.blob_compaction_readahead_size, 262144); ASSERT_EQ(new_cf_opt.blob_file_starting_level, 1); + ASSERT_EQ(new_cf_opt.prepopulate_blob_cache, PrepopulateBlobCache::kDisable); ASSERT_EQ(new_cf_opt.bottommost_temperature, Temperature::kWarm); cf_options_map["write_buffer_size"] = "hello"; diff --git a/tools/benchmark.sh b/tools/benchmark.sh index 105d6f83e..1773f9d6e 100755 --- a/tools/benchmark.sh +++ b/tools/benchmark.sh @@ -95,6 +95,7 @@ function display_usage() { echo -e "\tUSE_SHARED_BLOCK_AND_BLOB_CACHE\t\t\tUse the same backing cache for block cache and blob cache (default: 1)" echo -e "\tBLOB_CACHE_SIZE\t\t\tSize of the blob cache (default: 16GB)" echo -e "\tBLOB_CACHE_NUMSHARDBITS\t\t\tNumber of shards for the blob cache is 2 ** blob_cache_numshardbits (default: 6)" + echo -e "\tPREPOPULATE_BLOB_CACHE\t\t\tPre-populate hot/warm blobs in blob cache (default: 0)" } if [ $# -lt 1 ]; then @@ -239,6 +240,7 @@ use_blob_cache=${USE_BLOB_CACHE:-1} use_shared_block_and_blob_cache=${USE_SHARED_BLOCK_AND_BLOB_CACHE:-1} blob_cache_size=${BLOB_CACHE_SIZE:-$(( 16 * $G ))} blob_cache_numshardbits=${BLOB_CACHE_NUMSHARDBITS:-6} +prepopulate_blob_cache=${PREPOPULATE_BLOB_CACHE:-0} const_params_base=" --db=$DB_DIR \ @@ -306,7 +308,8 @@ blob_const_params=" --use_shared_block_and_blob_cache=$use_shared_block_and_blob_cache \ --blob_cache_size=$blob_cache_size \ --blob_cache_numshardbits=$blob_cache_numshardbits \ - --undefok=use_blob_cache,use_shared_block_and_blob_cache,blob_cache_size,blob_cache_numshardbits \ + --prepopulate_blob_cache=$prepopulate_blob_cache \ + --undefok=use_blob_cache,use_shared_block_and_blob_cache,blob_cache_size,blob_cache_numshardbits,prepopulate_blob_cache \ " # TODO: diff --git a/tools/db_bench_tool.cc b/tools/db_bench_tool.cc index 61aa184d3..3df9fcb78 100644 --- a/tools/db_bench_tool.cc +++ b/tools/db_bench_tool.cc @@ -1096,6 +1096,10 @@ DEFINE_int32(blob_cache_numshardbits, 6, "the block and blob caches are different " "(use_shared_block_and_blob_cache = false)."); +DEFINE_int32(prepopulate_blob_cache, 0, + "[Integrated BlobDB] Pre-populate hot/warm blobs in blob cache. 0 " + "to disable and 1 to insert during flush."); + #ifndef ROCKSDB_LITE // Secondary DB instance Options @@ -4524,12 +4528,24 @@ class Benchmark { exit(1); } } - fprintf(stdout, - "Integrated BlobDB: blob cache enabled, block and blob caches " - "shared: %d, blob cache size %" PRIu64 - ", blob cache num shard bits: %d\n", - FLAGS_use_shared_block_and_blob_cache, FLAGS_blob_cache_size, - FLAGS_blob_cache_numshardbits); + switch (FLAGS_prepopulate_blob_cache) { + case 0: + options.prepopulate_blob_cache = PrepopulateBlobCache::kDisable; + break; + case 1: + options.prepopulate_blob_cache = PrepopulateBlobCache::kFlushOnly; + break; + default: + fprintf(stderr, "Unknown prepopulate blob cache mode\n"); + exit(1); + } + fprintf( + stdout, + "Integrated BlobDB: blob cache enabled, block and blob caches " + "shared: %d, blob cache size %" PRIu64 + ", blob cache num shard bits: %d, hot/warm blobs prepopulated: %d\n", + FLAGS_use_shared_block_and_blob_cache, FLAGS_blob_cache_size, + FLAGS_blob_cache_numshardbits, FLAGS_prepopulate_blob_cache); } else { fprintf(stdout, "Integrated BlobDB: blob cache disabled\n"); } diff --git a/tools/db_bench_tool_test.cc b/tools/db_bench_tool_test.cc index 9dc2bf277..7d09c67a5 100644 --- a/tools/db_bench_tool_test.cc +++ b/tools/db_bench_tool_test.cc @@ -276,6 +276,7 @@ const std::string options_file_content = R"OPTIONS_FILE( blob_garbage_collection_force_threshold=0.75 blob_compaction_readahead_size=262144 blob_file_starting_level=0 + prepopulate_blob_cache=kDisable; [TableOptions/BlockBasedTable "default"] format_version=0 diff --git a/tools/db_crashtest.py b/tools/db_crashtest.py index 56aa9380b..dbb8bc9e9 100644 --- a/tools/db_crashtest.py +++ b/tools/db_crashtest.py @@ -349,6 +349,7 @@ blob_params = { "use_blob_cache": lambda: random.randint(0, 1), "use_shared_block_and_blob_cache": lambda: random.randint(0, 1), "blob_cache_size": lambda: random.choice([1048576, 2097152, 4194304, 8388608]), + "prepopulate_blob_cache": lambda: random.randint(0, 1), } ts_params = { diff --git a/tools/ldb_cmd.cc b/tools/ldb_cmd.cc index 25c4463ce..01dac5ff9 100644 --- a/tools/ldb_cmd.cc +++ b/tools/ldb_cmd.cc @@ -102,6 +102,8 @@ const std::string LDBCommand::ARG_BLOB_COMPACTION_READAHEAD_SIZE = "blob_compaction_readahead_size"; const std::string LDBCommand::ARG_BLOB_FILE_STARTING_LEVEL = "blob_file_starting_level"; +const std::string LDBCommand::ARG_PREPOPULATE_BLOB_CACHE = + "prepopulate_blob_cache"; const std::string LDBCommand::ARG_DECODE_BLOB_INDEX = "decode_blob_index"; const std::string LDBCommand::ARG_DUMP_UNCOMPRESSED_BLOBS = "dump_uncompressed_blobs"; @@ -556,6 +558,7 @@ std::vector LDBCommand::BuildCmdLineOptions( ARG_BLOB_GARBAGE_COLLECTION_FORCE_THRESHOLD, ARG_BLOB_COMPACTION_READAHEAD_SIZE, ARG_BLOB_FILE_STARTING_LEVEL, + ARG_PREPOPULATE_BLOB_CACHE, ARG_IGNORE_UNKNOWN_OPTIONS, ARG_CF_NAME}; ret.insert(ret.end(), options.begin(), options.end()); @@ -833,6 +836,23 @@ void LDBCommand::OverrideBaseCFOptions(ColumnFamilyOptions* cf_opts) { } } + int prepopulate_blob_cache; + if (ParseIntOption(option_map_, ARG_PREPOPULATE_BLOB_CACHE, + prepopulate_blob_cache, exec_state_)) { + switch (prepopulate_blob_cache) { + case 0: + cf_opts->prepopulate_blob_cache = PrepopulateBlobCache::kDisable; + break; + case 1: + cf_opts->prepopulate_blob_cache = PrepopulateBlobCache::kFlushOnly; + break; + default: + exec_state_ = LDBCommandExecuteResult::Failed( + ARG_PREPOPULATE_BLOB_CACHE + + " must be 0 (disable) or 1 (flush only)."); + } + } + auto itr = option_map_.find(ARG_AUTO_COMPACTION); if (itr != option_map_.end()) { cf_opts->disable_auto_compactions = !StringToBool(itr->second); diff --git a/tools/run_blob_bench.sh b/tools/run_blob_bench.sh index 32b45717a..3755a9e56 100755 --- a/tools/run_blob_bench.sh +++ b/tools/run_blob_bench.sh @@ -59,6 +59,7 @@ function display_usage() { echo -e "\tUSE_SHARED_BLOCK_AND_BLOB_CACHE\t\t\tUse the same backing cache for block cache and blob cache. (default: 1)" echo -e "\tBLOB_CACHE_SIZE\t\t\tSize of the blob cache (default: 16GB)" echo -e "\tBLOB_CACHE_NUMSHARDBITS\t\t\tNumber of shards for the blob cache is 2 ** blob_cache_numshardbits (default: 6)" + echo -e "\tPREPOPULATE_BLOB_CACHE\t\t\tPre-populate hot/warm blobs in blob cache (default: 0)" echo -e "\tTARGET_FILE_SIZE_BASE\t\tTarget SST file size for compactions (default: write buffer size, scaled down if blob files are enabled)" echo -e "\tMAX_BYTES_FOR_LEVEL_BASE\tMaximum size for the base level (default: 8 * target SST file size)" } @@ -123,6 +124,7 @@ use_blob_cache=${USE_BLOB_CACHE:-1} use_shared_block_and_blob_cache=${USE_SHARED_BLOCK_AND_BLOB_CACHE:-1} blob_cache_size=${BLOB_CACHE_SIZE:-$((16 * G))} blob_cache_numshardbits=${BLOB_CACHE_NUMSHARDBITS:-6} +prepopulate_blob_cache=${PREPOPULATE_BLOB_CACHE:-0} if [ "$enable_blob_files" == "1" ]; then target_file_size_base=${TARGET_FILE_SIZE_BASE:-$((32 * write_buffer_size / value_size))} @@ -157,6 +159,7 @@ echo -e "Blob cache enabled:\t\t\t$use_blob_cache" echo -e "Blob cache and block cache shared:\t\t\t$use_shared_block_and_blob_cache" echo -e "Blob cache size:\t\t$blob_cache_size" echo -e "Blob cache number of shard bits:\t\t$blob_cache_numshardbits" +echo -e "Blob cache prepopulated:\t\t\t$prepopulate_blob_cache" echo -e "Target SST file size:\t\t\t$target_file_size_base" echo -e "Maximum size of base level:\t\t$max_bytes_for_level_base" echo "=================================================================" @@ -187,6 +190,7 @@ PARAMS="\ --use_shared_block_and_blob_cache=$use_shared_block_and_blob_cache \ --blob_cache_size=$blob_cache_size \ --blob_cache_numshardbits=$blob_cache_numshardbits \ + --prepopulate_blob_cache=$prepopulate_blob_cache \ --write_buffer_size=$write_buffer_size \ --target_file_size_base=$target_file_size_base \ --max_bytes_for_level_base=$max_bytes_for_level_base"