Support prepopulating/warming the blob cache (#10298)

Summary:
Many workloads have temporal locality, where recently written items are read back in a short period of time. When using remote file systems, this is inefficient since it involves network traffic and higher latencies. Because of this, we would like to support prepopulating the blob cache during flush.

This task is a part of https://github.com/facebook/rocksdb/issues/10156

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10298

Reviewed By: ltamasi

Differential Revision: D37908743

Pulled By: gangliao

fbshipit-source-id: 9feaed234bc719d38f0c02975c1ad19fa4bb37d1
main
Gang Liao 2 years ago committed by Facebook GitHub Bot
parent f5ef36a29a
commit ec4ebeff30
  1. 7
      HISTORY.md
  2. 80
      db/blob/blob_file_builder.cc
  3. 13
      db/blob/blob_file_builder.h
  4. 35
      db/blob/blob_file_builder_test.cc
  5. 5
      db/blob/blob_source.cc
  6. 120
      db/blob/db_blob_basic_test.cc
  7. 8
      db/builder.cc
  8. 9
      db/c.cc
  9. 3
      db/c_test.c
  10. 8
      db/compaction/compaction_job.cc
  11. 1
      db_stress_tool/db_stress_common.h
  12. 4
      db_stress_tool/db_stress_gflags.cc
  13. 20
      db_stress_tool/db_stress_test_base.cc
  14. 19
      include/rocksdb/advanced_options.h
  15. 11
      include/rocksdb/c.h
  16. 1
      include/rocksdb/utilities/ldb_cmd.h
  17. 1
      java/CMakeLists.txt
  18. 53
      java/rocksjni/options.cc
  19. 34
      java/rocksjni/portal.h
  20. 14
      java/src/main/java/org/rocksdb/AbstractMutableOptions.java
  21. 23
      java/src/main/java/org/rocksdb/AdvancedMutableColumnFamilyOptionsInterface.java
  22. 34
      java/src/main/java/org/rocksdb/ColumnFamilyOptions.java
  23. 14
      java/src/main/java/org/rocksdb/MutableColumnFamilyOptions.java
  24. 14
      java/src/main/java/org/rocksdb/Options.java
  25. 117
      java/src/main/java/org/rocksdb/PrepopulateBlobCache.java
  26. 28
      java/src/test/java/org/rocksdb/BlobOptionsTest.java
  27. 6
      java/src/test/java/org/rocksdb/MutableColumnFamilyOptionsTest.java
  28. 12
      java/src/test/java/org/rocksdb/OptionsTest.java
  29. 9
      options/cf_options.cc
  30. 3
      options/cf_options.h
  31. 8
      options/options.cc
  32. 6
      options/options_helper.cc
  33. 4
      options/options_helper.h
  34. 1
      options/options_settable_test.cc
  35. 4
      options/options_test.cc
  36. 5
      tools/benchmark.sh
  37. 28
      tools/db_bench_tool.cc
  38. 1
      tools/db_bench_tool_test.cc
  39. 1
      tools/db_crashtest.py
  40. 20
      tools/ldb_cmd.cc
  41. 4
      tools/run_blob_bench.sh

@ -8,8 +8,9 @@
* Provide support for ReadOptions.async_io with direct_io to improve Seek latency by using async IO to parallelize child iterator seek and doing asynchronous prefetching on sequential scans.
* Added support for blob caching in order to cache frequently used blobs for BlobDB.
* User can configure the new ColumnFamilyOptions `blob_cache` to enable/disable blob caching.
* Either sharing the backend cache with the block cache or using a completely separate cache is supported.
* A new abstraction interface called `BlobSource` for blob read logic gives all users access to blobs, whether they are in the blob cache, secondary cache, or (remote) storage. Blobs can be potentially read both while handling user reads (`Get`, `MultiGet`, or iterator) and during compaction (while dealing with compaction filters, Merges, or garbage collection) but eventually all blob reads go through `Version::GetBlob` or, for MultiGet, `Version::MultiGetBlob` (and then get dispatched to the interface -- `BlobSource`).
* Either sharing the backend cache with the block cache or using a completely separate cache is supported.
* A new abstraction interface called `BlobSource` for blob read logic gives all users access to blobs, whether they are in the blob cache, secondary cache, or (remote) storage. Blobs can be potentially read both while handling user reads (`Get`, `MultiGet`, or iterator) and during compaction (while dealing with compaction filters, Merges, or garbage collection) but eventually all blob reads go through `Version::GetBlob` or, for MultiGet, `Version::MultiGetBlob` (and then get dispatched to the interface -- `BlobSource`).
* Added `prepopulate_blob_cache` to ColumnFamilyOptions. If enabled, prepopulate warm/hot blobs which are already in memory into blob cache at the time of flush. On a flush, the blob that is in memory (in memtables) get flushed to the device. If using Direct IO, additional IO is incurred to read this blob back into memory again, which is avoided by enabling this option. This further helps if the workload exhibits high temporal locality, where most of the reads go to recently written data. This also helps in case of the remote file system since it involves network traffic and higher latencies.
* Support using secondary cache with the blob cache. When creating a blob cache, the user can set a secondary blob cache by configuring `secondary_cache` in LRUCacheOptions.
* Add experimental tiered compaction feature `AdvancedColumnFamilyOptions::preclude_last_level_data_seconds`, which makes sure the new data inserted within preclude_last_level_data_seconds won't be placed on cold tier (the feature is not complete).
@ -23,6 +24,8 @@
* When using block cache strict capacity limit (`LRUCache` with `strict_capacity_limit=true`), DB operations now fail with Status code `kAborted` subcode `kMemoryLimit` (`IsMemoryLimit()`) instead of `kIncomplete` (`IsIncomplete()`) when the capacity limit is reached, because Incomplete can mean other specific things for some operations. In more detail, `Cache::Insert()` now returns the updated Status code and this usually propagates through RocksDB to the user on failure.
* NewClockCache calls temporarily return an LRUCache (with similar characteristics as the desired ClockCache). This is because ClockCache is being replaced by a new version (the old one had unknown bugs) but this is still under development.
* Add two functions `int ReserveThreads(int threads_to_be_reserved)` and `int ReleaseThreads(threads_to_be_released)` into `Env` class. In the default implementation, both return 0. Newly added `xxxEnv` class that inherits `Env` should implement these two functions for thread reservation/releasing features.
* Add `rocksdb_options_get_prepopulate_blob_cache` and `rocksdb_options_set_prepopulate_blob_cache` to C API.
* Add `prepopulateBlobCache` and `setPrepopulateBlobCache` to Java API.
### Bug Fixes
* Fix a bug in which backup/checkpoint can include a WAL deleted by RocksDB.

@ -32,9 +32,9 @@ BlobFileBuilder::BlobFileBuilder(
VersionSet* versions, FileSystem* fs,
const ImmutableOptions* immutable_options,
const MutableCFOptions* mutable_cf_options, const FileOptions* file_options,
int job_id, uint32_t column_family_id,
const std::string& column_family_name, Env::IOPriority io_priority,
Env::WriteLifeTimeHint write_hint,
std::string db_id, std::string db_session_id, int job_id,
uint32_t column_family_id, const std::string& column_family_name,
Env::IOPriority io_priority, Env::WriteLifeTimeHint write_hint,
const std::shared_ptr<IOTracer>& io_tracer,
BlobFileCompletionCallback* blob_callback,
BlobFileCreationReason creation_reason,
@ -42,17 +42,18 @@ BlobFileBuilder::BlobFileBuilder(
std::vector<BlobFileAddition>* blob_file_additions)
: BlobFileBuilder([versions]() { return versions->NewFileNumber(); }, fs,
immutable_options, mutable_cf_options, file_options,
job_id, column_family_id, column_family_name, io_priority,
write_hint, io_tracer, blob_callback, creation_reason,
blob_file_paths, blob_file_additions) {}
db_id, db_session_id, job_id, column_family_id,
column_family_name, io_priority, write_hint, io_tracer,
blob_callback, creation_reason, blob_file_paths,
blob_file_additions) {}
BlobFileBuilder::BlobFileBuilder(
std::function<uint64_t()> file_number_generator, FileSystem* fs,
const ImmutableOptions* immutable_options,
const MutableCFOptions* mutable_cf_options, const FileOptions* file_options,
int job_id, uint32_t column_family_id,
const std::string& column_family_name, Env::IOPriority io_priority,
Env::WriteLifeTimeHint write_hint,
std::string db_id, std::string db_session_id, int job_id,
uint32_t column_family_id, const std::string& column_family_name,
Env::IOPriority io_priority, Env::WriteLifeTimeHint write_hint,
const std::shared_ptr<IOTracer>& io_tracer,
BlobFileCompletionCallback* blob_callback,
BlobFileCreationReason creation_reason,
@ -64,7 +65,10 @@ BlobFileBuilder::BlobFileBuilder(
min_blob_size_(mutable_cf_options->min_blob_size),
blob_file_size_(mutable_cf_options->blob_file_size),
blob_compression_type_(mutable_cf_options->blob_compression_type),
prepopulate_blob_cache_(mutable_cf_options->prepopulate_blob_cache),
file_options_(file_options),
db_id_(std::move(db_id)),
db_session_id_(std::move(db_session_id)),
job_id_(job_id),
column_family_id_(column_family_id),
column_family_name_(column_family_name),
@ -133,6 +137,16 @@ Status BlobFileBuilder::Add(const Slice& key, const Slice& value,
}
}
{
const Status s =
PutBlobIntoCacheIfNeeded(value, blob_file_number, blob_offset);
if (!s.ok()) {
ROCKS_LOG_WARN(immutable_options_->info_log,
"Failed to pre-populate the blob into blob cache: %s",
s.ToString().c_str());
}
}
BlobIndex::EncodeBlob(blob_index, blob_file_number, blob_offset, blob.size(),
blob_compression_type_);
@ -372,4 +386,52 @@ void BlobFileBuilder::Abandon(const Status& s) {
blob_count_ = 0;
blob_bytes_ = 0;
}
Status BlobFileBuilder::PutBlobIntoCacheIfNeeded(const Slice& blob,
uint64_t blob_file_number,
uint64_t blob_offset) const {
Status s = Status::OK();
auto blob_cache = immutable_options_->blob_cache;
auto statistics = immutable_options_->statistics.get();
bool warm_cache =
prepopulate_blob_cache_ == PrepopulateBlobCache::kFlushOnly &&
creation_reason_ == BlobFileCreationReason::kFlush;
if (blob_cache && warm_cache) {
// The blob file during flush is unknown to be exactly how big it is.
// Therefore, we set the file size to kMaxOffsetStandardEncoding. For any
// max_offset <= this value, the same encoding scheme is guaranteed.
const OffsetableCacheKey base_cache_key(
db_id_, db_session_id_, blob_file_number,
OffsetableCacheKey::kMaxOffsetStandardEncoding);
const CacheKey cache_key = base_cache_key.WithOffset(blob_offset);
const Slice key = cache_key.AsSlice();
const Cache::Priority priority = Cache::Priority::LOW;
// Objects to be put into the cache have to be heap-allocated and
// self-contained, i.e. own their contents. The Cache has to be able to
// take unique ownership of them. Therefore, we copy the blob into a
// string directly, and insert that into the cache.
std::unique_ptr<std::string> buf = std::make_unique<std::string>();
buf->assign(blob.data(), blob.size());
// TODO: support custom allocators and provide a better estimated memory
// usage using malloc_usable_size.
s = blob_cache->Insert(key, buf.get(), buf->size(),
&DeleteCacheEntry<std::string>,
nullptr /* cache_handle */, priority);
if (s.ok()) {
RecordTick(statistics, BLOB_DB_CACHE_ADD);
RecordTick(statistics, BLOB_DB_CACHE_BYTES_WRITE, buf->size());
buf.release();
} else {
RecordTick(statistics, BLOB_DB_CACHE_ADD_FAILURES);
}
}
return s;
}
} // namespace ROCKSDB_NAMESPACE

@ -10,6 +10,7 @@
#include <string>
#include <vector>
#include "rocksdb/advanced_options.h"
#include "rocksdb/compression_type.h"
#include "rocksdb/env.h"
#include "rocksdb/rocksdb_namespace.h"
@ -35,7 +36,8 @@ class BlobFileBuilder {
BlobFileBuilder(VersionSet* versions, FileSystem* fs,
const ImmutableOptions* immutable_options,
const MutableCFOptions* mutable_cf_options,
const FileOptions* file_options, int job_id,
const FileOptions* file_options, std::string db_id,
std::string db_session_id, int job_id,
uint32_t column_family_id,
const std::string& column_family_name,
Env::IOPriority io_priority,
@ -49,7 +51,8 @@ class BlobFileBuilder {
BlobFileBuilder(std::function<uint64_t()> file_number_generator,
FileSystem* fs, const ImmutableOptions* immutable_options,
const MutableCFOptions* mutable_cf_options,
const FileOptions* file_options, int job_id,
const FileOptions* file_options, std::string db_id,
std::string db_session_id, int job_id,
uint32_t column_family_id,
const std::string& column_family_name,
Env::IOPriority io_priority,
@ -78,13 +81,19 @@ class BlobFileBuilder {
Status CloseBlobFile();
Status CloseBlobFileIfNeeded();
Status PutBlobIntoCacheIfNeeded(const Slice& blob, uint64_t blob_file_number,
uint64_t blob_offset) const;
std::function<uint64_t()> file_number_generator_;
FileSystem* fs_;
const ImmutableOptions* immutable_options_;
uint64_t min_blob_size_;
uint64_t blob_file_size_;
CompressionType blob_compression_type_;
PrepopulateBlobCache prepopulate_blob_cache_;
const FileOptions* file_options_;
const std::string db_id_;
const std::string db_session_id_;
int job_id_;
uint32_t column_family_id_;
std::string column_family_name_;

@ -144,8 +144,9 @@ TEST_F(BlobFileBuilderTest, BuildAndCheckOneFile) {
BlobFileBuilder builder(
TestFileNumberGenerator(), fs_, &immutable_options, &mutable_cf_options,
&file_options_, job_id, column_family_id, column_family_name, io_priority,
write_hint, nullptr /*IOTracer*/, nullptr /*BlobFileCompletionCallback*/,
&file_options_, "" /*db_id*/, "" /*db_session_id*/, job_id,
column_family_id, column_family_name, io_priority, write_hint,
nullptr /*IOTracer*/, nullptr /*BlobFileCompletionCallback*/,
BlobFileCreationReason::kFlush, &blob_file_paths, &blob_file_additions);
std::vector<std::pair<std::string, std::string>> expected_key_value_pairs(
@ -228,8 +229,9 @@ TEST_F(BlobFileBuilderTest, BuildAndCheckMultipleFiles) {
BlobFileBuilder builder(
TestFileNumberGenerator(), fs_, &immutable_options, &mutable_cf_options,
&file_options_, job_id, column_family_id, column_family_name, io_priority,
write_hint, nullptr /*IOTracer*/, nullptr /*BlobFileCompletionCallback*/,
&file_options_, "" /*db_id*/, "" /*db_session_id*/, job_id,
column_family_id, column_family_name, io_priority, write_hint,
nullptr /*IOTracer*/, nullptr /*BlobFileCompletionCallback*/,
BlobFileCreationReason::kFlush, &blob_file_paths, &blob_file_additions);
std::vector<std::pair<std::string, std::string>> expected_key_value_pairs(
@ -315,8 +317,9 @@ TEST_F(BlobFileBuilderTest, InlinedValues) {
BlobFileBuilder builder(
TestFileNumberGenerator(), fs_, &immutable_options, &mutable_cf_options,
&file_options_, job_id, column_family_id, column_family_name, io_priority,
write_hint, nullptr /*IOTracer*/, nullptr /*BlobFileCompletionCallback*/,
&file_options_, "" /*db_id*/, "" /*db_session_id*/, job_id,
column_family_id, column_family_name, io_priority, write_hint,
nullptr /*IOTracer*/, nullptr /*BlobFileCompletionCallback*/,
BlobFileCreationReason::kFlush, &blob_file_paths, &blob_file_additions);
for (size_t i = 0; i < number_of_blobs; ++i) {
@ -369,8 +372,9 @@ TEST_F(BlobFileBuilderTest, Compression) {
BlobFileBuilder builder(
TestFileNumberGenerator(), fs_, &immutable_options, &mutable_cf_options,
&file_options_, job_id, column_family_id, column_family_name, io_priority,
write_hint, nullptr /*IOTracer*/, nullptr /*BlobFileCompletionCallback*/,
&file_options_, "" /*db_id*/, "" /*db_session_id*/, job_id,
column_family_id, column_family_name, io_priority, write_hint,
nullptr /*IOTracer*/, nullptr /*BlobFileCompletionCallback*/,
BlobFileCreationReason::kFlush, &blob_file_paths, &blob_file_additions);
const std::string key("1");
@ -452,8 +456,9 @@ TEST_F(BlobFileBuilderTest, CompressionError) {
BlobFileBuilder builder(
TestFileNumberGenerator(), fs_, &immutable_options, &mutable_cf_options,
&file_options_, job_id, column_family_id, column_family_name, io_priority,
write_hint, nullptr /*IOTracer*/, nullptr /*BlobFileCompletionCallback*/,
&file_options_, "" /*db_id*/, "" /*db_session_id*/, job_id,
column_family_id, column_family_name, io_priority, write_hint,
nullptr /*IOTracer*/, nullptr /*BlobFileCompletionCallback*/,
BlobFileCreationReason::kFlush, &blob_file_paths, &blob_file_additions);
SyncPoint::GetInstance()->SetCallBack("CompressData:TamperWithReturnValue",
@ -531,8 +536,9 @@ TEST_F(BlobFileBuilderTest, Checksum) {
BlobFileBuilder builder(
TestFileNumberGenerator(), fs_, &immutable_options, &mutable_cf_options,
&file_options_, job_id, column_family_id, column_family_name, io_priority,
write_hint, nullptr /*IOTracer*/, nullptr /*BlobFileCompletionCallback*/,
&file_options_, "" /*db_id*/, "" /*db_session_id*/, job_id,
column_family_id, column_family_name, io_priority, write_hint,
nullptr /*IOTracer*/, nullptr /*BlobFileCompletionCallback*/,
BlobFileCreationReason::kFlush, &blob_file_paths, &blob_file_additions);
const std::string key("1");
@ -628,8 +634,9 @@ TEST_P(BlobFileBuilderIOErrorTest, IOError) {
BlobFileBuilder builder(
TestFileNumberGenerator(), fs_, &immutable_options, &mutable_cf_options,
&file_options_, job_id, column_family_id, column_family_name, io_priority,
write_hint, nullptr /*IOTracer*/, nullptr /*BlobFileCompletionCallback*/,
&file_options_, "" /*db_id*/, "" /*db_session_id*/, job_id,
column_family_id, column_family_name, io_priority, write_hint,
nullptr /*IOTracer*/, nullptr /*BlobFileCompletionCallback*/,
BlobFileCreationReason::kFlush, &blob_file_paths, &blob_file_additions);
SyncPoint::GetInstance()->SetCallBack(sync_point_, [this](void* arg) {

@ -63,15 +63,16 @@ Status BlobSource::PutBlobIntoCache(const Slice& cache_key,
// self-contained, i.e. own their contents. The Cache has to be able to take
// unique ownership of them. Therefore, we copy the blob into a string
// directly, and insert that into the cache.
std::string* buf = new std::string();
std::unique_ptr<std::string> buf = std::make_unique<std::string>();
buf->assign(blob->data(), blob->size());
// TODO: support custom allocators and provide a better estimated memory
// usage using malloc_usable_size.
Cache::Handle* cache_handle = nullptr;
s = InsertEntryIntoCache(cache_key, buf, buf->size(), &cache_handle,
s = InsertEntryIntoCache(cache_key, buf.get(), buf->size(), &cache_handle,
priority);
if (s.ok()) {
buf.release();
assert(cache_handle != nullptr);
*cached_blob =
CacheHandleGuard<std::string>(blob_cache_.get(), cache_handle);

@ -1432,6 +1432,126 @@ TEST_P(DBBlobBasicIOErrorTest, CompactionFilterReadBlob_IOError) {
SyncPoint::GetInstance()->ClearAllCallBacks();
}
TEST_F(DBBlobBasicTest, WarmCacheWithBlobsDuringFlush) {
Options options = GetDefaultOptions();
LRUCacheOptions co;
co.capacity = 1 << 25;
co.num_shard_bits = 2;
co.metadata_charge_policy = kDontChargeCacheMetadata;
auto backing_cache = NewLRUCache(co);
options.blob_cache = backing_cache;
BlockBasedTableOptions block_based_options;
block_based_options.no_block_cache = false;
block_based_options.block_cache = backing_cache;
block_based_options.cache_index_and_filter_blocks = true;
options.table_factory.reset(NewBlockBasedTableFactory(block_based_options));
options.enable_blob_files = true;
options.create_if_missing = true;
options.disable_auto_compactions = true;
options.enable_blob_garbage_collection = true;
options.blob_garbage_collection_age_cutoff = 1.0;
options.prepopulate_blob_cache = PrepopulateBlobCache::kFlushOnly;
options.statistics = ROCKSDB_NAMESPACE::CreateDBStatistics();
DestroyAndReopen(options);
constexpr size_t kNumBlobs = 10;
constexpr size_t kValueSize = 100;
std::string value(kValueSize, 'a');
for (size_t i = 1; i <= kNumBlobs; i++) {
ASSERT_OK(Put(std::to_string(i), value));
ASSERT_OK(Put(std::to_string(i + kNumBlobs), value)); // Add some overlap
ASSERT_OK(Flush());
ASSERT_EQ(i * 2, options.statistics->getTickerCount(BLOB_DB_CACHE_ADD));
ASSERT_EQ(value, Get(std::to_string(i)));
ASSERT_EQ(value, Get(std::to_string(i + kNumBlobs)));
ASSERT_EQ(0, options.statistics->getTickerCount(BLOB_DB_CACHE_MISS));
ASSERT_EQ(i * 2, options.statistics->getTickerCount(BLOB_DB_CACHE_HIT));
}
// Verify compaction not counted
ASSERT_OK(db_->CompactRange(CompactRangeOptions(), /*begin=*/nullptr,
/*end=*/nullptr));
EXPECT_EQ(kNumBlobs * 2,
options.statistics->getTickerCount(BLOB_DB_CACHE_ADD));
}
#ifndef ROCKSDB_LITE
TEST_F(DBBlobBasicTest, DynamicallyWarmCacheDuringFlush) {
Options options = GetDefaultOptions();
LRUCacheOptions co;
co.capacity = 1 << 25;
co.num_shard_bits = 2;
co.metadata_charge_policy = kDontChargeCacheMetadata;
auto backing_cache = NewLRUCache(co);
options.blob_cache = backing_cache;
BlockBasedTableOptions block_based_options;
block_based_options.no_block_cache = false;
block_based_options.block_cache = backing_cache;
block_based_options.cache_index_and_filter_blocks = true;
options.table_factory.reset(NewBlockBasedTableFactory(block_based_options));
options.enable_blob_files = true;
options.create_if_missing = true;
options.disable_auto_compactions = true;
options.enable_blob_garbage_collection = true;
options.blob_garbage_collection_age_cutoff = 1.0;
options.prepopulate_blob_cache = PrepopulateBlobCache::kFlushOnly;
options.statistics = ROCKSDB_NAMESPACE::CreateDBStatistics();
DestroyAndReopen(options);
constexpr size_t kNumBlobs = 10;
constexpr size_t kValueSize = 100;
std::string value(kValueSize, 'a');
for (size_t i = 1; i <= 5; i++) {
ASSERT_OK(Put(std::to_string(i), value));
ASSERT_OK(Put(std::to_string(i + kNumBlobs), value)); // Add some overlap
ASSERT_OK(Flush());
ASSERT_EQ(2, options.statistics->getAndResetTickerCount(BLOB_DB_CACHE_ADD));
ASSERT_EQ(value, Get(std::to_string(i)));
ASSERT_EQ(value, Get(std::to_string(i + kNumBlobs)));
ASSERT_EQ(0, options.statistics->getAndResetTickerCount(BLOB_DB_CACHE_ADD));
ASSERT_EQ(0,
options.statistics->getAndResetTickerCount(BLOB_DB_CACHE_MISS));
ASSERT_EQ(2, options.statistics->getAndResetTickerCount(BLOB_DB_CACHE_HIT));
}
ASSERT_OK(dbfull()->SetOptions({{"prepopulate_blob_cache", "kDisable"}}));
for (size_t i = 6; i <= kNumBlobs; i++) {
ASSERT_OK(Put(std::to_string(i), value));
ASSERT_OK(Put(std::to_string(i + kNumBlobs), value)); // Add some overlap
ASSERT_OK(Flush());
ASSERT_EQ(0, options.statistics->getAndResetTickerCount(BLOB_DB_CACHE_ADD));
ASSERT_EQ(value, Get(std::to_string(i)));
ASSERT_EQ(value, Get(std::to_string(i + kNumBlobs)));
ASSERT_EQ(2, options.statistics->getAndResetTickerCount(BLOB_DB_CACHE_ADD));
ASSERT_EQ(2,
options.statistics->getAndResetTickerCount(BLOB_DB_CACHE_MISS));
ASSERT_EQ(0, options.statistics->getAndResetTickerCount(BLOB_DB_CACHE_HIT));
}
// Verify compaction not counted
ASSERT_OK(db_->CompactRange(CompactRangeOptions(), /*begin=*/nullptr,
/*end=*/nullptr));
EXPECT_EQ(0, options.statistics->getTickerCount(BLOB_DB_CACHE_ADD));
}
#endif // !ROCKSDB_LITE
} // namespace ROCKSDB_NAMESPACE
int main(int argc, char** argv) {

@ -188,10 +188,10 @@ Status BuildTable(
blob_file_additions)
? new BlobFileBuilder(
versions, fs, &ioptions, &mutable_cf_options, &file_options,
job_id, tboptions.column_family_id,
tboptions.column_family_name, io_priority, write_hint,
io_tracer, blob_callback, blob_creation_reason,
&blob_file_paths, blob_file_additions)
tboptions.db_id, tboptions.db_session_id, job_id,
tboptions.column_family_id, tboptions.column_family_name,
io_priority, write_hint, io_tracer, blob_callback,
blob_creation_reason, &blob_file_paths, blob_file_additions)
: nullptr);
const std::atomic<bool> kManualCompactionCanceledFalse{false};

@ -99,6 +99,7 @@ using ROCKSDB_NAMESPACE::Options;
using ROCKSDB_NAMESPACE::PerfContext;
using ROCKSDB_NAMESPACE::PerfLevel;
using ROCKSDB_NAMESPACE::PinnableSlice;
using ROCKSDB_NAMESPACE::PrepopulateBlobCache;
using ROCKSDB_NAMESPACE::RandomAccessFile;
using ROCKSDB_NAMESPACE::Range;
using ROCKSDB_NAMESPACE::RateLimiter;
@ -3140,6 +3141,14 @@ void rocksdb_options_set_blob_cache(rocksdb_options_t* opt,
opt->rep.blob_cache = blob_cache->rep;
}
void rocksdb_options_set_prepopulate_blob_cache(rocksdb_options_t* opt, int t) {
opt->rep.prepopulate_blob_cache = static_cast<PrepopulateBlobCache>(t);
}
int rocksdb_options_get_prepopulate_blob_cache(rocksdb_options_t* opt) {
return static_cast<int>(opt->rep.prepopulate_blob_cache);
}
void rocksdb_options_set_num_levels(rocksdb_options_t* opt, int n) {
opt->rep.num_levels = n;
}

@ -2068,6 +2068,9 @@ int main(int argc, char** argv) {
rocksdb_options_set_blob_file_starting_level(o, 5);
CheckCondition(5 == rocksdb_options_get_blob_file_starting_level(o));
rocksdb_options_set_prepopulate_blob_cache(o, 1 /* flush only */);
CheckCondition(1 == rocksdb_options_get_prepopulate_blob_cache(o));
// Create a copy that should be equal to the original.
rocksdb_options_t* copy;
copy = rocksdb_options_create_copy(o);

@ -999,10 +999,10 @@ void CompactionJob::ProcessKeyValueCompaction(SubcompactionState* sub_compact) {
? new BlobFileBuilder(
versions_, fs_.get(),
sub_compact->compaction->immutable_options(),
mutable_cf_options, &file_options_, job_id_, cfd->GetID(),
cfd->GetName(), Env::IOPriority::IO_LOW, write_hint_,
io_tracer_, blob_callback_, BlobFileCreationReason::kCompaction,
&blob_file_paths,
mutable_cf_options, &file_options_, db_id_, db_session_id_,
job_id_, cfd->GetID(), cfd->GetName(), Env::IOPriority::IO_LOW,
write_hint_, io_tracer_, blob_callback_,
BlobFileCreationReason::kCompaction, &blob_file_paths,
sub_compact->Current().GetBlobFileAdditionsPtr())
: nullptr);

@ -272,6 +272,7 @@ DECLARE_bool(use_blob_cache);
DECLARE_bool(use_shared_block_and_blob_cache);
DECLARE_uint64(blob_cache_size);
DECLARE_int32(blob_cache_numshardbits);
DECLARE_int32(prepopulate_blob_cache);
DECLARE_int32(approximate_size_one_in);
DECLARE_bool(sync_fault_injection);

@ -474,6 +474,10 @@ DEFINE_int32(blob_cache_numshardbits, 6,
"the block and blob caches are different "
"(use_shared_block_and_blob_cache = false).");
DEFINE_int32(prepopulate_blob_cache, 0,
"[Integrated BlobDB] Pre-populate hot/warm blobs in blob cache. 0 "
"to disable and 1 to insert during flush.");
static const bool FLAGS_subcompactions_dummy __attribute__((__unused__)) =
RegisterFlagValidator(&FLAGS_subcompactions, &ValidateUint32Range);

@ -270,6 +270,8 @@ bool StressTest::BuildOptionsTable() {
std::vector<std::string>{"0", "1M", "4M"});
options_tbl.emplace("blob_file_starting_level",
std::vector<std::string>{"0", "1", "2"});
options_tbl.emplace("prepopulate_blob_cache",
std::vector<std::string>{"kDisable", "kFlushOnly"});
}
options_table_ = std::move(options_tbl);
@ -2401,9 +2403,12 @@ void StressTest::Open(SharedState* shared) {
fprintf(stdout,
"Integrated BlobDB: blob cache enabled, block and blob caches "
"shared: %d, blob cache size %" PRIu64
", blob cache num shard bits: %d\n",
", blob cache num shard bits: %d, blob cache prepopulated: %s\n",
FLAGS_use_shared_block_and_blob_cache, FLAGS_blob_cache_size,
FLAGS_blob_cache_numshardbits);
FLAGS_blob_cache_numshardbits,
options_.prepopulate_blob_cache == PrepopulateBlobCache::kFlushOnly
? "flush only"
: "disable");
} else {
fprintf(stdout, "Integrated BlobDB: blob cache disabled\n");
}
@ -3043,6 +3048,17 @@ void InitializeOptionsFromFlags(
exit(1);
}
}
switch (FLAGS_prepopulate_blob_cache) {
case 0:
options.prepopulate_blob_cache = PrepopulateBlobCache::kDisable;
break;
case 1:
options.prepopulate_blob_cache = PrepopulateBlobCache::kFlushOnly;
break;
default:
fprintf(stderr, "Unknown prepopulate blob cache mode\n");
exit(1);
}
}
options.wal_compression =

@ -246,6 +246,11 @@ enum UpdateStatus { // Return status For inplace update callback
UPDATED = 2, // No inplace update. Merged value set
};
enum class PrepopulateBlobCache : uint8_t {
kDisable = 0x0, // Disable prepopulate blob cache
kFlushOnly = 0x1, // Prepopulate blobs during flush only
};
struct AdvancedColumnFamilyOptions {
// The maximum number of write buffers that are built up in memory.
// The default and the minimum number is 2, so that when 1 write buffer
@ -992,6 +997,20 @@ struct AdvancedColumnFamilyOptions {
// Default: nullptr (disabled)
std::shared_ptr<Cache> blob_cache = nullptr;
// If enabled, prepopulate warm/hot blobs which are already in memory into
// blob cache at the time of flush. On a flush, the blob that is in memory (in
// memtables) get flushed to the device. If using Direct IO, additional IO is
// incurred to read this blob back into memory again, which is avoided by
// enabling this option. This further helps if the workload exhibits high
// temporal locality, where most of the reads go to recently written data.
// This also helps in case of the remote file system since it involves network
// traffic and higher latencies.
//
// Default: disabled
//
// Dynamically changeable through the SetOptions() API
PrepopulateBlobCache prepopulate_blob_cache = PrepopulateBlobCache::kDisable;
// Create ColumnFamilyOptions with default values for all fields
AdvancedColumnFamilyOptions();
// Create ColumnFamilyOptions from Options

@ -1302,6 +1302,17 @@ extern ROCKSDB_LIBRARY_API int rocksdb_options_get_blob_file_starting_level(
extern ROCKSDB_LIBRARY_API void rocksdb_options_set_blob_cache(
rocksdb_options_t* opt, rocksdb_cache_t* blob_cache);
enum {
rocksdb_prepopulate_blob_disable = 0,
rocksdb_prepopulate_blob_flush_only = 1
};
extern ROCKSDB_LIBRARY_API void rocksdb_options_set_prepopulate_blob_cache(
rocksdb_options_t* opt, int val);
extern ROCKSDB_LIBRARY_API int rocksdb_options_get_prepopulate_blob_cache(
rocksdb_options_t* opt);
/* returns a pointer to a malloc()-ed, null terminated string */
extern ROCKSDB_LIBRARY_API char* rocksdb_options_statistics_get_string(
rocksdb_options_t* opt);

@ -70,6 +70,7 @@ class LDBCommand {
static const std::string ARG_BLOB_GARBAGE_COLLECTION_FORCE_THRESHOLD;
static const std::string ARG_BLOB_COMPACTION_READAHEAD_SIZE;
static const std::string ARG_BLOB_FILE_STARTING_LEVEL;
static const std::string ARG_PREPOPULATE_BLOB_CACHE;
static const std::string ARG_DECODE_BLOB_INDEX;
static const std::string ARG_DUMP_UNCOMPRESSED_BLOBS;

@ -194,6 +194,7 @@ set(JAVA_MAIN_CLASSES
src/main/java/org/rocksdb/OptionsUtil.java
src/main/java/org/rocksdb/PersistentCache.java
src/main/java/org/rocksdb/PlainTableConfig.java
src/main/java/org/rocksdb/PrepopulateBlobCache.java
src/main/java/org/rocksdb/Priority.java
src/main/java/org/rocksdb/Range.java
src/main/java/org/rocksdb/RateLimiter.java

@ -3885,6 +3885,31 @@ jint Java_org_rocksdb_Options_blobFileStartingLevel(JNIEnv*, jobject,
return static_cast<jint>(opts->blob_file_starting_level);
}
/*
* Class: org_rocksdb_Options
* Method: setPrepopulateBlobCache
* Signature: (JB)V
*/
void Java_org_rocksdb_Options_setPrepopulateBlobCache(
JNIEnv*, jobject, jlong jhandle, jbyte jprepopulate_blob_cache_value) {
auto* opts = reinterpret_cast<ROCKSDB_NAMESPACE::Options*>(jhandle);
opts->prepopulate_blob_cache =
ROCKSDB_NAMESPACE::PrepopulateBlobCacheJni::toCppPrepopulateBlobCache(
jprepopulate_blob_cache_value);
}
/*
* Class: org_rocksdb_Options
* Method: prepopulateBlobCache
* Signature: (J)B
*/
jbyte Java_org_rocksdb_Options_prepopulateBlobCache(JNIEnv*, jobject,
jlong jhandle) {
auto* opts = reinterpret_cast<ROCKSDB_NAMESPACE::Options*>(jhandle);
return ROCKSDB_NAMESPACE::PrepopulateBlobCacheJni::toJavaPrepopulateBlobCache(
opts->prepopulate_blob_cache);
}
//////////////////////////////////////////////////////////////////////////////
// ROCKSDB_NAMESPACE::ColumnFamilyOptions
@ -5717,6 +5742,34 @@ jint Java_org_rocksdb_ColumnFamilyOptions_blobFileStartingLevel(JNIEnv*,
return static_cast<jint>(opts->blob_file_starting_level);
}
/*
* Class: org_rocksdb_ColumnFamilyOptions
* Method: setPrepopulateBlobCache
* Signature: (JB)V
*/
void Java_org_rocksdb_ColumnFamilyOptions_setPrepopulateBlobCache(
JNIEnv*, jobject, jlong jhandle, jbyte jprepopulate_blob_cache_value) {
auto* opts =
reinterpret_cast<ROCKSDB_NAMESPACE::ColumnFamilyOptions*>(jhandle);
opts->prepopulate_blob_cache =
ROCKSDB_NAMESPACE::PrepopulateBlobCacheJni::toCppPrepopulateBlobCache(
jprepopulate_blob_cache_value);
}
/*
* Class: org_rocksdb_ColumnFamilyOptions
* Method: prepopulateBlobCache
* Signature: (J)B
*/
jbyte Java_org_rocksdb_ColumnFamilyOptions_prepopulateBlobCache(JNIEnv*,
jobject,
jlong jhandle) {
auto* opts =
reinterpret_cast<ROCKSDB_NAMESPACE::ColumnFamilyOptions*>(jhandle);
return ROCKSDB_NAMESPACE::PrepopulateBlobCacheJni::toJavaPrepopulateBlobCache(
opts->prepopulate_blob_cache);
}
/////////////////////////////////////////////////////////////////////
// ROCKSDB_NAMESPACE::DBOptions

@ -7894,6 +7894,40 @@ class SanityLevelJni {
}
};
// The portal class for org.rocksdb.PrepopulateBlobCache
class PrepopulateBlobCacheJni {
public:
// Returns the equivalent org.rocksdb.PrepopulateBlobCache for the provided
// C++ ROCKSDB_NAMESPACE::PrepopulateBlobCache enum
static jbyte toJavaPrepopulateBlobCache(
ROCKSDB_NAMESPACE::PrepopulateBlobCache prepopulate_blob_cache) {
switch (prepopulate_blob_cache) {
case ROCKSDB_NAMESPACE::PrepopulateBlobCache::kDisable:
return 0x0;
case ROCKSDB_NAMESPACE::PrepopulateBlobCache::kFlushOnly:
return 0x1;
default:
return 0x7f; // undefined
}
}
// Returns the equivalent C++ ROCKSDB_NAMESPACE::PrepopulateBlobCache enum for
// the provided Java org.rocksdb.PrepopulateBlobCache
static ROCKSDB_NAMESPACE::PrepopulateBlobCache toCppPrepopulateBlobCache(
jbyte jprepopulate_blob_cache) {
switch (jprepopulate_blob_cache) {
case 0x0:
return ROCKSDB_NAMESPACE::PrepopulateBlobCache::kDisable;
case 0x1:
return ROCKSDB_NAMESPACE::PrepopulateBlobCache::kFlushOnly;
case 0x7F:
default:
// undefined/default
return ROCKSDB_NAMESPACE::PrepopulateBlobCache::kDisable;
}
}
};
// The portal class for org.rocksdb.AbstractListener.EnabledEventCallback
class EnabledEventCallbackJni {
public:

@ -341,8 +341,18 @@ public abstract class AbstractMutableOptions {
return setIntArray(key, value);
case ENUM:
final CompressionType compressionType = CompressionType.getFromInternal(valueStr);
return setEnum(key, compressionType);
String optionName = key.name();
if (optionName.equals("prepopulate_blob_cache")) {
final PrepopulateBlobCache prepopulateBlobCache =
PrepopulateBlobCache.getFromInternal(valueStr);
return setEnum(key, prepopulateBlobCache);
} else if (optionName.equals("compression")
|| optionName.equals("blob_compression_type")) {
final CompressionType compressionType = CompressionType.getFromInternal(valueStr);
return setEnum(key, compressionType);
} else {
throw new IllegalArgumentException("Unknown enum type: " + key.name());
}
default:
throw new IllegalStateException(key + " has unknown value type: " + key.getValueType());

@ -801,6 +801,29 @@ public interface AdvancedMutableColumnFamilyOptionsInterface<
*/
int blobFileStartingLevel();
/**
* Set a certain prepopulate blob cache option.
*
* Default: 0
*
* Dynamically changeable through
* {@link RocksDB#setOptions(ColumnFamilyHandle, MutableColumnFamilyOptions)}.
*
* @param prepopulateBlobCache the prepopulate blob cache option
*
* @return the reference to the current options.
*/
T setPrepopulateBlobCache(final PrepopulateBlobCache prepopulateBlobCache);
/**
* Get the prepopulate blob cache option.
*
* Default: 0
*
* @return the current prepopulate blob cache option.
*/
PrepopulateBlobCache prepopulateBlobCache();
//
// END options for blobs (integrated BlobDB)
//

@ -1280,6 +1280,37 @@ public class ColumnFamilyOptions extends RocksObject
return blobFileStartingLevel(nativeHandle_);
}
/**
* Set a certain prepopulate blob cache option.
*
* Default: 0
*
* Dynamically changeable through
* {@link RocksDB#setOptions(ColumnFamilyHandle, MutableColumnFamilyOptions)}.
*
* @param prepopulateBlobCache the prepopulate blob cache option
*
* @return the reference to the current options.
*/
@Override
public ColumnFamilyOptions setPrepopulateBlobCache(
final PrepopulateBlobCache prepopulateBlobCache) {
setPrepopulateBlobCache(nativeHandle_, prepopulateBlobCache.getValue());
return this;
}
/**
* Get the prepopulate blob cache option.
*
* Default: 0
*
* @return the current prepopulate blob cache option.
*/
@Override
public PrepopulateBlobCache prepopulateBlobCache() {
return PrepopulateBlobCache.getPrepopulateBlobCache(prepopulateBlobCache(nativeHandle_));
}
//
// END options for blobs (integrated BlobDB)
//
@ -1488,6 +1519,9 @@ public class ColumnFamilyOptions extends RocksObject
private native void setBlobFileStartingLevel(
final long nativeHandle_, final int blobFileStartingLevel);
private native int blobFileStartingLevel(final long nativeHandle_);
private native void setPrepopulateBlobCache(
final long nativeHandle_, final byte prepopulateBlobCache);
private native byte prepopulateBlobCache(final long nativeHandle_);
// instance variables
// NOTE: If you add new member variables, please update the copy constructor above!

@ -123,7 +123,8 @@ public class MutableColumnFamilyOptions
blob_garbage_collection_age_cutoff(ValueType.DOUBLE),
blob_garbage_collection_force_threshold(ValueType.DOUBLE),
blob_compaction_readahead_size(ValueType.LONG),
blob_file_starting_level(ValueType.INT);
blob_file_starting_level(ValueType.INT),
prepopulate_blob_cache(ValueType.ENUM);
private final ValueType valueType;
BlobOption(final ValueType valueType) {
@ -607,5 +608,16 @@ public class MutableColumnFamilyOptions
public int blobFileStartingLevel() {
return getInt(BlobOption.blob_file_starting_level);
}
@Override
public MutableColumnFamilyOptionsBuilder setPrepopulateBlobCache(
final PrepopulateBlobCache prepopulateBlobCache) {
return setEnum(BlobOption.prepopulate_blob_cache, prepopulateBlobCache);
}
@Override
public PrepopulateBlobCache prepopulateBlobCache() {
return (PrepopulateBlobCache) getEnum(BlobOption.prepopulate_blob_cache);
}
}
}

@ -2104,6 +2104,17 @@ public class Options extends RocksObject
return blobFileStartingLevel(nativeHandle_);
}
@Override
public Options setPrepopulateBlobCache(final PrepopulateBlobCache prepopulateBlobCache) {
setPrepopulateBlobCache(nativeHandle_, prepopulateBlobCache.getValue());
return this;
}
@Override
public PrepopulateBlobCache prepopulateBlobCache() {
return PrepopulateBlobCache.getPrepopulateBlobCache(prepopulateBlobCache(nativeHandle_));
}
//
// END options for blobs (integrated BlobDB)
//
@ -2541,6 +2552,9 @@ public class Options extends RocksObject
private native void setBlobFileStartingLevel(
final long nativeHandle_, final int blobFileStartingLevel);
private native int blobFileStartingLevel(final long nativeHandle_);
private native void setPrepopulateBlobCache(
final long nativeHandle_, final byte prepopulateBlobCache);
private native byte prepopulateBlobCache(final long nativeHandle_);
// instance variables
// NOTE: If you add new member variables, please update the copy constructor above!

@ -0,0 +1,117 @@
// Copyright (c) Meta Platforms, Inc. and affiliates.
// This source code is licensed under both the GPLv2 (found in the
// COPYING file in the root directory) and Apache 2.0 License
// (found in the LICENSE.Apache file in the root directory).
package org.rocksdb;
/**
* Enum PrepopulateBlobCache
*
* <p>Prepopulate warm/hot blobs which are already in memory into blob
* cache at the time of flush. On a flush, the blob that is in memory
* (in memtables) get flushed to the device. If using Direct IO,
* additional IO is incurred to read this blob back into memory again,
* which is avoided by enabling this option. This further helps if the
* workload exhibits high temporal locality, where most of the reads go
* to recently written data. This also helps in case of the remote file
* system since it involves network traffic and higher latencies.</p>
*/
public enum PrepopulateBlobCache {
PREPOPULATE_BLOB_DISABLE((byte) 0x0, "prepopulate_blob_disable", "kDisable"),
PREPOPULATE_BLOB_FLUSH_ONLY((byte) 0x1, "prepopulate_blob_flush_only", "kFlushOnly");
/**
* <p>Get the PrepopulateBlobCache enumeration value by
* passing the library name to this method.</p>
*
* <p>If library cannot be found the enumeration
* value {@code PREPOPULATE_BLOB_DISABLE} will be returned.</p>
*
* @param libraryName prepopulate blob cache library name.
*
* @return PrepopulateBlobCache instance.
*/
public static PrepopulateBlobCache getPrepopulateBlobCache(String libraryName) {
if (libraryName != null) {
for (PrepopulateBlobCache prepopulateBlobCache : PrepopulateBlobCache.values()) {
if (prepopulateBlobCache.getLibraryName() != null
&& prepopulateBlobCache.getLibraryName().equals(libraryName)) {
return prepopulateBlobCache;
}
}
}
return PrepopulateBlobCache.PREPOPULATE_BLOB_DISABLE;
}
/**
* <p>Get the PrepopulateBlobCache enumeration value by
* passing the byte identifier to this method.</p>
*
* @param byteIdentifier of PrepopulateBlobCache.
*
* @return PrepopulateBlobCache instance.
*
* @throws IllegalArgumentException If PrepopulateBlobCache cannot be found for the
* provided byteIdentifier
*/
public static PrepopulateBlobCache getPrepopulateBlobCache(byte byteIdentifier) {
for (final PrepopulateBlobCache prepopulateBlobCache : PrepopulateBlobCache.values()) {
if (prepopulateBlobCache.getValue() == byteIdentifier) {
return prepopulateBlobCache;
}
}
throw new IllegalArgumentException("Illegal value provided for PrepopulateBlobCache.");
}
/**
* <p>Get a PrepopulateBlobCache value based on the string key in the C++ options output.
* This gets used in support of getting options into Java from an options string,
* which is generated at the C++ level.
* </p>
*
* @param internalName the internal (C++) name by which the option is known.
*
* @return PrepopulateBlobCache instance (optional)
*/
static PrepopulateBlobCache getFromInternal(final String internalName) {
for (final PrepopulateBlobCache prepopulateBlobCache : PrepopulateBlobCache.values()) {
if (prepopulateBlobCache.internalName_.equals(internalName)) {
return prepopulateBlobCache;
}
}
throw new IllegalArgumentException(
"Illegal internalName '" + internalName + " ' provided for PrepopulateBlobCache.");
}
/**
* <p>Returns the byte value of the enumerations value.</p>
*
* @return byte representation
*/
public byte getValue() {
return value_;
}
/**
* <p>Returns the library name of the prepopulate blob cache mode
* identified by the enumeration value.</p>
*
* @return library name
*/
public String getLibraryName() {
return libraryName_;
}
PrepopulateBlobCache(final byte value, final String libraryName, final String internalName) {
value_ = value;
libraryName_ = libraryName;
internalName_ = internalName;
}
private final byte value_;
private final String libraryName_;
private final String internalName_;
}

@ -79,6 +79,8 @@ public class BlobOptionsTest {
assertThat(options.blobGarbageCollectionAgeCutoff()).isEqualTo(0.25);
assertThat(options.blobGarbageCollectionForceThreshold()).isEqualTo(1.0);
assertThat(options.blobCompactionReadaheadSize()).isEqualTo(0);
assertThat(options.prepopulateBlobCache())
.isEqualTo(PrepopulateBlobCache.PREPOPULATE_BLOB_DISABLE);
assertThat(options.setEnableBlobFiles(true)).isEqualTo(options);
assertThat(options.setMinBlobSize(132768L)).isEqualTo(options);
@ -90,6 +92,8 @@ public class BlobOptionsTest {
assertThat(options.setBlobGarbageCollectionForceThreshold(0.80)).isEqualTo(options);
assertThat(options.setBlobCompactionReadaheadSize(262144L)).isEqualTo(options);
assertThat(options.setBlobFileStartingLevel(0)).isEqualTo(options);
assertThat(options.setPrepopulateBlobCache(PrepopulateBlobCache.PREPOPULATE_BLOB_FLUSH_ONLY))
.isEqualTo(options);
assertThat(options.enableBlobFiles()).isEqualTo(true);
assertThat(options.minBlobSize()).isEqualTo(132768L);
@ -100,6 +104,8 @@ public class BlobOptionsTest {
assertThat(options.blobGarbageCollectionForceThreshold()).isEqualTo(0.80);
assertThat(options.blobCompactionReadaheadSize()).isEqualTo(262144L);
assertThat(options.blobFileStartingLevel()).isEqualTo(0);
assertThat(options.prepopulateBlobCache())
.isEqualTo(PrepopulateBlobCache.PREPOPULATE_BLOB_FLUSH_ONLY);
}
}
@ -130,6 +136,9 @@ public class BlobOptionsTest {
assertThat(columnFamilyOptions.setBlobCompactionReadaheadSize(262144L))
.isEqualTo(columnFamilyOptions);
assertThat(columnFamilyOptions.setBlobFileStartingLevel(0)).isEqualTo(columnFamilyOptions);
assertThat(columnFamilyOptions.setPrepopulateBlobCache(
PrepopulateBlobCache.PREPOPULATE_BLOB_DISABLE))
.isEqualTo(columnFamilyOptions);
assertThat(columnFamilyOptions.enableBlobFiles()).isEqualTo(true);
assertThat(columnFamilyOptions.minBlobSize()).isEqualTo(132768L);
@ -141,6 +150,8 @@ public class BlobOptionsTest {
assertThat(columnFamilyOptions.blobGarbageCollectionForceThreshold()).isEqualTo(0.80);
assertThat(columnFamilyOptions.blobCompactionReadaheadSize()).isEqualTo(262144L);
assertThat(columnFamilyOptions.blobFileStartingLevel()).isEqualTo(0);
assertThat(columnFamilyOptions.prepopulateBlobCache())
.isEqualTo(PrepopulateBlobCache.PREPOPULATE_BLOB_DISABLE);
}
}
@ -156,7 +167,8 @@ public class BlobOptionsTest {
.setBlobGarbageCollectionAgeCutoff(0.89)
.setBlobGarbageCollectionForceThreshold(0.80)
.setBlobCompactionReadaheadSize(262144)
.setBlobFileStartingLevel(1);
.setBlobFileStartingLevel(1)
.setPrepopulateBlobCache(PrepopulateBlobCache.PREPOPULATE_BLOB_FLUSH_ONLY);
assertThat(builder.enableBlobFiles()).isEqualTo(true);
assertThat(builder.minBlobSize()).isEqualTo(1024);
@ -167,6 +179,8 @@ public class BlobOptionsTest {
assertThat(builder.blobGarbageCollectionForceThreshold()).isEqualTo(0.80);
assertThat(builder.blobCompactionReadaheadSize()).isEqualTo(262144);
assertThat(builder.blobFileStartingLevel()).isEqualTo(1);
assertThat(builder.prepopulateBlobCache())
.isEqualTo(PrepopulateBlobCache.PREPOPULATE_BLOB_FLUSH_ONLY);
builder.setEnableBlobFiles(false)
.setMinBlobSize(4096)
@ -176,7 +190,8 @@ public class BlobOptionsTest {
.setBlobGarbageCollectionAgeCutoff(0.91)
.setBlobGarbageCollectionForceThreshold(0.96)
.setBlobCompactionReadaheadSize(1024)
.setBlobFileStartingLevel(0);
.setBlobFileStartingLevel(0)
.setPrepopulateBlobCache(PrepopulateBlobCache.PREPOPULATE_BLOB_DISABLE);
assertThat(builder.enableBlobFiles()).isEqualTo(false);
assertThat(builder.minBlobSize()).isEqualTo(4096);
@ -187,16 +202,19 @@ public class BlobOptionsTest {
assertThat(builder.blobGarbageCollectionForceThreshold()).isEqualTo(0.96);
assertThat(builder.blobCompactionReadaheadSize()).isEqualTo(1024);
assertThat(builder.blobFileStartingLevel()).isEqualTo(0);
assertThat(builder.prepopulateBlobCache())
.isEqualTo(PrepopulateBlobCache.PREPOPULATE_BLOB_DISABLE);
final MutableColumnFamilyOptions options = builder.build();
assertThat(options.getKeys())
.isEqualTo(new String[] {"enable_blob_files", "min_blob_size", "blob_file_size",
"blob_compression_type", "enable_blob_garbage_collection",
"blob_garbage_collection_age_cutoff", "blob_garbage_collection_force_threshold",
"blob_compaction_readahead_size", "blob_file_starting_level"});
"blob_compaction_readahead_size", "blob_file_starting_level",
"prepopulate_blob_cache"});
assertThat(options.getValues())
.isEqualTo(new String[] {
"false", "4096", "2048", "LZ4_COMPRESSION", "false", "0.91", "0.96", "1024", "0"});
.isEqualTo(new String[] {"false", "4096", "2048", "LZ4_COMPRESSION", "false", "0.91",
"0.96", "1024", "0", "PREPOPULATE_BLOB_DISABLE"});
}
/**

@ -96,8 +96,9 @@ public class MutableColumnFamilyOptionsTest {
public void mutableColumnFamilyOptions_parse_getOptions_output() {
final String optionsString =
"bottommost_compression=kDisableCompressionOption; sample_for_compression=0; "
+ "blob_garbage_collection_age_cutoff=0.250000; blob_garbage_collection_force_threshold=0.800000; arena_block_size=1048576; enable_blob_garbage_collection=false; "
+ "level0_stop_writes_trigger=36; min_blob_size=65536; blob_compaction_readahead_size=262144; blob_file_starting_level=5; "
+ "blob_garbage_collection_age_cutoff=0.250000; blob_garbage_collection_force_threshold=0.800000;"
+ "arena_block_size=1048576; enable_blob_garbage_collection=false; level0_stop_writes_trigger=36; min_blob_size=65536;"
+ "blob_compaction_readahead_size=262144; blob_file_starting_level=5; prepopulate_blob_cache=kDisable;"
+ "compaction_options_universal={allow_trivial_move=false;stop_style=kCompactionStopStyleTotalSize;min_merge_width=2;"
+ "compression_size_percent=-1;max_size_amplification_percent=200;max_merge_width=4294967295;size_ratio=1;}; "
+ "target_file_size_base=67108864; max_bytes_for_level_base=268435456; memtable_whole_key_filtering=false; "
@ -133,6 +134,7 @@ public class MutableColumnFamilyOptionsTest {
assertThat(cf.minBlobSize()).isEqualTo(65536);
assertThat(cf.blobCompactionReadaheadSize()).isEqualTo(262144);
assertThat(cf.blobFileStartingLevel()).isEqualTo(5);
assertThat(cf.prepopulateBlobCache()).isEqualTo(PrepopulateBlobCache.PREPOPULATE_BLOB_DISABLE);
assertThat(cf.targetFileSizeBase()).isEqualTo(67108864);
assertThat(cf.maxBytesForLevelBase()).isEqualTo(268435456);
assertThat(cf.softPendingCompactionBytesLimit()).isEqualTo(68719476736L);

@ -1033,6 +1033,18 @@ public class OptionsTest {
}
}
@Test
public void prepopulateBlobCache() {
try (final Options options = new Options()) {
for (final PrepopulateBlobCache prepopulateBlobCache : PrepopulateBlobCache.values()) {
options.setPrepopulateBlobCache(prepopulateBlobCache);
assertThat(options.prepopulateBlobCache()).isEqualTo(prepopulateBlobCache);
assertThat(PrepopulateBlobCache.valueOf("PREPOPULATE_BLOB_DISABLE"))
.isEqualTo(PrepopulateBlobCache.PREPOPULATE_BLOB_DISABLE);
}
}
}
@Test
public void compressionPerLevel() {
try (final Options options = new Options()) {

@ -451,6 +451,10 @@ static std::unordered_map<std::string, OptionTypeInfo>
{offsetof(struct MutableCFOptions, blob_file_starting_level),
OptionType::kInt, OptionVerificationType::kNormal,
OptionTypeFlags::kMutable}},
{"prepopulate_blob_cache",
OptionTypeInfo::Enum<PrepopulateBlobCache>(
offsetof(struct MutableCFOptions, prepopulate_blob_cache),
&prepopulate_blob_cache_string_map, OptionTypeFlags::kMutable)},
{"sample_for_compression",
{offsetof(struct MutableCFOptions, sample_for_compression),
OptionType::kUInt64T, OptionVerificationType::kNormal,
@ -1097,7 +1101,10 @@ void MutableCFOptions::Dump(Logger* log) const {
blob_compaction_readahead_size);
ROCKS_LOG_INFO(log, " blob_file_starting_level: %d",
blob_file_starting_level);
ROCKS_LOG_INFO(log, " prepopulate_blob_cache: %s",
prepopulate_blob_cache == PrepopulateBlobCache::kFlushOnly
? "flush only"
: "disable");
ROCKS_LOG_INFO(log, " bottommost_temperature: %d",
static_cast<int>(bottommost_temperature));
}

@ -147,6 +147,7 @@ struct MutableCFOptions {
options.blob_garbage_collection_force_threshold),
blob_compaction_readahead_size(options.blob_compaction_readahead_size),
blob_file_starting_level(options.blob_file_starting_level),
prepopulate_blob_cache(options.prepopulate_blob_cache),
max_sequential_skip_in_iterations(
options.max_sequential_skip_in_iterations),
check_flush_compaction_key_order(
@ -198,6 +199,7 @@ struct MutableCFOptions {
blob_garbage_collection_force_threshold(0.0),
blob_compaction_readahead_size(0),
blob_file_starting_level(0),
prepopulate_blob_cache(PrepopulateBlobCache::kDisable),
max_sequential_skip_in_iterations(0),
check_flush_compaction_key_order(true),
paranoid_file_checks(false),
@ -281,6 +283,7 @@ struct MutableCFOptions {
double blob_garbage_collection_force_threshold;
uint64_t blob_compaction_readahead_size;
int blob_file_starting_level;
PrepopulateBlobCache prepopulate_blob_cache;
// Misc options
uint64_t max_sequential_skip_in_iterations;

@ -105,7 +105,8 @@ AdvancedColumnFamilyOptions::AdvancedColumnFamilyOptions(const Options& options)
options.blob_garbage_collection_force_threshold),
blob_compaction_readahead_size(options.blob_compaction_readahead_size),
blob_file_starting_level(options.blob_file_starting_level),
blob_cache(options.blob_cache) {
blob_cache(options.blob_cache),
prepopulate_blob_cache(options.prepopulate_blob_cache) {
assert(memtable_factory.get() != nullptr);
if (max_bytes_for_level_multiplier_additional.size() <
static_cast<unsigned int>(num_levels)) {
@ -428,6 +429,11 @@ void ColumnFamilyOptions::Dump(Logger* log) const {
blob_cache->Name());
ROCKS_LOG_HEADER(log, " blob_cache options: %s",
blob_cache->GetPrintableOptions().c_str());
ROCKS_LOG_HEADER(
log, " blob_cache prepopulated: %s",
prepopulate_blob_cache == PrepopulateBlobCache::kFlushOnly
? "flush only"
: "disabled");
}
ROCKS_LOG_HEADER(log, "Options.experimental_mempurge_threshold: %f",
experimental_mempurge_threshold);

@ -257,6 +257,7 @@ void UpdateColumnFamilyOptions(const MutableCFOptions& moptions,
cf_opts->blob_compaction_readahead_size =
moptions.blob_compaction_readahead_size;
cf_opts->blob_file_starting_level = moptions.blob_file_starting_level;
cf_opts->prepopulate_blob_cache = moptions.prepopulate_blob_cache;
// Misc options
cf_opts->max_sequential_skip_in_iterations =
@ -850,6 +851,11 @@ std::unordered_map<std::string, Temperature>
{"kWarm", Temperature::kWarm},
{"kCold", Temperature::kCold}};
std::unordered_map<std::string, PrepopulateBlobCache>
OptionsHelper::prepopulate_blob_cache_string_map = {
{"kDisable", PrepopulateBlobCache::kDisable},
{"kFlushOnly", PrepopulateBlobCache::kFlushOnly}};
Status OptionTypeInfo::NextToken(const std::string& opts, char delimiter,
size_t pos, size_t* end, std::string* token) {
while (pos < opts.size() && isspace(opts[pos])) {

@ -82,6 +82,8 @@ struct OptionsHelper {
static std::unordered_map<std::string, ChecksumType> checksum_type_string_map;
static std::unordered_map<std::string, CompressionType>
compression_type_string_map;
static std::unordered_map<std::string, PrepopulateBlobCache>
prepopulate_blob_cache_string_map;
#ifndef ROCKSDB_LITE
static std::unordered_map<std::string, CompactionStopStyle>
compaction_stop_style_string_map;
@ -113,6 +115,8 @@ static auto& compaction_style_string_map =
static auto& compaction_pri_string_map =
OptionsHelper::compaction_pri_string_map;
static auto& temperature_string_map = OptionsHelper::temperature_string_map;
static auto& prepopulate_blob_cache_string_map =
OptionsHelper::prepopulate_blob_cache_string_map;
#endif // !ROCKSDB_LITE
} // namespace ROCKSDB_NAMESPACE

@ -526,6 +526,7 @@ TEST_F(OptionsSettableTest, ColumnFamilyOptionsAllFieldsSettable) {
"blob_garbage_collection_force_threshold=0.75;"
"blob_compaction_readahead_size=262144;"
"blob_file_starting_level=1;"
"prepopulate_blob_cache=kDisable;"
"bottommost_temperature=kWarm;"
"preclude_last_level_data_seconds=86400;"
"compaction_options_fifo={max_table_files_size=3;allow_"

@ -127,6 +127,7 @@ TEST_F(OptionsTest, GetOptionsFromMapTest) {
{"blob_garbage_collection_force_threshold", "0.75"},
{"blob_compaction_readahead_size", "256K"},
{"blob_file_starting_level", "1"},
{"prepopulate_blob_cache", "kDisable"},
{"bottommost_temperature", "kWarm"},
};
@ -266,6 +267,7 @@ TEST_F(OptionsTest, GetOptionsFromMapTest) {
ASSERT_EQ(new_cf_opt.blob_garbage_collection_force_threshold, 0.75);
ASSERT_EQ(new_cf_opt.blob_compaction_readahead_size, 262144);
ASSERT_EQ(new_cf_opt.blob_file_starting_level, 1);
ASSERT_EQ(new_cf_opt.prepopulate_blob_cache, PrepopulateBlobCache::kDisable);
ASSERT_EQ(new_cf_opt.bottommost_temperature, Temperature::kWarm);
cf_options_map["write_buffer_size"] = "hello";
@ -2356,6 +2358,7 @@ TEST_F(OptionsOldApiTest, GetOptionsFromMapTest) {
{"blob_garbage_collection_force_threshold", "0.75"},
{"blob_compaction_readahead_size", "256K"},
{"blob_file_starting_level", "1"},
{"prepopulate_blob_cache", "kDisable"},
{"bottommost_temperature", "kWarm"},
};
@ -2489,6 +2492,7 @@ TEST_F(OptionsOldApiTest, GetOptionsFromMapTest) {
ASSERT_EQ(new_cf_opt.blob_garbage_collection_force_threshold, 0.75);
ASSERT_EQ(new_cf_opt.blob_compaction_readahead_size, 262144);
ASSERT_EQ(new_cf_opt.blob_file_starting_level, 1);
ASSERT_EQ(new_cf_opt.prepopulate_blob_cache, PrepopulateBlobCache::kDisable);
ASSERT_EQ(new_cf_opt.bottommost_temperature, Temperature::kWarm);
cf_options_map["write_buffer_size"] = "hello";

@ -95,6 +95,7 @@ function display_usage() {
echo -e "\tUSE_SHARED_BLOCK_AND_BLOB_CACHE\t\t\tUse the same backing cache for block cache and blob cache (default: 1)"
echo -e "\tBLOB_CACHE_SIZE\t\t\tSize of the blob cache (default: 16GB)"
echo -e "\tBLOB_CACHE_NUMSHARDBITS\t\t\tNumber of shards for the blob cache is 2 ** blob_cache_numshardbits (default: 6)"
echo -e "\tPREPOPULATE_BLOB_CACHE\t\t\tPre-populate hot/warm blobs in blob cache (default: 0)"
}
if [ $# -lt 1 ]; then
@ -239,6 +240,7 @@ use_blob_cache=${USE_BLOB_CACHE:-1}
use_shared_block_and_blob_cache=${USE_SHARED_BLOCK_AND_BLOB_CACHE:-1}
blob_cache_size=${BLOB_CACHE_SIZE:-$(( 16 * $G ))}
blob_cache_numshardbits=${BLOB_CACHE_NUMSHARDBITS:-6}
prepopulate_blob_cache=${PREPOPULATE_BLOB_CACHE:-0}
const_params_base="
--db=$DB_DIR \
@ -306,7 +308,8 @@ blob_const_params="
--use_shared_block_and_blob_cache=$use_shared_block_and_blob_cache \
--blob_cache_size=$blob_cache_size \
--blob_cache_numshardbits=$blob_cache_numshardbits \
--undefok=use_blob_cache,use_shared_block_and_blob_cache,blob_cache_size,blob_cache_numshardbits \
--prepopulate_blob_cache=$prepopulate_blob_cache \
--undefok=use_blob_cache,use_shared_block_and_blob_cache,blob_cache_size,blob_cache_numshardbits,prepopulate_blob_cache \
"
# TODO:

@ -1096,6 +1096,10 @@ DEFINE_int32(blob_cache_numshardbits, 6,
"the block and blob caches are different "
"(use_shared_block_and_blob_cache = false).");
DEFINE_int32(prepopulate_blob_cache, 0,
"[Integrated BlobDB] Pre-populate hot/warm blobs in blob cache. 0 "
"to disable and 1 to insert during flush.");
#ifndef ROCKSDB_LITE
// Secondary DB instance Options
@ -4524,12 +4528,24 @@ class Benchmark {
exit(1);
}
}
fprintf(stdout,
"Integrated BlobDB: blob cache enabled, block and blob caches "
"shared: %d, blob cache size %" PRIu64
", blob cache num shard bits: %d\n",
FLAGS_use_shared_block_and_blob_cache, FLAGS_blob_cache_size,
FLAGS_blob_cache_numshardbits);
switch (FLAGS_prepopulate_blob_cache) {
case 0:
options.prepopulate_blob_cache = PrepopulateBlobCache::kDisable;
break;
case 1:
options.prepopulate_blob_cache = PrepopulateBlobCache::kFlushOnly;
break;
default:
fprintf(stderr, "Unknown prepopulate blob cache mode\n");
exit(1);
}
fprintf(
stdout,
"Integrated BlobDB: blob cache enabled, block and blob caches "
"shared: %d, blob cache size %" PRIu64
", blob cache num shard bits: %d, hot/warm blobs prepopulated: %d\n",
FLAGS_use_shared_block_and_blob_cache, FLAGS_blob_cache_size,
FLAGS_blob_cache_numshardbits, FLAGS_prepopulate_blob_cache);
} else {
fprintf(stdout, "Integrated BlobDB: blob cache disabled\n");
}

@ -276,6 +276,7 @@ const std::string options_file_content = R"OPTIONS_FILE(
blob_garbage_collection_force_threshold=0.75
blob_compaction_readahead_size=262144
blob_file_starting_level=0
prepopulate_blob_cache=kDisable;
[TableOptions/BlockBasedTable "default"]
format_version=0

@ -349,6 +349,7 @@ blob_params = {
"use_blob_cache": lambda: random.randint(0, 1),
"use_shared_block_and_blob_cache": lambda: random.randint(0, 1),
"blob_cache_size": lambda: random.choice([1048576, 2097152, 4194304, 8388608]),
"prepopulate_blob_cache": lambda: random.randint(0, 1),
}
ts_params = {

@ -102,6 +102,8 @@ const std::string LDBCommand::ARG_BLOB_COMPACTION_READAHEAD_SIZE =
"blob_compaction_readahead_size";
const std::string LDBCommand::ARG_BLOB_FILE_STARTING_LEVEL =
"blob_file_starting_level";
const std::string LDBCommand::ARG_PREPOPULATE_BLOB_CACHE =
"prepopulate_blob_cache";
const std::string LDBCommand::ARG_DECODE_BLOB_INDEX = "decode_blob_index";
const std::string LDBCommand::ARG_DUMP_UNCOMPRESSED_BLOBS =
"dump_uncompressed_blobs";
@ -556,6 +558,7 @@ std::vector<std::string> LDBCommand::BuildCmdLineOptions(
ARG_BLOB_GARBAGE_COLLECTION_FORCE_THRESHOLD,
ARG_BLOB_COMPACTION_READAHEAD_SIZE,
ARG_BLOB_FILE_STARTING_LEVEL,
ARG_PREPOPULATE_BLOB_CACHE,
ARG_IGNORE_UNKNOWN_OPTIONS,
ARG_CF_NAME};
ret.insert(ret.end(), options.begin(), options.end());
@ -833,6 +836,23 @@ void LDBCommand::OverrideBaseCFOptions(ColumnFamilyOptions* cf_opts) {
}
}
int prepopulate_blob_cache;
if (ParseIntOption(option_map_, ARG_PREPOPULATE_BLOB_CACHE,
prepopulate_blob_cache, exec_state_)) {
switch (prepopulate_blob_cache) {
case 0:
cf_opts->prepopulate_blob_cache = PrepopulateBlobCache::kDisable;
break;
case 1:
cf_opts->prepopulate_blob_cache = PrepopulateBlobCache::kFlushOnly;
break;
default:
exec_state_ = LDBCommandExecuteResult::Failed(
ARG_PREPOPULATE_BLOB_CACHE +
" must be 0 (disable) or 1 (flush only).");
}
}
auto itr = option_map_.find(ARG_AUTO_COMPACTION);
if (itr != option_map_.end()) {
cf_opts->disable_auto_compactions = !StringToBool(itr->second);

@ -59,6 +59,7 @@ function display_usage() {
echo -e "\tUSE_SHARED_BLOCK_AND_BLOB_CACHE\t\t\tUse the same backing cache for block cache and blob cache. (default: 1)"
echo -e "\tBLOB_CACHE_SIZE\t\t\tSize of the blob cache (default: 16GB)"
echo -e "\tBLOB_CACHE_NUMSHARDBITS\t\t\tNumber of shards for the blob cache is 2 ** blob_cache_numshardbits (default: 6)"
echo -e "\tPREPOPULATE_BLOB_CACHE\t\t\tPre-populate hot/warm blobs in blob cache (default: 0)"
echo -e "\tTARGET_FILE_SIZE_BASE\t\tTarget SST file size for compactions (default: write buffer size, scaled down if blob files are enabled)"
echo -e "\tMAX_BYTES_FOR_LEVEL_BASE\tMaximum size for the base level (default: 8 * target SST file size)"
}
@ -123,6 +124,7 @@ use_blob_cache=${USE_BLOB_CACHE:-1}
use_shared_block_and_blob_cache=${USE_SHARED_BLOCK_AND_BLOB_CACHE:-1}
blob_cache_size=${BLOB_CACHE_SIZE:-$((16 * G))}
blob_cache_numshardbits=${BLOB_CACHE_NUMSHARDBITS:-6}
prepopulate_blob_cache=${PREPOPULATE_BLOB_CACHE:-0}
if [ "$enable_blob_files" == "1" ]; then
target_file_size_base=${TARGET_FILE_SIZE_BASE:-$((32 * write_buffer_size / value_size))}
@ -157,6 +159,7 @@ echo -e "Blob cache enabled:\t\t\t$use_blob_cache"
echo -e "Blob cache and block cache shared:\t\t\t$use_shared_block_and_blob_cache"
echo -e "Blob cache size:\t\t$blob_cache_size"
echo -e "Blob cache number of shard bits:\t\t$blob_cache_numshardbits"
echo -e "Blob cache prepopulated:\t\t\t$prepopulate_blob_cache"
echo -e "Target SST file size:\t\t\t$target_file_size_base"
echo -e "Maximum size of base level:\t\t$max_bytes_for_level_base"
echo "================================================================="
@ -187,6 +190,7 @@ PARAMS="\
--use_shared_block_and_blob_cache=$use_shared_block_and_blob_cache \
--blob_cache_size=$blob_cache_size \
--blob_cache_numshardbits=$blob_cache_numshardbits \
--prepopulate_blob_cache=$prepopulate_blob_cache \
--write_buffer_size=$write_buffer_size \
--target_file_size_base=$target_file_size_base \
--max_bytes_for_level_base=$max_bytes_for_level_base"

Loading…
Cancel
Save