Support using secondary cache with the blob cache (#10349)

Summary:
RocksDB supports a two-level cache hierarchy (see https://rocksdb.org/blog/2021/05/27/rocksdb-secondary-cache.html), where items evicted from the primary cache can be spilled over to the secondary cache, or items from the secondary cache can be promoted to the primary one. We have a CacheLib-based non-volatile secondary cache implementation that can be used to improve read latencies and reduce the amount of network bandwidth when using distributed file systems. In addition, we have recently implemented a compressed secondary cache that can be used as a replacement for the OS page cache when e.g. direct I/O is used. The goals of this task are to add support for using a secondary cache with the blob cache and to measure the potential performance gains using `db_bench`.

This task is a part of https://github.com/facebook/rocksdb/issues/10156

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10349

Reviewed By: ltamasi

Differential Revision: D37896773

Pulled By: gangliao

fbshipit-source-id: 7804619ce4a44b73d9e11ad606640f9385969c84
main
Gang Liao 3 years ago committed by Facebook GitHub Bot
parent efdb428edc
commit 95ef007adc
  1. 1
      HISTORY.md
  2. 2
      cache/lru_cache.cc
  3. 53
      db/blob/blob_source.cc
  4. 14
      db/blob/blob_source.h
  5. 238
      db/blob/blob_source_test.cc

@ -10,6 +10,7 @@
* User can configure the new ColumnFamilyOptions `blob_cache` to enable/disable blob caching. * User can configure the new ColumnFamilyOptions `blob_cache` to enable/disable blob caching.
* Either sharing the backend cache with the block cache or using a completely separate cache is supported. * Either sharing the backend cache with the block cache or using a completely separate cache is supported.
* A new abstraction interface called `BlobSource` for blob read logic gives all users access to blobs, whether they are in the blob cache, secondary cache, or (remote) storage. Blobs can be potentially read both while handling user reads (`Get`, `MultiGet`, or iterator) and during compaction (while dealing with compaction filters, Merges, or garbage collection) but eventually all blob reads go through `Version::GetBlob` or, for MultiGet, `Version::MultiGetBlob` (and then get dispatched to the interface -- `BlobSource`). * A new abstraction interface called `BlobSource` for blob read logic gives all users access to blobs, whether they are in the blob cache, secondary cache, or (remote) storage. Blobs can be potentially read both while handling user reads (`Get`, `MultiGet`, or iterator) and during compaction (while dealing with compaction filters, Merges, or garbage collection) but eventually all blob reads go through `Version::GetBlob` or, for MultiGet, `Version::MultiGetBlob` (and then get dispatched to the interface -- `BlobSource`).
* Support using secondary cache with the blob cache. When creating a blob cache, the user can set a secondary blob cache by configuring `secondary_cache` in LRUCacheOptions.
* Add experimental tiered compaction feature `AdvancedColumnFamilyOptions::preclude_last_level_data_seconds`, which makes sure the new data inserted within preclude_last_level_data_seconds won't be placed on cold tier (the feature is not complete). * Add experimental tiered compaction feature `AdvancedColumnFamilyOptions::preclude_last_level_data_seconds`, which makes sure the new data inserted within preclude_last_level_data_seconds won't be placed on cold tier (the feature is not complete).
### Public API changes ### Public API changes

@ -529,7 +529,7 @@ bool LRUCacheShard::Release(Cache::Handle* handle, bool erase_if_last_ref) {
// If it was the last reference, and the entry is either not secondary // If it was the last reference, and the entry is either not secondary
// cache compatible (i.e a dummy entry for accounting), or is secondary // cache compatible (i.e a dummy entry for accounting), or is secondary
// cache compatible and has a non-null value, then decrement the cache // cache compatible and has a non-null value, then decrement the cache
// usage. If value is null in the latter case, taht means the lookup // usage. If value is null in the latter case, that means the lookup
// failed and we didn't charge the cache. // failed and we didn't charge the cache.
if (last_reference && (!e->IsSecondaryCacheCompatible() || e->value)) { if (last_reference && (!e->IsSecondaryCacheCompatible() || e->value)) {
assert(usage_ >= e->total_charge); assert(usage_ >= e->total_charge);

@ -25,7 +25,8 @@ BlobSource::BlobSource(const ImmutableOptions* immutable_options,
db_session_id_(db_session_id), db_session_id_(db_session_id),
statistics_(immutable_options->statistics.get()), statistics_(immutable_options->statistics.get()),
blob_file_cache_(blob_file_cache), blob_file_cache_(blob_file_cache),
blob_cache_(immutable_options->blob_cache) {} blob_cache_(immutable_options->blob_cache),
lowest_used_cache_tier_(immutable_options->lowest_used_cache_tier) {}
BlobSource::~BlobSource() = default; BlobSource::~BlobSource() = default;
@ -81,7 +82,24 @@ Status BlobSource::PutBlobIntoCache(const Slice& cache_key,
Cache::Handle* BlobSource::GetEntryFromCache(const Slice& key) const { Cache::Handle* BlobSource::GetEntryFromCache(const Slice& key) const {
Cache::Handle* cache_handle = nullptr; Cache::Handle* cache_handle = nullptr;
if (lowest_used_cache_tier_ == CacheTier::kNonVolatileBlockTier) {
Cache::CreateCallback create_cb = [&](const void* buf, size_t size,
void** out_obj,
size_t* charge) -> Status {
std::string* blob = new std::string();
blob->assign(static_cast<const char*>(buf), size);
*out_obj = blob;
*charge = size;
return Status::OK();
};
cache_handle = blob_cache_->Lookup(key, GetCacheItemHelper(), create_cb,
Cache::Priority::LOW,
true /* wait_for_cache */, statistics_);
} else {
cache_handle = blob_cache_->Lookup(key, statistics_); cache_handle = blob_cache_->Lookup(key, statistics_);
}
if (cache_handle != nullptr) { if (cache_handle != nullptr) {
PERF_COUNTER_ADD(blob_cache_hit_count, 1); PERF_COUNTER_ADD(blob_cache_hit_count, 1);
RecordTick(statistics_, BLOB_DB_CACHE_HIT); RecordTick(statistics_, BLOB_DB_CACHE_HIT);
@ -97,9 +115,16 @@ Status BlobSource::InsertEntryIntoCache(const Slice& key, std::string* value,
size_t charge, size_t charge,
Cache::Handle** cache_handle, Cache::Handle** cache_handle,
Cache::Priority priority) const { Cache::Priority priority) const {
const Status s = Status s;
blob_cache_->Insert(key, value, charge, &DeleteCacheEntry<std::string>,
if (lowest_used_cache_tier_ == CacheTier::kNonVolatileBlockTier) {
s = blob_cache_->Insert(key, value, GetCacheItemHelper(), charge,
cache_handle, priority); cache_handle, priority);
} else {
s = blob_cache_->Insert(key, value, charge, &DeleteCacheEntry<std::string>,
cache_handle, priority);
}
if (s.ok()) { if (s.ok()) {
assert(*cache_handle != nullptr); assert(*cache_handle != nullptr);
RecordTick(statistics_, BLOB_DB_CACHE_ADD); RecordTick(statistics_, BLOB_DB_CACHE_ADD);
@ -108,6 +133,7 @@ Status BlobSource::InsertEntryIntoCache(const Slice& key, std::string* value,
} else { } else {
RecordTick(statistics_, BLOB_DB_CACHE_ADD_FAILURES); RecordTick(statistics_, BLOB_DB_CACHE_ADD_FAILURES);
} }
return s; return s;
} }
@ -402,4 +428,25 @@ bool BlobSource::TEST_BlobInCache(uint64_t file_number, uint64_t file_size,
return false; return false;
} }
// Callbacks for secondary blob cache
size_t BlobSource::SizeCallback(void* obj) {
assert(obj != nullptr);
return static_cast<const std::string*>(obj)->size();
}
Status BlobSource::SaveToCallback(void* from_obj, size_t from_offset,
size_t length, void* out) {
assert(from_obj != nullptr);
const std::string* buf = static_cast<const std::string*>(from_obj);
assert(buf->size() >= from_offset + length);
memcpy(out, buf->data() + from_offset, length);
return Status::OK();
}
Cache::CacheItemHelper* BlobSource::GetCacheItemHelper() {
static Cache::CacheItemHelper cache_helper(SizeCallback, SaveToCallback,
&DeleteCacheEntry<std::string>);
return &cache_helper;
}
} // namespace ROCKSDB_NAMESPACE } // namespace ROCKSDB_NAMESPACE

@ -123,6 +123,14 @@ class BlobSource {
return base_cache_key.WithOffset(offset); return base_cache_key.WithOffset(offset);
} }
// Callbacks for secondary blob cache
static size_t SizeCallback(void* obj);
static Status SaveToCallback(void* from_obj, size_t from_offset,
size_t length, void* out);
static Cache::CacheItemHelper* GetCacheItemHelper();
const std::string& db_id_; const std::string& db_id_;
const std::string& db_session_id_; const std::string& db_session_id_;
@ -133,6 +141,12 @@ class BlobSource {
// A cache to store uncompressed blobs. // A cache to store uncompressed blobs.
std::shared_ptr<Cache> blob_cache_; std::shared_ptr<Cache> blob_cache_;
// The control option of how the cache tiers will be used. Currently rocksdb
// support block/blob cache (volatile tier) and secondary cache (this tier
// isn't strictly speaking a non-volatile tier since the compressed cache in
// this tier is in volatile memory).
const CacheTier lowest_used_cache_tier_;
}; };
} // namespace ROCKSDB_NAMESPACE } // namespace ROCKSDB_NAMESPACE

@ -11,6 +11,7 @@
#include <memory> #include <memory>
#include <string> #include <string>
#include "cache/compressed_secondary_cache.h"
#include "db/blob/blob_file_cache.h" #include "db/blob/blob_file_cache.h"
#include "db/blob/blob_file_reader.h" #include "db/blob/blob_file_reader.h"
#include "db/blob/blob_log_format.h" #include "db/blob/blob_log_format.h"
@ -21,6 +22,7 @@
#include "options/cf_options.h" #include "options/cf_options.h"
#include "rocksdb/options.h" #include "rocksdb/options.h"
#include "util/compression.h" #include "util/compression.h"
#include "util/random.h"
namespace ROCKSDB_NAMESPACE { namespace ROCKSDB_NAMESPACE {
@ -1020,6 +1022,242 @@ TEST_F(BlobSourceTest, MultiGetBlobsFromCache) {
} }
} }
class BlobSecondaryCacheTest : public DBTestBase {
protected:
public:
explicit BlobSecondaryCacheTest()
: DBTestBase("blob_secondary_cache_test", /*env_do_fsync=*/true) {
options_.env = env_;
options_.enable_blob_files = true;
options_.create_if_missing = true;
// Set a small cache capacity to evict entries from the cache, and to test
// that secondary cache is used properly.
lru_cache_ops_.capacity = 1024;
lru_cache_ops_.num_shard_bits = 0;
lru_cache_ops_.strict_capacity_limit = true;
lru_cache_ops_.metadata_charge_policy = kDontChargeCacheMetadata;
secondary_cache_opts_.capacity = 8 << 20; // 8 MB
secondary_cache_opts_.num_shard_bits = 0;
secondary_cache_opts_.metadata_charge_policy = kDontChargeCacheMetadata;
// Read blobs from the secondary cache if they are not in the primary cache
options_.lowest_used_cache_tier = CacheTier::kNonVolatileBlockTier;
assert(db_->GetDbIdentity(db_id_).ok());
assert(db_->GetDbSessionId(db_session_id_).ok());
}
Options options_;
LRUCacheOptions lru_cache_ops_;
CompressedSecondaryCacheOptions secondary_cache_opts_;
std::string db_id_;
std::string db_session_id_;
};
TEST_F(BlobSecondaryCacheTest, GetBlobsFromSecondaryCache) {
if (!Snappy_Supported()) {
return;
}
secondary_cache_opts_.compression_type = kSnappyCompression;
lru_cache_ops_.secondary_cache =
NewCompressedSecondaryCache(secondary_cache_opts_);
options_.blob_cache = NewLRUCache(lru_cache_ops_);
options_.cf_paths.emplace_back(
test::PerThreadDBPath(
env_, "BlobSecondaryCacheTest_GetBlobsFromSecondaryCache"),
0);
options_.statistics = CreateDBStatistics();
Statistics* statistics = options_.statistics.get();
assert(statistics);
DestroyAndReopen(options_);
ImmutableOptions immutable_options(options_);
constexpr uint32_t column_family_id = 1;
constexpr bool has_ttl = false;
constexpr ExpirationRange expiration_range;
constexpr uint64_t file_number = 1;
Random rnd(301);
std::vector<std::string> key_strs{"key0", "key1"};
std::vector<std::string> blob_strs{rnd.RandomString(1010),
rnd.RandomString(1020)};
std::vector<Slice> keys{key_strs[0], key_strs[1]};
std::vector<Slice> blobs{blob_strs[0], blob_strs[1]};
std::vector<uint64_t> blob_offsets(keys.size());
std::vector<uint64_t> blob_sizes(keys.size());
WriteBlobFile(immutable_options, column_family_id, has_ttl, expiration_range,
expiration_range, file_number, keys, blobs, kNoCompression,
blob_offsets, blob_sizes);
constexpr size_t capacity = 1024;
std::shared_ptr<Cache> backing_cache = NewLRUCache(capacity);
FileOptions file_options;
constexpr HistogramImpl* blob_file_read_hist = nullptr;
std::unique_ptr<BlobFileCache> blob_file_cache(new BlobFileCache(
backing_cache.get(), &immutable_options, &file_options, column_family_id,
blob_file_read_hist, nullptr /*IOTracer*/));
BlobSource blob_source(&immutable_options, db_id_, db_session_id_,
blob_file_cache.get());
CacheHandleGuard<BlobFileReader> file_reader;
ASSERT_OK(blob_source.GetBlobFileReader(file_number, &file_reader));
ASSERT_NE(file_reader.GetValue(), nullptr);
const uint64_t file_size = file_reader.GetValue()->GetFileSize();
ASSERT_EQ(file_reader.GetValue()->GetCompressionType(), kNoCompression);
ReadOptions read_options;
read_options.verify_checksums = true;
auto blob_cache = options_.blob_cache;
auto secondary_cache = lru_cache_ops_.secondary_cache;
Cache::CreateCallback create_cb = [&](const void* buf, size_t size,
void** out_obj,
size_t* charge) -> Status {
std::string* blob = new std::string();
blob->assign(static_cast<const char*>(buf), size);
*out_obj = blob;
*charge = size;
return Status::OK();
};
{
// GetBlob
std::vector<PinnableSlice> values(keys.size());
read_options.fill_cache = true;
get_perf_context()->Reset();
// key0 should be filled to the primary cache from the blob file.
ASSERT_OK(blob_source.GetBlob(read_options, keys[0], file_number,
blob_offsets[0], file_size, blob_sizes[0],
kNoCompression, nullptr /* prefetch_buffer */,
&values[0], nullptr /* bytes_read */));
ASSERT_EQ(values[0], blobs[0]);
ASSERT_TRUE(
blob_source.TEST_BlobInCache(file_number, file_size, blob_offsets[0]));
// key0 should be demoted to the secondary cache, and key1 should be filled
// to the primary cache from the blob file.
ASSERT_OK(blob_source.GetBlob(read_options, keys[1], file_number,
blob_offsets[1], file_size, blob_sizes[1],
kNoCompression, nullptr /* prefetch_buffer */,
&values[1], nullptr /* bytes_read */));
ASSERT_EQ(values[1], blobs[1]);
ASSERT_TRUE(
blob_source.TEST_BlobInCache(file_number, file_size, blob_offsets[1]));
OffsetableCacheKey base_cache_key(db_id_, db_session_id_, file_number,
file_size);
// blob_cache here only looks at the primary cache since we didn't provide
// the cache item helper for the secondary cache. However, since key0 is
// demoted to the secondary cache, we shouldn't be able to find it in the
// primary cache.
{
CacheKey cache_key = base_cache_key.WithOffset(blob_offsets[0]);
const Slice key0 = cache_key.AsSlice();
auto handle0 = blob_cache->Lookup(key0, statistics);
ASSERT_EQ(handle0, nullptr);
// key0 should be in the secondary cache. After looking up key0 in the
// secondary cache, it will be erased from the secondary cache.
bool is_in_sec_cache = false;
auto sec_handle0 =
secondary_cache->Lookup(key0, create_cb, true, is_in_sec_cache);
ASSERT_FALSE(is_in_sec_cache);
ASSERT_NE(sec_handle0, nullptr);
ASSERT_TRUE(sec_handle0->IsReady());
auto value = static_cast<std::string*>(sec_handle0->Value());
ASSERT_EQ(*value, blobs[0]);
delete value;
// key0 doesn't exist in the blob cache
ASSERT_FALSE(blob_source.TEST_BlobInCache(file_number, file_size,
blob_offsets[0]));
}
// key1 should exist in the primary cache.
{
CacheKey cache_key = base_cache_key.WithOffset(blob_offsets[1]);
const Slice key1 = cache_key.AsSlice();
auto handle1 = blob_cache->Lookup(key1, statistics);
ASSERT_NE(handle1, nullptr);
blob_cache->Release(handle1);
bool is_in_sec_cache = false;
auto sec_handle1 =
secondary_cache->Lookup(key1, create_cb, true, is_in_sec_cache);
ASSERT_FALSE(is_in_sec_cache);
ASSERT_EQ(sec_handle1, nullptr);
ASSERT_TRUE(blob_source.TEST_BlobInCache(file_number, file_size,
blob_offsets[1]));
}
{
// fetch key0 from the blob file to the primary cache.
ASSERT_OK(blob_source.GetBlob(
read_options, keys[0], file_number, blob_offsets[0], file_size,
blob_sizes[0], kNoCompression, nullptr /* prefetch_buffer */,
&values[0], nullptr /* bytes_read */));
ASSERT_EQ(values[0], blobs[0]);
// key0 should be in the primary cache.
CacheKey cache_key0 = base_cache_key.WithOffset(blob_offsets[0]);
const Slice key0 = cache_key0.AsSlice();
auto handle0 = blob_cache->Lookup(key0, statistics);
ASSERT_NE(handle0, nullptr);
auto value = static_cast<std::string*>(blob_cache->Value(handle0));
ASSERT_EQ(*value, blobs[0]);
blob_cache->Release(handle0);
// key1 is not in the primary cache, and it should be demoted to the
// secondary cache.
CacheKey cache_key1 = base_cache_key.WithOffset(blob_offsets[1]);
const Slice key1 = cache_key1.AsSlice();
auto handle1 = blob_cache->Lookup(key1, statistics);
ASSERT_EQ(handle1, nullptr);
// erase key0 from the primary cache.
blob_cache->Erase(key0);
handle0 = blob_cache->Lookup(key0, statistics);
ASSERT_EQ(handle0, nullptr);
// key1 promotion should succeed due to the primary cache being empty. we
// did't call secondary cache's Lookup() here, because it will remove the
// key but it won't be able to promote the key to the primary cache.
// Instead we use the end-to-end blob source API to promote the key to
// the primary cache.
ASSERT_TRUE(blob_source.TEST_BlobInCache(file_number, file_size,
blob_offsets[1]));
// key1 should be in the primary cache.
handle1 = blob_cache->Lookup(key1, statistics);
ASSERT_NE(handle1, nullptr);
value = static_cast<std::string*>(blob_cache->Value(handle1));
ASSERT_EQ(*value, blobs[1]);
blob_cache->Release(handle1);
}
}
}
} // namespace ROCKSDB_NAMESPACE } // namespace ROCKSDB_NAMESPACE
int main(int argc, char** argv) { int main(int argc, char** argv) {

Loading…
Cancel
Save