Refactor block cache tracing APIs (#10811)

Summary:
Refactor the classes, APIs and data structures for block cache tracing to allow a user provided trace writer to be used. Currently, only a TraceWriter is supported, with a default built-in implementation of FileTraceWriter. The TraceWriter, however, takes a flat trace record and is thus only suitable for file tracing. This PR introduces an abstract BlockCacheTraceWriter class that takes a structured BlockCacheTraceRecord. The BlockCacheTraceWriter implementation can then format and log the record in whatever way it sees fit. The default BlockCacheTraceWriterImpl does file tracing using a user provided TraceWriter.

`DB::StartBlockTrace` will internally redirect to changed `BlockCacheTrace::StartBlockCacheTrace`.
New API `DB::StartBlockTrace` is also added that directly takes `BlockCacheTraceWriter` pointer.

This same philosophy can be applied to KV and IO tracing as well.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10811

Test Plan:
existing unit tests
Old API DB::StartBlockTrace checked with db_bench tool
create database
```
./db_bench --benchmarks="fillseq" \
--key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 \
--cache_index_and_filter_blocks --cache_size=1048576 \
--disable_auto_compactions=1 --disable_wal=1 --compression_type=none \
--min_level_to_compress=-1 --compression_ratio=1 --num=10000000
```

To trace block cache accesses when running readrandom benchmark:
```
./db_bench --benchmarks="readrandom" --use_existing_db --duration=60 \
--key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 \
--cache_index_and_filter_blocks --cache_size=1048576 \
--disable_auto_compactions=1 --disable_wal=1 --compression_type=none \
--min_level_to_compress=-1 --compression_ratio=1 --num=10000000 \
--threads=16 \
-block_cache_trace_file="/tmp/binary_trace_test_example" \
-block_cache_trace_max_trace_file_size_in_bytes=1073741824 \
-block_cache_trace_sampling_frequency=1

```

Reviewed By: anand1976

Differential Revision: D40435289

Pulled By: akankshamahajan15

fbshipit-source-id: fa2755f4788185e19f4605e731641cfd21ab3282
main
akankshamahajan 2 years ago committed by Facebook GitHub Bot
parent b6e33dbc0e
commit 0e7b27bfcf
  1. 3
      HISTORY.md
  2. 20
      db/db_impl/db_impl.cc
  3. 6
      db/db_impl/db_impl.h
  4. 149
      include/rocksdb/block_cache_trace_writer.h
  5. 9
      include/rocksdb/db.h
  6. 0
      include/rocksdb/table_reader_caller.h
  7. 8
      include/rocksdb/utilities/stackable_db.h
  8. 2
      table/table_reader.h
  9. 48
      table/table_test.cc
  10. 2
      tools/block_cache_analyzer/block_cache_trace_analyzer.cc
  11. 2
      tools/block_cache_analyzer/block_cache_trace_analyzer.h
  12. 41
      tools/block_cache_analyzer/block_cache_trace_analyzer_test.cc
  13. 49
      trace_replay/block_cache_tracer.cc
  14. 86
      trace_replay/block_cache_tracer.h
  15. 112
      trace_replay/block_cache_tracer_test.cc
  16. 4
      utilities/simulator_cache/cache_simulator.cc
  17. 26
      utilities/simulator_cache/cache_simulator_test.cc

@ -32,6 +32,9 @@
### Behavior Changes ### Behavior Changes
* Sanitize min_write_buffer_number_to_merge to 1 if atomic flush is enabled to prevent unexpected data loss when WAL is disabled in a multi-column-family setting (#10773). * Sanitize min_write_buffer_number_to_merge to 1 if atomic flush is enabled to prevent unexpected data loss when WAL is disabled in a multi-column-family setting (#10773).
### Public API changes
* Refactor the classes, APIs and data structures for block cache tracing to allow a user provided trace writer to be used. Introduced an abstract BlockCacheTraceWriter class that takes a structured BlockCacheTraceRecord. The BlockCacheTraceWriter implementation can then format and log the record in whatever way it sees fit. The default BlockCacheTraceWriterImpl does file tracing using a user provided TraceWriter. More details in rocksdb/includb/block_cache_trace_writer.h.
## 7.7.0 (09/18/2022) ## 7.7.0 (09/18/2022)
### Bug Fixes ### Bug Fixes
* Fixed a hang when an operation such as `GetLiveFiles` or `CreateNewBackup` is asked to trigger and wait for memtable flush on a read-only DB. Such indirect requests for memtable flush are now ignored on a read-only DB. * Fixed a hang when an operation such as `GetLiveFiles` or `CreateNewBackup` is asked to trigger and wait for memtable flush on a read-only DB. Such indirect requests for memtable flush are now ignored on a read-only DB.

@ -5782,8 +5782,24 @@ Status DBImpl::NewDefaultReplayer(
Status DBImpl::StartBlockCacheTrace( Status DBImpl::StartBlockCacheTrace(
const TraceOptions& trace_options, const TraceOptions& trace_options,
std::unique_ptr<TraceWriter>&& trace_writer) { std::unique_ptr<TraceWriter>&& trace_writer) {
return block_cache_tracer_.StartTrace(immutable_db_options_.clock, BlockCacheTraceOptions block_trace_opts;
trace_options, std::move(trace_writer)); block_trace_opts.sampling_frequency = trace_options.sampling_frequency;
BlockCacheTraceWriterOptions trace_writer_opt;
trace_writer_opt.max_trace_file_size = trace_options.max_trace_file_size;
std::unique_ptr<BlockCacheTraceWriter> block_cache_trace_writer =
NewBlockCacheTraceWriter(env_->GetSystemClock().get(), trace_writer_opt,
std::move(trace_writer));
return block_cache_tracer_.StartTrace(block_trace_opts,
std::move(block_cache_trace_writer));
}
Status DBImpl::StartBlockCacheTrace(
const BlockCacheTraceOptions& trace_options,
std::unique_ptr<BlockCacheTraceWriter>&& trace_writer) {
return block_cache_tracer_.StartTrace(trace_options, std::move(trace_writer));
} }
Status DBImpl::EndBlockCacheTrace() { Status DBImpl::EndBlockCacheTrace() {

@ -568,9 +568,13 @@ class DBImpl : public DB {
using DB::StartBlockCacheTrace; using DB::StartBlockCacheTrace;
Status StartBlockCacheTrace( Status StartBlockCacheTrace(
const TraceOptions& options, const TraceOptions& trace_options,
std::unique_ptr<TraceWriter>&& trace_writer) override; std::unique_ptr<TraceWriter>&& trace_writer) override;
Status StartBlockCacheTrace(
const BlockCacheTraceOptions& options,
std::unique_ptr<BlockCacheTraceWriter>&& trace_writer) override;
using DB::EndBlockCacheTrace; using DB::EndBlockCacheTrace;
Status EndBlockCacheTrace() override; Status EndBlockCacheTrace() override;

@ -0,0 +1,149 @@
// Copyright (c) 2022, Meta, Inc. All rights reserved.
// This source code is licensed under both the GPLv2 (found in the
// COPYING file in the root directory) and Apache 2.0 License
// (found in the LICENSE.Apache file in the root directory).
#pragma once
#include "rocksdb/options.h"
#include "rocksdb/system_clock.h"
#include "rocksdb/table_reader_caller.h"
#include "rocksdb/trace_reader_writer.h"
#include "rocksdb/trace_record.h"
namespace ROCKSDB_NAMESPACE {
// A record for block cache lookups/inserts. This is passed by the table
// reader to the BlockCacheTraceWriter for every block cache op.
struct BlockCacheTraceRecord {
// Required fields for all accesses.
uint64_t access_timestamp = 0;
// Info related to the block being looked up or inserted
//
// 1. The cache key for the block
std::string block_key;
// 2. The type of block
TraceType block_type = TraceType::kTraceMax;
// 3. Size of the block
uint64_t block_size = 0;
// Info about the SST file the block is in
//
// 1. Column family ID
uint64_t cf_id = 0;
// 2. Column family name
std::string cf_name;
// 3. LSM level of the file
uint32_t level = 0;
// 4. SST file number
uint64_t sst_fd_number = 0;
// Info about the calling context
//
// 1. The higher level request triggering the block cache request
TableReaderCaller caller = TableReaderCaller::kMaxBlockCacheLookupCaller;
// 2. Cache lookup hit/miss. Not relevant for inserts
bool is_cache_hit = false;
// 3. Whether this request is a lookup
bool no_insert = false;
// Get/MultiGet specific info
//
// 1. A unique ID for Get/MultiGet
uint64_t get_id = kReservedGetId;
// 2. Whether the Get/MultiGet is from a user-specified snapshot
bool get_from_user_specified_snapshot = false;
// 3. The target user key in the block
std::string referenced_key;
// Required fields for data block and user Get/Multi-Get only.
//
// 1. Size of te useful data in the block
uint64_t referenced_data_size = 0;
// 2. Only for MultiGet, number of keys from the batch found in the block
uint64_t num_keys_in_block = 0;
// 3. Whether the key was found in the block or not (false positive)
bool referenced_key_exist_in_block = false;
static const uint64_t kReservedGetId;
BlockCacheTraceRecord() {}
BlockCacheTraceRecord(uint64_t _access_timestamp, std::string _block_key,
TraceType _block_type, uint64_t _block_size,
uint64_t _cf_id, std::string _cf_name, uint32_t _level,
uint64_t _sst_fd_number, TableReaderCaller _caller,
bool _is_cache_hit, bool _no_insert, uint64_t _get_id,
bool _get_from_user_specified_snapshot = false,
std::string _referenced_key = "",
uint64_t _referenced_data_size = 0,
uint64_t _num_keys_in_block = 0,
bool _referenced_key_exist_in_block = false)
: access_timestamp(_access_timestamp),
block_key(_block_key),
block_type(_block_type),
block_size(_block_size),
cf_id(_cf_id),
cf_name(_cf_name),
level(_level),
sst_fd_number(_sst_fd_number),
caller(_caller),
is_cache_hit(_is_cache_hit),
no_insert(_no_insert),
get_id(_get_id),
get_from_user_specified_snapshot(_get_from_user_specified_snapshot),
referenced_key(_referenced_key),
referenced_data_size(_referenced_data_size),
num_keys_in_block(_num_keys_in_block),
referenced_key_exist_in_block(_referenced_key_exist_in_block) {}
};
// Options for tracing block cache accesses
struct BlockCacheTraceOptions {
// Specify trace sampling option, i.e. capture one per how many requests.
// Default to 1 (capture every request).
uint64_t sampling_frequency = 1;
};
// Options for the built-in implementation of BlockCacheTraceWriter
struct BlockCacheTraceWriterOptions {
uint64_t max_trace_file_size = uint64_t{64} * 1024 * 1024 * 1024;
};
// BlockCacheTraceWriter is an abstract class that captures all RocksDB block
// cache accesses. Every RocksDB operation is passed to WriteBlockAccess()
// with a BlockCacheTraceRecord.
class BlockCacheTraceWriter {
public:
virtual ~BlockCacheTraceWriter() {}
// Pass Slice references to avoid copy.
virtual Status WriteBlockAccess(const BlockCacheTraceRecord& record,
const Slice& block_key, const Slice& cf_name,
const Slice& referenced_key) = 0;
// Write a trace header at the beginning, typically on initiating a trace,
// with some metadata like a magic number and RocksDB version.
virtual Status WriteHeader() = 0;
};
// Allocate an instance of the built-in BlockCacheTraceWriter implementation,
// that traces all block cache accesses to a user-provided TraceWriter. Each
// access is traced to a file with a timestamp and type, followed by the
// payload.
std::unique_ptr<BlockCacheTraceWriter> NewBlockCacheTraceWriter(
SystemClock* clock, const BlockCacheTraceWriterOptions& trace_options,
std::unique_ptr<TraceWriter>&& trace_writer);
} // namespace ROCKSDB_NAMESPACE

@ -17,6 +17,7 @@
#include <unordered_map> #include <unordered_map>
#include <vector> #include <vector>
#include "rocksdb/block_cache_trace_writer.h"
#include "rocksdb/iterator.h" #include "rocksdb/iterator.h"
#include "rocksdb/listener.h" #include "rocksdb/listener.h"
#include "rocksdb/metadata.h" #include "rocksdb/metadata.h"
@ -1748,11 +1749,17 @@ class DB {
// Trace block cache accesses. Use EndBlockCacheTrace() to stop tracing. // Trace block cache accesses. Use EndBlockCacheTrace() to stop tracing.
virtual Status StartBlockCacheTrace( virtual Status StartBlockCacheTrace(
const TraceOptions& /*options*/, const TraceOptions& /*trace_options*/,
std::unique_ptr<TraceWriter>&& /*trace_writer*/) { std::unique_ptr<TraceWriter>&& /*trace_writer*/) {
return Status::NotSupported("StartBlockCacheTrace() is not implemented."); return Status::NotSupported("StartBlockCacheTrace() is not implemented.");
} }
virtual Status StartBlockCacheTrace(
const BlockCacheTraceOptions& /*options*/,
std::unique_ptr<BlockCacheTraceWriter>&& /*trace_writer*/) {
return Status::NotSupported("StartBlockCacheTrace() is not implemented.");
}
virtual Status EndBlockCacheTrace() { virtual Status EndBlockCacheTrace() {
return Status::NotSupported("EndBlockCacheTrace() is not implemented."); return Status::NotSupported("EndBlockCacheTrace() is not implemented.");
} }

@ -404,8 +404,14 @@ class StackableDB : public DB {
using DB::StartBlockCacheTrace; using DB::StartBlockCacheTrace;
Status StartBlockCacheTrace( Status StartBlockCacheTrace(
const TraceOptions& options, const TraceOptions& trace_options,
std::unique_ptr<TraceWriter>&& trace_writer) override { std::unique_ptr<TraceWriter>&& trace_writer) override {
return db_->StartBlockCacheTrace(trace_options, std::move(trace_writer));
}
Status StartBlockCacheTrace(
const BlockCacheTraceOptions& options,
std::unique_ptr<BlockCacheTraceWriter>&& trace_writer) override {
return db_->StartBlockCacheTrace(options, std::move(trace_writer)); return db_->StartBlockCacheTrace(options, std::move(trace_writer));
} }

@ -15,10 +15,10 @@
#include "folly/experimental/coro/Task.h" #include "folly/experimental/coro/Task.h"
#endif #endif
#include "rocksdb/slice_transform.h" #include "rocksdb/slice_transform.h"
#include "rocksdb/table_reader_caller.h"
#include "table/get_context.h" #include "table/get_context.h"
#include "table/internal_iterator.h" #include "table/internal_iterator.h"
#include "table/multiget_context.h" #include "table/multiget_context.h"
#include "table/table_reader_caller.h"
namespace ROCKSDB_NAMESPACE { namespace ROCKSDB_NAMESPACE {

@ -1147,15 +1147,21 @@ class BlockBasedTableTest
test_path_ = test::PerThreadDBPath("block_based_table_tracing_test"); test_path_ = test::PerThreadDBPath("block_based_table_tracing_test");
EXPECT_OK(env_->CreateDir(test_path_)); EXPECT_OK(env_->CreateDir(test_path_));
trace_file_path_ = test_path_ + "/block_cache_trace_file"; trace_file_path_ = test_path_ + "/block_cache_trace_file";
TraceOptions trace_opt;
BlockCacheTraceWriterOptions trace_writer_opt;
BlockCacheTraceOptions trace_opt;
std::unique_ptr<TraceWriter> trace_writer; std::unique_ptr<TraceWriter> trace_writer;
EXPECT_OK(NewFileTraceWriter(env_, EnvOptions(), trace_file_path_, EXPECT_OK(NewFileTraceWriter(env_, EnvOptions(), trace_file_path_,
&trace_writer)); &trace_writer));
std::unique_ptr<BlockCacheTraceWriter> block_cache_trace_writer =
NewBlockCacheTraceWriter(env_->GetSystemClock().get(), trace_writer_opt,
std::move(trace_writer));
ASSERT_NE(block_cache_trace_writer, nullptr);
// Always return Status::OK(). // Always return Status::OK().
assert(c->block_cache_tracer_ assert(c->block_cache_tracer_
.StartTrace(env_->GetSystemClock().get(), trace_opt, .StartTrace(trace_opt, std::move(block_cache_trace_writer))
std::move(trace_writer))
.ok()); .ok());
{ {
std::string user_key = "k01"; std::string user_key = "k01";
InternalKey internal_key(user_key, 0, kTypeValue); InternalKey internal_key(user_key, 0, kTypeValue);
@ -1213,10 +1219,10 @@ class BlockBasedTableTest
} else { } else {
EXPECT_EQ(access.referenced_key, ""); EXPECT_EQ(access.referenced_key, "");
EXPECT_EQ(access.get_id, 0); EXPECT_EQ(access.get_id, 0);
EXPECT_TRUE(access.get_from_user_specified_snapshot == Boolean::kFalse); EXPECT_FALSE(access.get_from_user_specified_snapshot);
EXPECT_EQ(access.referenced_data_size, 0); EXPECT_EQ(access.referenced_data_size, 0);
EXPECT_EQ(access.num_keys_in_block, 0); EXPECT_EQ(access.num_keys_in_block, 0);
EXPECT_TRUE(access.referenced_key_exist_in_block == Boolean::kFalse); EXPECT_FALSE(access.referenced_key_exist_in_block);
} }
index++; index++;
} }
@ -3051,8 +3057,8 @@ TEST_P(BlockBasedTableTest, TracingGetTest) {
BlockCacheTraceRecord record; BlockCacheTraceRecord record;
record.block_type = TraceType::kBlockTraceIndexBlock; record.block_type = TraceType::kBlockTraceIndexBlock;
record.caller = TableReaderCaller::kPrefetch; record.caller = TableReaderCaller::kPrefetch;
record.is_cache_hit = Boolean::kFalse; record.is_cache_hit = false;
record.no_insert = Boolean::kFalse; record.no_insert = false;
expected_records.push_back(record); expected_records.push_back(record);
record.block_type = TraceType::kBlockTraceFilterBlock; record.block_type = TraceType::kBlockTraceFilterBlock;
expected_records.push_back(record); expected_records.push_back(record);
@ -3061,22 +3067,22 @@ TEST_P(BlockBasedTableTest, TracingGetTest) {
record.get_id = 1; record.get_id = 1;
record.block_type = TraceType::kBlockTraceFilterBlock; record.block_type = TraceType::kBlockTraceFilterBlock;
record.caller = TableReaderCaller::kUserGet; record.caller = TableReaderCaller::kUserGet;
record.get_from_user_specified_snapshot = Boolean::kFalse; record.get_from_user_specified_snapshot = false;
record.referenced_key = encoded_key; record.referenced_key = encoded_key;
record.referenced_key_exist_in_block = Boolean::kTrue; record.referenced_key_exist_in_block = true;
record.is_cache_hit = Boolean::kTrue; record.is_cache_hit = true;
expected_records.push_back(record); expected_records.push_back(record);
record.block_type = TraceType::kBlockTraceIndexBlock; record.block_type = TraceType::kBlockTraceIndexBlock;
expected_records.push_back(record); expected_records.push_back(record);
record.is_cache_hit = Boolean::kFalse; record.is_cache_hit = false;
record.block_type = TraceType::kBlockTraceDataBlock; record.block_type = TraceType::kBlockTraceDataBlock;
expected_records.push_back(record); expected_records.push_back(record);
// The second get should all observe cache hits. // The second get should all observe cache hits.
record.is_cache_hit = Boolean::kTrue; record.is_cache_hit = true;
record.get_id = 2; record.get_id = 2;
record.block_type = TraceType::kBlockTraceFilterBlock; record.block_type = TraceType::kBlockTraceFilterBlock;
record.caller = TableReaderCaller::kUserGet; record.caller = TableReaderCaller::kUserGet;
record.get_from_user_specified_snapshot = Boolean::kFalse; record.get_from_user_specified_snapshot = false;
record.referenced_key = encoded_key; record.referenced_key = encoded_key;
expected_records.push_back(record); expected_records.push_back(record);
record.block_type = TraceType::kBlockTraceIndexBlock; record.block_type = TraceType::kBlockTraceIndexBlock;
@ -3116,15 +3122,15 @@ TEST_P(BlockBasedTableTest, TracingApproximateOffsetOfTest) {
BlockCacheTraceRecord record; BlockCacheTraceRecord record;
record.block_type = TraceType::kBlockTraceIndexBlock; record.block_type = TraceType::kBlockTraceIndexBlock;
record.caller = TableReaderCaller::kPrefetch; record.caller = TableReaderCaller::kPrefetch;
record.is_cache_hit = Boolean::kFalse; record.is_cache_hit = false;
record.no_insert = Boolean::kFalse; record.no_insert = false;
expected_records.push_back(record); expected_records.push_back(record);
record.block_type = TraceType::kBlockTraceFilterBlock; record.block_type = TraceType::kBlockTraceFilterBlock;
expected_records.push_back(record); expected_records.push_back(record);
// Then we should have two records for only index blocks. // Then we should have two records for only index blocks.
record.block_type = TraceType::kBlockTraceIndexBlock; record.block_type = TraceType::kBlockTraceIndexBlock;
record.caller = TableReaderCaller::kUserApproximateSize; record.caller = TableReaderCaller::kUserApproximateSize;
record.is_cache_hit = Boolean::kTrue; record.is_cache_hit = true;
expected_records.push_back(record); expected_records.push_back(record);
expected_records.push_back(record); expected_records.push_back(record);
VerifyBlockAccessTrace(&c, expected_records); VerifyBlockAccessTrace(&c, expected_records);
@ -3169,24 +3175,24 @@ TEST_P(BlockBasedTableTest, TracingIterator) {
BlockCacheTraceRecord record; BlockCacheTraceRecord record;
record.block_type = TraceType::kBlockTraceIndexBlock; record.block_type = TraceType::kBlockTraceIndexBlock;
record.caller = TableReaderCaller::kPrefetch; record.caller = TableReaderCaller::kPrefetch;
record.is_cache_hit = Boolean::kFalse; record.is_cache_hit = false;
record.no_insert = Boolean::kFalse; record.no_insert = false;
expected_records.push_back(record); expected_records.push_back(record);
record.block_type = TraceType::kBlockTraceFilterBlock; record.block_type = TraceType::kBlockTraceFilterBlock;
expected_records.push_back(record); expected_records.push_back(record);
// Then we should have three records for index and two data block access. // Then we should have three records for index and two data block access.
record.block_type = TraceType::kBlockTraceIndexBlock; record.block_type = TraceType::kBlockTraceIndexBlock;
record.caller = TableReaderCaller::kUserIterator; record.caller = TableReaderCaller::kUserIterator;
record.is_cache_hit = Boolean::kTrue; record.is_cache_hit = true;
expected_records.push_back(record); expected_records.push_back(record);
record.block_type = TraceType::kBlockTraceDataBlock; record.block_type = TraceType::kBlockTraceDataBlock;
record.is_cache_hit = Boolean::kFalse; record.is_cache_hit = false;
expected_records.push_back(record); expected_records.push_back(record);
expected_records.push_back(record); expected_records.push_back(record);
// When we iterate this file for the second time, we should observe all cache // When we iterate this file for the second time, we should observe all cache
// hits. // hits.
record.block_type = TraceType::kBlockTraceIndexBlock; record.block_type = TraceType::kBlockTraceIndexBlock;
record.is_cache_hit = Boolean::kTrue; record.is_cache_hit = true;
expected_records.push_back(record); expected_records.push_back(record);
record.block_type = TraceType::kBlockTraceDataBlock; record.block_type = TraceType::kBlockTraceDataBlock;
expected_records.push_back(record); expected_records.push_back(record);

@ -1568,7 +1568,7 @@ Status BlockCacheTraceAnalyzer::Analyze() {
trace_end_timestamp_in_seconds_ = access.access_timestamp / kMicrosInSecond; trace_end_timestamp_in_seconds_ = access.access_timestamp / kMicrosInSecond;
miss_ratio_stats_.UpdateMetrics(access.access_timestamp, miss_ratio_stats_.UpdateMetrics(access.access_timestamp,
is_user_access(access.caller), is_user_access(access.caller),
access.is_cache_hit == Boolean::kFalse); !access.is_cache_hit);
if (cache_simulator_) { if (cache_simulator_) {
cache_simulator_->Access(access); cache_simulator_->Access(access);
} }

@ -95,7 +95,7 @@ struct BlockAccessInfo {
if (BlockCacheTraceHelper::IsGetOrMultiGetOnDataBlock(access.block_type, if (BlockCacheTraceHelper::IsGetOrMultiGetOnDataBlock(access.block_type,
access.caller)) { access.caller)) {
num_keys = access.num_keys_in_block; num_keys = access.num_keys_in_block;
if (access.referenced_key_exist_in_block == Boolean::kTrue) { if (access.referenced_key_exist_in_block) {
if (key_num_access_map.find(access.referenced_key) == if (key_num_access_map.find(access.referenced_key) ==
key_num_access_map.end()) { key_num_access_map.end()) {
referenced_data_size += access.referenced_data_size; referenced_data_size += access.referenced_data_size;

@ -114,14 +114,14 @@ class BlockCacheTracerTest : public testing::Test {
} else { } else {
record.sst_fd_number = kSSTStoringOddKeys; record.sst_fd_number = kSSTStoringOddKeys;
} }
record.is_cache_hit = Boolean::kFalse; record.is_cache_hit = false;
record.no_insert = Boolean::kFalse; record.no_insert = false;
// Provide these fields for all block types. // Provide these fields for all block types.
// The writer should only write these fields for data blocks and the // The writer should only write these fields for data blocks and the
// caller is either GET or MGET. // caller is either GET or MGET.
record.referenced_key = record.referenced_key =
kRefKeyPrefix + std::to_string(key_id) + std::string(8, 0); kRefKeyPrefix + std::to_string(key_id) + std::string(8, 0);
record.referenced_key_exist_in_block = Boolean::kTrue; record.referenced_key_exist_in_block = true;
record.num_keys_in_block = kNumKeysInBlock; record.num_keys_in_block = kNumKeysInBlock;
ASSERT_OK(writer->WriteBlockAccess( ASSERT_OK(writer->WriteBlockAccess(
record, record.block_key, record.cf_name, record.referenced_key)); record, record.block_key, record.cf_name, record.referenced_key));
@ -223,15 +223,18 @@ class BlockCacheTracerTest : public testing::Test {
TEST_F(BlockCacheTracerTest, BlockCacheAnalyzer) { TEST_F(BlockCacheTracerTest, BlockCacheAnalyzer) {
{ {
// Generate a trace file. // Generate a trace file.
TraceOptions trace_opt; BlockCacheTraceWriterOptions trace_writer_opt;
std::unique_ptr<TraceWriter> trace_writer; std::unique_ptr<TraceWriter> trace_writer;
ASSERT_OK(NewFileTraceWriter(env_, env_options_, trace_file_path_, ASSERT_OK(NewFileTraceWriter(env_, env_options_, trace_file_path_,
&trace_writer)); &trace_writer));
const auto& clock = env_->GetSystemClock(); const auto& clock = env_->GetSystemClock();
BlockCacheTraceWriter writer(clock.get(), trace_opt, std::unique_ptr<BlockCacheTraceWriter> block_cache_trace_writer =
NewBlockCacheTraceWriter(clock.get(), trace_writer_opt,
std::move(trace_writer)); std::move(trace_writer));
ASSERT_OK(writer.WriteHeader()); ASSERT_NE(block_cache_trace_writer, nullptr);
WriteBlockAccess(&writer, 0, TraceType::kBlockTraceDataBlock, 50); ASSERT_OK(block_cache_trace_writer->WriteHeader());
WriteBlockAccess(block_cache_trace_writer.get(), 0,
TraceType::kBlockTraceDataBlock, 50);
ASSERT_OK(env_->FileExists(trace_file_path_)); ASSERT_OK(env_->FileExists(trace_file_path_));
} }
{ {
@ -612,21 +615,27 @@ TEST_F(BlockCacheTracerTest, MixedBlocks) {
// It contains two SST files with 25 blocks of odd numbered block_key in // It contains two SST files with 25 blocks of odd numbered block_key in
// kSSTStoringOddKeys and 25 blocks of even numbered blocks_key in // kSSTStoringOddKeys and 25 blocks of even numbered blocks_key in
// kSSTStoringEvenKeys. // kSSTStoringEvenKeys.
TraceOptions trace_opt; BlockCacheTraceWriterOptions trace_writer_opt;
std::unique_ptr<TraceWriter> trace_writer; std::unique_ptr<TraceWriter> trace_writer;
const auto& clock = env_->GetSystemClock(); const auto& clock = env_->GetSystemClock();
ASSERT_OK(NewFileTraceWriter(env_, env_options_, trace_file_path_, ASSERT_OK(NewFileTraceWriter(env_, env_options_, trace_file_path_,
&trace_writer)); &trace_writer));
BlockCacheTraceWriter writer(clock.get(), trace_opt, std::unique_ptr<BlockCacheTraceWriter> block_cache_trace_writer =
NewBlockCacheTraceWriter(clock.get(), trace_writer_opt,
std::move(trace_writer)); std::move(trace_writer));
ASSERT_OK(writer.WriteHeader()); ASSERT_NE(block_cache_trace_writer, nullptr);
ASSERT_OK(block_cache_trace_writer->WriteHeader());
// Write blocks of different types. // Write blocks of different types.
WriteBlockAccess(&writer, 0, TraceType::kBlockTraceUncompressionDictBlock, WriteBlockAccess(block_cache_trace_writer.get(), 0,
10); TraceType::kBlockTraceUncompressionDictBlock, 10);
WriteBlockAccess(&writer, 10, TraceType::kBlockTraceDataBlock, 10); WriteBlockAccess(block_cache_trace_writer.get(), 10,
WriteBlockAccess(&writer, 20, TraceType::kBlockTraceFilterBlock, 10); TraceType::kBlockTraceDataBlock, 10);
WriteBlockAccess(&writer, 30, TraceType::kBlockTraceIndexBlock, 10); WriteBlockAccess(block_cache_trace_writer.get(), 20,
WriteBlockAccess(&writer, 40, TraceType::kBlockTraceRangeDeletionBlock, 10); TraceType::kBlockTraceFilterBlock, 10);
WriteBlockAccess(block_cache_trace_writer.get(), 30,
TraceType::kBlockTraceIndexBlock, 10);
WriteBlockAccess(block_cache_trace_writer.get(), 40,
TraceType::kBlockTraceRangeDeletionBlock, 10);
ASSERT_OK(env_->FileExists(trace_file_path_)); ASSERT_OK(env_->FileExists(trace_file_path_));
} }

@ -20,7 +20,8 @@
namespace ROCKSDB_NAMESPACE { namespace ROCKSDB_NAMESPACE {
namespace { namespace {
bool ShouldTrace(const Slice& block_key, const TraceOptions& trace_options) { bool ShouldTrace(const Slice& block_key,
const BlockCacheTraceOptions& trace_options) {
if (trace_options.sampling_frequency == 0 || if (trace_options.sampling_frequency == 0 ||
trace_options.sampling_frequency == 1) { trace_options.sampling_frequency == 1) {
return true; return true;
@ -36,6 +37,7 @@ const uint64_t kSecondInMinute = 60;
const uint64_t kSecondInHour = 3600; const uint64_t kSecondInHour = 3600;
const std::string BlockCacheTraceHelper::kUnknownColumnFamilyName = const std::string BlockCacheTraceHelper::kUnknownColumnFamilyName =
"UnknownColumnFamily"; "UnknownColumnFamily";
const uint64_t BlockCacheTraceRecord::kReservedGetId = 0;
const uint64_t BlockCacheTraceHelper::kReservedGetId = 0; const uint64_t BlockCacheTraceHelper::kReservedGetId = 0;
bool BlockCacheTraceHelper::IsGetOrMultiGetOnDataBlock( bool BlockCacheTraceHelper::IsGetOrMultiGetOnDataBlock(
@ -79,9 +81,9 @@ uint64_t BlockCacheTraceHelper::GetSequenceNumber(
if (!IsGetOrMultiGet(access.caller)) { if (!IsGetOrMultiGet(access.caller)) {
return 0; return 0;
} }
return access.get_from_user_specified_snapshot == Boolean::kFalse return access.get_from_user_specified_snapshot
? 0 ? 1 + GetInternalKeySeqno(access.referenced_key)
: 1 + GetInternalKeySeqno(access.referenced_key); : 0;
} }
uint64_t BlockCacheTraceHelper::GetBlockOffsetInFile( uint64_t BlockCacheTraceHelper::GetBlockOffsetInFile(
@ -99,14 +101,14 @@ uint64_t BlockCacheTraceHelper::GetBlockOffsetInFile(
return offset; return offset;
} }
BlockCacheTraceWriter::BlockCacheTraceWriter( BlockCacheTraceWriterImpl::BlockCacheTraceWriterImpl(
SystemClock* clock, const TraceOptions& trace_options, SystemClock* clock, const BlockCacheTraceWriterOptions& trace_options,
std::unique_ptr<TraceWriter>&& trace_writer) std::unique_ptr<TraceWriter>&& trace_writer)
: clock_(clock), : clock_(clock),
trace_options_(trace_options), trace_options_(trace_options),
trace_writer_(std::move(trace_writer)) {} trace_writer_(std::move(trace_writer)) {}
Status BlockCacheTraceWriter::WriteBlockAccess( Status BlockCacheTraceWriterImpl::WriteBlockAccess(
const BlockCacheTraceRecord& record, const Slice& block_key, const BlockCacheTraceRecord& record, const Slice& block_key,
const Slice& cf_name, const Slice& referenced_key) { const Slice& cf_name, const Slice& referenced_key) {
uint64_t trace_file_size = trace_writer_->GetFileSize(); uint64_t trace_file_size = trace_writer_->GetFileSize();
@ -141,7 +143,7 @@ Status BlockCacheTraceWriter::WriteBlockAccess(
return trace_writer_->Write(encoded_trace); return trace_writer_->Write(encoded_trace);
} }
Status BlockCacheTraceWriter::WriteHeader() { Status BlockCacheTraceWriterImpl::WriteHeader() {
Trace trace; Trace trace;
trace.ts = clock_->NowMicros(); trace.ts = clock_->NowMicros();
trace.type = TraceType::kTraceBegin; trace.type = TraceType::kTraceBegin;
@ -255,13 +257,13 @@ Status BlockCacheTraceReader::ReadAccess(BlockCacheTraceRecord* record) {
return Status::Incomplete( return Status::Incomplete(
"Incomplete access record: Failed to read is_cache_hit."); "Incomplete access record: Failed to read is_cache_hit.");
} }
record->is_cache_hit = static_cast<Boolean>(enc_slice[0]); record->is_cache_hit = static_cast<char>(enc_slice[0]);
enc_slice.remove_prefix(kCharSize); enc_slice.remove_prefix(kCharSize);
if (enc_slice.empty()) { if (enc_slice.empty()) {
return Status::Incomplete( return Status::Incomplete(
"Incomplete access record: Failed to read no_insert."); "Incomplete access record: Failed to read no_insert.");
} }
record->no_insert = static_cast<Boolean>(enc_slice[0]); record->no_insert = static_cast<char>(enc_slice[0]);
enc_slice.remove_prefix(kCharSize); enc_slice.remove_prefix(kCharSize);
if (BlockCacheTraceHelper::IsGetOrMultiGet(record->caller)) { if (BlockCacheTraceHelper::IsGetOrMultiGet(record->caller)) {
if (!GetFixed64(&enc_slice, &record->get_id)) { if (!GetFixed64(&enc_slice, &record->get_id)) {
@ -273,8 +275,7 @@ Status BlockCacheTraceReader::ReadAccess(BlockCacheTraceRecord* record) {
"Incomplete access record: Failed to read " "Incomplete access record: Failed to read "
"get_from_user_specified_snapshot."); "get_from_user_specified_snapshot.");
} }
record->get_from_user_specified_snapshot = record->get_from_user_specified_snapshot = static_cast<char>(enc_slice[0]);
static_cast<Boolean>(enc_slice[0]);
enc_slice.remove_prefix(kCharSize); enc_slice.remove_prefix(kCharSize);
Slice referenced_key; Slice referenced_key;
if (!GetLengthPrefixedSlice(&enc_slice, &referenced_key)) { if (!GetLengthPrefixedSlice(&enc_slice, &referenced_key)) {
@ -299,7 +300,7 @@ Status BlockCacheTraceReader::ReadAccess(BlockCacheTraceRecord* record) {
"Incomplete access record: Failed to read " "Incomplete access record: Failed to read "
"referenced_key_exist_in_block."); "referenced_key_exist_in_block.");
} }
record->referenced_key_exist_in_block = static_cast<Boolean>(enc_slice[0]); record->referenced_key_exist_in_block = static_cast<char>(enc_slice[0]);
} }
return Status::OK(); return Status::OK();
} }
@ -391,14 +392,14 @@ Status BlockCacheHumanReadableTraceReader::ReadAccess(
record->level = static_cast<uint32_t>(ParseUint64(record_strs[6])); record->level = static_cast<uint32_t>(ParseUint64(record_strs[6]));
record->sst_fd_number = ParseUint64(record_strs[7]); record->sst_fd_number = ParseUint64(record_strs[7]);
record->caller = static_cast<TableReaderCaller>(ParseUint64(record_strs[8])); record->caller = static_cast<TableReaderCaller>(ParseUint64(record_strs[8]));
record->no_insert = static_cast<Boolean>(ParseUint64(record_strs[9])); record->no_insert = static_cast<char>(ParseUint64(record_strs[9]));
record->get_id = ParseUint64(record_strs[10]); record->get_id = ParseUint64(record_strs[10]);
uint64_t get_key_id = ParseUint64(record_strs[11]); uint64_t get_key_id = ParseUint64(record_strs[11]);
record->referenced_data_size = ParseUint64(record_strs[12]); record->referenced_data_size = ParseUint64(record_strs[12]);
record->is_cache_hit = static_cast<Boolean>(ParseUint64(record_strs[13])); record->is_cache_hit = static_cast<char>(ParseUint64(record_strs[13]));
record->referenced_key_exist_in_block = record->referenced_key_exist_in_block =
static_cast<Boolean>(ParseUint64(record_strs[14])); static_cast<char>(ParseUint64(record_strs[14]));
record->num_keys_in_block = ParseUint64(record_strs[15]); record->num_keys_in_block = ParseUint64(record_strs[15]);
uint64_t table_id = ParseUint64(record_strs[16]); uint64_t table_id = ParseUint64(record_strs[16]);
if (table_id > 0) { if (table_id > 0) {
@ -408,7 +409,7 @@ Status BlockCacheHumanReadableTraceReader::ReadAccess(
} }
uint64_t get_sequence_number = ParseUint64(record_strs[17]); uint64_t get_sequence_number = ParseUint64(record_strs[17]);
if (get_sequence_number > 0) { if (get_sequence_number > 0) {
record->get_from_user_specified_snapshot = Boolean::kTrue; record->get_from_user_specified_snapshot = true;
// Decrement since valid seq number in the trace file equals traced seq // Decrement since valid seq number in the trace file equals traced seq
// number + 1. // number + 1.
get_sequence_number -= 1; get_sequence_number -= 1;
@ -445,16 +446,15 @@ BlockCacheTracer::BlockCacheTracer() { writer_.store(nullptr); }
BlockCacheTracer::~BlockCacheTracer() { EndTrace(); } BlockCacheTracer::~BlockCacheTracer() { EndTrace(); }
Status BlockCacheTracer::StartTrace( Status BlockCacheTracer::StartTrace(
SystemClock* clock, const TraceOptions& trace_options, const BlockCacheTraceOptions& trace_options,
std::unique_ptr<TraceWriter>&& trace_writer) { std::unique_ptr<BlockCacheTraceWriter>&& trace_writer) {
InstrumentedMutexLock lock_guard(&trace_writer_mutex_); InstrumentedMutexLock lock_guard(&trace_writer_mutex_);
if (writer_.load()) { if (writer_.load()) {
return Status::Busy(); return Status::Busy();
} }
get_id_counter_.store(1); get_id_counter_.store(1);
trace_options_ = trace_options; trace_options_ = trace_options;
writer_.store( writer_.store(trace_writer.release());
new BlockCacheTraceWriter(clock, trace_options, std::move(trace_writer)));
return writer_.load()->WriteHeader(); return writer_.load()->WriteHeader();
} }
@ -494,4 +494,11 @@ uint64_t BlockCacheTracer::NextGetId() {
return prev_value; return prev_value;
} }
std::unique_ptr<BlockCacheTraceWriter> NewBlockCacheTraceWriter(
SystemClock* clock, const BlockCacheTraceWriterOptions& trace_options,
std::unique_ptr<TraceWriter>&& trace_writer) {
return std::unique_ptr<BlockCacheTraceWriter>(new BlockCacheTraceWriterImpl(
clock, trace_options, std::move(trace_writer)));
}
} // namespace ROCKSDB_NAMESPACE } // namespace ROCKSDB_NAMESPACE

@ -9,9 +9,10 @@
#include <fstream> #include <fstream>
#include "monitoring/instrumented_mutex.h" #include "monitoring/instrumented_mutex.h"
#include "rocksdb/block_cache_trace_writer.h"
#include "rocksdb/options.h" #include "rocksdb/options.h"
#include "rocksdb/table_reader_caller.h"
#include "rocksdb/trace_reader_writer.h" #include "rocksdb/trace_reader_writer.h"
#include "table/table_reader_caller.h"
#include "trace_replay/trace_replay.h" #include "trace_replay/trace_replay.h"
namespace ROCKSDB_NAMESPACE { namespace ROCKSDB_NAMESPACE {
@ -102,65 +103,6 @@ struct BlockCacheLookupContext {
} }
}; };
enum Boolean : char { kTrue = 1, kFalse = 0 };
struct BlockCacheTraceRecord {
// Required fields for all accesses.
uint64_t access_timestamp = 0;
std::string block_key;
TraceType block_type = TraceType::kTraceMax;
uint64_t block_size = 0;
uint64_t cf_id = 0;
std::string cf_name;
uint32_t level = 0;
uint64_t sst_fd_number = 0;
TableReaderCaller caller = TableReaderCaller::kMaxBlockCacheLookupCaller;
Boolean is_cache_hit = Boolean::kFalse;
Boolean no_insert = Boolean::kFalse;
// Required field for Get and MultiGet
uint64_t get_id = BlockCacheTraceHelper::kReservedGetId;
Boolean get_from_user_specified_snapshot = Boolean::kFalse;
std::string referenced_key;
// Required fields for data block and user Get/Multi-Get only.
uint64_t referenced_data_size = 0;
uint64_t num_keys_in_block = 0;
Boolean referenced_key_exist_in_block = Boolean::kFalse;
BlockCacheTraceRecord() {}
BlockCacheTraceRecord(
uint64_t _access_timestamp, std::string _block_key, TraceType _block_type,
uint64_t _block_size, uint64_t _cf_id, std::string _cf_name,
uint32_t _level, uint64_t _sst_fd_number, TableReaderCaller _caller,
bool _is_cache_hit, bool _no_insert,
uint64_t _get_id = BlockCacheTraceHelper::kReservedGetId,
bool _get_from_user_specified_snapshot = false,
std::string _referenced_key = "", uint64_t _referenced_data_size = 0,
uint64_t _num_keys_in_block = 0,
bool _referenced_key_exist_in_block = false)
: access_timestamp(_access_timestamp),
block_key(_block_key),
block_type(_block_type),
block_size(_block_size),
cf_id(_cf_id),
cf_name(_cf_name),
level(_level),
sst_fd_number(_sst_fd_number),
caller(_caller),
is_cache_hit(_is_cache_hit ? Boolean::kTrue : Boolean::kFalse),
no_insert(_no_insert ? Boolean::kTrue : Boolean::kFalse),
get_id(_get_id),
get_from_user_specified_snapshot(_get_from_user_specified_snapshot
? Boolean::kTrue
: Boolean::kFalse),
referenced_key(_referenced_key),
referenced_data_size(_referenced_data_size),
num_keys_in_block(_num_keys_in_block),
referenced_key_exist_in_block(
_referenced_key_exist_in_block ? Boolean::kTrue : Boolean::kFalse) {
}
};
struct BlockCacheTraceHeader { struct BlockCacheTraceHeader {
uint64_t start_time; uint64_t start_time;
uint32_t rocksdb_major_version; uint32_t rocksdb_major_version;
@ -171,16 +113,18 @@ struct BlockCacheTraceHeader {
// user-provided TraceWriter. Every RocksDB operation is written as a single // user-provided TraceWriter. Every RocksDB operation is written as a single
// trace. Each trace will have a timestamp and type, followed by the trace // trace. Each trace will have a timestamp and type, followed by the trace
// payload. // payload.
class BlockCacheTraceWriter { class BlockCacheTraceWriterImpl : public BlockCacheTraceWriter {
public: public:
BlockCacheTraceWriter(SystemClock* clock, const TraceOptions& trace_options, BlockCacheTraceWriterImpl(SystemClock* clock,
const BlockCacheTraceWriterOptions& trace_options,
std::unique_ptr<TraceWriter>&& trace_writer); std::unique_ptr<TraceWriter>&& trace_writer);
~BlockCacheTraceWriter() = default; ~BlockCacheTraceWriterImpl() = default;
// No copy and move. // No copy and move.
BlockCacheTraceWriter(const BlockCacheTraceWriter&) = delete; BlockCacheTraceWriterImpl(const BlockCacheTraceWriterImpl&) = delete;
BlockCacheTraceWriter& operator=(const BlockCacheTraceWriter&) = delete; BlockCacheTraceWriterImpl& operator=(const BlockCacheTraceWriterImpl&) =
BlockCacheTraceWriter(BlockCacheTraceWriter&&) = delete; delete;
BlockCacheTraceWriter& operator=(BlockCacheTraceWriter&&) = delete; BlockCacheTraceWriterImpl(BlockCacheTraceWriterImpl&&) = delete;
BlockCacheTraceWriterImpl& operator=(BlockCacheTraceWriterImpl&&) = delete;
// Pass Slice references to avoid copy. // Pass Slice references to avoid copy.
Status WriteBlockAccess(const BlockCacheTraceRecord& record, Status WriteBlockAccess(const BlockCacheTraceRecord& record,
@ -193,7 +137,7 @@ class BlockCacheTraceWriter {
private: private:
SystemClock* clock_; SystemClock* clock_;
TraceOptions trace_options_; BlockCacheTraceWriterOptions trace_options_;
std::unique_ptr<TraceWriter> trace_writer_; std::unique_ptr<TraceWriter> trace_writer_;
}; };
@ -267,8 +211,8 @@ class BlockCacheTracer {
BlockCacheTracer& operator=(BlockCacheTracer&&) = delete; BlockCacheTracer& operator=(BlockCacheTracer&&) = delete;
// Start writing block cache accesses to the trace_writer. // Start writing block cache accesses to the trace_writer.
Status StartTrace(SystemClock* clock, const TraceOptions& trace_options, Status StartTrace(const BlockCacheTraceOptions& trace_options,
std::unique_ptr<TraceWriter>&& trace_writer); std::unique_ptr<BlockCacheTraceWriter>&& trace_writer);
// Stop writing block cache accesses to the trace_writer. // Stop writing block cache accesses to the trace_writer.
void EndTrace(); void EndTrace();
@ -285,7 +229,7 @@ class BlockCacheTracer {
uint64_t NextGetId(); uint64_t NextGetId();
private: private:
TraceOptions trace_options_; BlockCacheTraceOptions trace_options_;
// A mutex protects the writer_. // A mutex protects the writer_.
InstrumentedMutex trace_writer_mutex_; InstrumentedMutex trace_writer_mutex_;
std::atomic<BlockCacheTraceWriter*> writer_; std::atomic<BlockCacheTraceWriter*> writer_;

@ -74,17 +74,17 @@ class BlockCacheTracerTest : public testing::Test {
record.caller = GetCaller(key_id); record.caller = GetCaller(key_id);
record.level = kLevel; record.level = kLevel;
record.sst_fd_number = kSSTFDNumber + key_id; record.sst_fd_number = kSSTFDNumber + key_id;
record.is_cache_hit = Boolean::kFalse; record.is_cache_hit = false;
record.no_insert = Boolean::kFalse; record.no_insert = false;
// Provide get_id for all callers. The writer should only write get_id // Provide get_id for all callers. The writer should only write get_id
// when the caller is either GET or MGET. // when the caller is either GET or MGET.
record.get_id = key_id + 1; record.get_id = key_id + 1;
record.get_from_user_specified_snapshot = Boolean::kTrue; record.get_from_user_specified_snapshot = true;
// Provide these fields for all block types. // Provide these fields for all block types.
// The writer should only write these fields for data blocks and the // The writer should only write these fields for data blocks and the
// caller is either GET or MGET. // caller is either GET or MGET.
record.referenced_key = (kRefKeyPrefix + std::to_string(key_id)); record.referenced_key = (kRefKeyPrefix + std::to_string(key_id));
record.referenced_key_exist_in_block = Boolean::kTrue; record.referenced_key_exist_in_block = true;
record.num_keys_in_block = kNumKeysInBlock; record.num_keys_in_block = kNumKeysInBlock;
record.referenced_data_size = kReferencedDataSize + key_id; record.referenced_data_size = kReferencedDataSize + key_id;
ASSERT_OK(writer->WriteBlockAccess( ASSERT_OK(writer->WriteBlockAccess(
@ -104,10 +104,10 @@ class BlockCacheTracerTest : public testing::Test {
record.caller = GetCaller(key_id); record.caller = GetCaller(key_id);
record.level = kLevel; record.level = kLevel;
record.sst_fd_number = kSSTFDNumber + key_id; record.sst_fd_number = kSSTFDNumber + key_id;
record.is_cache_hit = Boolean::kFalse; record.is_cache_hit = false;
record.no_insert = Boolean::kFalse; record.no_insert = false;
record.referenced_key = kRefKeyPrefix + std::to_string(key_id); record.referenced_key = kRefKeyPrefix + std::to_string(key_id);
record.referenced_key_exist_in_block = Boolean::kTrue; record.referenced_key_exist_in_block = true;
record.num_keys_in_block = kNumKeysInBlock; record.num_keys_in_block = kNumKeysInBlock;
return record; return record;
} }
@ -127,28 +127,28 @@ class BlockCacheTracerTest : public testing::Test {
ASSERT_EQ(GetCaller(key_id), record.caller); ASSERT_EQ(GetCaller(key_id), record.caller);
ASSERT_EQ(kLevel, record.level); ASSERT_EQ(kLevel, record.level);
ASSERT_EQ(kSSTFDNumber + key_id, record.sst_fd_number); ASSERT_EQ(kSSTFDNumber + key_id, record.sst_fd_number);
ASSERT_EQ(Boolean::kFalse, record.is_cache_hit); ASSERT_FALSE(record.is_cache_hit);
ASSERT_EQ(Boolean::kFalse, record.no_insert); ASSERT_FALSE(record.no_insert);
if (record.caller == TableReaderCaller::kUserGet || if (record.caller == TableReaderCaller::kUserGet ||
record.caller == TableReaderCaller::kUserMultiGet) { record.caller == TableReaderCaller::kUserMultiGet) {
ASSERT_EQ(key_id + 1, record.get_id); ASSERT_EQ(key_id + 1, record.get_id);
ASSERT_EQ(Boolean::kTrue, record.get_from_user_specified_snapshot); ASSERT_TRUE(record.get_from_user_specified_snapshot);
ASSERT_EQ(kRefKeyPrefix + std::to_string(key_id), ASSERT_EQ(kRefKeyPrefix + std::to_string(key_id),
record.referenced_key); record.referenced_key);
} else { } else {
ASSERT_EQ(BlockCacheTraceHelper::kReservedGetId, record.get_id); ASSERT_EQ(BlockCacheTraceHelper::kReservedGetId, record.get_id);
ASSERT_EQ(Boolean::kFalse, record.get_from_user_specified_snapshot); ASSERT_FALSE(record.get_from_user_specified_snapshot);
ASSERT_EQ("", record.referenced_key); ASSERT_EQ("", record.referenced_key);
} }
if (block_type == TraceType::kBlockTraceDataBlock && if (block_type == TraceType::kBlockTraceDataBlock &&
(record.caller == TableReaderCaller::kUserGet || (record.caller == TableReaderCaller::kUserGet ||
record.caller == TableReaderCaller::kUserMultiGet)) { record.caller == TableReaderCaller::kUserMultiGet)) {
ASSERT_EQ(Boolean::kTrue, record.referenced_key_exist_in_block); ASSERT_TRUE(record.referenced_key_exist_in_block);
ASSERT_EQ(kNumKeysInBlock, record.num_keys_in_block); ASSERT_EQ(kNumKeysInBlock, record.num_keys_in_block);
ASSERT_EQ(kReferencedDataSize + key_id, record.referenced_data_size); ASSERT_EQ(kReferencedDataSize + key_id, record.referenced_data_size);
continue; continue;
} }
ASSERT_EQ(Boolean::kFalse, record.referenced_key_exist_in_block); ASSERT_FALSE(record.referenced_key_exist_in_block);
ASSERT_EQ(0, record.num_keys_in_block); ASSERT_EQ(0, record.num_keys_in_block);
ASSERT_EQ(0, record.referenced_data_size); ASSERT_EQ(0, record.referenced_data_size);
} }
@ -188,12 +188,18 @@ TEST_F(BlockCacheTracerTest, AtomicWriteBeforeStartTrace) {
TEST_F(BlockCacheTracerTest, AtomicWrite) { TEST_F(BlockCacheTracerTest, AtomicWrite) {
BlockCacheTraceRecord record = GenerateAccessRecord(); BlockCacheTraceRecord record = GenerateAccessRecord();
{ {
TraceOptions trace_opt; BlockCacheTraceWriterOptions trace_writer_opt;
BlockCacheTraceOptions trace_opt;
std::unique_ptr<TraceWriter> trace_writer; std::unique_ptr<TraceWriter> trace_writer;
ASSERT_OK(NewFileTraceWriter(env_, env_options_, trace_file_path_, ASSERT_OK(NewFileTraceWriter(env_, env_options_, trace_file_path_,
&trace_writer)); &trace_writer));
std::unique_ptr<BlockCacheTraceWriter> block_cache_trace_writer =
NewBlockCacheTraceWriter(env_->GetSystemClock().get(), trace_writer_opt,
std::move(trace_writer));
ASSERT_NE(block_cache_trace_writer, nullptr);
BlockCacheTracer writer; BlockCacheTracer writer;
ASSERT_OK(writer.StartTrace(clock_, trace_opt, std::move(trace_writer))); ASSERT_OK(
writer.StartTrace(trace_opt, std::move(block_cache_trace_writer)));
ASSERT_OK(writer.WriteBlockAccess(record, record.block_key, record.cf_name, ASSERT_OK(writer.WriteBlockAccess(record, record.block_key, record.cf_name,
record.referenced_key)); record.referenced_key));
ASSERT_OK(env_->FileExists(trace_file_path_)); ASSERT_OK(env_->FileExists(trace_file_path_));
@ -214,25 +220,36 @@ TEST_F(BlockCacheTracerTest, AtomicWrite) {
} }
TEST_F(BlockCacheTracerTest, ConsecutiveStartTrace) { TEST_F(BlockCacheTracerTest, ConsecutiveStartTrace) {
TraceOptions trace_opt; BlockCacheTraceWriterOptions trace_writer_opt;
BlockCacheTraceOptions trace_opt;
std::unique_ptr<TraceWriter> trace_writer; std::unique_ptr<TraceWriter> trace_writer;
ASSERT_OK( ASSERT_OK(
NewFileTraceWriter(env_, env_options_, trace_file_path_, &trace_writer)); NewFileTraceWriter(env_, env_options_, trace_file_path_, &trace_writer));
std::unique_ptr<BlockCacheTraceWriter> block_cache_trace_writer =
NewBlockCacheTraceWriter(env_->GetSystemClock().get(), trace_writer_opt,
std::move(trace_writer));
ASSERT_NE(block_cache_trace_writer, nullptr);
BlockCacheTracer writer; BlockCacheTracer writer;
ASSERT_OK(writer.StartTrace(clock_, trace_opt, std::move(trace_writer))); ASSERT_OK(writer.StartTrace(trace_opt, std::move(block_cache_trace_writer)));
ASSERT_NOK(writer.StartTrace(clock_, trace_opt, std::move(trace_writer))); ASSERT_NOK(writer.StartTrace(trace_opt, std::move(block_cache_trace_writer)));
ASSERT_OK(env_->FileExists(trace_file_path_)); ASSERT_OK(env_->FileExists(trace_file_path_));
} }
TEST_F(BlockCacheTracerTest, AtomicNoWriteAfterEndTrace) { TEST_F(BlockCacheTracerTest, AtomicNoWriteAfterEndTrace) {
BlockCacheTraceRecord record = GenerateAccessRecord(); BlockCacheTraceRecord record = GenerateAccessRecord();
{ {
TraceOptions trace_opt; BlockCacheTraceWriterOptions trace_writer_opt;
BlockCacheTraceOptions trace_opt;
std::unique_ptr<TraceWriter> trace_writer; std::unique_ptr<TraceWriter> trace_writer;
ASSERT_OK(NewFileTraceWriter(env_, env_options_, trace_file_path_, ASSERT_OK(NewFileTraceWriter(env_, env_options_, trace_file_path_,
&trace_writer)); &trace_writer));
std::unique_ptr<BlockCacheTraceWriter> block_cache_trace_writer =
NewBlockCacheTraceWriter(env_->GetSystemClock().get(), trace_writer_opt,
std::move(trace_writer));
ASSERT_NE(block_cache_trace_writer, nullptr);
BlockCacheTracer writer; BlockCacheTracer writer;
ASSERT_OK(writer.StartTrace(clock_, trace_opt, std::move(trace_writer))); ASSERT_OK(
writer.StartTrace(trace_opt, std::move(block_cache_trace_writer)));
ASSERT_OK(writer.WriteBlockAccess(record, record.block_key, record.cf_name, ASSERT_OK(writer.WriteBlockAccess(record, record.block_key, record.cf_name,
record.referenced_key)); record.referenced_key));
writer.EndTrace(); writer.EndTrace();
@ -260,14 +277,20 @@ TEST_F(BlockCacheTracerTest, AtomicNoWriteAfterEndTrace) {
TEST_F(BlockCacheTracerTest, NextGetId) { TEST_F(BlockCacheTracerTest, NextGetId) {
BlockCacheTracer writer; BlockCacheTracer writer;
{ {
TraceOptions trace_opt; BlockCacheTraceWriterOptions trace_writer_opt;
BlockCacheTraceOptions trace_opt;
std::unique_ptr<TraceWriter> trace_writer; std::unique_ptr<TraceWriter> trace_writer;
ASSERT_OK(NewFileTraceWriter(env_, env_options_, trace_file_path_, ASSERT_OK(NewFileTraceWriter(env_, env_options_, trace_file_path_,
&trace_writer)); &trace_writer));
std::unique_ptr<BlockCacheTraceWriter> block_cache_trace_writer =
NewBlockCacheTraceWriter(env_->GetSystemClock().get(), trace_writer_opt,
std::move(trace_writer));
ASSERT_NE(block_cache_trace_writer, nullptr);
// next get id should always return 0 before we call StartTrace. // next get id should always return 0 before we call StartTrace.
ASSERT_EQ(0, writer.NextGetId()); ASSERT_EQ(0, writer.NextGetId());
ASSERT_EQ(0, writer.NextGetId()); ASSERT_EQ(0, writer.NextGetId());
ASSERT_OK(writer.StartTrace(clock_, trace_opt, std::move(trace_writer))); ASSERT_OK(
writer.StartTrace(trace_opt, std::move(block_cache_trace_writer)));
ASSERT_EQ(1, writer.NextGetId()); ASSERT_EQ(1, writer.NextGetId());
ASSERT_EQ(2, writer.NextGetId()); ASSERT_EQ(2, writer.NextGetId());
writer.EndTrace(); writer.EndTrace();
@ -277,11 +300,17 @@ TEST_F(BlockCacheTracerTest, NextGetId) {
// Start trace again and next get id should return 1. // Start trace again and next get id should return 1.
{ {
TraceOptions trace_opt; BlockCacheTraceWriterOptions trace_writer_opt;
BlockCacheTraceOptions trace_opt;
std::unique_ptr<TraceWriter> trace_writer; std::unique_ptr<TraceWriter> trace_writer;
ASSERT_OK(NewFileTraceWriter(env_, env_options_, trace_file_path_, ASSERT_OK(NewFileTraceWriter(env_, env_options_, trace_file_path_,
&trace_writer)); &trace_writer));
ASSERT_OK(writer.StartTrace(clock_, trace_opt, std::move(trace_writer))); std::unique_ptr<BlockCacheTraceWriter> block_cache_trace_writer =
NewBlockCacheTraceWriter(env_->GetSystemClock().get(), trace_writer_opt,
std::move(trace_writer));
ASSERT_NE(block_cache_trace_writer, nullptr);
ASSERT_OK(
writer.StartTrace(trace_opt, std::move(block_cache_trace_writer)));
ASSERT_EQ(1, writer.NextGetId()); ASSERT_EQ(1, writer.NextGetId());
} }
} }
@ -289,19 +318,26 @@ TEST_F(BlockCacheTracerTest, NextGetId) {
TEST_F(BlockCacheTracerTest, MixedBlocks) { TEST_F(BlockCacheTracerTest, MixedBlocks) {
{ {
// Generate a trace file containing a mix of blocks. // Generate a trace file containing a mix of blocks.
TraceOptions trace_opt; BlockCacheTraceWriterOptions trace_writer_opt;
std::unique_ptr<TraceWriter> trace_writer; std::unique_ptr<TraceWriter> trace_writer;
ASSERT_OK(NewFileTraceWriter(env_, env_options_, trace_file_path_, ASSERT_OK(NewFileTraceWriter(env_, env_options_, trace_file_path_,
&trace_writer)); &trace_writer));
BlockCacheTraceWriter writer(clock_, trace_opt, std::move(trace_writer)); std::unique_ptr<BlockCacheTraceWriter> block_cache_trace_writer =
ASSERT_OK(writer.WriteHeader()); NewBlockCacheTraceWriter(env_->GetSystemClock().get(), trace_writer_opt,
std::move(trace_writer));
ASSERT_NE(block_cache_trace_writer, nullptr);
ASSERT_OK(block_cache_trace_writer->WriteHeader());
// Write blocks of different types. // Write blocks of different types.
WriteBlockAccess(&writer, 0, TraceType::kBlockTraceUncompressionDictBlock, WriteBlockAccess(block_cache_trace_writer.get(), 0,
10); TraceType::kBlockTraceUncompressionDictBlock, 10);
WriteBlockAccess(&writer, 10, TraceType::kBlockTraceDataBlock, 10); WriteBlockAccess(block_cache_trace_writer.get(), 10,
WriteBlockAccess(&writer, 20, TraceType::kBlockTraceFilterBlock, 10); TraceType::kBlockTraceDataBlock, 10);
WriteBlockAccess(&writer, 30, TraceType::kBlockTraceIndexBlock, 10); WriteBlockAccess(block_cache_trace_writer.get(), 20,
WriteBlockAccess(&writer, 40, TraceType::kBlockTraceRangeDeletionBlock, 10); TraceType::kBlockTraceFilterBlock, 10);
WriteBlockAccess(block_cache_trace_writer.get(), 30,
TraceType::kBlockTraceIndexBlock, 10);
WriteBlockAccess(block_cache_trace_writer.get(), 40,
TraceType::kBlockTraceRangeDeletionBlock, 10);
ASSERT_OK(env_->FileExists(trace_file_path_)); ASSERT_OK(env_->FileExists(trace_file_path_));
} }
@ -332,7 +368,7 @@ TEST_F(BlockCacheTracerTest, HumanReadableTrace) {
record.get_id = 1; record.get_id = 1;
record.referenced_key = ""; record.referenced_key = "";
record.caller = TableReaderCaller::kUserGet; record.caller = TableReaderCaller::kUserGet;
record.get_from_user_specified_snapshot = Boolean::kTrue; record.get_from_user_specified_snapshot = true;
record.referenced_data_size = kReferencedDataSize; record.referenced_data_size = kReferencedDataSize;
PutFixed32(&record.referenced_key, 111); PutFixed32(&record.referenced_key, 111);
PutLengthPrefixedSlice(&record.referenced_key, "get_key"); PutLengthPrefixedSlice(&record.referenced_key, "get_key");
@ -359,11 +395,11 @@ TEST_F(BlockCacheTracerTest, HumanReadableTrace) {
ASSERT_EQ(TableReaderCaller::kUserGet, read_record.caller); ASSERT_EQ(TableReaderCaller::kUserGet, read_record.caller);
ASSERT_EQ(kLevel, read_record.level); ASSERT_EQ(kLevel, read_record.level);
ASSERT_EQ(kSSTFDNumber, read_record.sst_fd_number); ASSERT_EQ(kSSTFDNumber, read_record.sst_fd_number);
ASSERT_EQ(Boolean::kFalse, read_record.is_cache_hit); ASSERT_FALSE(read_record.is_cache_hit);
ASSERT_EQ(Boolean::kFalse, read_record.no_insert); ASSERT_FALSE(read_record.no_insert);
ASSERT_EQ(1, read_record.get_id); ASSERT_EQ(1, read_record.get_id);
ASSERT_EQ(Boolean::kTrue, read_record.get_from_user_specified_snapshot); ASSERT_TRUE(read_record.get_from_user_specified_snapshot);
ASSERT_EQ(Boolean::kTrue, read_record.referenced_key_exist_in_block); ASSERT_TRUE(read_record.referenced_key_exist_in_block);
ASSERT_EQ(kNumKeysInBlock, read_record.num_keys_in_block); ASSERT_EQ(kNumKeysInBlock, read_record.num_keys_in_block);
ASSERT_EQ(kReferencedDataSize, read_record.referenced_data_size); ASSERT_EQ(kReferencedDataSize, read_record.referenced_data_size);
ASSERT_EQ(record.block_key.size(), read_record.block_key.size()); ASSERT_EQ(record.block_key.size(), read_record.block_key.size());

@ -41,7 +41,7 @@ void CacheSimulator::Access(const BlockCacheTraceRecord& access) {
const bool is_user_access = const bool is_user_access =
BlockCacheTraceHelper::IsUserAccess(access.caller); BlockCacheTraceHelper::IsUserAccess(access.caller);
bool is_cache_miss = true; bool is_cache_miss = true;
if (ghost_cache_ && access.no_insert == Boolean::kFalse) { if (ghost_cache_ && !access.no_insert) {
admit = ghost_cache_->Admit(access.block_key); admit = ghost_cache_->Admit(access.block_key);
} }
auto handle = sim_cache_->Lookup(access.block_key); auto handle = sim_cache_->Lookup(access.block_key);
@ -49,7 +49,7 @@ void CacheSimulator::Access(const BlockCacheTraceRecord& access) {
sim_cache_->Release(handle); sim_cache_->Release(handle);
is_cache_miss = false; is_cache_miss = false;
} else { } else {
if (access.no_insert == Boolean::kFalse && admit && access.block_size > 0) { if (!access.no_insert && admit && access.block_size > 0) {
// Ignore errors on insert // Ignore errors on insert
auto s = sim_cache_->Insert(access.block_key, /*value=*/nullptr, auto s = sim_cache_->Insert(access.block_key, /*value=*/nullptr,
access.block_size, access.block_size,

@ -43,11 +43,11 @@ class CacheSimulatorTest : public testing::Test {
record.level = 6; record.level = 6;
record.sst_fd_number = 0; record.sst_fd_number = 0;
record.get_id = getid; record.get_id = getid;
record.is_cache_hit = Boolean::kFalse; record.is_cache_hit = false;
record.no_insert = Boolean::kFalse; record.no_insert = false;
record.referenced_key = record.referenced_key =
kRefKeyPrefix + std::to_string(kGetId) + kRefKeySequenceNumber; kRefKeyPrefix + std::to_string(kGetId) + kRefKeySequenceNumber;
record.referenced_key_exist_in_block = Boolean::kTrue; record.referenced_key_exist_in_block = true;
record.referenced_data_size = 100; record.referenced_data_size = 100;
record.num_keys_in_block = 300; record.num_keys_in_block = 300;
return record; return record;
@ -64,8 +64,8 @@ class CacheSimulatorTest : public testing::Test {
record.caller = TableReaderCaller::kCompaction; record.caller = TableReaderCaller::kCompaction;
record.level = 6; record.level = 6;
record.sst_fd_number = kCompactionBlockId; record.sst_fd_number = kCompactionBlockId;
record.is_cache_hit = Boolean::kFalse; record.is_cache_hit = false;
record.no_insert = Boolean::kTrue; record.no_insert = true;
return record; return record;
} }
@ -200,17 +200,17 @@ TEST_F(CacheSimulatorTest, GhostPrioritizedCacheSimulator) {
TEST_F(CacheSimulatorTest, HybridRowBlockCacheSimulator) { TEST_F(CacheSimulatorTest, HybridRowBlockCacheSimulator) {
uint64_t block_id = 100; uint64_t block_id = 100;
BlockCacheTraceRecord first_get = GenerateGetRecord(kGetId); BlockCacheTraceRecord first_get = GenerateGetRecord(kGetId);
first_get.get_from_user_specified_snapshot = Boolean::kTrue; first_get.get_from_user_specified_snapshot = true;
BlockCacheTraceRecord second_get = GenerateGetRecord(kGetId + 1); BlockCacheTraceRecord second_get = GenerateGetRecord(kGetId + 1);
second_get.referenced_data_size = 0; second_get.referenced_data_size = 0;
second_get.referenced_key_exist_in_block = Boolean::kFalse; second_get.referenced_key_exist_in_block = false;
second_get.get_from_user_specified_snapshot = Boolean::kTrue; second_get.get_from_user_specified_snapshot = true;
BlockCacheTraceRecord third_get = GenerateGetRecord(kGetId + 2); BlockCacheTraceRecord third_get = GenerateGetRecord(kGetId + 2);
third_get.referenced_data_size = 0; third_get.referenced_data_size = 0;
third_get.referenced_key_exist_in_block = Boolean::kFalse; third_get.referenced_key_exist_in_block = false;
third_get.referenced_key = kRefKeyPrefix + "third_get"; third_get.referenced_key = kRefKeyPrefix + "third_get";
// We didn't find the referenced key in the third get. // We didn't find the referenced key in the third get.
third_get.referenced_key_exist_in_block = Boolean::kFalse; third_get.referenced_key_exist_in_block = false;
third_get.referenced_data_size = 0; third_get.referenced_data_size = 0;
std::shared_ptr<Cache> sim_cache = std::shared_ptr<Cache> sim_cache =
NewLRUCache(/*capacity=*/kCacheSize, /*num_shard_bits=*/1, NewLRUCache(/*capacity=*/kCacheSize, /*num_shard_bits=*/1,
@ -308,12 +308,12 @@ TEST_F(CacheSimulatorTest, HybridRowBlockCacheSimulatorGetTest) {
get.access_timestamp = 0; get.access_timestamp = 0;
get.block_key = "1"; get.block_key = "1";
get.get_id = 1; get.get_id = 1;
get.get_from_user_specified_snapshot = Boolean::kFalse; get.get_from_user_specified_snapshot = false;
get.referenced_key = get.referenced_key =
kRefKeyPrefix + std::to_string(1) + kRefKeySequenceNumber; kRefKeyPrefix + std::to_string(1) + kRefKeySequenceNumber;
get.no_insert = Boolean::kFalse; get.no_insert = false;
get.sst_fd_number = 0; get.sst_fd_number = 0;
get.get_from_user_specified_snapshot = Boolean::kFalse; get.get_from_user_specified_snapshot = false;
LRUCacheOptions co; LRUCacheOptions co;
co.capacity = 16; co.capacity = 16;

Loading…
Cancel
Save