Group rocksdb.sst.read.micros stat by IOActivity flush and compaction (#11288)

Summary:
**Context:**
The existing stat rocksdb.sst.read.micros does not reflect each of compaction and flush cases but aggregate them, which is not so helpful for us to understand IO read behavior of each of them.

**Summary**
- Update `StopWatch` and `RandomAccessFileReader` to record `rocksdb.sst.read.micros` and `rocksdb.file.{flush/compaction}.read.micros`
   - Fixed the default histogram in `RandomAccessFileReader`
- New field `ReadOptions/IOOptions::io_activity`; Pass `ReadOptions` through paths under db open, flush and compaction to where we can prepare `IOOptions` and pass it to `RandomAccessFileReader`
- Use `thread_status_util` for assertion in `DbStressFSWrapper` for continuous testing on we are passing correct `io_activity` under db open, flush and compaction

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11288

Test Plan:
- **Stress test**
- **Db bench 1: rocksdb.sst.read.micros COUNT ≈ sum of rocksdb.file.read.flush.micros's and rocksdb.file.read.compaction.micros's.**  (without blob)
     - May not be exactly the same due to `HistogramStat::Add` only guarantees atomic not accuracy across threads.
```
./db_bench -db=/dev/shm/testdb/ -statistics=true -benchmarks="fillseq" -key_size=32 -value_size=512 -num=50000 -write_buffer_size=655 -target_file_size_base=655 -disable_auto_compactions=false -compression_type=none -bloom_bits=3 (-use_plain_table=1 -prefix_size=10)
```
```
// BlockBasedTable
rocksdb.sst.read.micros P50 : 2.009374 P95 : 4.968548 P99 : 8.110362 P100 : 43.000000 COUNT : 40456 SUM : 114805
rocksdb.file.read.flush.micros P50 : 1.871841 P95 : 3.872407 P99 : 5.540541 P100 : 43.000000 COUNT : 2250 SUM : 6116
rocksdb.file.read.compaction.micros P50 : 2.023109 P95 : 5.029149 P99 : 8.196910 P100 : 26.000000 COUNT : 38206 SUM : 108689

// PlainTable
Does not apply
```
- **Db bench 2: performance**

**Read**

SETUP: db with 900 files
```
./db_bench -db=/dev/shm/testdb/ -benchmarks="fillseq" -key_size=32 -value_size=512 -num=50000 -write_buffer_size=655  -disable_auto_compactions=true -target_file_size_base=655 -compression_type=none
```run till convergence
```
./db_bench -seed=1678564177044286 -use_existing_db=true -db=/dev/shm/testdb -benchmarks=readrandom[-X60] -statistics=true -num=1000000 -disable_auto_compactions=true -compression_type=none -bloom_bits=3
```
Pre-change
`readrandom [AVG 60 runs] : 21568 (± 248) ops/sec`
Post-change (no regression, -0.3%)
`readrandom [AVG 60 runs] : 21486 (± 236) ops/sec`

**Compaction/Flush**run till convergence
```
./db_bench -db=/dev/shm/testdb2/ -seed=1678564177044286 -benchmarks="fillseq[-X60]" -key_size=32 -value_size=512 -num=50000 -write_buffer_size=655  -disable_auto_compactions=false -target_file_size_base=655 -compression_type=none

rocksdb.sst.read.micros  COUNT : 33820
rocksdb.sst.read.flush.micros COUNT : 1800
rocksdb.sst.read.compaction.micros COUNT : 32020
```
Pre-change
`fillseq [AVG 46 runs] : 1391 (± 214) ops/sec;    0.7 (± 0.1) MB/sec`

Post-change (no regression, ~-0.4%)
`fillseq [AVG 46 runs] : 1385 (± 216) ops/sec;    0.7 (± 0.1) MB/sec`

Reviewed By: ajkr

Differential Revision: D44007011

Pulled By: hx235

fbshipit-source-id: a54c89e4846dfc9a135389edf3f3eedfea257132
oxigraph-8.3.2
Hui Xiao 2 years ago committed by Facebook GitHub Bot
parent 0a774a102f
commit 151242ce46
  1. 1
      HISTORY.md
  2. 4
      db/blob/blob_file_cache.cc
  3. 3
      db/blob/blob_file_cache.h
  4. 25
      db/blob/blob_file_cache_test.cc
  5. 59
      db/blob/blob_file_reader.cc
  6. 6
      db/blob/blob_file_reader.h
  7. 83
      db/blob/blob_file_reader_test.cc
  8. 7
      db/blob/blob_source.cc
  9. 4
      db/blob/blob_source.h
  10. 8
      db/blob/blob_source_test.cc
  11. 9
      db/builder.cc
  12. 4
      db/builder.h
  13. 1
      db/column_family.cc
  14. 1
      db/compaction/compaction_iterator.cc
  15. 32
      db/compaction/compaction_job.cc
  16. 8
      db/compaction/compaction_job_test.cc
  17. 3
      db/compaction/compaction_outputs.cc
  18. 10
      db/convenience.cc
  19. 9
      db/corruption_test.cc
  20. 5
      db/db_impl/compacted_db_impl.cc
  21. 127
      db/db_impl/db_impl.cc
  22. 2
      db/db_impl/db_impl.h
  23. 44
      db/db_impl/db_impl_compaction_flush.cc
  24. 6
      db/db_impl/db_impl_experimental.cc
  25. 18
      db/db_impl/db_impl_open.cc
  26. 10
      db/db_impl/db_impl_readonly.cc
  27. 15
      db/db_impl/db_impl_secondary.cc
  28. 16
      db/db_impl/db_impl_write.cc
  29. 1
      db/db_iter.cc
  30. 5
      db/db_properties_test.cc
  31. 5
      db/db_rate_limiter_test.cc
  32. 4
      db/experimental.cc
  33. 4
      db/external_sst_file_ingestion_job.cc
  34. 32
      db/flush_job.cc
  35. 2
      db/import_column_family_job.cc
  36. 27
      db/internal_stats.cc
  37. 1
      db/memtable.cc
  38. 12
      db/memtable_list.cc
  39. 3
      db/perf_context_test.cc
  40. 9
      db/plain_table_db_test.cc
  41. 28
      db/repair.cc
  42. 24
      db/table_cache.cc
  43. 11
      db/table_cache.h
  44. 9
      db/table_properties_collector_test.cc
  45. 13
      db/version_builder.cc
  46. 2
      db/version_builder.h
  47. 28
      db/version_edit_handler.cc
  48. 42
      db/version_edit_handler.h
  49. 143
      db/version_set.cc
  50. 70
      db/version_set.h
  51. 43
      db/version_set_test.cc
  52. 9
      db/version_util.h
  53. 2
      db/write_batch.cc
  54. 2
      db_stress_tool/db_stress_common.h
  55. 9
      db_stress_tool/db_stress_driver.cc
  56. 34
      db_stress_tool/db_stress_env_wrapper.h
  57. 5
      db_stress_tool/db_stress_gflags.cc
  58. 2
      db_stress_tool/db_stress_test_base.cc
  59. 7
      db_stress_tool/db_stress_tool.cc
  60. 4
      db_stress_tool/multi_ops_txns_stress.cc
  61. 8
      db_stress_tool/multi_ops_txns_stress.h
  62. 6
      file/file_util.cc
  63. 2
      file/file_util.h
  64. 26
      file/random_access_file_reader.cc
  65. 5
      file/random_access_file_reader.h
  66. 8
      include/rocksdb/env.h
  67. 2
      include/rocksdb/file_system.h
  68. 3
      include/rocksdb/options.h
  69. 6
      include/rocksdb/statistics.h
  70. 1
      include/rocksdb/thread_status.h
  71. 1
      include/rocksdb/utilities/transaction.h
  72. 12
      java/rocksjni/portal.h
  73. 4
      java/src/main/java/org/rocksdb/HistogramType.java
  74. 3
      java/src/main/java/org/rocksdb/OperationType.java
  75. 3
      microbench/db_basic_bench.cc
  76. 2
      monitoring/statistics.cc
  77. 24
      monitoring/thread_status_updater.cc
  78. 17
      monitoring/thread_status_updater.h
  79. 37
      monitoring/thread_status_util.cc
  80. 12
      monitoring/thread_status_util.h
  81. 14
      monitoring/thread_status_util_debug.cc
  82. 35
      options/options.cc
  83. 52
      table/block_based/block_based_table_reader.cc
  84. 16
      table/block_based/block_based_table_reader.h
  85. 2
      table/block_based/block_based_table_reader_impl.h
  86. 2
      table/block_based/block_based_table_reader_sync_and_async.h
  87. 10
      table/block_based/hash_index_reader.cc
  88. 1
      table/block_based/partitioned_index_reader.cc
  89. 6
      table/block_based/uncompression_dict_reader.cc
  90. 5
      table/block_based/uncompression_dict_reader.h
  91. 5
      table/cuckoo/cuckoo_table_builder_test.cc
  92. 7
      table/cuckoo/cuckoo_table_reader.cc
  93. 6
      table/cuckoo/cuckoo_table_reader.h
  94. 52
      table/meta_blocks.cc
  95. 4
      table/meta_blocks.h
  96. 6
      table/mock_table.cc
  97. 28
      table/plain/plain_table_reader.cc
  98. 7
      table/plain/plain_table_reader.h
  99. 6
      table/sst_file_dumper.cc
  100. 3
      table/sst_file_reader.cc
  101. Some files were not shown because too many files have changed in this diff Show More

@ -17,6 +17,7 @@
### New Features ### New Features
* Add experimental `PerfContext` counters `iter_{next|prev|seek}_count` for db iterator, each counting the times of corresponding API being called. * Add experimental `PerfContext` counters `iter_{next|prev|seek}_count` for db iterator, each counting the times of corresponding API being called.
* Allow runtime changes to whether `WriteBufferManager` allows stall or not by calling `SetAllowStall()` * Allow runtime changes to whether `WriteBufferManager` allows stall or not by calling `SetAllowStall()`
* New statistics `rocksdb.file.read.{flush|compaction}.micros` that measure read time of block-based SST tables or blob files during flush or compaction.
### Bug Fixes ### Bug Fixes
* In block cache tracing, fixed some cases of bad hit/miss information (and more) with MultiGet. * In block cache tracing, fixed some cases of bad hit/miss information (and more) with MultiGet.

@ -37,7 +37,7 @@ BlobFileCache::BlobFileCache(Cache* cache,
} }
Status BlobFileCache::GetBlobFileReader( Status BlobFileCache::GetBlobFileReader(
uint64_t blob_file_number, const ReadOptions& read_options, uint64_t blob_file_number,
CacheHandleGuard<BlobFileReader>* blob_file_reader) { CacheHandleGuard<BlobFileReader>* blob_file_reader) {
assert(blob_file_reader); assert(blob_file_reader);
assert(blob_file_reader->IsEmpty()); assert(blob_file_reader->IsEmpty());
@ -73,7 +73,7 @@ Status BlobFileCache::GetBlobFileReader(
{ {
assert(file_options_); assert(file_options_);
const Status s = BlobFileReader::Create( const Status s = BlobFileReader::Create(
*immutable_options_, *file_options_, column_family_id_, *immutable_options_, read_options, *file_options_, column_family_id_,
blob_file_read_hist_, blob_file_number, io_tracer_, &reader); blob_file_read_hist_, blob_file_number, io_tracer_, &reader);
if (!s.ok()) { if (!s.ok()) {
RecordTick(statistics, NO_FILE_ERRORS); RecordTick(statistics, NO_FILE_ERRORS);

@ -32,7 +32,8 @@ class BlobFileCache {
BlobFileCache(const BlobFileCache&) = delete; BlobFileCache(const BlobFileCache&) = delete;
BlobFileCache& operator=(const BlobFileCache&) = delete; BlobFileCache& operator=(const BlobFileCache&) = delete;
Status GetBlobFileReader(uint64_t blob_file_number, Status GetBlobFileReader(const ReadOptions& read_options,
uint64_t blob_file_number,
CacheHandleGuard<BlobFileReader>* blob_file_reader); CacheHandleGuard<BlobFileReader>* blob_file_reader);
private: private:

@ -118,7 +118,9 @@ TEST_F(BlobFileCacheTest, GetBlobFileReader) {
// First try: reader should be opened and put in cache // First try: reader should be opened and put in cache
CacheHandleGuard<BlobFileReader> first; CacheHandleGuard<BlobFileReader> first;
ASSERT_OK(blob_file_cache.GetBlobFileReader(blob_file_number, &first)); const ReadOptions read_options;
ASSERT_OK(blob_file_cache.GetBlobFileReader(read_options, blob_file_number,
&first));
ASSERT_NE(first.GetValue(), nullptr); ASSERT_NE(first.GetValue(), nullptr);
ASSERT_EQ(options.statistics->getTickerCount(NO_FILE_OPENS), 1); ASSERT_EQ(options.statistics->getTickerCount(NO_FILE_OPENS), 1);
ASSERT_EQ(options.statistics->getTickerCount(NO_FILE_ERRORS), 0); ASSERT_EQ(options.statistics->getTickerCount(NO_FILE_ERRORS), 0);
@ -126,7 +128,8 @@ TEST_F(BlobFileCacheTest, GetBlobFileReader) {
// Second try: reader should be served from cache // Second try: reader should be served from cache
CacheHandleGuard<BlobFileReader> second; CacheHandleGuard<BlobFileReader> second;
ASSERT_OK(blob_file_cache.GetBlobFileReader(blob_file_number, &second)); ASSERT_OK(blob_file_cache.GetBlobFileReader(read_options, blob_file_number,
&second));
ASSERT_NE(second.GetValue(), nullptr); ASSERT_NE(second.GetValue(), nullptr);
ASSERT_EQ(options.statistics->getTickerCount(NO_FILE_OPENS), 1); ASSERT_EQ(options.statistics->getTickerCount(NO_FILE_OPENS), 1);
ASSERT_EQ(options.statistics->getTickerCount(NO_FILE_ERRORS), 0); ASSERT_EQ(options.statistics->getTickerCount(NO_FILE_ERRORS), 0);
@ -163,19 +166,21 @@ TEST_F(BlobFileCacheTest, GetBlobFileReader_Race) {
CacheHandleGuard<BlobFileReader> first; CacheHandleGuard<BlobFileReader> first;
CacheHandleGuard<BlobFileReader> second; CacheHandleGuard<BlobFileReader> second;
const ReadOptions read_options;
SyncPoint::GetInstance()->SetCallBack( SyncPoint::GetInstance()->SetCallBack(
"BlobFileCache::GetBlobFileReader:DoubleCheck", [&](void* /* arg */) { "BlobFileCache::GetBlobFileReader:DoubleCheck", [&](void* /* arg */) {
// Disabling sync points to prevent infinite recursion // Disabling sync points to prevent infinite recursion
SyncPoint::GetInstance()->DisableProcessing(); SyncPoint::GetInstance()->DisableProcessing();
ASSERT_OK(blob_file_cache.GetBlobFileReader(read_options,
ASSERT_OK(blob_file_cache.GetBlobFileReader(blob_file_number, &second)); blob_file_number, &second));
ASSERT_NE(second.GetValue(), nullptr); ASSERT_NE(second.GetValue(), nullptr);
ASSERT_EQ(options.statistics->getTickerCount(NO_FILE_OPENS), 1); ASSERT_EQ(options.statistics->getTickerCount(NO_FILE_OPENS), 1);
ASSERT_EQ(options.statistics->getTickerCount(NO_FILE_ERRORS), 0); ASSERT_EQ(options.statistics->getTickerCount(NO_FILE_ERRORS), 0);
}); });
SyncPoint::GetInstance()->EnableProcessing(); SyncPoint::GetInstance()->EnableProcessing();
ASSERT_OK(blob_file_cache.GetBlobFileReader(blob_file_number, &first)); ASSERT_OK(blob_file_cache.GetBlobFileReader(read_options, blob_file_number,
&first));
ASSERT_NE(first.GetValue(), nullptr); ASSERT_NE(first.GetValue(), nullptr);
ASSERT_EQ(options.statistics->getTickerCount(NO_FILE_OPENS), 1); ASSERT_EQ(options.statistics->getTickerCount(NO_FILE_OPENS), 1);
ASSERT_EQ(options.statistics->getTickerCount(NO_FILE_ERRORS), 0); ASSERT_EQ(options.statistics->getTickerCount(NO_FILE_ERRORS), 0);
@ -213,8 +218,10 @@ TEST_F(BlobFileCacheTest, GetBlobFileReader_IOError) {
CacheHandleGuard<BlobFileReader> reader; CacheHandleGuard<BlobFileReader> reader;
const ReadOptions read_options;
ASSERT_TRUE( ASSERT_TRUE(
blob_file_cache.GetBlobFileReader(blob_file_number, &reader).IsIOError()); blob_file_cache.GetBlobFileReader(read_options, blob_file_number, &reader)
.IsIOError());
ASSERT_EQ(reader.GetValue(), nullptr); ASSERT_EQ(reader.GetValue(), nullptr);
ASSERT_EQ(options.statistics->getTickerCount(NO_FILE_OPENS), 1); ASSERT_EQ(options.statistics->getTickerCount(NO_FILE_OPENS), 1);
ASSERT_EQ(options.statistics->getTickerCount(NO_FILE_ERRORS), 1); ASSERT_EQ(options.statistics->getTickerCount(NO_FILE_ERRORS), 1);
@ -253,8 +260,10 @@ TEST_F(BlobFileCacheTest, GetBlobFileReader_CacheFull) {
// strict_capacity_limit is set // strict_capacity_limit is set
CacheHandleGuard<BlobFileReader> reader; CacheHandleGuard<BlobFileReader> reader;
ASSERT_TRUE(blob_file_cache.GetBlobFileReader(blob_file_number, &reader) const ReadOptions read_options;
.IsMemoryLimit()); ASSERT_TRUE(
blob_file_cache.GetBlobFileReader(read_options, blob_file_number, &reader)
.IsMemoryLimit());
ASSERT_EQ(reader.GetValue(), nullptr); ASSERT_EQ(reader.GetValue(), nullptr);
ASSERT_EQ(options.statistics->getTickerCount(NO_FILE_OPENS), 1); ASSERT_EQ(options.statistics->getTickerCount(NO_FILE_OPENS), 1);
ASSERT_EQ(options.statistics->getTickerCount(NO_FILE_ERRORS), 1); ASSERT_EQ(options.statistics->getTickerCount(NO_FILE_ERRORS), 1);

@ -26,9 +26,10 @@
namespace ROCKSDB_NAMESPACE { namespace ROCKSDB_NAMESPACE {
Status BlobFileReader::Create( Status BlobFileReader::Create(
const ImmutableOptions& immutable_options, const FileOptions& file_options, const ImmutableOptions& immutable_options, const ReadOptions& read_options,
uint32_t column_family_id, HistogramImpl* blob_file_read_hist, const FileOptions& file_options, uint32_t column_family_id,
uint64_t blob_file_number, const std::shared_ptr<IOTracer>& io_tracer, HistogramImpl* blob_file_read_hist, uint64_t blob_file_number,
const std::shared_ptr<IOTracer>& io_tracer,
std::unique_ptr<BlobFileReader>* blob_file_reader) { std::unique_ptr<BlobFileReader>* blob_file_reader) {
assert(blob_file_reader); assert(blob_file_reader);
assert(!*blob_file_reader); assert(!*blob_file_reader);
@ -52,15 +53,17 @@ Status BlobFileReader::Create(
CompressionType compression_type = kNoCompression; CompressionType compression_type = kNoCompression;
{ {
const Status s = ReadHeader(file_reader.get(), column_family_id, statistics, const Status s =
&compression_type); ReadHeader(file_reader.get(), read_options, column_family_id,
statistics, &compression_type);
if (!s.ok()) { if (!s.ok()) {
return s; return s;
} }
} }
{ {
const Status s = ReadFooter(file_reader.get(), file_size, statistics); const Status s =
ReadFooter(file_reader.get(), read_options, file_size, statistics);
if (!s.ok()) { if (!s.ok()) {
return s; return s;
} }
@ -134,6 +137,7 @@ Status BlobFileReader::OpenFile(
} }
Status BlobFileReader::ReadHeader(const RandomAccessFileReader* file_reader, Status BlobFileReader::ReadHeader(const RandomAccessFileReader* file_reader,
const ReadOptions& read_options,
uint32_t column_family_id, uint32_t column_family_id,
Statistics* statistics, Statistics* statistics,
CompressionType* compression_type) { CompressionType* compression_type) {
@ -151,9 +155,10 @@ Status BlobFileReader::ReadHeader(const RandomAccessFileReader* file_reader,
constexpr size_t read_size = BlobLogHeader::kSize; constexpr size_t read_size = BlobLogHeader::kSize;
// TODO: rate limit reading headers from blob files. // TODO: rate limit reading headers from blob files.
const Status s = ReadFromFile(file_reader, read_offset, read_size, const Status s =
statistics, &header_slice, &buf, &aligned_buf, ReadFromFile(file_reader, read_options, read_offset, read_size,
Env::IO_TOTAL /* rate_limiter_priority */); statistics, &header_slice, &buf, &aligned_buf,
Env::IO_TOTAL /* rate_limiter_priority */);
if (!s.ok()) { if (!s.ok()) {
return s; return s;
} }
@ -187,6 +192,7 @@ Status BlobFileReader::ReadHeader(const RandomAccessFileReader* file_reader,
} }
Status BlobFileReader::ReadFooter(const RandomAccessFileReader* file_reader, Status BlobFileReader::ReadFooter(const RandomAccessFileReader* file_reader,
const ReadOptions& read_options,
uint64_t file_size, Statistics* statistics) { uint64_t file_size, Statistics* statistics) {
assert(file_size >= BlobLogHeader::kSize + BlobLogFooter::kSize); assert(file_size >= BlobLogHeader::kSize + BlobLogFooter::kSize);
assert(file_reader); assert(file_reader);
@ -202,9 +208,10 @@ Status BlobFileReader::ReadFooter(const RandomAccessFileReader* file_reader,
constexpr size_t read_size = BlobLogFooter::kSize; constexpr size_t read_size = BlobLogFooter::kSize;
// TODO: rate limit reading footers from blob files. // TODO: rate limit reading footers from blob files.
const Status s = ReadFromFile(file_reader, read_offset, read_size, const Status s =
statistics, &footer_slice, &buf, &aligned_buf, ReadFromFile(file_reader, read_options, read_offset, read_size,
Env::IO_TOTAL /* rate_limiter_priority */); statistics, &footer_slice, &buf, &aligned_buf,
Env::IO_TOTAL /* rate_limiter_priority */);
if (!s.ok()) { if (!s.ok()) {
return s; return s;
} }
@ -232,6 +239,7 @@ Status BlobFileReader::ReadFooter(const RandomAccessFileReader* file_reader,
} }
Status BlobFileReader::ReadFromFile(const RandomAccessFileReader* file_reader, Status BlobFileReader::ReadFromFile(const RandomAccessFileReader* file_reader,
const ReadOptions& read_options,
uint64_t read_offset, size_t read_size, uint64_t read_offset, size_t read_size,
Statistics* statistics, Slice* slice, Statistics* statistics, Slice* slice,
Buffer* buf, AlignedBuf* aligned_buf, Buffer* buf, AlignedBuf* aligned_buf,
@ -246,17 +254,23 @@ Status BlobFileReader::ReadFromFile(const RandomAccessFileReader* file_reader,
Status s; Status s;
IOOptions io_options;
s = file_reader->PrepareIOOptions(read_options, io_options);
if (!s.ok()) {
return s;
}
if (file_reader->use_direct_io()) { if (file_reader->use_direct_io()) {
constexpr char* scratch = nullptr; constexpr char* scratch = nullptr;
s = file_reader->Read(IOOptions(), read_offset, read_size, slice, scratch, s = file_reader->Read(io_options, read_offset, read_size, slice, scratch,
aligned_buf, rate_limiter_priority); aligned_buf, rate_limiter_priority);
} else { } else {
buf->reset(new char[read_size]); buf->reset(new char[read_size]);
constexpr AlignedBuf* aligned_scratch = nullptr; constexpr AlignedBuf* aligned_scratch = nullptr;
s = file_reader->Read(IOOptions(), read_offset, read_size, slice, s = file_reader->Read(io_options, read_offset, read_size, slice, buf->get(),
buf->get(), aligned_scratch, rate_limiter_priority); aligned_scratch, rate_limiter_priority);
} }
if (!s.ok()) { if (!s.ok()) {
@ -324,8 +338,13 @@ Status BlobFileReader::GetBlob(
Status s; Status s;
constexpr bool for_compaction = true; constexpr bool for_compaction = true;
IOOptions io_options;
s = file_reader_->PrepareIOOptions(read_options, io_options);
if (!s.ok()) {
return s;
}
prefetched = prefetch_buffer->TryReadFromCache( prefetched = prefetch_buffer->TryReadFromCache(
IOOptions(), file_reader_.get(), record_offset, io_options, file_reader_.get(), record_offset,
static_cast<size_t>(record_size), &record_slice, &s, static_cast<size_t>(record_size), &record_slice, &s,
read_options.rate_limiter_priority, for_compaction); read_options.rate_limiter_priority, for_compaction);
if (!s.ok()) { if (!s.ok()) {
@ -338,10 +357,10 @@ Status BlobFileReader::GetBlob(
PERF_COUNTER_ADD(blob_read_count, 1); PERF_COUNTER_ADD(blob_read_count, 1);
PERF_COUNTER_ADD(blob_read_byte, record_size); PERF_COUNTER_ADD(blob_read_byte, record_size);
PERF_TIMER_GUARD(blob_read_time); PERF_TIMER_GUARD(blob_read_time);
const Status s = ReadFromFile(file_reader_.get(), record_offset, const Status s = ReadFromFile(
static_cast<size_t>(record_size), statistics_, file_reader_.get(), read_options, record_offset,
&record_slice, &buf, &aligned_buf, static_cast<size_t>(record_size), statistics_, &record_slice, &buf,
read_options.rate_limiter_priority); &aligned_buf, read_options.rate_limiter_priority);
if (!s.ok()) { if (!s.ok()) {
return s; return s;
} }

@ -29,6 +29,7 @@ class Statistics;
class BlobFileReader { class BlobFileReader {
public: public:
static Status Create(const ImmutableOptions& immutable_options, static Status Create(const ImmutableOptions& immutable_options,
const ReadOptions& read_options,
const FileOptions& file_options, const FileOptions& file_options,
uint32_t column_family_id, uint32_t column_family_id,
HistogramImpl* blob_file_read_hist, HistogramImpl* blob_file_read_hist,
@ -74,15 +75,18 @@ class BlobFileReader {
std::unique_ptr<RandomAccessFileReader>* file_reader); std::unique_ptr<RandomAccessFileReader>* file_reader);
static Status ReadHeader(const RandomAccessFileReader* file_reader, static Status ReadHeader(const RandomAccessFileReader* file_reader,
const ReadOptions& read_options,
uint32_t column_family_id, Statistics* statistics, uint32_t column_family_id, Statistics* statistics,
CompressionType* compression_type); CompressionType* compression_type);
static Status ReadFooter(const RandomAccessFileReader* file_reader, static Status ReadFooter(const RandomAccessFileReader* file_reader,
uint64_t file_size, Statistics* statistics); const ReadOptions& read_options, uint64_t file_size,
Statistics* statistics);
using Buffer = std::unique_ptr<char[]>; using Buffer = std::unique_ptr<char[]>;
static Status ReadFromFile(const RandomAccessFileReader* file_reader, static Status ReadFromFile(const RandomAccessFileReader* file_reader,
const ReadOptions& read_options,
uint64_t read_offset, size_t read_size, uint64_t read_offset, size_t read_size,
Statistics* statistics, Slice* slice, Buffer* buf, Statistics* statistics, Slice* slice, Buffer* buf,
AlignedBuf* aligned_buf, AlignedBuf* aligned_buf,

@ -172,12 +172,12 @@ TEST_F(BlobFileReaderTest, CreateReaderAndGetBlob) {
std::unique_ptr<BlobFileReader> reader; std::unique_ptr<BlobFileReader> reader;
ReadOptions read_options;
ASSERT_OK(BlobFileReader::Create( ASSERT_OK(BlobFileReader::Create(
immutable_options, FileOptions(), column_family_id, blob_file_read_hist, immutable_options, read_options, FileOptions(), column_family_id,
blob_file_number, nullptr /*IOTracer*/, &reader)); blob_file_read_hist, blob_file_number, nullptr /*IOTracer*/, &reader));
// Make sure the blob can be retrieved with and without checksum verification // Make sure the blob can be retrieved with and without checksum verification
ReadOptions read_options;
read_options.verify_checksums = false; read_options.verify_checksums = false;
constexpr FilePrefetchBuffer* prefetch_buffer = nullptr; constexpr FilePrefetchBuffer* prefetch_buffer = nullptr;
@ -479,11 +479,11 @@ TEST_F(BlobFileReaderTest, Malformed) {
constexpr HistogramImpl* blob_file_read_hist = nullptr; constexpr HistogramImpl* blob_file_read_hist = nullptr;
std::unique_ptr<BlobFileReader> reader; std::unique_ptr<BlobFileReader> reader;
const ReadOptions read_options;
ASSERT_TRUE(BlobFileReader::Create(immutable_options, FileOptions(), ASSERT_TRUE(BlobFileReader::Create(immutable_options, read_options,
column_family_id, blob_file_read_hist, FileOptions(), column_family_id,
blob_file_number, nullptr /*IOTracer*/, blob_file_read_hist, blob_file_number,
&reader) nullptr /*IOTracer*/, &reader)
.IsCorruption()); .IsCorruption());
} }
@ -513,11 +513,11 @@ TEST_F(BlobFileReaderTest, TTL) {
constexpr HistogramImpl* blob_file_read_hist = nullptr; constexpr HistogramImpl* blob_file_read_hist = nullptr;
std::unique_ptr<BlobFileReader> reader; std::unique_ptr<BlobFileReader> reader;
const ReadOptions read_options;
ASSERT_TRUE(BlobFileReader::Create(immutable_options, FileOptions(), ASSERT_TRUE(BlobFileReader::Create(immutable_options, read_options,
column_family_id, blob_file_read_hist, FileOptions(), column_family_id,
blob_file_number, nullptr /*IOTracer*/, blob_file_read_hist, blob_file_number,
&reader) nullptr /*IOTracer*/, &reader)
.IsCorruption()); .IsCorruption());
} }
@ -552,11 +552,11 @@ TEST_F(BlobFileReaderTest, ExpirationRangeInHeader) {
constexpr HistogramImpl* blob_file_read_hist = nullptr; constexpr HistogramImpl* blob_file_read_hist = nullptr;
std::unique_ptr<BlobFileReader> reader; std::unique_ptr<BlobFileReader> reader;
const ReadOptions read_options;
ASSERT_TRUE(BlobFileReader::Create(immutable_options, FileOptions(), ASSERT_TRUE(BlobFileReader::Create(immutable_options, read_options,
column_family_id, blob_file_read_hist, FileOptions(), column_family_id,
blob_file_number, nullptr /*IOTracer*/, blob_file_read_hist, blob_file_number,
&reader) nullptr /*IOTracer*/, &reader)
.IsCorruption()); .IsCorruption());
} }
@ -591,11 +591,11 @@ TEST_F(BlobFileReaderTest, ExpirationRangeInFooter) {
constexpr HistogramImpl* blob_file_read_hist = nullptr; constexpr HistogramImpl* blob_file_read_hist = nullptr;
std::unique_ptr<BlobFileReader> reader; std::unique_ptr<BlobFileReader> reader;
const ReadOptions read_options;
ASSERT_TRUE(BlobFileReader::Create(immutable_options, FileOptions(), ASSERT_TRUE(BlobFileReader::Create(immutable_options, read_options,
column_family_id, blob_file_read_hist, FileOptions(), column_family_id,
blob_file_number, nullptr /*IOTracer*/, blob_file_read_hist, blob_file_number,
&reader) nullptr /*IOTracer*/, &reader)
.IsCorruption()); .IsCorruption());
} }
@ -629,9 +629,9 @@ TEST_F(BlobFileReaderTest, IncorrectColumnFamily) {
std::unique_ptr<BlobFileReader> reader; std::unique_ptr<BlobFileReader> reader;
constexpr uint32_t incorrect_column_family_id = 2; constexpr uint32_t incorrect_column_family_id = 2;
const ReadOptions read_options;
ASSERT_TRUE(BlobFileReader::Create(immutable_options, FileOptions(), ASSERT_TRUE(BlobFileReader::Create(immutable_options, read_options,
incorrect_column_family_id, FileOptions(), incorrect_column_family_id,
blob_file_read_hist, blob_file_number, blob_file_read_hist, blob_file_number,
nullptr /*IOTracer*/, &reader) nullptr /*IOTracer*/, &reader)
.IsCorruption()); .IsCorruption());
@ -664,10 +664,10 @@ TEST_F(BlobFileReaderTest, BlobCRCError) {
constexpr HistogramImpl* blob_file_read_hist = nullptr; constexpr HistogramImpl* blob_file_read_hist = nullptr;
std::unique_ptr<BlobFileReader> reader; std::unique_ptr<BlobFileReader> reader;
const ReadOptions read_options;
ASSERT_OK(BlobFileReader::Create( ASSERT_OK(BlobFileReader::Create(
immutable_options, FileOptions(), column_family_id, blob_file_read_hist, immutable_options, read_options, FileOptions(), column_family_id,
blob_file_number, nullptr /*IOTracer*/, &reader)); blob_file_read_hist, blob_file_number, nullptr /*IOTracer*/, &reader));
SyncPoint::GetInstance()->SetCallBack( SyncPoint::GetInstance()->SetCallBack(
"BlobFileReader::VerifyBlob:CheckBlobCRC", [](void* arg) { "BlobFileReader::VerifyBlob:CheckBlobCRC", [](void* arg) {
@ -728,13 +728,12 @@ TEST_F(BlobFileReaderTest, Compression) {
constexpr HistogramImpl* blob_file_read_hist = nullptr; constexpr HistogramImpl* blob_file_read_hist = nullptr;
std::unique_ptr<BlobFileReader> reader; std::unique_ptr<BlobFileReader> reader;
ReadOptions read_options;
ASSERT_OK(BlobFileReader::Create( ASSERT_OK(BlobFileReader::Create(
immutable_options, FileOptions(), column_family_id, blob_file_read_hist, immutable_options, read_options, FileOptions(), column_family_id,
blob_file_number, nullptr /*IOTracer*/, &reader)); blob_file_read_hist, blob_file_number, nullptr /*IOTracer*/, &reader));
// Make sure the blob can be retrieved with and without checksum verification // Make sure the blob can be retrieved with and without checksum verification
ReadOptions read_options;
read_options.verify_checksums = false; read_options.verify_checksums = false;
constexpr FilePrefetchBuffer* prefetch_buffer = nullptr; constexpr FilePrefetchBuffer* prefetch_buffer = nullptr;
@ -803,10 +802,10 @@ TEST_F(BlobFileReaderTest, UncompressionError) {
constexpr HistogramImpl* blob_file_read_hist = nullptr; constexpr HistogramImpl* blob_file_read_hist = nullptr;
std::unique_ptr<BlobFileReader> reader; std::unique_ptr<BlobFileReader> reader;
const ReadOptions read_options;
ASSERT_OK(BlobFileReader::Create( ASSERT_OK(BlobFileReader::Create(
immutable_options, FileOptions(), column_family_id, blob_file_read_hist, immutable_options, read_options, FileOptions(), column_family_id,
blob_file_number, nullptr /*IOTracer*/, &reader)); blob_file_read_hist, blob_file_number, nullptr /*IOTracer*/, &reader));
SyncPoint::GetInstance()->SetCallBack( SyncPoint::GetInstance()->SetCallBack(
"BlobFileReader::UncompressBlobIfNeeded:TamperWithResult", [](void* arg) { "BlobFileReader::UncompressBlobIfNeeded:TamperWithResult", [](void* arg) {
@ -895,10 +894,10 @@ TEST_P(BlobFileReaderIOErrorTest, IOError) {
constexpr HistogramImpl* blob_file_read_hist = nullptr; constexpr HistogramImpl* blob_file_read_hist = nullptr;
std::unique_ptr<BlobFileReader> reader; std::unique_ptr<BlobFileReader> reader;
const ReadOptions read_options;
const Status s = BlobFileReader::Create( const Status s = BlobFileReader::Create(
immutable_options, FileOptions(), column_family_id, blob_file_read_hist, immutable_options, read_options, FileOptions(), column_family_id,
blob_file_number, nullptr /*IOTracer*/, &reader); blob_file_read_hist, blob_file_number, nullptr /*IOTracer*/, &reader);
const bool fail_during_create = const bool fail_during_create =
(sync_point_ != "BlobFileReader::GetBlob:ReadFromFile"); (sync_point_ != "BlobFileReader::GetBlob:ReadFromFile");
@ -983,10 +982,10 @@ TEST_P(BlobFileReaderDecodingErrorTest, DecodingError) {
constexpr HistogramImpl* blob_file_read_hist = nullptr; constexpr HistogramImpl* blob_file_read_hist = nullptr;
std::unique_ptr<BlobFileReader> reader; std::unique_ptr<BlobFileReader> reader;
const ReadOptions read_options;
const Status s = BlobFileReader::Create( const Status s = BlobFileReader::Create(
immutable_options, FileOptions(), column_family_id, blob_file_read_hist, immutable_options, read_options, FileOptions(), column_family_id,
blob_file_number, nullptr /*IOTracer*/, &reader); blob_file_read_hist, blob_file_number, nullptr /*IOTracer*/, &reader);
const bool fail_during_create = const bool fail_during_create =
sync_point_ != "BlobFileReader::GetBlob:TamperWithResult"; sync_point_ != "BlobFileReader::GetBlob:TamperWithResult";

@ -209,7 +209,8 @@ Status BlobSource::GetBlob(const ReadOptions& read_options,
{ {
CacheHandleGuard<BlobFileReader> blob_file_reader; CacheHandleGuard<BlobFileReader> blob_file_reader;
s = blob_file_cache_->GetBlobFileReader(file_number, &blob_file_reader); s = blob_file_cache_->GetBlobFileReader(read_options, file_number,
&blob_file_reader);
if (!s.ok()) { if (!s.ok()) {
return s; return s;
} }
@ -372,8 +373,8 @@ void BlobSource::MultiGetBlobFromOneFile(const ReadOptions& read_options,
} }
CacheHandleGuard<BlobFileReader> blob_file_reader; CacheHandleGuard<BlobFileReader> blob_file_reader;
Status s = Status s = blob_file_cache_->GetBlobFileReader(read_options, file_number,
blob_file_cache_->GetBlobFileReader(file_number, &blob_file_reader); &blob_file_reader);
if (!s.ok()) { if (!s.ok()) {
for (size_t i = 0; i < _blob_reqs.size(); ++i) { for (size_t i = 0; i < _blob_reqs.size(); ++i) {
BlobReadRequest* const req = _blob_reqs[i].first; BlobReadRequest* const req = _blob_reqs[i].first;

@ -95,9 +95,9 @@ class BlobSource {
uint64_t* bytes_read); uint64_t* bytes_read);
inline Status GetBlobFileReader( inline Status GetBlobFileReader(
uint64_t blob_file_number, const ReadOptions& read_options, uint64_t blob_file_number,
CacheHandleGuard<BlobFileReader>* blob_file_reader) { CacheHandleGuard<BlobFileReader>* blob_file_reader) {
return blob_file_cache_->GetBlobFileReader(blob_file_number, return blob_file_cache_->GetBlobFileReader(read_options, blob_file_number,
blob_file_reader); blob_file_reader);
} }

@ -517,7 +517,8 @@ TEST_F(BlobSourceTest, GetCompressedBlobs) {
compression, blob_offsets, blob_sizes); compression, blob_offsets, blob_sizes);
CacheHandleGuard<BlobFileReader> blob_file_reader; CacheHandleGuard<BlobFileReader> blob_file_reader;
ASSERT_OK(blob_source.GetBlobFileReader(file_number, &blob_file_reader)); ASSERT_OK(blob_source.GetBlobFileReader(read_options, file_number,
&blob_file_reader));
ASSERT_NE(blob_file_reader.GetValue(), nullptr); ASSERT_NE(blob_file_reader.GetValue(), nullptr);
const uint64_t file_size = blob_file_reader.GetValue()->GetFileSize(); const uint64_t file_size = blob_file_reader.GetValue()->GetFileSize();
@ -1139,12 +1140,13 @@ TEST_F(BlobSecondaryCacheTest, GetBlobsFromSecondaryCache) {
blob_file_cache.get()); blob_file_cache.get());
CacheHandleGuard<BlobFileReader> file_reader; CacheHandleGuard<BlobFileReader> file_reader;
ASSERT_OK(blob_source.GetBlobFileReader(file_number, &file_reader)); ReadOptions read_options;
ASSERT_OK(
blob_source.GetBlobFileReader(read_options, file_number, &file_reader));
ASSERT_NE(file_reader.GetValue(), nullptr); ASSERT_NE(file_reader.GetValue(), nullptr);
const uint64_t file_size = file_reader.GetValue()->GetFileSize(); const uint64_t file_size = file_reader.GetValue()->GetFileSize();
ASSERT_EQ(file_reader.GetValue()->GetCompressionType(), kNoCompression); ASSERT_EQ(file_reader.GetValue()->GetCompressionType(), kNoCompression);
ReadOptions read_options;
read_options.verify_checksums = true; read_options.verify_checksums = true;
auto blob_cache = options_.blob_cache; auto blob_cache = options_.blob_cache;

@ -56,8 +56,8 @@ TableBuilder* NewTableBuilder(const TableBuilderOptions& tboptions,
Status BuildTable( Status BuildTable(
const std::string& dbname, VersionSet* versions, const std::string& dbname, VersionSet* versions,
const ImmutableDBOptions& db_options, const TableBuilderOptions& tboptions, const ImmutableDBOptions& db_options, const TableBuilderOptions& tboptions,
const FileOptions& file_options, TableCache* table_cache, const FileOptions& file_options, const ReadOptions& read_options,
InternalIterator* iter, TableCache* table_cache, InternalIterator* iter,
std::vector<std::unique_ptr<FragmentedRangeTombstoneIterator>> std::vector<std::unique_ptr<FragmentedRangeTombstoneIterator>>
range_del_iters, range_del_iters,
FileMetaData* meta, std::vector<BlobFileAddition>* blob_file_additions, FileMetaData* meta, std::vector<BlobFileAddition>* blob_file_additions,
@ -255,8 +255,8 @@ Status BuildTable(
SizeApproximationOptions approx_opts; SizeApproximationOptions approx_opts;
approx_opts.files_size_error_margin = 0.1; approx_opts.files_size_error_margin = 0.1;
meta->compensated_range_deletion_size += versions->ApproximateSize( meta->compensated_range_deletion_size += versions->ApproximateSize(
approx_opts, version, kv.first.Encode(), tombstone_end.Encode(), approx_opts, read_options, version, kv.first.Encode(),
0 /* start_level */, -1 /* end_level */, tombstone_end.Encode(), 0 /* start_level */, -1 /* end_level */,
TableReaderCaller::kFlush); TableReaderCaller::kFlush);
} }
last_tombstone_start_user_key = range_del_it->start_key(); last_tombstone_start_user_key = range_del_it->start_key();
@ -369,7 +369,6 @@ Status BuildTable(
// here because this is a special case after we finish the table building. // here because this is a special case after we finish the table building.
// No matter whether use_direct_io_for_flush_and_compaction is true, // No matter whether use_direct_io_for_flush_and_compaction is true,
// the goal is to cache it here for further user reads. // the goal is to cache it here for further user reads.
ReadOptions read_options;
std::unique_ptr<InternalIterator> it(table_cache->NewIterator( std::unique_ptr<InternalIterator> it(table_cache->NewIterator(
read_options, file_options, tboptions.internal_comparator, *meta, read_options, file_options, tboptions.internal_comparator, *meta,
nullptr /* range_del_agg */, mutable_cf_options.prefix_extractor, nullptr /* range_del_agg */, mutable_cf_options.prefix_extractor,

@ -53,8 +53,8 @@ TableBuilder* NewTableBuilder(const TableBuilderOptions& tboptions,
extern Status BuildTable( extern Status BuildTable(
const std::string& dbname, VersionSet* versions, const std::string& dbname, VersionSet* versions,
const ImmutableDBOptions& db_options, const TableBuilderOptions& tboptions, const ImmutableDBOptions& db_options, const TableBuilderOptions& tboptions,
const FileOptions& file_options, TableCache* table_cache, const FileOptions& file_options, const ReadOptions& read_options,
InternalIterator* iter, TableCache* table_cache, InternalIterator* iter,
std::vector<std::unique_ptr<FragmentedRangeTombstoneIterator>> std::vector<std::unique_ptr<FragmentedRangeTombstoneIterator>>
range_del_iters, range_del_iters,
FileMetaData* meta, std::vector<BlobFileAddition>* blob_file_additions, FileMetaData* meta, std::vector<BlobFileAddition>* blob_file_additions,

@ -1141,6 +1141,7 @@ Status ColumnFamilyData::RangesOverlapWithMemtables(
*overlap = false; *overlap = false;
// Create an InternalIterator over all unflushed memtables // Create an InternalIterator over all unflushed memtables
Arena arena; Arena arena;
// TODO: plumb Env::IOActivity
ReadOptions read_opts; ReadOptions read_opts;
read_opts.total_order_seek = true; read_opts.total_order_seek = true;
MergeIteratorBuilder merge_iter_builder(&internal_comparator_, &arena); MergeIteratorBuilder merge_iter_builder(&internal_comparator_, &arena);

@ -1412,6 +1412,7 @@ std::unique_ptr<BlobFetcher> CompactionIterator::CreateBlobFetcherIfNeeded(
} }
ReadOptions read_options; ReadOptions read_options;
read_options.io_activity = Env::IOActivity::kCompaction;
read_options.fill_cache = false; read_options.fill_cache = false;
return std::unique_ptr<BlobFetcher>(new BlobFetcher(version, read_options)); return std::unique_ptr<BlobFetcher>(new BlobFetcher(version, read_options));

@ -192,8 +192,8 @@ CompactionJob::CompactionJob(
assert(log_buffer_ != nullptr); assert(log_buffer_ != nullptr);
const auto* cfd = compact_->compaction->column_family_data(); const auto* cfd = compact_->compaction->column_family_data();
ThreadStatusUtil::SetColumnFamily(cfd, cfd->ioptions()->env, ThreadStatusUtil::SetEnableTracking(db_options_.enable_thread_tracking);
db_options_.enable_thread_tracking); ThreadStatusUtil::SetColumnFamily(cfd);
ThreadStatusUtil::SetThreadOperation(ThreadStatus::OP_COMPACTION); ThreadStatusUtil::SetThreadOperation(ThreadStatus::OP_COMPACTION);
ReportStartedCompaction(compaction); ReportStartedCompaction(compaction);
} }
@ -204,10 +204,6 @@ CompactionJob::~CompactionJob() {
} }
void CompactionJob::ReportStartedCompaction(Compaction* compaction) { void CompactionJob::ReportStartedCompaction(Compaction* compaction) {
const auto* cfd = compact_->compaction->column_family_data();
ThreadStatusUtil::SetColumnFamily(cfd, cfd->ioptions()->env,
db_options_.enable_thread_tracking);
ThreadStatusUtil::SetThreadOperationProperty(ThreadStatus::COMPACTION_JOB_ID, ThreadStatusUtil::SetThreadOperationProperty(ThreadStatus::COMPACTION_JOB_ID,
job_id_); job_id_);
@ -291,12 +287,14 @@ void CompactionJob::Prepare() {
c->immutable_options()->preclude_last_level_data_seconds); c->immutable_options()->preclude_last_level_data_seconds);
if (preserve_time_duration > 0) { if (preserve_time_duration > 0) {
const ReadOptions read_options(Env::IOActivity::kCompaction);
// setup seqno_time_mapping_ // setup seqno_time_mapping_
seqno_time_mapping_.SetMaxTimeDuration(preserve_time_duration); seqno_time_mapping_.SetMaxTimeDuration(preserve_time_duration);
for (const auto& each_level : *c->inputs()) { for (const auto& each_level : *c->inputs()) {
for (const auto& fmd : each_level.files) { for (const auto& fmd : each_level.files) {
std::shared_ptr<const TableProperties> tp; std::shared_ptr<const TableProperties> tp;
Status s = cfd->current()->GetTableProperties(&tp, fmd, nullptr); Status s =
cfd->current()->GetTableProperties(read_options, &tp, fmd, nullptr);
if (s.ok()) { if (s.ok()) {
seqno_time_mapping_.Add(tp->seqno_to_time_mapping) seqno_time_mapping_.Add(tp->seqno_to_time_mapping)
.PermitUncheckedError(); .PermitUncheckedError();
@ -472,7 +470,7 @@ void CompactionJob::GenSubcompactionBoundaries() {
// overlap with N-1 other ranges. Since we requested a relatively large number // overlap with N-1 other ranges. Since we requested a relatively large number
// (128) of ranges from each input files, even N range overlapping would // (128) of ranges from each input files, even N range overlapping would
// cause relatively small inaccuracy. // cause relatively small inaccuracy.
const ReadOptions read_options(Env::IOActivity::kCompaction);
auto* c = compact_->compaction; auto* c = compact_->compaction;
if (c->max_subcompactions() <= 1 && if (c->max_subcompactions() <= 1 &&
!(c->immutable_options()->compaction_pri == kRoundRobin && !(c->immutable_options()->compaction_pri == kRoundRobin &&
@ -506,7 +504,7 @@ void CompactionJob::GenSubcompactionBoundaries() {
FileMetaData* f = flevel->files[i].file_metadata; FileMetaData* f = flevel->files[i].file_metadata;
std::vector<TableReader::Anchor> my_anchors; std::vector<TableReader::Anchor> my_anchors;
Status s = cfd->table_cache()->ApproximateKeyAnchors( Status s = cfd->table_cache()->ApproximateKeyAnchors(
ReadOptions(), icomp, *f, my_anchors); read_options, icomp, *f, my_anchors);
if (!s.ok() || my_anchors.empty()) { if (!s.ok() || my_anchors.empty()) {
my_anchors.emplace_back(f->largest.user_key(), f->fd.GetFileSize()); my_anchors.emplace_back(f->largest.user_key(), f->fd.GetFileSize());
} }
@ -722,11 +720,12 @@ Status CompactionJob::Run() {
// use_direct_io_for_flush_and_compaction is true, we will regard this // use_direct_io_for_flush_and_compaction is true, we will regard this
// verification as user reads since the goal is to cache it here for // verification as user reads since the goal is to cache it here for
// further user reads // further user reads
ReadOptions read_options; const ReadOptions verify_table_read_options(
Env::IOActivity::kCompaction);
InternalIterator* iter = cfd->table_cache()->NewIterator( InternalIterator* iter = cfd->table_cache()->NewIterator(
read_options, file_options_, cfd->internal_comparator(), verify_table_read_options, file_options_,
files_output[file_idx]->meta, /*range_del_agg=*/nullptr, cfd->internal_comparator(), files_output[file_idx]->meta,
prefix_extractor, /*range_del_agg=*/nullptr, prefix_extractor,
/*table_reader_ptr=*/nullptr, /*table_reader_ptr=*/nullptr,
cfd->internal_stats()->GetFileReadHist( cfd->internal_stats()->GetFileReadHist(
compact_->compaction->output_level()), compact_->compaction->output_level()),
@ -1032,7 +1031,6 @@ void CompactionJob::NotifyOnSubcompactionCompleted(
void CompactionJob::ProcessKeyValueCompaction(SubcompactionState* sub_compact) { void CompactionJob::ProcessKeyValueCompaction(SubcompactionState* sub_compact) {
assert(sub_compact); assert(sub_compact);
assert(sub_compact->compaction); assert(sub_compact->compaction);
if (db_options_.compaction_service) { if (db_options_.compaction_service) {
CompactionServiceJobStatus comp_status = CompactionServiceJobStatus comp_status =
ProcessKeyValueCompactionWithCompactionService(sub_compact); ProcessKeyValueCompactionWithCompactionService(sub_compact);
@ -1083,6 +1081,7 @@ void CompactionJob::ProcessKeyValueCompaction(SubcompactionState* sub_compact) {
read_options.verify_checksums = true; read_options.verify_checksums = true;
read_options.fill_cache = false; read_options.fill_cache = false;
read_options.rate_limiter_priority = GetRateLimiterPriority(); read_options.rate_limiter_priority = GetRateLimiterPriority();
read_options.io_activity = Env::IOActivity::kCompaction;
// Compaction iterators shouldn't be confined to a single prefix. // Compaction iterators shouldn't be confined to a single prefix.
// Compactions use Seek() for // Compactions use Seek() for
// (a) concurrent compactions, // (a) concurrent compactions,
@ -1640,6 +1639,7 @@ Status CompactionJob::InstallCompactionResults(
db_mutex_->AssertHeld(); db_mutex_->AssertHeld();
const ReadOptions read_options(Env::IOActivity::kCompaction);
auto* compaction = compact_->compaction; auto* compaction = compact_->compaction;
assert(compaction); assert(compaction);
@ -1717,8 +1717,8 @@ Status CompactionJob::InstallCompactionResults(
} }
return versions_->LogAndApply(compaction->column_family_data(), return versions_->LogAndApply(compaction->column_family_data(),
mutable_cf_options, edit, db_mutex_, mutable_cf_options, read_options, edit,
db_directory_); db_mutex_, db_directory_);
} }
void CompactionJob::RecordCompactionIOStats() { void CompactionJob::RecordCompactionIOStats() {

@ -389,9 +389,9 @@ class CompactionJobTestBase : public testing::Test {
0); 0);
mutex_.Lock(); mutex_.Lock();
EXPECT_OK( EXPECT_OK(versions_->LogAndApply(
versions_->LogAndApply(versions_->GetColumnFamilySet()->GetDefault(), versions_->GetColumnFamilySet()->GetDefault(), mutable_cf_options_,
mutable_cf_options_, &edit, &mutex_, nullptr)); read_options_, &edit, &mutex_, nullptr));
mutex_.Unlock(); mutex_.Unlock();
} }
@ -727,6 +727,7 @@ class CompactionJobTestBase : public testing::Test {
ColumnFamilyOptions cf_options_; ColumnFamilyOptions cf_options_;
MutableCFOptions mutable_cf_options_; MutableCFOptions mutable_cf_options_;
MutableDBOptions mutable_db_options_; MutableDBOptions mutable_db_options_;
const ReadOptions read_options_;
std::shared_ptr<Cache> table_cache_; std::shared_ptr<Cache> table_cache_;
WriteController write_controller_; WriteController write_controller_;
WriteBufferManager write_buffer_manager_; WriteBufferManager write_buffer_manager_;
@ -2440,4 +2441,3 @@ int main(int argc, char** argv) {
RegisterCustomObjects(argc, argv); RegisterCustomObjects(argc, argv);
return RUN_ALL_TESTS(); return RUN_ALL_TESTS();
} }

@ -574,6 +574,7 @@ Status CompactionOutputs::AddRangeDels(
auto it = range_del_agg_->NewIterator(lower_bound, upper_bound); auto it = range_del_agg_->NewIterator(lower_bound, upper_bound);
Slice last_tombstone_start_user_key{}; Slice last_tombstone_start_user_key{};
bool reached_lower_bound = false; bool reached_lower_bound = false;
const ReadOptions read_options(Env::IOActivity::kCompaction);
for (it->SeekToFirst(); it->Valid(); it->Next()) { for (it->SeekToFirst(); it->Valid(); it->Next()) {
auto tombstone = it->Tombstone(); auto tombstone = it->Tombstone();
auto kv = tombstone.Serialize(); auto kv = tombstone.Serialize();
@ -713,7 +714,7 @@ Status CompactionOutputs::AddRangeDels(
approx_opts.files_size_error_margin = 0.1; approx_opts.files_size_error_margin = 0.1;
auto approximate_covered_size = auto approximate_covered_size =
compaction_->input_version()->version_set()->ApproximateSize( compaction_->input_version()->version_set()->ApproximateSize(
approx_opts, compaction_->input_version(), approx_opts, read_options, compaction_->input_version(),
tombstone_start.Encode(), tombstone_end.Encode(), tombstone_start.Encode(), tombstone_end.Encode(),
compaction_->output_level() + 1 /* start_level */, compaction_->output_level() + 1 /* start_level */,
-1 /* end_level */, kCompaction); -1 /* end_level */, kCompaction);

@ -33,7 +33,9 @@ Status DeleteFilesInRanges(DB* db, ColumnFamilyHandle* column_family,
Status VerifySstFileChecksum(const Options& options, Status VerifySstFileChecksum(const Options& options,
const EnvOptions& env_options, const EnvOptions& env_options,
const std::string& file_path) { const std::string& file_path) {
return VerifySstFileChecksum(options, env_options, ReadOptions(), file_path); // TODO: plumb Env::IOActivity
const ReadOptions read_options;
return VerifySstFileChecksum(options, env_options, read_options, file_path);
} }
Status VerifySstFileChecksum(const Options& options, Status VerifySstFileChecksum(const Options& options,
const EnvOptions& env_options, const EnvOptions& env_options,
@ -56,8 +58,9 @@ Status VerifySstFileChecksum(const Options& options,
std::unique_ptr<RandomAccessFileReader> file_reader( std::unique_ptr<RandomAccessFileReader> file_reader(
new RandomAccessFileReader( new RandomAccessFileReader(
std::move(file), file_path, ioptions.clock, nullptr /* io_tracer */, std::move(file), file_path, ioptions.clock, nullptr /* io_tracer */,
nullptr /* stats */, 0 /* hist_type */, nullptr /* file_read_hist */, ioptions.stats /* stats */,
ioptions.rate_limiter.get())); Histograms::SST_READ_MICROS /* hist_type */,
nullptr /* file_read_hist */, ioptions.rate_limiter.get()));
const bool kImmortal = true; const bool kImmortal = true;
auto reader_options = TableReaderOptions( auto reader_options = TableReaderOptions(
ioptions, options.prefix_extractor, env_options, internal_comparator, ioptions, options.prefix_extractor, env_options, internal_comparator,
@ -76,4 +79,3 @@ Status VerifySstFileChecksum(const Options& options,
} }
} // namespace ROCKSDB_NAMESPACE } // namespace ROCKSDB_NAMESPACE

@ -762,9 +762,11 @@ TEST_F(CorruptionTest, RangeDeletionCorrupted) {
fs->GetFileSize(filename, file_opts.io_options, &file_size, nullptr)); fs->GetFileSize(filename, file_opts.io_options, &file_size, nullptr));
BlockHandle range_del_handle; BlockHandle range_del_handle;
ASSERT_OK(FindMetaBlockInFile( const ReadOptions read_options;
file_reader.get(), file_size, kBlockBasedTableMagicNumber, ASSERT_OK(FindMetaBlockInFile(file_reader.get(), file_size,
ImmutableOptions(options_), kRangeDelBlockName, &range_del_handle)); kBlockBasedTableMagicNumber,
ImmutableOptions(options_), read_options,
kRangeDelBlockName, &range_del_handle));
ASSERT_OK(TryReopen()); ASSERT_OK(TryReopen());
ASSERT_OK(test::CorruptFile(env_.get(), filename, ASSERT_OK(test::CorruptFile(env_.get(), filename,
@ -1666,4 +1668,3 @@ int main(int argc, char** argv) {
RegisterCustomObjects(argc, argv); RegisterCustomObjects(argc, argv);
return RUN_ALL_TESTS(); return RUN_ALL_TESTS();
} }

@ -46,6 +46,11 @@ Status CompactedDBImpl::Get(const ReadOptions& options, ColumnFamilyHandle*,
Status CompactedDBImpl::Get(const ReadOptions& options, ColumnFamilyHandle*, Status CompactedDBImpl::Get(const ReadOptions& options, ColumnFamilyHandle*,
const Slice& key, PinnableSlice* value, const Slice& key, PinnableSlice* value,
std::string* timestamp) { std::string* timestamp) {
if (options.io_activity != Env::IOActivity::kUnknown) {
return Status::InvalidArgument(
"Cannot call Get with `ReadOptions::io_activity` != "
"`Env::IOActivity::kUnknown`");
}
assert(user_comparator_); assert(user_comparator_);
if (options.timestamp) { if (options.timestamp) {
const Status s = FailIfTsMismatchCf( const Status s = FailIfTsMismatchCf(

@ -327,6 +327,9 @@ Status DBImpl::Resume() {
// means a new super version wouldn't have been installed // means a new super version wouldn't have been installed
Status DBImpl::ResumeImpl(DBRecoverContext context) { Status DBImpl::ResumeImpl(DBRecoverContext context) {
mutex_.AssertHeld(); mutex_.AssertHeld();
// TODO: plumb Env::IOActivity
const ReadOptions read_options;
WaitForBackgroundWork(); WaitForBackgroundWork();
Status s; Status s;
@ -368,7 +371,7 @@ Status DBImpl::ResumeImpl(DBRecoverContext context) {
assert(cfh); assert(cfh);
ColumnFamilyData* cfd = cfh->cfd(); ColumnFamilyData* cfd = cfh->cfd();
const MutableCFOptions& cf_opts = *cfd->GetLatestMutableCFOptions(); const MutableCFOptions& cf_opts = *cfd->GetLatestMutableCFOptions();
s = versions_->LogAndApply(cfd, cf_opts, &edit, &mutex_, s = versions_->LogAndApply(cfd, cf_opts, read_options, &edit, &mutex_,
directories_.GetDbDir()); directories_.GetDbDir());
if (!s.ok()) { if (!s.ok()) {
io_s = versions_->io_status(); io_s = versions_->io_status();
@ -1147,6 +1150,8 @@ FSDirectory* DBImpl::GetDataDir(ColumnFamilyData* cfd, size_t path_id) const {
Status DBImpl::SetOptions( Status DBImpl::SetOptions(
ColumnFamilyHandle* column_family, ColumnFamilyHandle* column_family,
const std::unordered_map<std::string, std::string>& options_map) { const std::unordered_map<std::string, std::string>& options_map) {
// TODO: plumb Env::IOActivity
const ReadOptions read_options;
auto* cfd = auto* cfd =
static_cast_with_check<ColumnFamilyHandleImpl>(column_family)->cfd(); static_cast_with_check<ColumnFamilyHandleImpl>(column_family)->cfd();
if (options_map.empty()) { if (options_map.empty()) {
@ -1168,8 +1173,8 @@ Status DBImpl::SetOptions(
new_options = *cfd->GetLatestMutableCFOptions(); new_options = *cfd->GetLatestMutableCFOptions();
// Append new version to recompute compaction score. // Append new version to recompute compaction score.
VersionEdit dummy_edit; VersionEdit dummy_edit;
s = versions_->LogAndApply(cfd, new_options, &dummy_edit, &mutex_, s = versions_->LogAndApply(cfd, new_options, read_options, &dummy_edit,
directories_.GetDbDir()); &mutex_, directories_.GetDbDir());
// Trigger possible flush/compactions. This has to be before we persist // Trigger possible flush/compactions. This has to be before we persist
// options to file, otherwise there will be a deadlock with writer // options to file, otherwise there will be a deadlock with writer
// thread. // thread.
@ -1507,7 +1512,9 @@ Status DBImpl::SyncWAL() {
} }
if (status.ok() && synced_wals.IsWalAddition()) { if (status.ok() && synced_wals.IsWalAddition()) {
InstrumentedMutexLock l(&mutex_); InstrumentedMutexLock l(&mutex_);
status = ApplyWALToManifest(&synced_wals); // TODO: plumb Env::IOActivity
const ReadOptions read_options;
status = ApplyWALToManifest(read_options, &synced_wals);
} }
TEST_SYNC_POINT("DBImpl::SyncWAL:BeforeMarkLogsSynced:2"); TEST_SYNC_POINT("DBImpl::SyncWAL:BeforeMarkLogsSynced:2");
@ -1515,11 +1522,13 @@ Status DBImpl::SyncWAL() {
return status; return status;
} }
Status DBImpl::ApplyWALToManifest(VersionEdit* synced_wals) { Status DBImpl::ApplyWALToManifest(const ReadOptions& read_options,
VersionEdit* synced_wals) {
// not empty, write to MANIFEST. // not empty, write to MANIFEST.
mutex_.AssertHeld(); mutex_.AssertHeld();
Status status = versions_->LogAndApplyToDefaultColumnFamily( Status status = versions_->LogAndApplyToDefaultColumnFamily(
synced_wals, &mutex_, directories_.GetDbDir()); read_options, synced_wals, &mutex_, directories_.GetDbDir());
if (!status.ok() && versions_->io_status().IsIOError()) { if (!status.ok() && versions_->io_status().IsIOError()) {
status = error_handler_.SetBGError(versions_->io_status(), status = error_handler_.SetBGError(versions_->io_status(),
BackgroundErrorReason::kManifestWrite); BackgroundErrorReason::kManifestWrite);
@ -1936,6 +1945,12 @@ Status DBImpl::GetEntity(const ReadOptions& read_options,
"Cannot call GetEntity without a PinnableWideColumns object"); "Cannot call GetEntity without a PinnableWideColumns object");
} }
if (read_options.io_activity != Env::IOActivity::kUnknown) {
return Status::InvalidArgument(
"Cannot call GetEntity with `ReadOptions::io_activity` != "
"`Env::IOActivity::kUnknown`");
}
columns->Reset(); columns->Reset();
GetImplOptions get_impl_options; GetImplOptions get_impl_options;
@ -1981,6 +1996,12 @@ Status DBImpl::GetImpl(const ReadOptions& read_options, const Slice& key,
assert(get_impl_options.column_family); assert(get_impl_options.column_family);
if (read_options.io_activity != Env::IOActivity::kUnknown) {
return Status::InvalidArgument(
"Cannot call Get with `ReadOptions::io_activity` != "
"`Env::IOActivity::kUnknown`");
}
if (read_options.timestamp) { if (read_options.timestamp) {
const Status s = FailIfTsMismatchCf(get_impl_options.column_family, const Status s = FailIfTsMismatchCf(get_impl_options.column_family,
*(read_options.timestamp), *(read_options.timestamp),
@ -2931,6 +2952,11 @@ Status DBImpl::MultiGetImpl(
autovector<KeyContext*, MultiGetContext::MAX_BATCH_SIZE>* sorted_keys, autovector<KeyContext*, MultiGetContext::MAX_BATCH_SIZE>* sorted_keys,
SuperVersion* super_version, SequenceNumber snapshot, SuperVersion* super_version, SequenceNumber snapshot,
ReadCallback* callback) { ReadCallback* callback) {
if (read_options.io_activity != Env::IOActivity::kUnknown) {
return Status::InvalidArgument(
"Cannot call MultiGet with `ReadOptions::io_activity` != "
"`Env::IOActivity::kUnknown`");
}
PERF_CPU_TIMER_GUARD(get_cpu_nanos, immutable_db_options_.clock); PERF_CPU_TIMER_GUARD(get_cpu_nanos, immutable_db_options_.clock);
StopWatch sw(immutable_db_options_.clock, stats_, DB_MULTIGET); StopWatch sw(immutable_db_options_.clock, stats_, DB_MULTIGET);
@ -3129,6 +3155,8 @@ Status DBImpl::CreateColumnFamilies(
Status DBImpl::CreateColumnFamilyImpl(const ColumnFamilyOptions& cf_options, Status DBImpl::CreateColumnFamilyImpl(const ColumnFamilyOptions& cf_options,
const std::string& column_family_name, const std::string& column_family_name,
ColumnFamilyHandle** handle) { ColumnFamilyHandle** handle) {
// TODO: plumb Env::IOActivity
const ReadOptions read_options;
Status s; Status s;
*handle = nullptr; *handle = nullptr;
@ -3169,9 +3197,9 @@ Status DBImpl::CreateColumnFamilyImpl(const ColumnFamilyOptions& cf_options,
write_thread_.EnterUnbatched(&w, &mutex_); write_thread_.EnterUnbatched(&w, &mutex_);
// LogAndApply will both write the creation in MANIFEST and create // LogAndApply will both write the creation in MANIFEST and create
// ColumnFamilyData object // ColumnFamilyData object
s = versions_->LogAndApply(nullptr, MutableCFOptions(cf_options), &edit, s = versions_->LogAndApply(nullptr, MutableCFOptions(cf_options),
&mutex_, directories_.GetDbDir(), false, read_options, &edit, &mutex_,
&cf_options); directories_.GetDbDir(), false, &cf_options);
write_thread_.ExitUnbatched(&w); write_thread_.ExitUnbatched(&w);
} }
if (s.ok()) { if (s.ok()) {
@ -3250,6 +3278,8 @@ Status DBImpl::DropColumnFamilies(
} }
Status DBImpl::DropColumnFamilyImpl(ColumnFamilyHandle* column_family) { Status DBImpl::DropColumnFamilyImpl(ColumnFamilyHandle* column_family) {
// TODO: plumb Env::IOActivity
const ReadOptions read_options;
auto cfh = static_cast_with_check<ColumnFamilyHandleImpl>(column_family); auto cfh = static_cast_with_check<ColumnFamilyHandleImpl>(column_family);
auto cfd = cfh->cfd(); auto cfd = cfh->cfd();
if (cfd->GetID() == 0) { if (cfd->GetID() == 0) {
@ -3272,8 +3302,9 @@ Status DBImpl::DropColumnFamilyImpl(ColumnFamilyHandle* column_family) {
// we drop column family from a single write thread // we drop column family from a single write thread
WriteThread::Writer w; WriteThread::Writer w;
write_thread_.EnterUnbatched(&w, &mutex_); write_thread_.EnterUnbatched(&w, &mutex_);
s = versions_->LogAndApply(cfd, *cfd->GetLatestMutableCFOptions(), &edit, s = versions_->LogAndApply(cfd, *cfd->GetLatestMutableCFOptions(),
&mutex_, directories_.GetDbDir()); read_options, &edit, &mutex_,
directories_.GetDbDir());
write_thread_.ExitUnbatched(&w); write_thread_.ExitUnbatched(&w);
} }
if (s.ok()) { if (s.ok()) {
@ -3324,10 +3355,13 @@ bool DBImpl::KeyMayExist(const ReadOptions& read_options,
std::string* value, std::string* timestamp, std::string* value, std::string* timestamp,
bool* value_found) { bool* value_found) {
assert(value != nullptr); assert(value != nullptr);
assert(read_options.io_activity == Env::IOActivity::kUnknown);
if (value_found != nullptr) { if (value_found != nullptr) {
// falsify later if key-may-exist but can't fetch value // falsify later if key-may-exist but can't fetch value
*value_found = true; *value_found = true;
} }
// TODO: plumb Env::IOActivity
ReadOptions roptions = read_options; ReadOptions roptions = read_options;
roptions.read_tier = kBlockCacheTier; // read from block cache only roptions.read_tier = kBlockCacheTier; // read from block cache only
PinnableSlice pinnable_val; PinnableSlice pinnable_val;
@ -3356,6 +3390,11 @@ Iterator* DBImpl::NewIterator(const ReadOptions& read_options,
return NewErrorIterator(Status::NotSupported( return NewErrorIterator(Status::NotSupported(
"ReadTier::kPersistedData is not yet supported in iterators.")); "ReadTier::kPersistedData is not yet supported in iterators."));
} }
if (read_options.io_activity != Env::IOActivity::kUnknown) {
return NewErrorIterator(Status::InvalidArgument(
"Cannot call NewIterator with `ReadOptions::io_activity` != "
"`Env::IOActivity::kUnknown`"));
}
assert(column_family); assert(column_family);
@ -3491,6 +3530,11 @@ Status DBImpl::NewIterators(
return Status::NotSupported( return Status::NotSupported(
"ReadTier::kPersistedData is not yet supported in iterators."); "ReadTier::kPersistedData is not yet supported in iterators.");
} }
if (read_options.io_activity != Env::IOActivity::kUnknown) {
return Status::InvalidArgument(
"Cannot call NewIterators with `ReadOptions::io_activity` != "
"`Env::IOActivity::kUnknown`");
}
if (read_options.timestamp) { if (read_options.timestamp) {
for (auto* cf : column_families) { for (auto* cf : column_families) {
@ -3808,7 +3852,9 @@ Status DBImpl::GetPropertiesOfAllTables(ColumnFamilyHandle* column_family,
version->Ref(); version->Ref();
mutex_.Unlock(); mutex_.Unlock();
auto s = version->GetPropertiesOfAllTables(props); // TODO: plumb Env::IOActivity
const ReadOptions read_options;
auto s = version->GetPropertiesOfAllTables(read_options, props);
// Decrement the ref count // Decrement the ref count
mutex_.Lock(); mutex_.Lock();
@ -3830,7 +3876,9 @@ Status DBImpl::GetPropertiesOfTablesInRange(ColumnFamilyHandle* column_family,
version->Ref(); version->Ref();
mutex_.Unlock(); mutex_.Unlock();
auto s = version->GetPropertiesOfTablesInRange(range, n, props); // TODO: plumb Env::IOActivity
const ReadOptions read_options;
auto s = version->GetPropertiesOfTablesInRange(read_options, range, n, props);
// Decrement the ref count // Decrement the ref count
mutex_.Lock(); mutex_.Lock();
@ -4163,6 +4211,8 @@ Status DBImpl::GetApproximateSizes(const SizeApproximationOptions& options,
SuperVersion* sv = GetAndRefSuperVersion(cfd); SuperVersion* sv = GetAndRefSuperVersion(cfd);
v = sv->current; v = sv->current;
// TODO: plumb Env::IOActivity
const ReadOptions read_options;
for (int i = 0; i < n; i++) { for (int i = 0; i < n; i++) {
Slice start = range[i].start; Slice start = range[i].start;
Slice limit = range[i].limit; Slice limit = range[i].limit;
@ -4184,7 +4234,7 @@ Status DBImpl::GetApproximateSizes(const SizeApproximationOptions& options,
sizes[i] = 0; sizes[i] = 0;
if (options.include_files) { if (options.include_files) {
sizes[i] += versions_->ApproximateSize( sizes[i] += versions_->ApproximateSize(
options, v, k1.Encode(), k2.Encode(), /*start_level=*/0, options, read_options, v, k1.Encode(), k2.Encode(), /*start_level=*/0,
/*end_level=*/-1, TableReaderCaller::kUserApproximateSize); /*end_level=*/-1, TableReaderCaller::kUserApproximateSize);
} }
if (options.include_memtables) { if (options.include_memtables) {
@ -4232,6 +4282,8 @@ Status DBImpl::GetUpdatesSince(
} }
Status DBImpl::DeleteFile(std::string name) { Status DBImpl::DeleteFile(std::string name) {
// TODO: plumb Env::IOActivity
const ReadOptions read_options;
uint64_t number; uint64_t number;
FileType type; FileType type;
WalFileType log_type; WalFileType log_type;
@ -4311,7 +4363,8 @@ Status DBImpl::DeleteFile(std::string name) {
edit.SetColumnFamily(cfd->GetID()); edit.SetColumnFamily(cfd->GetID());
edit.DeleteFile(level, number); edit.DeleteFile(level, number);
status = versions_->LogAndApply(cfd, *cfd->GetLatestMutableCFOptions(), status = versions_->LogAndApply(cfd, *cfd->GetLatestMutableCFOptions(),
&edit, &mutex_, directories_.GetDbDir()); read_options, &edit, &mutex_,
directories_.GetDbDir());
if (status.ok()) { if (status.ok()) {
InstallSuperVersionAndScheduleWork(cfd, InstallSuperVersionAndScheduleWork(cfd,
&job_context.superversion_contexts[0], &job_context.superversion_contexts[0],
@ -4333,6 +4386,8 @@ Status DBImpl::DeleteFile(std::string name) {
Status DBImpl::DeleteFilesInRanges(ColumnFamilyHandle* column_family, Status DBImpl::DeleteFilesInRanges(ColumnFamilyHandle* column_family,
const RangePtr* ranges, size_t n, const RangePtr* ranges, size_t n,
bool include_end) { bool include_end) {
// TODO: plumb Env::IOActivity
const ReadOptions read_options;
Status status = Status::OK(); Status status = Status::OK();
auto cfh = static_cast_with_check<ColumnFamilyHandleImpl>(column_family); auto cfh = static_cast_with_check<ColumnFamilyHandleImpl>(column_family);
ColumnFamilyData* cfd = cfh->cfd(); ColumnFamilyData* cfd = cfh->cfd();
@ -4398,7 +4453,8 @@ Status DBImpl::DeleteFilesInRanges(ColumnFamilyHandle* column_family,
} }
input_version->Ref(); input_version->Ref();
status = versions_->LogAndApply(cfd, *cfd->GetLatestMutableCFOptions(), status = versions_->LogAndApply(cfd, *cfd->GetLatestMutableCFOptions(),
&edit, &mutex_, directories_.GetDbDir()); read_options, &edit, &mutex_,
directories_.GetDbDir());
if (status.ok()) { if (status.ok()) {
InstallSuperVersionAndScheduleWork(cfd, InstallSuperVersionAndScheduleWork(cfd,
&job_context.superversion_contexts[0], &job_context.superversion_contexts[0],
@ -5034,6 +5090,7 @@ Status DBImpl::GetLatestSequenceForKey(
MergeContext merge_context; MergeContext merge_context;
SequenceNumber max_covering_tombstone_seq = 0; SequenceNumber max_covering_tombstone_seq = 0;
// TODO: plumb Env::IOActivity
ReadOptions read_options; ReadOptions read_options;
SequenceNumber current_seq = versions_->LastSequence(); SequenceNumber current_seq = versions_->LastSequence();
@ -5189,6 +5246,8 @@ Status DBImpl::IngestExternalFile(
Status DBImpl::IngestExternalFiles( Status DBImpl::IngestExternalFiles(
const std::vector<IngestExternalFileArg>& args) { const std::vector<IngestExternalFileArg>& args) {
// TODO: plumb Env::IOActivity
const ReadOptions read_options;
if (args.empty()) { if (args.empty()) {
return Status::InvalidArgument("ingestion arg list is empty"); return Status::InvalidArgument("ingestion arg list is empty");
} }
@ -5406,9 +5465,9 @@ Status DBImpl::IngestExternalFiles(
} }
assert(0 == num_entries); assert(0 == num_entries);
} }
status = status = versions_->LogAndApply(cfds_to_commit, mutable_cf_options_list,
versions_->LogAndApply(cfds_to_commit, mutable_cf_options_list, read_options, edit_lists, &mutex_,
edit_lists, &mutex_, directories_.GetDbDir()); directories_.GetDbDir());
// It is safe to update VersionSet last seqno here after LogAndApply since // It is safe to update VersionSet last seqno here after LogAndApply since
// LogAndApply persists last sequence number from VersionEdits, // LogAndApply persists last sequence number from VersionEdits,
// which are from file's largest seqno and not from VersionSet. // which are from file's largest seqno and not from VersionSet.
@ -5509,6 +5568,8 @@ Status DBImpl::CreateColumnFamilyWithImport(
const ExportImportFilesMetaData& metadata, ColumnFamilyHandle** handle) { const ExportImportFilesMetaData& metadata, ColumnFamilyHandle** handle) {
assert(handle != nullptr); assert(handle != nullptr);
assert(*handle == nullptr); assert(*handle == nullptr);
// TODO: plumb Env::IOActivity
const ReadOptions read_options;
std::string cf_comparator_name = options.comparator->Name(); std::string cf_comparator_name = options.comparator->Name();
if (cf_comparator_name != metadata.db_comparator_name) { if (cf_comparator_name != metadata.db_comparator_name) {
return Status::InvalidArgument("Comparator name mismatch"); return Status::InvalidArgument("Comparator name mismatch");
@ -5550,8 +5611,9 @@ Status DBImpl::CreateColumnFamilyWithImport(
// file, we have to make sure the file number will never being reused. // file, we have to make sure the file number will never being reused.
next_file_number = versions_->FetchAddFileNumber(metadata.files.size()); next_file_number = versions_->FetchAddFileNumber(metadata.files.size());
auto cf_options = cfd->GetLatestMutableCFOptions(); auto cf_options = cfd->GetLatestMutableCFOptions();
status = versions_->LogAndApply(cfd, *cf_options, &dummy_edit, &mutex_, status =
directories_.GetDbDir()); versions_->LogAndApply(cfd, *cf_options, read_options, &dummy_edit,
&mutex_, directories_.GetDbDir());
if (status.ok()) { if (status.ok()) {
InstallSuperVersionAndScheduleWork(cfd, &dummy_sv_ctx, *cf_options); InstallSuperVersionAndScheduleWork(cfd, &dummy_sv_ctx, *cf_options);
} }
@ -5587,8 +5649,9 @@ Status DBImpl::CreateColumnFamilyWithImport(
// Install job edit [Mutex will be unlocked here] // Install job edit [Mutex will be unlocked here]
if (status.ok()) { if (status.ok()) {
auto cf_options = cfd->GetLatestMutableCFOptions(); auto cf_options = cfd->GetLatestMutableCFOptions();
status = versions_->LogAndApply(cfd, *cf_options, import_job.edit(), status = versions_->LogAndApply(cfd, *cf_options, read_options,
&mutex_, directories_.GetDbDir()); import_job.edit(), &mutex_,
directories_.GetDbDir());
if (status.ok()) { if (status.ok()) {
InstallSuperVersionAndScheduleWork(cfd, &sv_context, *cf_options); InstallSuperVersionAndScheduleWork(cfd, &sv_context, *cf_options);
} }
@ -5648,6 +5711,12 @@ Status DBImpl::VerifyChecksumInternal(const ReadOptions& read_options,
Status s; Status s;
if (read_options.io_activity != Env::IOActivity::kUnknown) {
s = Status::InvalidArgument(
"Cannot verify file checksum with `ReadOptions::io_activity` != "
"`Env::IOActivity::kUnknown`");
return s;
}
if (use_file_checksum) { if (use_file_checksum) {
FileChecksumGenFactory* const file_checksum_gen_factory = FileChecksumGenFactory* const file_checksum_gen_factory =
immutable_db_options_.file_checksum_gen_factory.get(); immutable_db_options_.file_checksum_gen_factory.get();
@ -5761,6 +5830,12 @@ Status DBImpl::VerifyFullFileChecksum(const std::string& file_checksum_expected,
const std::string& func_name_expected, const std::string& func_name_expected,
const std::string& fname, const std::string& fname,
const ReadOptions& read_options) { const ReadOptions& read_options) {
if (read_options.io_activity != Env::IOActivity::kUnknown) {
return Status::InvalidArgument(
"Cannot call VerifyChecksum with `ReadOptions::io_activity` != "
"`Env::IOActivity::kUnknown`");
}
Status s; Status s;
if (file_checksum_expected == kUnknownFileChecksum) { if (file_checksum_expected == kUnknownFileChecksum) {
return s; return s;
@ -5893,6 +5968,8 @@ Status DBImpl::ReserveFileNumbersBeforeIngestion(
ColumnFamilyData* cfd, uint64_t num, ColumnFamilyData* cfd, uint64_t num,
std::unique_ptr<std::list<uint64_t>::iterator>& pending_output_elem, std::unique_ptr<std::list<uint64_t>::iterator>& pending_output_elem,
uint64_t* next_file_number) { uint64_t* next_file_number) {
// TODO: plumb Env::IOActivity
const ReadOptions read_options;
Status s; Status s;
SuperVersionContext dummy_sv_ctx(true /* create_superversion */); SuperVersionContext dummy_sv_ctx(true /* create_superversion */);
assert(nullptr != next_file_number); assert(nullptr != next_file_number);
@ -5910,8 +5987,8 @@ Status DBImpl::ReserveFileNumbersBeforeIngestion(
// reuse the file number that has already assigned to the internal file, // reuse the file number that has already assigned to the internal file,
// and this will overwrite the external file. To protect the external // and this will overwrite the external file. To protect the external
// file, we have to make sure the file number will never being reused. // file, we have to make sure the file number will never being reused.
s = versions_->LogAndApply(cfd, *cf_options, &dummy_edit, &mutex_, s = versions_->LogAndApply(cfd, *cf_options, read_options, &dummy_edit,
directories_.GetDbDir()); &mutex_, directories_.GetDbDir());
if (s.ok()) { if (s.ok()) {
InstallSuperVersionAndScheduleWork(cfd, &dummy_sv_ctx, *cf_options); InstallSuperVersionAndScheduleWork(cfd, &dummy_sv_ctx, *cf_options);
} }

@ -2127,7 +2127,7 @@ class DBImpl : public DB {
// helper function to call after some of the logs_ were synced // helper function to call after some of the logs_ were synced
void MarkLogsSynced(uint64_t up_to, bool synced_dir, VersionEdit* edit); void MarkLogsSynced(uint64_t up_to, bool synced_dir, VersionEdit* edit);
Status ApplyWALToManifest(VersionEdit* edit); Status ApplyWALToManifest(const ReadOptions& read_options, VersionEdit* edit);
// WALs with log number up to up_to are not synced successfully. // WALs with log number up to up_to are not synced successfully.
void MarkLogsNotSynced(uint64_t up_to); void MarkLogsNotSynced(uint64_t up_to);

@ -229,7 +229,9 @@ Status DBImpl::FlushMemTableToOutputFile(
log_io_s = SyncClosedLogs(job_context, &synced_wals); log_io_s = SyncClosedLogs(job_context, &synced_wals);
mutex_.Lock(); mutex_.Lock();
if (log_io_s.ok() && synced_wals.IsWalAddition()) { if (log_io_s.ok() && synced_wals.IsWalAddition()) {
log_io_s = status_to_io_status(ApplyWALToManifest(&synced_wals)); const ReadOptions read_options(Env::IOActivity::kFlush);
log_io_s =
status_to_io_status(ApplyWALToManifest(read_options, &synced_wals));
TEST_SYNC_POINT_CALLBACK("DBImpl::FlushMemTableToOutputFile:CommitWal:1", TEST_SYNC_POINT_CALLBACK("DBImpl::FlushMemTableToOutputFile:CommitWal:1",
nullptr); nullptr);
} }
@ -492,7 +494,9 @@ Status DBImpl::AtomicFlushMemTablesToOutputFiles(
log_io_s = SyncClosedLogs(job_context, &synced_wals); log_io_s = SyncClosedLogs(job_context, &synced_wals);
mutex_.Lock(); mutex_.Lock();
if (log_io_s.ok() && synced_wals.IsWalAddition()) { if (log_io_s.ok() && synced_wals.IsWalAddition()) {
log_io_s = status_to_io_status(ApplyWALToManifest(&synced_wals)); const ReadOptions read_options(Env::IOActivity::kFlush);
log_io_s =
status_to_io_status(ApplyWALToManifest(read_options, &synced_wals));
} }
if (!log_io_s.ok() && !log_io_s.IsShutdownInProgress() && if (!log_io_s.ok() && !log_io_s.IsShutdownInProgress() &&
@ -956,6 +960,9 @@ Status DBImpl::IncreaseFullHistoryTsLowImpl(ColumnFamilyData* cfd,
VersionEdit edit; VersionEdit edit;
edit.SetColumnFamily(cfd->GetID()); edit.SetColumnFamily(cfd->GetID());
edit.SetFullHistoryTsLow(ts_low); edit.SetFullHistoryTsLow(ts_low);
// TODO: plumb Env::IOActivity
const ReadOptions read_options;
TEST_SYNC_POINT_CALLBACK("DBImpl::IncreaseFullHistoryTsLowImpl:BeforeEdit", TEST_SYNC_POINT_CALLBACK("DBImpl::IncreaseFullHistoryTsLowImpl:BeforeEdit",
&edit); &edit);
@ -969,7 +976,8 @@ Status DBImpl::IncreaseFullHistoryTsLowImpl(ColumnFamilyData* cfd,
} }
Status s = versions_->LogAndApply(cfd, *cfd->GetLatestMutableCFOptions(), Status s = versions_->LogAndApply(cfd, *cfd->GetLatestMutableCFOptions(),
&edit, &mutex_, directories_.GetDbDir()); read_options, &edit, &mutex_,
directories_.GetDbDir());
if (!s.ok()) { if (!s.ok()) {
return s; return s;
} }
@ -1080,6 +1088,7 @@ Status DBImpl::CompactRangeInternal(const CompactRangeOptions& options,
ReadOptions ro; ReadOptions ro;
ro.total_order_seek = true; ro.total_order_seek = true;
ro.io_activity = Env::IOActivity::kCompaction;
bool overlap; bool overlap;
for (int level = 0; for (int level = 0;
level < current_version->storage_info()->num_non_empty_levels(); level < current_version->storage_info()->num_non_empty_levels();
@ -1639,6 +1648,8 @@ Status DBImpl::ReFitLevel(ColumnFamilyData* cfd, int level, int target_level) {
return Status::InvalidArgument("Target level exceeds number of levels"); return Status::InvalidArgument("Target level exceeds number of levels");
} }
const ReadOptions read_options(Env::IOActivity::kCompaction);
SuperVersionContext sv_context(/* create_superversion */ true); SuperVersionContext sv_context(/* create_superversion */ true);
InstrumentedMutexLock guard_lock(&mutex_); InstrumentedMutexLock guard_lock(&mutex_);
@ -1753,8 +1764,9 @@ Status DBImpl::ReFitLevel(ColumnFamilyData* cfd, int level, int target_level) {
"[%s] Apply version edit:\n%s", cfd->GetName().c_str(), "[%s] Apply version edit:\n%s", cfd->GetName().c_str(),
edit.DebugString().data()); edit.DebugString().data());
Status status = versions_->LogAndApply(cfd, mutable_cf_options, &edit, Status status =
&mutex_, directories_.GetDbDir()); versions_->LogAndApply(cfd, mutable_cf_options, read_options, &edit,
&mutex_, directories_.GetDbDir());
cfd->compaction_picker()->UnregisterCompaction(c.get()); cfd->compaction_picker()->UnregisterCompaction(c.get());
c.reset(); c.reset();
@ -3189,6 +3201,8 @@ Status DBImpl::BackgroundCompaction(bool* made_progress,
mutex_.AssertHeld(); mutex_.AssertHeld();
TEST_SYNC_POINT("DBImpl::BackgroundCompaction:Start"); TEST_SYNC_POINT("DBImpl::BackgroundCompaction:Start");
const ReadOptions read_options(Env::IOActivity::kCompaction);
bool is_manual = (manual_compaction != nullptr); bool is_manual = (manual_compaction != nullptr);
std::unique_ptr<Compaction> c; std::unique_ptr<Compaction> c;
if (prepicked_compaction != nullptr && if (prepicked_compaction != nullptr &&
@ -3399,9 +3413,9 @@ Status DBImpl::BackgroundCompaction(bool* made_progress,
for (const auto& f : *c->inputs(0)) { for (const auto& f : *c->inputs(0)) {
c->edit()->DeleteFile(c->level(), f->fd.GetNumber()); c->edit()->DeleteFile(c->level(), f->fd.GetNumber());
} }
status = versions_->LogAndApply(c->column_family_data(), status = versions_->LogAndApply(
*c->mutable_cf_options(), c->edit(), c->column_family_data(), *c->mutable_cf_options(), read_options,
&mutex_, directories_.GetDbDir()); c->edit(), &mutex_, directories_.GetDbDir());
io_s = versions_->io_status(); io_s = versions_->io_status();
InstallSuperVersionAndScheduleWork(c->column_family_data(), InstallSuperVersionAndScheduleWork(c->column_family_data(),
&job_context->superversion_contexts[0], &job_context->superversion_contexts[0],
@ -3418,9 +3432,7 @@ Status DBImpl::BackgroundCompaction(bool* made_progress,
c->column_family_data()); c->column_family_data());
// Instrument for event update // Instrument for event update
// TODO(yhchiang): add op details for showing trivial-move. // TODO(yhchiang): add op details for showing trivial-move.
ThreadStatusUtil::SetColumnFamily( ThreadStatusUtil::SetColumnFamily(c->column_family_data());
c->column_family_data(), c->column_family_data()->ioptions()->env,
immutable_db_options_.enable_thread_tracking);
ThreadStatusUtil::SetThreadOperation(ThreadStatus::OP_COMPACTION); ThreadStatusUtil::SetThreadOperation(ThreadStatus::OP_COMPACTION);
compaction_job_stats.num_input_files = c->num_input_files(0); compaction_job_stats.num_input_files = c->num_input_files(0);
@ -3466,9 +3478,9 @@ Status DBImpl::BackgroundCompaction(bool* made_progress,
vstorage->GetNextCompactCursor(start_level, c->num_input_files(0))); vstorage->GetNextCompactCursor(start_level, c->num_input_files(0)));
} }
} }
status = versions_->LogAndApply(c->column_family_data(), status = versions_->LogAndApply(
*c->mutable_cf_options(), c->edit(), c->column_family_data(), *c->mutable_cf_options(), read_options,
&mutex_, directories_.GetDbDir()); c->edit(), &mutex_, directories_.GetDbDir());
io_s = versions_->io_status(); io_s = versions_->io_status();
// Use latest MutableCFOptions // Use latest MutableCFOptions
InstallSuperVersionAndScheduleWork(c->column_family_data(), InstallSuperVersionAndScheduleWork(c->column_family_data(),
@ -3782,6 +3794,8 @@ void DBImpl::BuildCompactionJobInfo(
compaction_job_info->table_properties = c->GetOutputTableProperties(); compaction_job_info->table_properties = c->GetOutputTableProperties();
compaction_job_info->compaction_reason = c->compaction_reason(); compaction_job_info->compaction_reason = c->compaction_reason();
compaction_job_info->compression = c->output_compression(); compaction_job_info->compression = c->output_compression();
const ReadOptions read_options(Env::IOActivity::kCompaction);
for (size_t i = 0; i < c->num_input_levels(); ++i) { for (size_t i = 0; i < c->num_input_levels(); ++i) {
for (const auto fmd : *c->inputs(i)) { for (const auto fmd : *c->inputs(i)) {
const FileDescriptor& desc = fmd->fd; const FileDescriptor& desc = fmd->fd;
@ -3793,7 +3807,7 @@ void DBImpl::BuildCompactionJobInfo(
static_cast<int>(i), file_number, fmd->oldest_blob_file_number}); static_cast<int>(i), file_number, fmd->oldest_blob_file_number});
if (compaction_job_info->table_properties.count(fn) == 0) { if (compaction_job_info->table_properties.count(fn) == 0) {
std::shared_ptr<const TableProperties> tp; std::shared_ptr<const TableProperties> tp;
auto s = current->GetTableProperties(&tp, fmd, &fn); auto s = current->GetTableProperties(read_options, &tp, fmd, &fn);
if (s.ok()) { if (s.ok()) {
compaction_job_info->table_properties[fn] = tp; compaction_job_info->table_properties[fn] = tp;
} }

@ -61,7 +61,8 @@ Status DBImpl::PromoteL0(ColumnFamilyHandle* column_family, int target_level) {
"PromoteL0 FAILED. Invalid target level %d\n", target_level); "PromoteL0 FAILED. Invalid target level %d\n", target_level);
return Status::InvalidArgument("Invalid target level"); return Status::InvalidArgument("Invalid target level");
} }
// TODO: plumb Env::IOActivity
const ReadOptions read_options;
Status status; Status status;
VersionEdit edit; VersionEdit edit;
JobContext job_context(next_job_id_.fetch_add(1), true); JobContext job_context(next_job_id_.fetch_add(1), true);
@ -141,7 +142,8 @@ Status DBImpl::PromoteL0(ColumnFamilyHandle* column_family, int target_level) {
} }
status = versions_->LogAndApply(cfd, *cfd->GetLatestMutableCFOptions(), status = versions_->LogAndApply(cfd, *cfd->GetLatestMutableCFOptions(),
&edit, &mutex_, directories_.GetDbDir()); read_options, &edit, &mutex_,
directories_.GetDbDir());
if (status.ok()) { if (status.ok()) {
InstallSuperVersionAndScheduleWork(cfd, InstallSuperVersionAndScheduleWork(cfd,
&job_context.superversion_contexts[0], &job_context.superversion_contexts[0],

@ -19,6 +19,7 @@
#include "file/writable_file_writer.h" #include "file/writable_file_writer.h"
#include "logging/logging.h" #include "logging/logging.h"
#include "monitoring/persistent_stats_history.h" #include "monitoring/persistent_stats_history.h"
#include "monitoring/thread_status_util.h"
#include "options/options_helper.h" #include "options/options_helper.h"
#include "rocksdb/table.h" #include "rocksdb/table.h"
#include "rocksdb/wal_filter.h" #include "rocksdb/wal_filter.h"
@ -923,8 +924,9 @@ Status DBImpl::InitPersistStatsColumnFamily() {
Status DBImpl::LogAndApplyForRecovery(const RecoveryContext& recovery_ctx) { Status DBImpl::LogAndApplyForRecovery(const RecoveryContext& recovery_ctx) {
mutex_.AssertHeld(); mutex_.AssertHeld();
assert(versions_->descriptor_log_ == nullptr); assert(versions_->descriptor_log_ == nullptr);
const ReadOptions read_options(Env::IOActivity::kDBOpen);
Status s = versions_->LogAndApply( Status s = versions_->LogAndApply(
recovery_ctx.cfds_, recovery_ctx.mutable_cf_opts_, recovery_ctx.cfds_, recovery_ctx.mutable_cf_opts_, read_options,
recovery_ctx.edit_lists_, &mutex_, directories_.GetDbDir()); recovery_ctx.edit_lists_, &mutex_, directories_.GetDbDir());
if (s.ok() && !(recovery_ctx.files_to_delete_.empty())) { if (s.ok() && !(recovery_ctx.files_to_delete_.empty())) {
mutex_.Unlock(); mutex_.Unlock();
@ -1577,6 +1579,7 @@ Status DBImpl::WriteLevel0TableForRecovery(int job_id, ColumnFamilyData* cfd,
meta.fd = FileDescriptor(versions_->NewFileNumber(), 0, 0); meta.fd = FileDescriptor(versions_->NewFileNumber(), 0, 0);
ReadOptions ro; ReadOptions ro;
ro.total_order_seek = true; ro.total_order_seek = true;
ro.io_activity = Env::IOActivity::kDBOpen;
Arena arena; Arena arena;
Status s; Status s;
TableProperties table_properties; TableProperties table_properties;
@ -1635,10 +1638,11 @@ Status DBImpl::WriteLevel0TableForRecovery(int job_id, ColumnFamilyData* cfd,
SeqnoToTimeMapping empty_seqno_time_mapping; SeqnoToTimeMapping empty_seqno_time_mapping;
Version* version = cfd->current(); Version* version = cfd->current();
version->Ref(); version->Ref();
const ReadOptions read_option(Env::IOActivity::kDBOpen);
s = BuildTable( s = BuildTable(
dbname_, versions_.get(), immutable_db_options_, tboptions, dbname_, versions_.get(), immutable_db_options_, tboptions,
file_options_for_compaction_, cfd->table_cache(), iter.get(), file_options_for_compaction_, read_option, cfd->table_cache(),
std::move(range_del_iters), &meta, &blob_file_additions, iter.get(), std::move(range_del_iters), &meta, &blob_file_additions,
snapshot_seqs, earliest_write_conflict_snapshot, kMaxSequenceNumber, snapshot_seqs, earliest_write_conflict_snapshot, kMaxSequenceNumber,
snapshot_checker, paranoid_file_checks, cfd->internal_stats(), &io_s, snapshot_checker, paranoid_file_checks, cfd->internal_stats(), &io_s,
io_tracer_, BlobFileCreationReason::kRecovery, io_tracer_, BlobFileCreationReason::kRecovery,
@ -1739,8 +1743,12 @@ Status DB::Open(const DBOptions& db_options, const std::string& dbname,
std::vector<ColumnFamilyHandle*>* handles, DB** dbptr) { std::vector<ColumnFamilyHandle*>* handles, DB** dbptr) {
const bool kSeqPerBatch = true; const bool kSeqPerBatch = true;
const bool kBatchPerTxn = true; const bool kBatchPerTxn = true;
return DBImpl::Open(db_options, dbname, column_families, handles, dbptr, ThreadStatusUtil::SetEnableTracking(db_options.enable_thread_tracking);
!kSeqPerBatch, kBatchPerTxn); ThreadStatusUtil::SetThreadOperation(ThreadStatus::OperationType::OP_DBOPEN);
Status s = DBImpl::Open(db_options, dbname, column_families, handles, dbptr,
!kSeqPerBatch, kBatchPerTxn);
ThreadStatusUtil::ResetThreadStatus();
return s;
} }
// TODO: Implement the trimming in flush code path. // TODO: Implement the trimming in flush code path.

@ -40,6 +40,11 @@ Status DBImplReadOnly::Get(const ReadOptions& read_options,
ColumnFamilyHandle* column_family, const Slice& key, ColumnFamilyHandle* column_family, const Slice& key,
PinnableSlice* pinnable_val, PinnableSlice* pinnable_val,
std::string* timestamp) { std::string* timestamp) {
if (read_options.io_activity != Env::IOActivity::kUnknown) {
return Status::InvalidArgument(
"Cannot call Get with `ReadOptions::io_activity` != "
"`Env::IOActivity::kUnknown`");
}
assert(pinnable_val != nullptr); assert(pinnable_val != nullptr);
// TODO: stopwatch DB_GET needed?, perf timer needed? // TODO: stopwatch DB_GET needed?, perf timer needed?
PERF_TIMER_GUARD(get_snapshot_time); PERF_TIMER_GUARD(get_snapshot_time);
@ -112,6 +117,11 @@ Status DBImplReadOnly::Get(const ReadOptions& read_options,
Iterator* DBImplReadOnly::NewIterator(const ReadOptions& read_options, Iterator* DBImplReadOnly::NewIterator(const ReadOptions& read_options,
ColumnFamilyHandle* column_family) { ColumnFamilyHandle* column_family) {
if (read_options.io_activity != Env::IOActivity::kUnknown) {
return NewErrorIterator(Status::InvalidArgument(
"Cannot call NewIterator with `ReadOptions::io_activity` != "
"`Env::IOActivity::kUnknown`"));
}
assert(column_family); assert(column_family);
if (read_options.timestamp) { if (read_options.timestamp) {
const Status s = FailIfTsMismatchCf( const Status s = FailIfTsMismatchCf(

@ -345,6 +345,11 @@ Status DBImplSecondary::GetImpl(const ReadOptions& read_options,
ColumnFamilyHandle* column_family, ColumnFamilyHandle* column_family,
const Slice& key, PinnableSlice* pinnable_val, const Slice& key, PinnableSlice* pinnable_val,
std::string* timestamp) { std::string* timestamp) {
if (read_options.io_activity != Env::IOActivity::kUnknown) {
return Status::InvalidArgument(
"Cannot call Get with `ReadOptions::io_activity` != "
"`Env::IOActivity::kUnknown`");
}
assert(pinnable_val != nullptr); assert(pinnable_val != nullptr);
PERF_CPU_TIMER_GUARD(get_cpu_nanos, immutable_db_options_.clock); PERF_CPU_TIMER_GUARD(get_cpu_nanos, immutable_db_options_.clock);
StopWatch sw(immutable_db_options_.clock, stats_, DB_GET); StopWatch sw(immutable_db_options_.clock, stats_, DB_GET);
@ -445,6 +450,11 @@ Iterator* DBImplSecondary::NewIterator(const ReadOptions& read_options,
return NewErrorIterator(Status::NotSupported( return NewErrorIterator(Status::NotSupported(
"ReadTier::kPersistedData is not yet supported in iterators.")); "ReadTier::kPersistedData is not yet supported in iterators."));
} }
if (read_options.io_activity != Env::IOActivity::kUnknown) {
return NewErrorIterator(Status::InvalidArgument(
"Cannot call NewIterator with `ReadOptions::io_activity` != "
"`Env::IOActivity::kUnknown`"));
}
assert(column_family); assert(column_family);
if (read_options.timestamp) { if (read_options.timestamp) {
@ -511,6 +521,11 @@ Status DBImplSecondary::NewIterators(
return Status::NotSupported( return Status::NotSupported(
"ReadTier::kPersistedData is not yet supported in iterators."); "ReadTier::kPersistedData is not yet supported in iterators.");
} }
if (read_options.io_activity != Env::IOActivity::kUnknown) {
return Status::InvalidArgument(
"Cannot call NewIterators with `ReadOptions::io_activity` != "
"`Env::IOActivity::kUnknown`");
}
ReadCallback* read_callback = nullptr; // No read callback provided. ReadCallback* read_callback = nullptr; // No read callback provided.
if (iterators == nullptr) { if (iterators == nullptr) {
return Status::InvalidArgument("iterators not allowed to be nullptr"); return Status::InvalidArgument("iterators not allowed to be nullptr");

@ -606,7 +606,9 @@ Status DBImpl::WriteImpl(const WriteOptions& write_options,
log_write_mutex_.Unlock(); log_write_mutex_.Unlock();
if (status.ok() && synced_wals.IsWalAddition()) { if (status.ok() && synced_wals.IsWalAddition()) {
InstrumentedMutexLock l(&mutex_); InstrumentedMutexLock l(&mutex_);
status = ApplyWALToManifest(&synced_wals); // TODO: plumb Env::IOActivity
const ReadOptions read_options;
status = ApplyWALToManifest(read_options, &synced_wals);
} }
// Requesting sync with two_write_queues_ is expected to be very rare. We // Requesting sync with two_write_queues_ is expected to be very rare. We
@ -767,7 +769,9 @@ Status DBImpl::PipelinedWriteImpl(const WriteOptions& write_options,
} }
if (w.status.ok() && synced_wals.IsWalAddition()) { if (w.status.ok() && synced_wals.IsWalAddition()) {
InstrumentedMutexLock l(&mutex_); InstrumentedMutexLock l(&mutex_);
w.status = ApplyWALToManifest(&synced_wals); // TODO: plumb Env::IOActivity
const ReadOptions read_options;
w.status = ApplyWALToManifest(read_options, &synced_wals);
} }
write_thread_.ExitAsBatchGroupLeader(wal_write_group, w.status); write_thread_.ExitAsBatchGroupLeader(wal_write_group, w.status);
} }
@ -1805,7 +1809,7 @@ Status DBImpl::DelayWrite(uint64_t num_bytes, WriteThread& write_thread,
bool delayed = false; bool delayed = false;
{ {
StopWatch sw(immutable_db_options_.clock, stats_, WRITE_STALL, StopWatch sw(immutable_db_options_.clock, stats_, WRITE_STALL,
&time_delayed); Histograms::HISTOGRAM_ENUM_MAX, &time_delayed);
// To avoid parallel timed delays (bad throttling), only support them // To avoid parallel timed delays (bad throttling), only support them
// on the primary write queue. // on the primary write queue.
uint64_t delay; uint64_t delay;
@ -2086,6 +2090,8 @@ void DBImpl::NotifyOnMemTableSealed(ColumnFamilyData* /*cfd*/,
// two_write_queues_ is true (This is to simplify the reasoning.) // two_write_queues_ is true (This is to simplify the reasoning.)
Status DBImpl::SwitchMemtable(ColumnFamilyData* cfd, WriteContext* context) { Status DBImpl::SwitchMemtable(ColumnFamilyData* cfd, WriteContext* context) {
mutex_.AssertHeld(); mutex_.AssertHeld();
// TODO: plumb Env::IOActivity
const ReadOptions read_options;
log::Writer* new_log = nullptr; log::Writer* new_log = nullptr;
MemTable* new_mem = nullptr; MemTable* new_mem = nullptr;
IOStatus io_s; IOStatus io_s;
@ -2237,8 +2243,8 @@ Status DBImpl::SwitchMemtable(ColumnFamilyData* cfd, WriteContext* context) {
VersionEdit wal_deletion; VersionEdit wal_deletion;
wal_deletion.DeleteWalsBefore(min_wal_number_to_keep); wal_deletion.DeleteWalsBefore(min_wal_number_to_keep);
s = versions_->LogAndApplyToDefaultColumnFamily(&wal_deletion, &mutex_, s = versions_->LogAndApplyToDefaultColumnFamily(
directories_.GetDbDir()); read_options, &wal_deletion, &mutex_, directories_.GetDbDir());
if (!s.ok() && versions_->io_status().IsIOError()) { if (!s.ok() && versions_->io_status().IsIOError()) {
s = error_handler_.SetBGError(versions_->io_status(), s = error_handler_.SetBGError(versions_->io_status(),
BackgroundErrorReason::kManifestWrite); BackgroundErrorReason::kManifestWrite);

@ -196,6 +196,7 @@ bool DBIter::SetBlobValueIfNeeded(const Slice& user_key,
// TODO: consider moving ReadOptions from ArenaWrappedDBIter to DBIter to // TODO: consider moving ReadOptions from ArenaWrappedDBIter to DBIter to
// avoid having to copy options back and forth. // avoid having to copy options back and forth.
// TODO: plumb Env::IOActivity
ReadOptions read_options; ReadOptions read_options;
read_options.read_tier = read_tier_; read_options.read_tier = read_tier_;
read_options.fill_cache = fill_cache_; read_options.fill_cache = fill_cache_;

@ -2333,8 +2333,9 @@ TEST_F(DBPropertiesTest, TableMetaIndexKeys) {
// Read metaindex // Read metaindex
BlockContents bc; BlockContents bc;
ASSERT_OK(ReadMetaIndexBlockInFile(r.get(), file_size, 0U, const ReadOptions read_options;
ImmutableOptions(options), &bc)); ASSERT_OK(ReadMetaIndexBlockInFile(
r.get(), file_size, 0U, ImmutableOptions(options), read_options, &bc));
Block metaindex_block(std::move(bc)); Block metaindex_block(std::move(bc));
std::unique_ptr<InternalIterator> meta_iter; std::unique_ptr<InternalIterator> meta_iter;
meta_iter.reset(metaindex_block.NewMetaIterator()); meta_iter.reset(metaindex_block.NewMetaIterator());

@ -235,8 +235,9 @@ TEST_P(DBRateLimiterOnReadTest, VerifyChecksum) {
ASSERT_EQ(0, options_.rate_limiter->GetTotalRequests(Env::IO_USER)); ASSERT_EQ(0, options_.rate_limiter->GetTotalRequests(Env::IO_USER));
ASSERT_OK(db_->VerifyChecksum(GetReadOptions())); ASSERT_OK(db_->VerifyChecksum(GetReadOptions()));
// The files are tiny so there should have just been one read per file. // There are 3 reads per file: ReadMetaIndexBlock,
int expected = kNumFiles; // VerifyChecksumInMetaBlocks, VerifyChecksumInBlocks
int expected = kNumFiles * 3;
ASSERT_EQ(expected, options_.rate_limiter->GetTotalRequests(Env::IO_USER)); ASSERT_EQ(expected, options_.rate_limiter->GetTotalRequests(Env::IO_USER));
} }

@ -38,6 +38,8 @@ Status UpdateManifestForFilesState(
const DBOptions& db_opts, const std::string& db_name, const DBOptions& db_opts, const std::string& db_name,
const std::vector<ColumnFamilyDescriptor>& column_families, const std::vector<ColumnFamilyDescriptor>& column_families,
const UpdateManifestForFilesStateOptions& opts) { const UpdateManifestForFilesStateOptions& opts) {
// TODO: plumb Env::IOActivity
const ReadOptions read_options;
OfflineManifestWriter w(db_opts, db_name); OfflineManifestWriter w(db_opts, db_name);
Status s = w.Recover(column_families); Status s = w.Recover(column_families);
@ -114,7 +116,7 @@ Status UpdateManifestForFilesState(
std::unique_ptr<FSDirectory> db_dir; std::unique_ptr<FSDirectory> db_dir;
s = fs->NewDirectory(db_name, IOOptions(), &db_dir, nullptr); s = fs->NewDirectory(db_name, IOOptions(), &db_dir, nullptr);
if (s.ok()) { if (s.ok()) {
s = w.LogAndApply(cfd, &edit, db_dir.get()); s = w.LogAndApply(read_options, cfd, &edit, db_dir.get());
} }
if (s.ok()) { if (s.ok()) {
++cfs_updated; ++cfs_updated;

@ -692,6 +692,7 @@ Status ExternalSstFileIngestionJob::GetIngestedFileInfo(
// If customized readahead size is needed, we can pass a user option // If customized readahead size is needed, we can pass a user option
// all the way to here. Right now we just rely on the default readahead // all the way to here. Right now we just rely on the default readahead
// to keep things simple. // to keep things simple.
// TODO: plumb Env::IOActivity
ReadOptions ro; ReadOptions ro;
ro.readahead_size = ingestion_options_.verify_checksums_readahead_size; ro.readahead_size = ingestion_options_.verify_checksums_readahead_size;
status = table_reader->VerifyChecksum( status = table_reader->VerifyChecksum(
@ -745,6 +746,7 @@ Status ExternalSstFileIngestionJob::GetIngestedFileInfo(
file_to_ingest->num_range_deletions = props->num_range_deletions; file_to_ingest->num_range_deletions = props->num_range_deletions;
ParsedInternalKey key; ParsedInternalKey key;
// TODO: plumb Env::IOActivity
ReadOptions ro; ReadOptions ro;
std::unique_ptr<InternalIterator> iter(table_reader->NewIterator( std::unique_ptr<InternalIterator> iter(table_reader->NewIterator(
ro, sv->mutable_cf_options.prefix_extractor.get(), /*arena=*/nullptr, ro, sv->mutable_cf_options.prefix_extractor.get(), /*arena=*/nullptr,
@ -855,6 +857,7 @@ Status ExternalSstFileIngestionJob::AssignLevelAndSeqnoForIngestedFile(
bool overlap_with_db = false; bool overlap_with_db = false;
Arena arena; Arena arena;
// TODO: plumb Env::IOActivity
ReadOptions ro; ReadOptions ro;
ro.total_order_seek = true; ro.total_order_seek = true;
int target_level = 0; int target_level = 0;
@ -1088,4 +1091,3 @@ Status ExternalSstFileIngestionJob::SyncIngestedFile(TWritableFile* file) {
} }
} // namespace ROCKSDB_NAMESPACE } // namespace ROCKSDB_NAMESPACE

@ -141,11 +141,12 @@ FlushJob::FlushJob(
FlushJob::~FlushJob() { ThreadStatusUtil::ResetThreadStatus(); } FlushJob::~FlushJob() { ThreadStatusUtil::ResetThreadStatus(); }
void FlushJob::ReportStartedFlush() { void FlushJob::ReportStartedFlush() {
ThreadStatusUtil::SetColumnFamily(cfd_, cfd_->ioptions()->env, ThreadStatusUtil::SetEnableTracking(db_options_.enable_thread_tracking);
db_options_.enable_thread_tracking); ThreadStatusUtil::SetColumnFamily(cfd_);
ThreadStatusUtil::SetThreadOperation(ThreadStatus::OP_FLUSH); ThreadStatusUtil::SetThreadOperation(ThreadStatus::OP_FLUSH);
ThreadStatusUtil::SetThreadOperationProperty(ThreadStatus::COMPACTION_JOB_ID, ThreadStatusUtil::SetThreadOperationProperty(ThreadStatus::COMPACTION_JOB_ID,
job_context_->job_id); job_context_->job_id);
IOSTATS_RESET(bytes_written); IOSTATS_RESET(bytes_written);
} }
@ -379,6 +380,7 @@ Status FlushJob::MemPurge() {
// Create two iterators, one for the memtable data (contains // Create two iterators, one for the memtable data (contains
// info from puts + deletes), and one for the memtable // info from puts + deletes), and one for the memtable
// Range Tombstones (from DeleteRanges). // Range Tombstones (from DeleteRanges).
// TODO: plumb Env::IOActivity
ReadOptions ro; ReadOptions ro;
ro.total_order_seek = true; ro.total_order_seek = true;
Arena arena; Arena arena;
@ -669,6 +671,7 @@ bool FlushJob::MemPurgeDecider(double threshold) {
// Cochran formula for determining sample size. // Cochran formula for determining sample size.
// 95% confidence interval, 7% precision. // 95% confidence interval, 7% precision.
// n0 = (1.96*1.96)*0.25/(0.07*0.07) = 196.0 // n0 = (1.96*1.96)*0.25/(0.07*0.07) = 196.0
// TODO: plumb Env::IOActivity
double n0 = 196.0; double n0 = 196.0;
ReadOptions ro; ReadOptions ro;
ro.total_order_seek = true; ro.total_order_seek = true;
@ -841,6 +844,7 @@ Status FlushJob::WriteLevel0Table() {
range_del_iters; range_del_iters;
ReadOptions ro; ReadOptions ro;
ro.total_order_seek = true; ro.total_order_seek = true;
ro.io_activity = Env::IOActivity::kFlush;
Arena arena; Arena arena;
uint64_t total_num_entries = 0, total_num_deletes = 0; uint64_t total_num_entries = 0, total_num_deletes = 0;
uint64_t total_data_size = 0; uint64_t total_data_size = 0;
@ -930,17 +934,19 @@ Status FlushJob::WriteLevel0Table() {
meta_.fd.GetNumber()); meta_.fd.GetNumber());
const SequenceNumber job_snapshot_seq = const SequenceNumber job_snapshot_seq =
job_context_->GetJobSnapshotSequence(); job_context_->GetJobSnapshotSequence();
s = BuildTable( const ReadOptions read_options(Env::IOActivity::kFlush);
dbname_, versions_, db_options_, tboptions, file_options_, s = BuildTable(dbname_, versions_, db_options_, tboptions, file_options_,
cfd_->table_cache(), iter.get(), std::move(range_del_iters), &meta_, read_options, cfd_->table_cache(), iter.get(),
&blob_file_additions, existing_snapshots_, std::move(range_del_iters), &meta_, &blob_file_additions,
earliest_write_conflict_snapshot_, job_snapshot_seq, existing_snapshots_, earliest_write_conflict_snapshot_,
snapshot_checker_, mutable_cf_options_.paranoid_file_checks, job_snapshot_seq, snapshot_checker_,
cfd_->internal_stats(), &io_s, io_tracer_, mutable_cf_options_.paranoid_file_checks,
BlobFileCreationReason::kFlush, seqno_to_time_mapping_, event_logger_, cfd_->internal_stats(), &io_s, io_tracer_,
job_context_->job_id, io_priority, &table_properties_, write_hint, BlobFileCreationReason::kFlush, seqno_to_time_mapping_,
full_history_ts_low, blob_callback_, base_, &num_input_entries, event_logger_, job_context_->job_id, io_priority,
&memtable_payload_bytes, &memtable_garbage_bytes); &table_properties_, write_hint, full_history_ts_low,
blob_callback_, base_, &num_input_entries,
&memtable_payload_bytes, &memtable_garbage_bytes);
// TODO: Cleanup io_status in BuildTable and table builders // TODO: Cleanup io_status in BuildTable and table builders
assert(!s.ok() || io_s.ok()); assert(!s.ok() || io_s.ok());
io_s.PermitUncheckedError(); io_s.PermitUncheckedError();

@ -275,6 +275,7 @@ Status ImportColumnFamilyJob::GetIngestedFileInfo(
// in file_meta. // in file_meta.
if (file_meta.smallest.empty()) { if (file_meta.smallest.empty()) {
assert(file_meta.largest.empty()); assert(file_meta.largest.empty());
// TODO: plumb Env::IOActivity
ReadOptions ro; ReadOptions ro;
std::unique_ptr<InternalIterator> iter(table_reader->NewIterator( std::unique_ptr<InternalIterator> iter(table_reader->NewIterator(
ro, sv->mutable_cf_options.prefix_extractor.get(), /*arena=*/nullptr, ro, sv->mutable_cf_options.prefix_extractor.get(), /*arena=*/nullptr,
@ -350,4 +351,3 @@ Status ImportColumnFamilyJob::GetIngestedFileInfo(
return status; return status;
} }
} // namespace ROCKSDB_NAMESPACE } // namespace ROCKSDB_NAMESPACE

@ -1148,7 +1148,9 @@ bool InternalStats::HandleSsTables(std::string* value, Slice /*suffix*/) {
bool InternalStats::HandleAggregatedTableProperties(std::string* value, bool InternalStats::HandleAggregatedTableProperties(std::string* value,
Slice /*suffix*/) { Slice /*suffix*/) {
std::shared_ptr<const TableProperties> tp; std::shared_ptr<const TableProperties> tp;
auto s = cfd_->current()->GetAggregatedTableProperties(&tp); // TODO: plumb Env::IOActivity
const ReadOptions read_options;
auto s = cfd_->current()->GetAggregatedTableProperties(read_options, &tp);
if (!s.ok()) { if (!s.ok()) {
return false; return false;
} }
@ -1168,7 +1170,9 @@ static std::map<std::string, std::string> MapUint64ValuesToString(
bool InternalStats::HandleAggregatedTablePropertiesMap( bool InternalStats::HandleAggregatedTablePropertiesMap(
std::map<std::string, std::string>* values, Slice /*suffix*/) { std::map<std::string, std::string>* values, Slice /*suffix*/) {
std::shared_ptr<const TableProperties> tp; std::shared_ptr<const TableProperties> tp;
auto s = cfd_->current()->GetAggregatedTableProperties(&tp); // TODO: plumb Env::IOActivity
const ReadOptions read_options;
auto s = cfd_->current()->GetAggregatedTableProperties(read_options, &tp);
if (!s.ok()) { if (!s.ok()) {
return false; return false;
} }
@ -1184,8 +1188,10 @@ bool InternalStats::HandleAggregatedTablePropertiesAtLevel(std::string* values,
return false; return false;
} }
std::shared_ptr<const TableProperties> tp; std::shared_ptr<const TableProperties> tp;
// TODO: plumb Env::IOActivity
const ReadOptions read_options;
auto s = cfd_->current()->GetAggregatedTableProperties( auto s = cfd_->current()->GetAggregatedTableProperties(
&tp, static_cast<int>(level)); read_options, &tp, static_cast<int>(level));
if (!s.ok()) { if (!s.ok()) {
return false; return false;
} }
@ -1201,8 +1207,10 @@ bool InternalStats::HandleAggregatedTablePropertiesAtLevelMap(
return false; return false;
} }
std::shared_ptr<const TableProperties> tp; std::shared_ptr<const TableProperties> tp;
// TODO: plumb Env::IOActivity
const ReadOptions read_options;
auto s = cfd_->current()->GetAggregatedTableProperties( auto s = cfd_->current()->GetAggregatedTableProperties(
&tp, static_cast<int>(level)); read_options, &tp, static_cast<int>(level));
if (!s.ok()) { if (!s.ok()) {
return false; return false;
} }
@ -1397,7 +1405,11 @@ bool InternalStats::HandleEstimatePendingCompactionBytes(uint64_t* value,
bool InternalStats::HandleEstimateTableReadersMem(uint64_t* value, bool InternalStats::HandleEstimateTableReadersMem(uint64_t* value,
DBImpl* /*db*/, DBImpl* /*db*/,
Version* version) { Version* version) {
*value = (version == nullptr) ? 0 : version->GetMemoryUsageByTableReaders(); // TODO: plumb Env::IOActivity
const ReadOptions read_options;
*value = (version == nullptr)
? 0
: version->GetMemoryUsageByTableReaders(read_options);
return true; return true;
} }
@ -1448,9 +1460,10 @@ bool InternalStats::HandleEstimateOldestKeyTime(uint64_t* value, DBImpl* /*db*/,
->compaction_options_fifo.allow_compaction) { ->compaction_options_fifo.allow_compaction) {
return false; return false;
} }
// TODO: plumb Env::IOActivity
const ReadOptions read_options;
TablePropertiesCollection collection; TablePropertiesCollection collection;
auto s = cfd_->current()->GetPropertiesOfAllTables(&collection); auto s = cfd_->current()->GetPropertiesOfAllTables(read_options, &collection);
if (!s.ok()) { if (!s.ok()) {
return false; return false;
} }

@ -599,6 +599,7 @@ void MemTable::ConstructFragmentedRangeTombstones() {
assert(!IsFragmentedRangeTombstonesConstructed(false)); assert(!IsFragmentedRangeTombstonesConstructed(false));
// There should be no concurrent Construction // There should be no concurrent Construction
if (!is_range_del_table_empty_.load(std::memory_order_relaxed)) { if (!is_range_del_table_empty_.load(std::memory_order_relaxed)) {
// TODO: plumb Env::IOActivity
auto* unfragmented_iter = auto* unfragmented_iter =
new MemTableIterator(*this, ReadOptions(), nullptr /* arena */, new MemTableIterator(*this, ReadOptions(), nullptr /* arena */,
true /* use_range_del_table */); true /* use_range_del_table */);

@ -467,6 +467,8 @@ Status MemTableList::TryInstallMemtableFlushResults(
ThreadStatus::STAGE_MEMTABLE_INSTALL_FLUSH_RESULTS); ThreadStatus::STAGE_MEMTABLE_INSTALL_FLUSH_RESULTS);
mu->AssertHeld(); mu->AssertHeld();
const ReadOptions read_options(Env::IOActivity::kFlush);
// Flush was successful // Flush was successful
// Record the status on the memtable object. Either this call or a call by a // Record the status on the memtable object. Either this call or a call by a
// concurrent flush thread will read the status and write it to manifest. // concurrent flush thread will read the status and write it to manifest.
@ -578,8 +580,8 @@ Status MemTableList::TryInstallMemtableFlushResults(
}; };
if (write_edits) { if (write_edits) {
// this can release and reacquire the mutex. // this can release and reacquire the mutex.
s = vset->LogAndApply(cfd, mutable_cf_options, edit_list, mu, s = vset->LogAndApply(cfd, mutable_cf_options, read_options, edit_list,
db_directory, /*new_descriptor_log=*/false, mu, db_directory, /*new_descriptor_log=*/false,
/*column_family_options=*/nullptr, /*column_family_options=*/nullptr,
manifest_write_cb); manifest_write_cb);
} else { } else {
@ -798,6 +800,8 @@ Status InstallMemtableAtomicFlushResults(
ThreadStatus::STAGE_MEMTABLE_INSTALL_FLUSH_RESULTS); ThreadStatus::STAGE_MEMTABLE_INSTALL_FLUSH_RESULTS);
mu->AssertHeld(); mu->AssertHeld();
const ReadOptions read_options(Env::IOActivity::kFlush);
size_t num = mems_list.size(); size_t num = mems_list.size();
assert(cfds.size() == num); assert(cfds.size() == num);
if (imm_lists != nullptr) { if (imm_lists != nullptr) {
@ -875,8 +879,8 @@ Status InstallMemtableAtomicFlushResults(
} }
// this can release and reacquire the mutex. // this can release and reacquire the mutex.
s = vset->LogAndApply(cfds, mutable_cf_options_list, edit_lists, mu, s = vset->LogAndApply(cfds, mutable_cf_options_list, read_options, edit_lists,
db_directory); mu, db_directory);
for (size_t k = 0; k != cfds.size(); ++k) { for (size_t k = 0; k != cfds.size(); ++k) {
auto* imm = (imm_lists == nullptr) ? cfds[k]->imm() : imm_lists->at(k); auto* imm = (imm_lists == nullptr) ? cfds[k]->imm() : imm_lists->at(k);

@ -187,7 +187,8 @@ TEST_F(PerfContextTest, StopWatchOverhead) {
uint64_t elapsed = 0; uint64_t elapsed = 0;
std::vector<uint64_t> timings(kTotalIterations); std::vector<uint64_t> timings(kTotalIterations);
StopWatch timer(SystemClock::Default().get(), nullptr, 0, &elapsed); StopWatch timer(SystemClock::Default().get(), nullptr, 0,
Histograms::HISTOGRAM_ENUM_MAX, &elapsed);
for (auto& timing : timings) { for (auto& timing : timings) {
timing = elapsed; timing = elapsed;
} }

@ -329,21 +329,23 @@ class TestPlainTableFactory : public PlainTableFactory {
std::unique_ptr<TableReader>* table, std::unique_ptr<TableReader>* table,
bool /*prefetch_index_and_filter_in_cache*/) const override { bool /*prefetch_index_and_filter_in_cache*/) const override {
std::unique_ptr<TableProperties> props; std::unique_ptr<TableProperties> props;
const ReadOptions read_options;
auto s = ReadTableProperties(file.get(), file_size, kPlainTableMagicNumber, auto s = ReadTableProperties(file.get(), file_size, kPlainTableMagicNumber,
table_reader_options.ioptions, &props); table_reader_options.ioptions, read_options,
&props);
EXPECT_TRUE(s.ok()); EXPECT_TRUE(s.ok());
if (store_index_in_file_) { if (store_index_in_file_) {
BlockHandle bloom_block_handle; BlockHandle bloom_block_handle;
s = FindMetaBlockInFile(file.get(), file_size, kPlainTableMagicNumber, s = FindMetaBlockInFile(file.get(), file_size, kPlainTableMagicNumber,
table_reader_options.ioptions, table_reader_options.ioptions, read_options,
BloomBlockBuilder::kBloomBlock, BloomBlockBuilder::kBloomBlock,
&bloom_block_handle); &bloom_block_handle);
EXPECT_TRUE(s.ok()); EXPECT_TRUE(s.ok());
BlockHandle index_block_handle; BlockHandle index_block_handle;
s = FindMetaBlockInFile(file.get(), file_size, kPlainTableMagicNumber, s = FindMetaBlockInFile(file.get(), file_size, kPlainTableMagicNumber,
table_reader_options.ioptions, table_reader_options.ioptions, read_options,
PlainTableIndexBuilder::kPlainTableIndexBlock, PlainTableIndexBuilder::kPlainTableIndexBlock,
&index_block_handle); &index_block_handle);
EXPECT_TRUE(s.ok()); EXPECT_TRUE(s.ok());
@ -1344,4 +1346,3 @@ int main(int argc, char** argv) {
::testing::InitGoogleTest(&argc, argv); ::testing::InitGoogleTest(&argc, argv);
return RUN_ALL_TESTS(); return RUN_ALL_TESTS();
} }

@ -145,6 +145,8 @@ class Repairer {
// Adds a column family to the VersionSet with cf_options_ and updates // Adds a column family to the VersionSet with cf_options_ and updates
// manifest. // manifest.
Status AddColumnFamily(const std::string& cf_name, uint32_t cf_id) { Status AddColumnFamily(const std::string& cf_name, uint32_t cf_id) {
// TODO: plumb Env::IOActivity;
const ReadOptions read_options;
const auto* cf_opts = GetColumnFamilyOptions(cf_name); const auto* cf_opts = GetColumnFamilyOptions(cf_name);
if (cf_opts == nullptr) { if (cf_opts == nullptr) {
return Status::Corruption("Encountered unknown column family with name=" + return Status::Corruption("Encountered unknown column family with name=" +
@ -166,8 +168,9 @@ class Repairer {
Status status = env_->GetFileSystem()->NewDirectory(dbname_, IOOptions(), Status status = env_->GetFileSystem()->NewDirectory(dbname_, IOOptions(),
&db_dir, nullptr); &db_dir, nullptr);
if (status.ok()) { if (status.ok()) {
status = vset_.LogAndApply(cfd, mut_cf_opts, &edit, &mutex_, db_dir.get(), status = vset_.LogAndApply(cfd, mut_cf_opts, read_options, &edit, &mutex_,
false /* new_descriptor_log */, cf_opts); db_dir.get(), false /* new_descriptor_log */,
cf_opts);
} }
mutex_.Unlock(); mutex_.Unlock();
return status; return status;
@ -357,6 +360,9 @@ class Repairer {
} }
}; };
// TODO: plumb Env::IOActivity
const ReadOptions read_options;
// Open the log file // Open the log file
std::string logname = LogFileName(wal_dir, log); std::string logname = LogFileName(wal_dir, log);
const auto& fs = env_->GetFileSystem(); const auto& fs = env_->GetFileSystem();
@ -422,6 +428,7 @@ class Repairer {
FileMetaData meta; FileMetaData meta;
meta.fd = FileDescriptor(next_file_number_++, 0, 0); meta.fd = FileDescriptor(next_file_number_++, 0, 0);
// TODO: plumb Env::IOActivity
ReadOptions ro; ReadOptions ro;
ro.total_order_seek = true; ro.total_order_seek = true;
Arena arena; Arena arena;
@ -456,7 +463,7 @@ class Repairer {
SeqnoToTimeMapping empty_seqno_time_mapping; SeqnoToTimeMapping empty_seqno_time_mapping;
status = BuildTable( status = BuildTable(
dbname_, /* versions */ nullptr, immutable_db_options_, tboptions, dbname_, /* versions */ nullptr, immutable_db_options_, tboptions,
file_options_, table_cache_.get(), iter.get(), file_options_, read_options, table_cache_.get(), iter.get(),
std::move(range_del_iters), &meta, nullptr /* blob_file_additions */, std::move(range_del_iters), &meta, nullptr /* blob_file_additions */,
{}, kMaxSequenceNumber, kMaxSequenceNumber, snapshot_checker, {}, kMaxSequenceNumber, kMaxSequenceNumber, snapshot_checker,
false /* paranoid_file_checks*/, nullptr /* internal_stats */, &io_s, false /* paranoid_file_checks*/, nullptr /* internal_stats */, &io_s,
@ -509,8 +516,10 @@ class Repairer {
file_size); file_size);
std::shared_ptr<const TableProperties> props; std::shared_ptr<const TableProperties> props;
if (status.ok()) { if (status.ok()) {
status = table_cache_->GetTableProperties(file_options_, icmp_, t->meta, // TODO: plumb Env::IOActivity
&props); const ReadOptions read_options;
status = table_cache_->GetTableProperties(file_options_, read_options,
icmp_, t->meta, &props);
} }
if (status.ok()) { if (status.ok()) {
auto s = auto s =
@ -556,6 +565,7 @@ class Repairer {
} }
} }
if (status.ok()) { if (status.ok()) {
// TODO: plumb Env::IOActivity
ReadOptions ropts; ReadOptions ropts;
ropts.total_order_seek = true; ropts.total_order_seek = true;
InternalIterator* iter = table_cache_->NewIterator( InternalIterator* iter = table_cache_->NewIterator(
@ -603,6 +613,7 @@ class Repairer {
// an SST file is a full sorted run. This probably needs the extra logic // an SST file is a full sorted run. This probably needs the extra logic
// from compaction_job.cc around call to UpdateBoundariesForRange (to // from compaction_job.cc around call to UpdateBoundariesForRange (to
// handle range tombstones extendingg beyond range of other entries). // handle range tombstones extendingg beyond range of other entries).
// TODO: plumb Env::IOActivity
ReadOptions ropts; ReadOptions ropts;
std::unique_ptr<FragmentedRangeTombstoneIterator> r_iter; std::unique_ptr<FragmentedRangeTombstoneIterator> r_iter;
status = table_cache_->GetRangeTombstoneIterator( status = table_cache_->GetRangeTombstoneIterator(
@ -625,6 +636,8 @@ class Repairer {
} }
Status AddTables() { Status AddTables() {
// TODO: plumb Env::IOActivity;
const ReadOptions read_options;
std::unordered_map<uint32_t, std::vector<const TableInfo*>> cf_id_to_tables; std::unordered_map<uint32_t, std::vector<const TableInfo*>> cf_id_to_tables;
SequenceNumber max_sequence = 0; SequenceNumber max_sequence = 0;
for (size_t i = 0; i < tables_.size(); i++) { for (size_t i = 0; i < tables_.size(); i++) {
@ -706,8 +719,8 @@ class Repairer {
s = env_->GetFileSystem()->NewDirectory(dbname_, IOOptions(), &db_dir, s = env_->GetFileSystem()->NewDirectory(dbname_, IOOptions(), &db_dir,
nullptr); nullptr);
if (s.ok()) { if (s.ok()) {
s = vset_.LogAndApply(cfd, *cfd->GetLatestMutableCFOptions(), &edit, s = vset_.LogAndApply(cfd, *cfd->GetLatestMutableCFOptions(),
&mutex_, db_dir.get(), read_options, &edit, &mutex_, db_dir.get(),
false /* new_descriptor_log */); false /* new_descriptor_log */);
} }
mutex_.Unlock(); mutex_.Unlock();
@ -809,4 +822,3 @@ Status RepairDB(const std::string& dbname, const Options& options) {
} }
} // namespace ROCKSDB_NAMESPACE } // namespace ROCKSDB_NAMESPACE

@ -560,7 +560,7 @@ Status TableCache::MultiGetFilter(
} }
Status TableCache::GetTableProperties( Status TableCache::GetTableProperties(
const FileOptions& file_options, const FileOptions& file_options, const ReadOptions& read_options,
const InternalKeyComparator& internal_comparator, const InternalKeyComparator& internal_comparator,
const FileMetaData& file_meta, const FileMetaData& file_meta,
std::shared_ptr<const TableProperties>* properties, std::shared_ptr<const TableProperties>* properties,
@ -574,7 +574,7 @@ Status TableCache::GetTableProperties(
} }
TypedHandle* table_handle = nullptr; TypedHandle* table_handle = nullptr;
Status s = FindTable(ReadOptions(), file_options, internal_comparator, Status s = FindTable(read_options, file_options, internal_comparator,
file_meta, &table_handle, prefix_extractor, no_io); file_meta, &table_handle, prefix_extractor, no_io);
if (!s.ok()) { if (!s.ok()) {
return s; return s;
@ -608,7 +608,7 @@ Status TableCache::ApproximateKeyAnchors(
} }
size_t TableCache::GetMemoryUsageByTableReader( size_t TableCache::GetMemoryUsageByTableReader(
const FileOptions& file_options, const FileOptions& file_options, const ReadOptions& read_options,
const InternalKeyComparator& internal_comparator, const InternalKeyComparator& internal_comparator,
const FileMetaData& file_meta, const FileMetaData& file_meta,
const std::shared_ptr<const SliceTransform>& prefix_extractor) { const std::shared_ptr<const SliceTransform>& prefix_extractor) {
@ -619,7 +619,7 @@ size_t TableCache::GetMemoryUsageByTableReader(
} }
TypedHandle* table_handle = nullptr; TypedHandle* table_handle = nullptr;
Status s = FindTable(ReadOptions(), file_options, internal_comparator, Status s = FindTable(read_options, file_options, internal_comparator,
file_meta, &table_handle, prefix_extractor, true); file_meta, &table_handle, prefix_extractor, true);
if (!s.ok()) { if (!s.ok()) {
return 0; return 0;
@ -636,7 +636,8 @@ void TableCache::Evict(Cache* cache, uint64_t file_number) {
} }
uint64_t TableCache::ApproximateOffsetOf( uint64_t TableCache::ApproximateOffsetOf(
const Slice& key, const FileMetaData& file_meta, TableReaderCaller caller, const ReadOptions& read_options, const Slice& key,
const FileMetaData& file_meta, TableReaderCaller caller,
const InternalKeyComparator& internal_comparator, const InternalKeyComparator& internal_comparator,
const std::shared_ptr<const SliceTransform>& prefix_extractor) { const std::shared_ptr<const SliceTransform>& prefix_extractor) {
uint64_t result = 0; uint64_t result = 0;
@ -645,7 +646,7 @@ uint64_t TableCache::ApproximateOffsetOf(
if (table_reader == nullptr) { if (table_reader == nullptr) {
const bool for_compaction = (caller == TableReaderCaller::kCompaction); const bool for_compaction = (caller == TableReaderCaller::kCompaction);
Status s = Status s =
FindTable(ReadOptions(), file_options_, internal_comparator, file_meta, FindTable(read_options, file_options_, internal_comparator, file_meta,
&table_handle, prefix_extractor, false /* no_io */, &table_handle, prefix_extractor, false /* no_io */,
!for_compaction /* record_read_stats */); !for_compaction /* record_read_stats */);
if (s.ok()) { if (s.ok()) {
@ -654,7 +655,7 @@ uint64_t TableCache::ApproximateOffsetOf(
} }
if (table_reader != nullptr) { if (table_reader != nullptr) {
result = table_reader->ApproximateOffsetOf(key, caller); result = table_reader->ApproximateOffsetOf(read_options, key, caller);
} }
if (table_handle != nullptr) { if (table_handle != nullptr) {
cache_.Release(table_handle); cache_.Release(table_handle);
@ -664,8 +665,9 @@ uint64_t TableCache::ApproximateOffsetOf(
} }
uint64_t TableCache::ApproximateSize( uint64_t TableCache::ApproximateSize(
const Slice& start, const Slice& end, const FileMetaData& file_meta, const ReadOptions& read_options, const Slice& start, const Slice& end,
TableReaderCaller caller, const InternalKeyComparator& internal_comparator, const FileMetaData& file_meta, TableReaderCaller caller,
const InternalKeyComparator& internal_comparator,
const std::shared_ptr<const SliceTransform>& prefix_extractor) { const std::shared_ptr<const SliceTransform>& prefix_extractor) {
uint64_t result = 0; uint64_t result = 0;
TableReader* table_reader = file_meta.fd.table_reader; TableReader* table_reader = file_meta.fd.table_reader;
@ -673,7 +675,7 @@ uint64_t TableCache::ApproximateSize(
if (table_reader == nullptr) { if (table_reader == nullptr) {
const bool for_compaction = (caller == TableReaderCaller::kCompaction); const bool for_compaction = (caller == TableReaderCaller::kCompaction);
Status s = Status s =
FindTable(ReadOptions(), file_options_, internal_comparator, file_meta, FindTable(read_options, file_options_, internal_comparator, file_meta,
&table_handle, prefix_extractor, false /* no_io */, &table_handle, prefix_extractor, false /* no_io */,
!for_compaction /* record_read_stats */); !for_compaction /* record_read_stats */);
if (s.ok()) { if (s.ok()) {
@ -682,7 +684,7 @@ uint64_t TableCache::ApproximateSize(
} }
if (table_reader != nullptr) { if (table_reader != nullptr) {
result = table_reader->ApproximateSize(start, end, caller); result = table_reader->ApproximateSize(read_options, start, end, caller);
} }
if (table_handle != nullptr) { if (table_handle != nullptr) {
cache_.Release(table_handle); cache_.Release(table_handle);

@ -179,7 +179,7 @@ class TableCache {
// return Status::Incomplete() if table is not present in cache and // return Status::Incomplete() if table is not present in cache and
// we set `no_io` to be true. // we set `no_io` to be true.
Status GetTableProperties( Status GetTableProperties(
const FileOptions& toptions, const FileOptions& toptions, const ReadOptions& read_options,
const InternalKeyComparator& internal_comparator, const InternalKeyComparator& internal_comparator,
const FileMetaData& file_meta, const FileMetaData& file_meta,
std::shared_ptr<const TableProperties>* properties, std::shared_ptr<const TableProperties>* properties,
@ -194,22 +194,23 @@ class TableCache {
// Return total memory usage of the table reader of the file. // Return total memory usage of the table reader of the file.
// 0 if table reader of the file is not loaded. // 0 if table reader of the file is not loaded.
size_t GetMemoryUsageByTableReader( size_t GetMemoryUsageByTableReader(
const FileOptions& toptions, const FileOptions& toptions, const ReadOptions& read_options,
const InternalKeyComparator& internal_comparator, const InternalKeyComparator& internal_comparator,
const FileMetaData& file_meta, const FileMetaData& file_meta,
const std::shared_ptr<const SliceTransform>& prefix_extractor = nullptr); const std::shared_ptr<const SliceTransform>& prefix_extractor = nullptr);
// Returns approximated offset of a key in a file represented by fd. // Returns approximated offset of a key in a file represented by fd.
uint64_t ApproximateOffsetOf( uint64_t ApproximateOffsetOf(
const Slice& key, const FileMetaData& file_meta, TableReaderCaller caller, const ReadOptions& read_options, const Slice& key,
const FileMetaData& file_meta, TableReaderCaller caller,
const InternalKeyComparator& internal_comparator, const InternalKeyComparator& internal_comparator,
const std::shared_ptr<const SliceTransform>& prefix_extractor = nullptr); const std::shared_ptr<const SliceTransform>& prefix_extractor = nullptr);
// Returns approximated data size between start and end keys in a file // Returns approximated data size between start and end keys in a file
// represented by fd (the start key must not be greater than the end key). // represented by fd (the start key must not be greater than the end key).
uint64_t ApproximateSize( uint64_t ApproximateSize(
const Slice& start, const Slice& end, const FileMetaData& file_meta, const ReadOptions& read_options, const Slice& start, const Slice& end,
TableReaderCaller caller, const FileMetaData& file_meta, TableReaderCaller caller,
const InternalKeyComparator& internal_comparator, const InternalKeyComparator& internal_comparator,
const std::shared_ptr<const SliceTransform>& prefix_extractor = nullptr); const std::shared_ptr<const SliceTransform>& prefix_extractor = nullptr);

@ -292,8 +292,9 @@ void TestCustomizedTablePropertiesCollector(
new RandomAccessFileReader(std::move(source), "test")); new RandomAccessFileReader(std::move(source), "test"));
std::unique_ptr<TableProperties> props; std::unique_ptr<TableProperties> props;
const ReadOptions read_options;
Status s = ReadTableProperties(fake_file_reader.get(), fwf->contents().size(), Status s = ReadTableProperties(fake_file_reader.get(), fwf->contents().size(),
magic_number, ioptions, &props); magic_number, ioptions, read_options, &props);
ASSERT_OK(s); ASSERT_OK(s);
auto user_collected = props->user_collected_properties; auto user_collected = props->user_collected_properties;
@ -429,8 +430,10 @@ void TestInternalKeyPropertiesCollector(
new RandomAccessFileReader(std::move(source), "test")); new RandomAccessFileReader(std::move(source), "test"));
std::unique_ptr<TableProperties> props; std::unique_ptr<TableProperties> props;
Status s = ReadTableProperties(reader.get(), fwf->contents().size(), const ReadOptions read_options;
magic_number, ioptions, &props); Status s =
ReadTableProperties(reader.get(), fwf->contents().size(), magic_number,
ioptions, read_options, &props);
ASSERT_OK(s); ASSERT_OK(s);
auto user_collected = props->user_collected_properties; auto user_collected = props->user_collected_properties;

@ -1257,7 +1257,7 @@ class VersionBuilder::Rep {
InternalStats* internal_stats, int max_threads, InternalStats* internal_stats, int max_threads,
bool prefetch_index_and_filter_in_cache, bool is_initial_load, bool prefetch_index_and_filter_in_cache, bool is_initial_load,
const std::shared_ptr<const SliceTransform>& prefix_extractor, const std::shared_ptr<const SliceTransform>& prefix_extractor,
size_t max_file_size_for_l0_meta_pin) { size_t max_file_size_for_l0_meta_pin, const ReadOptions& read_options) {
assert(table_cache_ != nullptr); assert(table_cache_ != nullptr);
size_t table_cache_capacity = size_t table_cache_capacity =
@ -1324,7 +1324,7 @@ class VersionBuilder::Rep {
int level = files_meta[file_idx].second; int level = files_meta[file_idx].second;
TableCache::TypedHandle* handle = nullptr; TableCache::TypedHandle* handle = nullptr;
statuses[file_idx] = table_cache_->FindTable( statuses[file_idx] = table_cache_->FindTable(
ReadOptions(), file_options_, read_options, file_options_,
*(base_vstorage_->InternalComparator()), *file_meta, &handle, *(base_vstorage_->InternalComparator()), *file_meta, &handle,
prefix_extractor, false /*no_io */, true /* record_read_stats */, prefix_extractor, false /*no_io */, true /* record_read_stats */,
internal_stats->GetFileReadHist(level), false, level, internal_stats->GetFileReadHist(level), false, level,
@ -1384,10 +1384,11 @@ Status VersionBuilder::LoadTableHandlers(
InternalStats* internal_stats, int max_threads, InternalStats* internal_stats, int max_threads,
bool prefetch_index_and_filter_in_cache, bool is_initial_load, bool prefetch_index_and_filter_in_cache, bool is_initial_load,
const std::shared_ptr<const SliceTransform>& prefix_extractor, const std::shared_ptr<const SliceTransform>& prefix_extractor,
size_t max_file_size_for_l0_meta_pin) { size_t max_file_size_for_l0_meta_pin, const ReadOptions& read_options) {
return rep_->LoadTableHandlers( return rep_->LoadTableHandlers(internal_stats, max_threads,
internal_stats, max_threads, prefetch_index_and_filter_in_cache, prefetch_index_and_filter_in_cache,
is_initial_load, prefix_extractor, max_file_size_for_l0_meta_pin); is_initial_load, prefix_extractor,
max_file_size_for_l0_meta_pin, read_options);
} }
uint64_t VersionBuilder::GetMinOldestBlobFileNumber() const { uint64_t VersionBuilder::GetMinOldestBlobFileNumber() const {

@ -48,7 +48,7 @@ class VersionBuilder {
InternalStats* internal_stats, int max_threads, InternalStats* internal_stats, int max_threads,
bool prefetch_index_and_filter_in_cache, bool is_initial_load, bool prefetch_index_and_filter_in_cache, bool is_initial_load,
const std::shared_ptr<const SliceTransform>& prefix_extractor, const std::shared_ptr<const SliceTransform>& prefix_extractor,
size_t max_file_size_for_l0_meta_pin); size_t max_file_size_for_l0_meta_pin, const ReadOptions& read_options);
uint64_t GetMinOldestBlobFileNumber() const; uint64_t GetMinOldestBlobFileNumber() const;
private: private:

@ -155,8 +155,9 @@ VersionEditHandler::VersionEditHandler(
bool read_only, std::vector<ColumnFamilyDescriptor> column_families, bool read_only, std::vector<ColumnFamilyDescriptor> column_families,
VersionSet* version_set, bool track_missing_files, VersionSet* version_set, bool track_missing_files,
bool no_error_if_files_missing, const std::shared_ptr<IOTracer>& io_tracer, bool no_error_if_files_missing, const std::shared_ptr<IOTracer>& io_tracer,
bool skip_load_table_files, EpochNumberRequirement epoch_number_requirement) const ReadOptions& read_options, bool skip_load_table_files,
: VersionEditHandlerBase(), EpochNumberRequirement epoch_number_requirement)
: VersionEditHandlerBase(read_options),
read_only_(read_only), read_only_(read_only),
column_families_(std::move(column_families)), column_families_(std::move(column_families)),
version_set_(version_set), version_set_(version_set),
@ -480,7 +481,8 @@ void VersionEditHandler::CheckIterationResult(const log::Reader& reader,
ColumnFamilyData* VersionEditHandler::CreateCfAndInit( ColumnFamilyData* VersionEditHandler::CreateCfAndInit(
const ColumnFamilyOptions& cf_options, const VersionEdit& edit) { const ColumnFamilyOptions& cf_options, const VersionEdit& edit) {
ColumnFamilyData* cfd = version_set_->CreateColumnFamily(cf_options, &edit); ColumnFamilyData* cfd =
version_set_->CreateColumnFamily(cf_options, read_options_, &edit);
assert(cfd != nullptr); assert(cfd != nullptr);
cfd->set_initialized(); cfd->set_initialized();
assert(builders_.find(edit.column_family_) == builders_.end()); assert(builders_.find(edit.column_family_) == builders_.end());
@ -537,7 +539,7 @@ Status VersionEditHandler::MaybeCreateVersion(const VersionEdit& /*edit*/,
if (s.ok()) { if (s.ok()) {
// Install new version // Install new version
v->PrepareAppend( v->PrepareAppend(
*cfd->GetLatestMutableCFOptions(), *cfd->GetLatestMutableCFOptions(), read_options_,
!(version_set_->db_options_->skip_stats_update_on_db_open)); !(version_set_->db_options_->skip_stats_update_on_db_open));
version_set_->AppendVersion(cfd, v); version_set_->AppendVersion(cfd, v);
} else { } else {
@ -569,7 +571,8 @@ Status VersionEditHandler::LoadTables(ColumnFamilyData* cfd,
version_set_->db_options_->max_file_opening_threads, version_set_->db_options_->max_file_opening_threads,
prefetch_index_and_filter_in_cache, is_initial_load, prefetch_index_and_filter_in_cache, is_initial_load,
cfd->GetLatestMutableCFOptions()->prefix_extractor, cfd->GetLatestMutableCFOptions()->prefix_extractor,
MaxFileSizeForL0MetaPin(*cfd->GetLatestMutableCFOptions())); MaxFileSizeForL0MetaPin(*cfd->GetLatestMutableCFOptions()),
read_options_);
if ((s.IsPathNotFound() || s.IsCorruption()) && no_error_if_files_missing_) { if ((s.IsPathNotFound() || s.IsCorruption()) && no_error_if_files_missing_) {
s = Status::OK(); s = Status::OK();
} }
@ -647,11 +650,12 @@ Status VersionEditHandler::ExtractInfoFromVersionEdit(ColumnFamilyData* cfd,
VersionEditHandlerPointInTime::VersionEditHandlerPointInTime( VersionEditHandlerPointInTime::VersionEditHandlerPointInTime(
bool read_only, std::vector<ColumnFamilyDescriptor> column_families, bool read_only, std::vector<ColumnFamilyDescriptor> column_families,
VersionSet* version_set, const std::shared_ptr<IOTracer>& io_tracer, VersionSet* version_set, const std::shared_ptr<IOTracer>& io_tracer,
const ReadOptions& read_options,
EpochNumberRequirement epoch_number_requirement) EpochNumberRequirement epoch_number_requirement)
: VersionEditHandler(read_only, column_families, version_set, : VersionEditHandler(read_only, column_families, version_set,
/*track_missing_files=*/true, /*track_missing_files=*/true,
/*no_error_if_files_missing=*/true, io_tracer, /*no_error_if_files_missing=*/true, io_tracer,
epoch_number_requirement) {} read_options, epoch_number_requirement) {}
VersionEditHandlerPointInTime::~VersionEditHandlerPointInTime() { VersionEditHandlerPointInTime::~VersionEditHandlerPointInTime() {
for (const auto& elem : versions_) { for (const auto& elem : versions_) {
@ -816,7 +820,8 @@ Status VersionEditHandlerPointInTime::MaybeCreateVersion(
cfd->internal_stats(), cfd->internal_stats(),
version_set_->db_options_->max_file_opening_threads, false, true, version_set_->db_options_->max_file_opening_threads, false, true,
cfd->GetLatestMutableCFOptions()->prefix_extractor, cfd->GetLatestMutableCFOptions()->prefix_extractor,
MaxFileSizeForL0MetaPin(*cfd->GetLatestMutableCFOptions())); MaxFileSizeForL0MetaPin(*cfd->GetLatestMutableCFOptions()),
read_options_);
if (!s.ok()) { if (!s.ok()) {
delete version; delete version;
if (s.IsCorruption()) { if (s.IsCorruption()) {
@ -827,7 +832,7 @@ Status VersionEditHandlerPointInTime::MaybeCreateVersion(
s = builder->SaveTo(version->storage_info()); s = builder->SaveTo(version->storage_info());
if (s.ok()) { if (s.ok()) {
version->PrepareAppend( version->PrepareAppend(
*cfd->GetLatestMutableCFOptions(), *cfd->GetLatestMutableCFOptions(), read_options_,
!version_set_->db_options_->skip_stats_update_on_db_open); !version_set_->db_options_->skip_stats_update_on_db_open);
auto v_iter = versions_.find(cfd->GetID()); auto v_iter = versions_.find(cfd->GetID());
if (v_iter != versions_.end()) { if (v_iter != versions_.end()) {
@ -847,7 +852,8 @@ Status VersionEditHandlerPointInTime::VerifyFile(ColumnFamilyData* cfd,
const std::string& fpath, const std::string& fpath,
int level, int level,
const FileMetaData& fmeta) { const FileMetaData& fmeta) {
return version_set_->VerifyFileMetadata(cfd, fpath, level, fmeta); return version_set_->VerifyFileMetadata(read_options_, cfd, fpath, level,
fmeta);
} }
Status VersionEditHandlerPointInTime::VerifyBlobFile( Status VersionEditHandlerPointInTime::VerifyBlobFile(
@ -856,7 +862,9 @@ Status VersionEditHandlerPointInTime::VerifyBlobFile(
BlobSource* blob_source = cfd->blob_source(); BlobSource* blob_source = cfd->blob_source();
assert(blob_source); assert(blob_source);
CacheHandleGuard<BlobFileReader> blob_file_reader; CacheHandleGuard<BlobFileReader> blob_file_reader;
Status s = blob_source->GetBlobFileReader(blob_file_num, &blob_file_reader);
Status s = blob_source->GetBlobFileReader(read_options_, blob_file_num,
&blob_file_reader);
if (!s.ok()) { if (!s.ok()) {
return s; return s;
} }

@ -19,8 +19,9 @@ struct FileMetaData;
class VersionEditHandlerBase { class VersionEditHandlerBase {
public: public:
explicit VersionEditHandlerBase() explicit VersionEditHandlerBase(const ReadOptions& read_options)
: max_manifest_read_size_(std::numeric_limits<uint64_t>::max()) {} : read_options_(read_options),
max_manifest_read_size_(std::numeric_limits<uint64_t>::max()) {}
virtual ~VersionEditHandlerBase() {} virtual ~VersionEditHandlerBase() {}
@ -31,8 +32,9 @@ class VersionEditHandlerBase {
AtomicGroupReadBuffer& GetReadBuffer() { return read_buffer_; } AtomicGroupReadBuffer& GetReadBuffer() { return read_buffer_; }
protected: protected:
explicit VersionEditHandlerBase(uint64_t max_read_size) explicit VersionEditHandlerBase(const ReadOptions& read_options,
: max_manifest_read_size_(max_read_size) {} uint64_t max_read_size)
: read_options_(read_options), max_manifest_read_size_(max_read_size) {}
virtual Status Initialize() { return Status::OK(); } virtual Status Initialize() { return Status::OK(); }
virtual Status ApplyVersionEdit(VersionEdit& edit, virtual Status ApplyVersionEdit(VersionEdit& edit,
@ -45,6 +47,8 @@ class VersionEditHandlerBase {
Status status_; Status status_;
const ReadOptions& read_options_;
private: private:
AtomicGroupReadBuffer read_buffer_; AtomicGroupReadBuffer read_buffer_;
const uint64_t max_manifest_read_size_; const uint64_t max_manifest_read_size_;
@ -52,7 +56,8 @@ class VersionEditHandlerBase {
class ListColumnFamiliesHandler : public VersionEditHandlerBase { class ListColumnFamiliesHandler : public VersionEditHandlerBase {
public: public:
ListColumnFamiliesHandler() : VersionEditHandlerBase() {} explicit ListColumnFamiliesHandler(const ReadOptions& read_options)
: VersionEditHandlerBase(read_options) {}
~ListColumnFamiliesHandler() override {} ~ListColumnFamiliesHandler() override {}
@ -72,9 +77,9 @@ class ListColumnFamiliesHandler : public VersionEditHandlerBase {
class FileChecksumRetriever : public VersionEditHandlerBase { class FileChecksumRetriever : public VersionEditHandlerBase {
public: public:
FileChecksumRetriever(uint64_t max_read_size, FileChecksumRetriever(const ReadOptions& read_options, uint64_t max_read_size,
FileChecksumList& file_checksum_list) FileChecksumList& file_checksum_list)
: VersionEditHandlerBase(max_read_size), : VersionEditHandlerBase(read_options, max_read_size),
file_checksum_list_(file_checksum_list) {} file_checksum_list_(file_checksum_list) {}
~FileChecksumRetriever() override {} ~FileChecksumRetriever() override {}
@ -111,12 +116,13 @@ class VersionEditHandler : public VersionEditHandlerBase {
VersionSet* version_set, bool track_missing_files, VersionSet* version_set, bool track_missing_files,
bool no_error_if_files_missing, bool no_error_if_files_missing,
const std::shared_ptr<IOTracer>& io_tracer, const std::shared_ptr<IOTracer>& io_tracer,
const ReadOptions& read_options,
EpochNumberRequirement epoch_number_requirement = EpochNumberRequirement epoch_number_requirement =
EpochNumberRequirement::kMustPresent) EpochNumberRequirement::kMustPresent)
: VersionEditHandler(read_only, column_families, version_set, : VersionEditHandler(
track_missing_files, no_error_if_files_missing, read_only, column_families, version_set, track_missing_files,
io_tracer, /*skip_load_table_files=*/false, no_error_if_files_missing, io_tracer, read_options,
epoch_number_requirement) {} /*skip_load_table_files=*/false, epoch_number_requirement) {}
~VersionEditHandler() override {} ~VersionEditHandler() override {}
@ -137,7 +143,8 @@ class VersionEditHandler : public VersionEditHandlerBase {
bool read_only, std::vector<ColumnFamilyDescriptor> column_families, bool read_only, std::vector<ColumnFamilyDescriptor> column_families,
VersionSet* version_set, bool track_missing_files, VersionSet* version_set, bool track_missing_files,
bool no_error_if_files_missing, bool no_error_if_files_missing,
const std::shared_ptr<IOTracer>& io_tracer, bool skip_load_table_files, const std::shared_ptr<IOTracer>& io_tracer,
const ReadOptions& read_options, bool skip_load_table_files,
EpochNumberRequirement epoch_number_requirement = EpochNumberRequirement epoch_number_requirement =
EpochNumberRequirement::kMustPresent); EpochNumberRequirement::kMustPresent);
@ -212,6 +219,7 @@ class VersionEditHandlerPointInTime : public VersionEditHandler {
VersionEditHandlerPointInTime( VersionEditHandlerPointInTime(
bool read_only, std::vector<ColumnFamilyDescriptor> column_families, bool read_only, std::vector<ColumnFamilyDescriptor> column_families,
VersionSet* version_set, const std::shared_ptr<IOTracer>& io_tracer, VersionSet* version_set, const std::shared_ptr<IOTracer>& io_tracer,
const ReadOptions& read_options,
EpochNumberRequirement epoch_number_requirement = EpochNumberRequirement epoch_number_requirement =
EpochNumberRequirement::kMustPresent); EpochNumberRequirement::kMustPresent);
~VersionEditHandlerPointInTime() override; ~VersionEditHandlerPointInTime() override;
@ -238,10 +246,11 @@ class ManifestTailer : public VersionEditHandlerPointInTime {
explicit ManifestTailer(std::vector<ColumnFamilyDescriptor> column_families, explicit ManifestTailer(std::vector<ColumnFamilyDescriptor> column_families,
VersionSet* version_set, VersionSet* version_set,
const std::shared_ptr<IOTracer>& io_tracer, const std::shared_ptr<IOTracer>& io_tracer,
const ReadOptions& read_options,
EpochNumberRequirement epoch_number_requirement = EpochNumberRequirement epoch_number_requirement =
EpochNumberRequirement::kMustPresent) EpochNumberRequirement::kMustPresent)
: VersionEditHandlerPointInTime(/*read_only=*/false, column_families, : VersionEditHandlerPointInTime(/*read_only=*/false, column_families,
version_set, io_tracer, version_set, io_tracer, read_options,
epoch_number_requirement), epoch_number_requirement),
mode_(Mode::kRecovery) {} mode_(Mode::kRecovery) {}
@ -281,12 +290,13 @@ class DumpManifestHandler : public VersionEditHandler {
public: public:
DumpManifestHandler(std::vector<ColumnFamilyDescriptor> column_families, DumpManifestHandler(std::vector<ColumnFamilyDescriptor> column_families,
VersionSet* version_set, VersionSet* version_set,
const std::shared_ptr<IOTracer>& io_tracer, bool verbose, const std::shared_ptr<IOTracer>& io_tracer,
bool hex, bool json) const ReadOptions& read_options, bool verbose, bool hex,
bool json)
: VersionEditHandler( : VersionEditHandler(
/*read_only=*/true, column_families, version_set, /*read_only=*/true, column_families, version_set,
/*track_missing_files=*/false, /*track_missing_files=*/false,
/*no_error_if_files_missing=*/false, io_tracer, /*no_error_if_files_missing=*/false, io_tracer, read_options,
/*skip_load_table_files=*/true), /*skip_load_table_files=*/true),
verbose_(verbose), verbose_(verbose),
hex_(hex), hex_(hex),

@ -1527,13 +1527,14 @@ void LevelIterator::InitFileIterator(size_t new_file_index) {
} }
} // anonymous namespace } // anonymous namespace
Status Version::GetTableProperties(std::shared_ptr<const TableProperties>* tp, Status Version::GetTableProperties(const ReadOptions& read_options,
std::shared_ptr<const TableProperties>* tp,
const FileMetaData* file_meta, const FileMetaData* file_meta,
const std::string* fname) const { const std::string* fname) const {
auto table_cache = cfd_->table_cache(); auto table_cache = cfd_->table_cache();
auto ioptions = cfd_->ioptions(); auto ioptions = cfd_->ioptions();
Status s = table_cache->GetTableProperties( Status s = table_cache->GetTableProperties(
file_options_, cfd_->internal_comparator(), *file_meta, tp, file_options_, read_options, cfd_->internal_comparator(), *file_meta, tp,
mutable_cf_options_.prefix_extractor, true /* no io */); mutable_cf_options_.prefix_extractor, true /* no io */);
if (s.ok()) { if (s.ok()) {
return s; return s;
@ -1565,14 +1566,16 @@ Status Version::GetTableProperties(std::shared_ptr<const TableProperties>* tp,
// the magic number check in the footer. // the magic number check in the footer.
std::unique_ptr<RandomAccessFileReader> file_reader( std::unique_ptr<RandomAccessFileReader> file_reader(
new RandomAccessFileReader( new RandomAccessFileReader(
std::move(file), file_name, nullptr /* env */, io_tracer_, std::move(file), file_name, ioptions->clock /* clock */, io_tracer_,
nullptr /* stats */, 0 /* hist_type */, nullptr /* file_read_hist */, ioptions->stats /* stats */,
nullptr /* rate_limiter */, ioptions->listeners)); Histograms::SST_READ_MICROS /* hist_type */,
nullptr /* file_read_hist */, nullptr /* rate_limiter */,
ioptions->listeners));
std::unique_ptr<TableProperties> props; std::unique_ptr<TableProperties> props;
s = ReadTableProperties( s = ReadTableProperties(
file_reader.get(), file_meta->fd.GetFileSize(), file_reader.get(), file_meta->fd.GetFileSize(),
Footer::kNullTableMagicNumber /* table's magic number */, *ioptions, Footer::kNullTableMagicNumber /* table's magic number */, *ioptions,
&props); read_options, &props);
if (!s.ok()) { if (!s.ok()) {
return s; return s;
} }
@ -1581,10 +1584,11 @@ Status Version::GetTableProperties(std::shared_ptr<const TableProperties>* tp,
return s; return s;
} }
Status Version::GetPropertiesOfAllTables(TablePropertiesCollection* props) { Status Version::GetPropertiesOfAllTables(const ReadOptions& read_options,
TablePropertiesCollection* props) {
Status s; Status s;
for (int level = 0; level < storage_info_.num_levels_; level++) { for (int level = 0; level < storage_info_.num_levels_; level++) {
s = GetPropertiesOfAllTables(props, level); s = GetPropertiesOfAllTables(read_options, props, level);
if (!s.ok()) { if (!s.ok()) {
return s; return s;
} }
@ -1602,6 +1606,8 @@ Status Version::TablesRangeTombstoneSummary(int max_entries_to_print,
std::stringstream ss; std::stringstream ss;
// TODO: plumb Env::IOActivity
const ReadOptions read_options;
for (int level = 0; level < storage_info_.num_levels_; level++) { for (int level = 0; level < storage_info_.num_levels_; level++) {
for (const auto& file_meta : storage_info_.files_[level]) { for (const auto& file_meta : storage_info_.files_[level]) {
auto fname = auto fname =
@ -1614,7 +1620,7 @@ Status Version::TablesRangeTombstoneSummary(int max_entries_to_print,
std::unique_ptr<FragmentedRangeTombstoneIterator> tombstone_iter; std::unique_ptr<FragmentedRangeTombstoneIterator> tombstone_iter;
Status s = table_cache->GetRangeTombstoneIterator( Status s = table_cache->GetRangeTombstoneIterator(
ReadOptions(), cfd_->internal_comparator(), *file_meta, read_options, cfd_->internal_comparator(), *file_meta,
&tombstone_iter); &tombstone_iter);
if (!s.ok()) { if (!s.ok()) {
return s; return s;
@ -1648,7 +1654,8 @@ Status Version::TablesRangeTombstoneSummary(int max_entries_to_print,
return Status::OK(); return Status::OK();
} }
Status Version::GetPropertiesOfAllTables(TablePropertiesCollection* props, Status Version::GetPropertiesOfAllTables(const ReadOptions& read_options,
TablePropertiesCollection* props,
int level) { int level) {
for (const auto& file_meta : storage_info_.files_[level]) { for (const auto& file_meta : storage_info_.files_[level]) {
auto fname = auto fname =
@ -1657,7 +1664,8 @@ Status Version::GetPropertiesOfAllTables(TablePropertiesCollection* props,
// 1. If the table is already present in table cache, load table // 1. If the table is already present in table cache, load table
// properties from there. // properties from there.
std::shared_ptr<const TableProperties> table_properties; std::shared_ptr<const TableProperties> table_properties;
Status s = GetTableProperties(&table_properties, file_meta, &fname); Status s =
GetTableProperties(read_options, &table_properties, file_meta, &fname);
if (s.ok()) { if (s.ok()) {
props->insert({fname, table_properties}); props->insert({fname, table_properties});
} else { } else {
@ -1669,7 +1677,8 @@ Status Version::GetPropertiesOfAllTables(TablePropertiesCollection* props,
} }
Status Version::GetPropertiesOfTablesInRange( Status Version::GetPropertiesOfTablesInRange(
const Range* range, std::size_t n, TablePropertiesCollection* props) const { const ReadOptions& read_options, const Range* range, std::size_t n,
TablePropertiesCollection* props) const {
for (int level = 0; level < storage_info_.num_non_empty_levels(); level++) { for (int level = 0; level < storage_info_.num_non_empty_levels(); level++) {
for (decltype(n) i = 0; i < n; i++) { for (decltype(n) i = 0; i < n; i++) {
// Convert user_key into a corresponding internal key. // Convert user_key into a corresponding internal key.
@ -1686,7 +1695,8 @@ Status Version::GetPropertiesOfTablesInRange(
// 1. If the table is already present in table cache, load table // 1. If the table is already present in table cache, load table
// properties from there. // properties from there.
std::shared_ptr<const TableProperties> table_properties; std::shared_ptr<const TableProperties> table_properties;
Status s = GetTableProperties(&table_properties, file_meta, &fname); Status s = GetTableProperties(read_options, &table_properties,
file_meta, &fname);
if (s.ok()) { if (s.ok()) {
props->insert({fname, table_properties}); props->insert({fname, table_properties});
} else { } else {
@ -1701,13 +1711,14 @@ Status Version::GetPropertiesOfTablesInRange(
} }
Status Version::GetAggregatedTableProperties( Status Version::GetAggregatedTableProperties(
std::shared_ptr<const TableProperties>* tp, int level) { const ReadOptions& read_options, std::shared_ptr<const TableProperties>* tp,
int level) {
TablePropertiesCollection props; TablePropertiesCollection props;
Status s; Status s;
if (level < 0) { if (level < 0) {
s = GetPropertiesOfAllTables(&props); s = GetPropertiesOfAllTables(read_options, &props);
} else { } else {
s = GetPropertiesOfAllTables(&props, level); s = GetPropertiesOfAllTables(read_options, &props, level);
} }
if (!s.ok()) { if (!s.ok()) {
return s; return s;
@ -1721,12 +1732,12 @@ Status Version::GetAggregatedTableProperties(
return Status::OK(); return Status::OK();
} }
size_t Version::GetMemoryUsageByTableReaders() { size_t Version::GetMemoryUsageByTableReaders(const ReadOptions& read_options) {
size_t total_usage = 0; size_t total_usage = 0;
for (auto& file_level : storage_info_.level_files_brief_) { for (auto& file_level : storage_info_.level_files_brief_) {
for (size_t i = 0; i < file_level.num_files; i++) { for (size_t i = 0; i < file_level.num_files; i++) {
total_usage += cfd_->table_cache()->GetMemoryUsageByTableReader( total_usage += cfd_->table_cache()->GetMemoryUsageByTableReader(
file_options_, cfd_->internal_comparator(), file_options_, read_options, cfd_->internal_comparator(),
*file_level.files[i].file_metadata, *file_level.files[i].file_metadata,
mutable_cf_options_.prefix_extractor); mutable_cf_options_.prefix_extractor);
} }
@ -2984,24 +2995,26 @@ void VersionStorageInfo::PrepareForVersionAppend(
} }
void Version::PrepareAppend(const MutableCFOptions& mutable_cf_options, void Version::PrepareAppend(const MutableCFOptions& mutable_cf_options,
const ReadOptions& read_options,
bool update_stats) { bool update_stats) {
TEST_SYNC_POINT_CALLBACK( TEST_SYNC_POINT_CALLBACK(
"Version::PrepareAppend:forced_check", "Version::PrepareAppend:forced_check",
reinterpret_cast<void*>(&storage_info_.force_consistency_checks_)); reinterpret_cast<void*>(&storage_info_.force_consistency_checks_));
if (update_stats) { if (update_stats) {
UpdateAccumulatedStats(); UpdateAccumulatedStats(read_options);
} }
storage_info_.PrepareForVersionAppend(*cfd_->ioptions(), mutable_cf_options); storage_info_.PrepareForVersionAppend(*cfd_->ioptions(), mutable_cf_options);
} }
bool Version::MaybeInitializeFileMetaData(FileMetaData* file_meta) { bool Version::MaybeInitializeFileMetaData(const ReadOptions& read_options,
FileMetaData* file_meta) {
if (file_meta->init_stats_from_file || file_meta->compensated_file_size > 0) { if (file_meta->init_stats_from_file || file_meta->compensated_file_size > 0) {
return false; return false;
} }
std::shared_ptr<const TableProperties> tp; std::shared_ptr<const TableProperties> tp;
Status s = GetTableProperties(&tp, file_meta); Status s = GetTableProperties(read_options, &tp, file_meta);
file_meta->init_stats_from_file = true; file_meta->init_stats_from_file = true;
if (!s.ok()) { if (!s.ok()) {
ROCKS_LOG_ERROR(vset_->db_options_->info_log, ROCKS_LOG_ERROR(vset_->db_options_->info_log,
@ -3046,7 +3059,7 @@ void VersionStorageInfo::RemoveCurrentStats(FileMetaData* file_meta) {
} }
} }
void Version::UpdateAccumulatedStats() { void Version::UpdateAccumulatedStats(const ReadOptions& read_options) {
// maximum number of table properties loaded from files. // maximum number of table properties loaded from files.
const int kMaxInitCount = 20; const int kMaxInitCount = 20;
int init_count = 0; int init_count = 0;
@ -3064,7 +3077,7 @@ void Version::UpdateAccumulatedStats() {
level < storage_info_.num_levels_ && init_count < kMaxInitCount; level < storage_info_.num_levels_ && init_count < kMaxInitCount;
++level) { ++level) {
for (auto* file_meta : storage_info_.files_[level]) { for (auto* file_meta : storage_info_.files_[level]) {
if (MaybeInitializeFileMetaData(file_meta)) { if (MaybeInitializeFileMetaData(read_options, file_meta)) {
// each FileMeta will be initialized only once. // each FileMeta will be initialized only once.
storage_info_.UpdateAccumulatedStats(file_meta); storage_info_.UpdateAccumulatedStats(file_meta);
// when option "max_open_files" is -1, all the file metadata has // when option "max_open_files" is -1, all the file metadata has
@ -3089,7 +3102,8 @@ void Version::UpdateAccumulatedStats() {
storage_info_.accumulated_raw_value_size_ == 0 && level >= 0; --level) { storage_info_.accumulated_raw_value_size_ == 0 && level >= 0; --level) {
for (int i = static_cast<int>(storage_info_.files_[level].size()) - 1; for (int i = static_cast<int>(storage_info_.files_[level].size()) - 1;
storage_info_.accumulated_raw_value_size_ == 0 && i >= 0; --i) { storage_info_.accumulated_raw_value_size_ == 0 && i >= 0; --i) {
if (MaybeInitializeFileMetaData(storage_info_.files_[level][i])) { if (MaybeInitializeFileMetaData(read_options,
storage_info_.files_[level][i])) {
storage_info_.UpdateAccumulatedStats(storage_info_.files_[level][i]); storage_info_.UpdateAccumulatedStats(storage_info_.files_[level][i]);
} }
} }
@ -4971,7 +4985,8 @@ void VersionSet::AppendVersion(ColumnFamilyData* column_family_data,
Status VersionSet::ProcessManifestWrites( Status VersionSet::ProcessManifestWrites(
std::deque<ManifestWriter>& writers, InstrumentedMutex* mu, std::deque<ManifestWriter>& writers, InstrumentedMutex* mu,
FSDirectory* dir_contains_current_file, bool new_descriptor_log, FSDirectory* dir_contains_current_file, bool new_descriptor_log,
const ColumnFamilyOptions* new_cf_options) { const ColumnFamilyOptions* new_cf_options,
const ReadOptions& read_options) {
mu->AssertHeld(); mu->AssertHeld();
assert(!writers.empty()); assert(!writers.empty());
ManifestWriter& first_writer = writers.front(); ManifestWriter& first_writer = writers.front();
@ -5202,7 +5217,7 @@ Status VersionSet::ProcessManifestWrites(
true /* prefetch_index_and_filter_in_cache */, true /* prefetch_index_and_filter_in_cache */,
false /* is_initial_load */, false /* is_initial_load */,
mutable_cf_options_ptrs[i]->prefix_extractor, mutable_cf_options_ptrs[i]->prefix_extractor,
MaxFileSizeForL0MetaPin(*mutable_cf_options_ptrs[i])); MaxFileSizeForL0MetaPin(*mutable_cf_options_ptrs[i]), read_options);
if (!s.ok()) { if (!s.ok()) {
if (db_options_->paranoid_checks) { if (db_options_->paranoid_checks) {
break; break;
@ -5247,7 +5262,8 @@ Status VersionSet::ProcessManifestWrites(
constexpr bool update_stats = true; constexpr bool update_stats = true;
for (int i = 0; i < static_cast<int>(versions.size()); ++i) { for (int i = 0; i < static_cast<int>(versions.size()); ++i) {
versions[i]->PrepareAppend(*mutable_cf_options_ptrs[i], update_stats); versions[i]->PrepareAppend(*mutable_cf_options_ptrs[i], read_options,
update_stats);
} }
} }
@ -5359,7 +5375,8 @@ Status VersionSet::ProcessManifestWrites(
assert(batch_edits.size() == 1); assert(batch_edits.size() == 1);
assert(new_cf_options != nullptr); assert(new_cf_options != nullptr);
assert(max_last_sequence == descriptor_last_sequence_); assert(max_last_sequence == descriptor_last_sequence_);
CreateColumnFamily(*new_cf_options, first_writer.edit_list.front()); CreateColumnFamily(*new_cf_options, read_options,
first_writer.edit_list.front());
} else if (first_writer.edit_list.front()->is_column_family_drop_) { } else if (first_writer.edit_list.front()->is_column_family_drop_) {
assert(batch_edits.size() == 1); assert(batch_edits.size() == 1);
assert(max_last_sequence == descriptor_last_sequence_); assert(max_last_sequence == descriptor_last_sequence_);
@ -5528,6 +5545,7 @@ void VersionSet::WakeUpWaitingManifestWriters() {
Status VersionSet::LogAndApply( Status VersionSet::LogAndApply(
const autovector<ColumnFamilyData*>& column_family_datas, const autovector<ColumnFamilyData*>& column_family_datas,
const autovector<const MutableCFOptions*>& mutable_cf_options_list, const autovector<const MutableCFOptions*>& mutable_cf_options_list,
const ReadOptions& read_options,
const autovector<autovector<VersionEdit*>>& edit_lists, const autovector<autovector<VersionEdit*>>& edit_lists,
InstrumentedMutex* mu, FSDirectory* dir_contains_current_file, InstrumentedMutex* mu, FSDirectory* dir_contains_current_file,
bool new_descriptor_log, const ColumnFamilyOptions* new_cf_options, bool new_descriptor_log, const ColumnFamilyOptions* new_cf_options,
@ -5605,7 +5623,8 @@ Status VersionSet::LogAndApply(
return Status::ColumnFamilyDropped(); return Status::ColumnFamilyDropped();
} }
return ProcessManifestWrites(writers, mu, dir_contains_current_file, return ProcessManifestWrites(writers, mu, dir_contains_current_file,
new_descriptor_log, new_cf_options); new_descriptor_log, new_cf_options,
read_options);
} }
void VersionSet::LogAndApplyCFHelper(VersionEdit* edit, void VersionSet::LogAndApplyCFHelper(VersionEdit* edit,
@ -5689,6 +5708,7 @@ Status VersionSet::GetCurrentManifestPath(const std::string& dbname,
Status VersionSet::Recover( Status VersionSet::Recover(
const std::vector<ColumnFamilyDescriptor>& column_families, bool read_only, const std::vector<ColumnFamilyDescriptor>& column_families, bool read_only,
std::string* db_id, bool no_error_if_files_missing) { std::string* db_id, bool no_error_if_files_missing) {
const ReadOptions read_options(Env::IOActivity::kDBOpen);
// Read "CURRENT" file, which contains a pointer to the current manifest // Read "CURRENT" file, which contains a pointer to the current manifest
// file // file
std::string manifest_path; std::string manifest_path;
@ -5725,7 +5745,7 @@ Status VersionSet::Recover(
VersionEditHandler handler( VersionEditHandler handler(
read_only, column_families, const_cast<VersionSet*>(this), read_only, column_families, const_cast<VersionSet*>(this),
/*track_missing_files=*/false, no_error_if_files_missing, io_tracer_, /*track_missing_files=*/false, no_error_if_files_missing, io_tracer_,
EpochNumberRequirement::kMightMissing); read_options, EpochNumberRequirement::kMightMissing);
handler.Iterate(reader, &log_read_status); handler.Iterate(reader, &log_read_status);
s = handler.status(); s = handler.status();
if (s.ok()) { if (s.ok()) {
@ -5873,6 +5893,7 @@ Status VersionSet::TryRecoverFromOneManifest(
const std::string& manifest_path, const std::string& manifest_path,
const std::vector<ColumnFamilyDescriptor>& column_families, bool read_only, const std::vector<ColumnFamilyDescriptor>& column_families, bool read_only,
std::string* db_id, bool* has_missing_table_file) { std::string* db_id, bool* has_missing_table_file) {
const ReadOptions read_options(Env::IOActivity::kDBOpen);
ROCKS_LOG_INFO(db_options_->info_log, "Trying to recover from manifest: %s\n", ROCKS_LOG_INFO(db_options_->info_log, "Trying to recover from manifest: %s\n",
manifest_path.c_str()); manifest_path.c_str());
std::unique_ptr<SequentialFileReader> manifest_file_reader; std::unique_ptr<SequentialFileReader> manifest_file_reader;
@ -5897,7 +5918,7 @@ Status VersionSet::TryRecoverFromOneManifest(
/*checksum=*/true, /*log_num=*/0); /*checksum=*/true, /*log_num=*/0);
VersionEditHandlerPointInTime handler_pit( VersionEditHandlerPointInTime handler_pit(
read_only, column_families, const_cast<VersionSet*>(this), io_tracer_, read_only, column_families, const_cast<VersionSet*>(this), io_tracer_,
EpochNumberRequirement::kMightMissing); read_options, EpochNumberRequirement::kMightMissing);
handler_pit.Iterate(reader, &s); handler_pit.Iterate(reader, &s);
@ -5940,6 +5961,8 @@ Status VersionSet::ListColumnFamilies(std::vector<std::string>* column_families,
Status VersionSet::ListColumnFamiliesFromManifest( Status VersionSet::ListColumnFamiliesFromManifest(
const std::string& manifest_path, FileSystem* fs, const std::string& manifest_path, FileSystem* fs,
std::vector<std::string>* column_families) { std::vector<std::string>* column_families) {
// TODO: plumb Env::IOActivity
const ReadOptions read_options;
std::unique_ptr<SequentialFileReader> file_reader; std::unique_ptr<SequentialFileReader> file_reader;
Status s; Status s;
{ {
@ -5959,7 +5982,7 @@ Status VersionSet::ListColumnFamiliesFromManifest(
log::Reader reader(nullptr, std::move(file_reader), &reporter, log::Reader reader(nullptr, std::move(file_reader), &reporter,
true /* checksum */, 0 /* log_number */); true /* checksum */, 0 /* log_number */);
ListColumnFamiliesHandler handler; ListColumnFamiliesHandler handler(read_options);
handler.Iterate(reader, &s); handler.Iterate(reader, &s);
assert(column_families); assert(column_families);
@ -5982,6 +6005,9 @@ Status VersionSet::ReduceNumberOfLevels(const std::string& dbname,
"Number of levels needs to be bigger than 1"); "Number of levels needs to be bigger than 1");
} }
// TODO: plumb Env::IOActivity
const ReadOptions read_options;
ImmutableDBOptions db_options(*options); ImmutableDBOptions db_options(*options);
ColumnFamilyOptions cf_options(*options); ColumnFamilyOptions cf_options(*options);
std::shared_ptr<Cache> tc(NewLRUCache(options->max_open_files - 10, std::shared_ptr<Cache> tc(NewLRUCache(options->max_open_files - 10,
@ -6069,8 +6095,8 @@ Status VersionSet::ReduceNumberOfLevels(const std::string& dbname,
InstrumentedMutex dummy_mutex; InstrumentedMutex dummy_mutex;
InstrumentedMutexLock l(&dummy_mutex); InstrumentedMutexLock l(&dummy_mutex);
return versions.LogAndApply(versions.GetColumnFamilySet()->GetDefault(), return versions.LogAndApply(versions.GetColumnFamilySet()->GetDefault(),
mutable_cf_options, &ve, &dummy_mutex, nullptr, mutable_cf_options, read_options, &ve,
true); &dummy_mutex, nullptr, true);
} }
// Get the checksum information including the checksum and checksum function // Get the checksum information including the checksum and checksum function
@ -6143,6 +6169,9 @@ Status VersionSet::GetLiveFilesChecksumInfo(FileChecksumList* checksum_list) {
Status VersionSet::DumpManifest(Options& options, std::string& dscname, Status VersionSet::DumpManifest(Options& options, std::string& dscname,
bool verbose, bool hex, bool json) { bool verbose, bool hex, bool json) {
assert(options.env); assert(options.env);
// TODO: plumb Env::IOActivity
const ReadOptions read_options;
std::vector<std::string> column_families; std::vector<std::string> column_families;
Status s = ListColumnFamiliesFromManifest( Status s = ListColumnFamiliesFromManifest(
dscname, options.env->GetFileSystem().get(), &column_families); dscname, options.env->GetFileSystem().get(), &column_families);
@ -6169,7 +6198,8 @@ Status VersionSet::DumpManifest(Options& options, std::string& dscname,
cf_descs.emplace_back(cf, options); cf_descs.emplace_back(cf, options);
} }
DumpManifestHandler handler(cf_descs, this, io_tracer_, verbose, hex, json); DumpManifestHandler handler(cf_descs, this, io_tracer_, read_options, verbose,
hex, json);
{ {
VersionSet::LogReporter reporter; VersionSet::LogReporter reporter;
reporter.status = &s; reporter.status = &s;
@ -6372,6 +6402,7 @@ Status VersionSet::WriteCurrentStateToManifest(
// we avoid doing binary search for the keys b and c twice and instead somehow // we avoid doing binary search for the keys b and c twice and instead somehow
// maintain state of where they first appear in the files. // maintain state of where they first appear in the files.
uint64_t VersionSet::ApproximateSize(const SizeApproximationOptions& options, uint64_t VersionSet::ApproximateSize(const SizeApproximationOptions& options,
const ReadOptions& read_options,
Version* v, const Slice& start, Version* v, const Slice& start,
const Slice& end, int start_level, const Slice& end, int start_level,
int end_level, TableReaderCaller caller) { int end_level, TableReaderCaller caller) {
@ -6451,8 +6482,8 @@ uint64_t VersionSet::ApproximateSize(const SizeApproximationOptions& options,
for (int i = idx_start + 1; i < idx_end; ++i) { for (int i = idx_start + 1; i < idx_end; ++i) {
uint64_t file_size = files_brief.files[i].fd.GetFileSize(); uint64_t file_size = files_brief.files[i].fd.GetFileSize();
// The entire file falls into the range, so we can just take its size. // The entire file falls into the range, so we can just take its size.
assert(file_size == assert(file_size == ApproximateSize(read_options, v, files_brief.files[i],
ApproximateSize(v, files_brief.files[i], start, end, caller)); start, end, caller));
total_full_size += file_size; total_full_size += file_size;
} }
@ -6487,21 +6518,24 @@ uint64_t VersionSet::ApproximateSize(const SizeApproximationOptions& options,
// Estimate for all the first files (might also be last files), at each // Estimate for all the first files (might also be last files), at each
// level // level
for (const auto file_ptr : first_files) { for (const auto file_ptr : first_files) {
total_full_size += ApproximateSize(v, *file_ptr, start, end, caller); total_full_size +=
ApproximateSize(read_options, v, *file_ptr, start, end, caller);
} }
// Estimate for all the last files, at each level // Estimate for all the last files, at each level
for (const auto file_ptr : last_files) { for (const auto file_ptr : last_files) {
// We could use ApproximateSize here, but calling ApproximateOffsetOf // We could use ApproximateSize here, but calling ApproximateOffsetOf
// directly is just more efficient. // directly is just more efficient.
total_full_size += ApproximateOffsetOf(v, *file_ptr, end, caller); total_full_size +=
ApproximateOffsetOf(read_options, v, *file_ptr, end, caller);
} }
} }
return total_full_size; return total_full_size;
} }
uint64_t VersionSet::ApproximateOffsetOf(Version* v, const FdWithKeyRange& f, uint64_t VersionSet::ApproximateOffsetOf(const ReadOptions& read_options,
Version* v, const FdWithKeyRange& f,
const Slice& key, const Slice& key,
TableReaderCaller caller) { TableReaderCaller caller) {
// pre-condition // pre-condition
@ -6521,14 +6555,15 @@ uint64_t VersionSet::ApproximateOffsetOf(Version* v, const FdWithKeyRange& f,
TableCache* table_cache = v->cfd_->table_cache(); TableCache* table_cache = v->cfd_->table_cache();
if (table_cache != nullptr) { if (table_cache != nullptr) {
result = table_cache->ApproximateOffsetOf( result = table_cache->ApproximateOffsetOf(
key, *f.file_metadata, caller, icmp, read_options, key, *f.file_metadata, caller, icmp,
v->GetMutableCFOptions().prefix_extractor); v->GetMutableCFOptions().prefix_extractor);
} }
} }
return result; return result;
} }
uint64_t VersionSet::ApproximateSize(Version* v, const FdWithKeyRange& f, uint64_t VersionSet::ApproximateSize(const ReadOptions& read_options,
Version* v, const FdWithKeyRange& f,
const Slice& start, const Slice& end, const Slice& start, const Slice& end,
TableReaderCaller caller) { TableReaderCaller caller) {
// pre-condition // pre-condition
@ -6544,13 +6579,14 @@ uint64_t VersionSet::ApproximateSize(Version* v, const FdWithKeyRange& f,
if (icmp.Compare(f.smallest_key, start) >= 0) { if (icmp.Compare(f.smallest_key, start) >= 0) {
// Start of the range is before the file start - approximate by end offset // Start of the range is before the file start - approximate by end offset
return ApproximateOffsetOf(v, f, end, caller); return ApproximateOffsetOf(read_options, v, f, end, caller);
} }
if (icmp.Compare(f.largest_key, end) < 0) { if (icmp.Compare(f.largest_key, end) < 0) {
// End of the range is after the file end - approximate by subtracting // End of the range is after the file end - approximate by subtracting
// start offset from the file size // start offset from the file size
uint64_t start_offset = ApproximateOffsetOf(v, f, start, caller); uint64_t start_offset =
ApproximateOffsetOf(read_options, v, f, start, caller);
assert(f.fd.GetFileSize() >= start_offset); assert(f.fd.GetFileSize() >= start_offset);
return f.fd.GetFileSize() - start_offset; return f.fd.GetFileSize() - start_offset;
} }
@ -6561,7 +6597,7 @@ uint64_t VersionSet::ApproximateSize(Version* v, const FdWithKeyRange& f,
return 0; return 0;
} }
return table_cache->ApproximateSize( return table_cache->ApproximateSize(
start, end, *f.file_metadata, caller, icmp, read_options, start, end, *f.file_metadata, caller, icmp,
v->GetMutableCFOptions().prefix_extractor); v->GetMutableCFOptions().prefix_extractor);
} }
@ -6852,7 +6888,8 @@ void VersionSet::GetObsoleteFiles(std::vector<ObsoleteFileInfo>* files,
} }
ColumnFamilyData* VersionSet::CreateColumnFamily( ColumnFamilyData* VersionSet::CreateColumnFamily(
const ColumnFamilyOptions& cf_options, const VersionEdit* edit) { const ColumnFamilyOptions& cf_options, const ReadOptions& read_options,
const VersionEdit* edit) {
assert(edit->is_column_family_add_); assert(edit->is_column_family_add_);
MutableCFOptions dummy_cf_options; MutableCFOptions dummy_cf_options;
@ -6871,7 +6908,8 @@ ColumnFamilyData* VersionSet::CreateColumnFamily(
constexpr bool update_stats = false; constexpr bool update_stats = false;
v->PrepareAppend(*new_cfd->GetLatestMutableCFOptions(), update_stats); v->PrepareAppend(*new_cfd->GetLatestMutableCFOptions(), read_options,
update_stats);
AppendVersion(new_cfd, v); AppendVersion(new_cfd, v);
// GetLatestMutableCFOptions() is safe here without mutex since the // GetLatestMutableCFOptions() is safe here without mutex since the
@ -6936,7 +6974,8 @@ uint64_t VersionSet::GetTotalBlobFileSize(Version* dummy_versions) {
return all_versions_blob_file_size; return all_versions_blob_file_size;
} }
Status VersionSet::VerifyFileMetadata(ColumnFamilyData* cfd, Status VersionSet::VerifyFileMetadata(const ReadOptions& read_options,
ColumnFamilyData* cfd,
const std::string& fpath, int level, const std::string& fpath, int level,
const FileMetaData& meta) { const FileMetaData& meta) {
uint64_t fsize = 0; uint64_t fsize = 0;
@ -6969,7 +7008,7 @@ Status VersionSet::VerifyFileMetadata(ColumnFamilyData* cfd,
TableCache::TypedHandle* handle = nullptr; TableCache::TypedHandle* handle = nullptr;
FileMetaData meta_copy = meta; FileMetaData meta_copy = meta;
status = table_cache->FindTable( status = table_cache->FindTable(
ReadOptions(), file_opts, *icmp, meta_copy, &handle, pe, read_options, file_opts, *icmp, meta_copy, &handle, pe,
/*no_io=*/false, /*record_read_stats=*/true, /*no_io=*/false, /*record_read_stats=*/true,
internal_stats->GetFileReadHist(level), false, level, internal_stats->GetFileReadHist(level), false, level,
/*prefetch_index_and_filter_in_cache*/ false, max_sz_for_l0_meta_pin, /*prefetch_index_and_filter_in_cache*/ false, max_sz_for_l0_meta_pin,
@ -7013,9 +7052,9 @@ Status ReactiveVersionSet::Recover(
log::Reader* reader = manifest_reader->get(); log::Reader* reader = manifest_reader->get();
assert(reader); assert(reader);
manifest_tailer_.reset( manifest_tailer_.reset(new ManifestTailer(
new ManifestTailer(column_families, const_cast<ReactiveVersionSet*>(this), column_families, const_cast<ReactiveVersionSet*>(this), io_tracer_,
io_tracer_, EpochNumberRequirement::kMightMissing)); read_options_, EpochNumberRequirement::kMightMissing));
manifest_tailer_->Iterate(*reader, manifest_reader_status->get()); manifest_tailer_->Iterate(*reader, manifest_reader_status->get());

@ -913,7 +913,7 @@ class Version {
// populates derived data structures. Call without mutex held. It needs to be // populates derived data structures. Call without mutex held. It needs to be
// called before appending the version to the version set. // called before appending the version to the version set.
void PrepareAppend(const MutableCFOptions& mutable_cf_options, void PrepareAppend(const MutableCFOptions& mutable_cf_options,
bool update_stats); const ReadOptions& read_options, bool update_stats);
// Reference count management (so Versions do not disappear out from // Reference count management (so Versions do not disappear out from
// under live iterators) // under live iterators)
@ -943,7 +943,8 @@ class Version {
// specified in "file_meta". If the file name of "file_meta" is // specified in "file_meta". If the file name of "file_meta" is
// known ahead, passing it by a non-null "fname" can save a // known ahead, passing it by a non-null "fname" can save a
// file-name conversion. // file-name conversion.
Status GetTableProperties(std::shared_ptr<const TableProperties>* tp, Status GetTableProperties(const ReadOptions& read_options,
std::shared_ptr<const TableProperties>* tp,
const FileMetaData* file_meta, const FileMetaData* file_meta,
const std::string* fname = nullptr) const; const std::string* fname = nullptr) const;
@ -951,9 +952,12 @@ class Version {
// On success, *props will be populated with all SSTables' table properties. // On success, *props will be populated with all SSTables' table properties.
// The keys of `props` are the sst file name, the values of `props` are the // The keys of `props` are the sst file name, the values of `props` are the
// tables' properties, represented as std::shared_ptr. // tables' properties, represented as std::shared_ptr.
Status GetPropertiesOfAllTables(TablePropertiesCollection* props); Status GetPropertiesOfAllTables(const ReadOptions& read_options,
Status GetPropertiesOfAllTables(TablePropertiesCollection* props, int level); TablePropertiesCollection* props);
Status GetPropertiesOfTablesInRange(const Range* range, std::size_t n, Status GetPropertiesOfAllTables(const ReadOptions& read_options,
TablePropertiesCollection* props, int level);
Status GetPropertiesOfTablesInRange(const ReadOptions& read_options,
const Range* range, std::size_t n,
TablePropertiesCollection* props) const; TablePropertiesCollection* props) const;
// Print summary of range delete tombstones in SST files into out_str, // Print summary of range delete tombstones in SST files into out_str,
@ -965,13 +969,14 @@ class Version {
// On success, "tp" will contains the aggregated table property among // On success, "tp" will contains the aggregated table property among
// the table properties of all sst files in this version. // the table properties of all sst files in this version.
Status GetAggregatedTableProperties( Status GetAggregatedTableProperties(
const ReadOptions& read_options,
std::shared_ptr<const TableProperties>* tp, int level = -1); std::shared_ptr<const TableProperties>* tp, int level = -1);
uint64_t GetEstimatedActiveKeys() { uint64_t GetEstimatedActiveKeys() {
return storage_info_.GetEstimatedActiveKeys(); return storage_info_.GetEstimatedActiveKeys();
} }
size_t GetMemoryUsageByTableReaders(); size_t GetMemoryUsageByTableReaders(const ReadOptions& read_options);
ColumnFamilyData* cfd() const { return cfd_; } ColumnFamilyData* cfd() const { return cfd_; }
@ -1024,11 +1029,12 @@ class Version {
// The helper function of UpdateAccumulatedStats, which may fill the missing // The helper function of UpdateAccumulatedStats, which may fill the missing
// fields of file_meta from its associated TableProperties. // fields of file_meta from its associated TableProperties.
// Returns true if it does initialize FileMetaData. // Returns true if it does initialize FileMetaData.
bool MaybeInitializeFileMetaData(FileMetaData* file_meta); bool MaybeInitializeFileMetaData(const ReadOptions& read_options,
FileMetaData* file_meta);
// Update the accumulated stats associated with the current version. // Update the accumulated stats associated with the current version.
// This accumulated stats will be used in compaction. // This accumulated stats will be used in compaction.
void UpdateAccumulatedStats(); void UpdateAccumulatedStats(const ReadOptions& read_options);
DECLARE_SYNC_AND_ASYNC( DECLARE_SYNC_AND_ASYNC(
/* ret_type */ Status, /* func_name */ MultiGetFromSST, /* ret_type */ Status, /* func_name */ MultiGetFromSST,
@ -1136,13 +1142,13 @@ class VersionSet {
virtual ~VersionSet(); virtual ~VersionSet();
Status LogAndApplyToDefaultColumnFamily( Status LogAndApplyToDefaultColumnFamily(
VersionEdit* edit, InstrumentedMutex* mu, const ReadOptions& read_options, VersionEdit* edit, InstrumentedMutex* mu,
FSDirectory* dir_contains_current_file, bool new_descriptor_log = false, FSDirectory* dir_contains_current_file, bool new_descriptor_log = false,
const ColumnFamilyOptions* column_family_options = nullptr) { const ColumnFamilyOptions* column_family_options = nullptr) {
ColumnFamilyData* default_cf = GetColumnFamilySet()->GetDefault(); ColumnFamilyData* default_cf = GetColumnFamilySet()->GetDefault();
const MutableCFOptions* cf_options = const MutableCFOptions* cf_options =
default_cf->GetLatestMutableCFOptions(); default_cf->GetLatestMutableCFOptions();
return LogAndApply(default_cf, *cf_options, edit, mu, return LogAndApply(default_cf, *cf_options, read_options, edit, mu,
dir_contains_current_file, new_descriptor_log, dir_contains_current_file, new_descriptor_log,
column_family_options); column_family_options);
} }
@ -1155,9 +1161,9 @@ class VersionSet {
// REQUIRES: no other thread concurrently calls LogAndApply() // REQUIRES: no other thread concurrently calls LogAndApply()
Status LogAndApply( Status LogAndApply(
ColumnFamilyData* column_family_data, ColumnFamilyData* column_family_data,
const MutableCFOptions& mutable_cf_options, VersionEdit* edit, const MutableCFOptions& mutable_cf_options,
InstrumentedMutex* mu, FSDirectory* dir_contains_current_file, const ReadOptions& read_options, VersionEdit* edit, InstrumentedMutex* mu,
bool new_descriptor_log = false, FSDirectory* dir_contains_current_file, bool new_descriptor_log = false,
const ColumnFamilyOptions* column_family_options = nullptr) { const ColumnFamilyOptions* column_family_options = nullptr) {
autovector<ColumnFamilyData*> cfds; autovector<ColumnFamilyData*> cfds;
cfds.emplace_back(column_family_data); cfds.emplace_back(column_family_data);
@ -1167,8 +1173,8 @@ class VersionSet {
autovector<VersionEdit*> edit_list; autovector<VersionEdit*> edit_list;
edit_list.emplace_back(edit); edit_list.emplace_back(edit);
edit_lists.emplace_back(edit_list); edit_lists.emplace_back(edit_list);
return LogAndApply(cfds, mutable_cf_options_list, edit_lists, mu, return LogAndApply(cfds, mutable_cf_options_list, read_options, edit_lists,
dir_contains_current_file, new_descriptor_log, mu, dir_contains_current_file, new_descriptor_log,
column_family_options); column_family_options);
} }
// The batch version. If edit_list.size() > 1, caller must ensure that // The batch version. If edit_list.size() > 1, caller must ensure that
@ -1176,6 +1182,7 @@ class VersionSet {
Status LogAndApply( Status LogAndApply(
ColumnFamilyData* column_family_data, ColumnFamilyData* column_family_data,
const MutableCFOptions& mutable_cf_options, const MutableCFOptions& mutable_cf_options,
const ReadOptions& read_options,
const autovector<VersionEdit*>& edit_list, InstrumentedMutex* mu, const autovector<VersionEdit*>& edit_list, InstrumentedMutex* mu,
FSDirectory* dir_contains_current_file, bool new_descriptor_log = false, FSDirectory* dir_contains_current_file, bool new_descriptor_log = false,
const ColumnFamilyOptions* column_family_options = nullptr, const ColumnFamilyOptions* column_family_options = nullptr,
@ -1186,8 +1193,8 @@ class VersionSet {
mutable_cf_options_list.emplace_back(&mutable_cf_options); mutable_cf_options_list.emplace_back(&mutable_cf_options);
autovector<autovector<VersionEdit*>> edit_lists; autovector<autovector<VersionEdit*>> edit_lists;
edit_lists.emplace_back(edit_list); edit_lists.emplace_back(edit_list);
return LogAndApply(cfds, mutable_cf_options_list, edit_lists, mu, return LogAndApply(cfds, mutable_cf_options_list, read_options, edit_lists,
dir_contains_current_file, new_descriptor_log, mu, dir_contains_current_file, new_descriptor_log,
column_family_options, {manifest_wcb}); column_family_options, {manifest_wcb});
} }
@ -1197,6 +1204,7 @@ class VersionSet {
virtual Status LogAndApply( virtual Status LogAndApply(
const autovector<ColumnFamilyData*>& cfds, const autovector<ColumnFamilyData*>& cfds,
const autovector<const MutableCFOptions*>& mutable_cf_options_list, const autovector<const MutableCFOptions*>& mutable_cf_options_list,
const ReadOptions& read_options,
const autovector<autovector<VersionEdit*>>& edit_lists, const autovector<autovector<VersionEdit*>>& edit_lists,
InstrumentedMutex* mu, FSDirectory* dir_contains_current_file, InstrumentedMutex* mu, FSDirectory* dir_contains_current_file,
bool new_descriptor_log = false, bool new_descriptor_log = false,
@ -1427,7 +1435,8 @@ class VersionSet {
// Return the approximate size of data to be scanned for range [start, end) // Return the approximate size of data to be scanned for range [start, end)
// in levels [start_level, end_level). If end_level == -1 it will search // in levels [start_level, end_level). If end_level == -1 it will search
// through all non-empty levels // through all non-empty levels
uint64_t ApproximateSize(const SizeApproximationOptions& options, Version* v, uint64_t ApproximateSize(const SizeApproximationOptions& options,
const ReadOptions& read_options, Version* v,
const Slice& start, const Slice& end, const Slice& start, const Slice& end,
int start_level, int end_level, int start_level, int end_level,
TableReaderCaller caller); TableReaderCaller caller);
@ -1487,7 +1496,8 @@ class VersionSet {
new Version(cfd, this, file_options_, mutable_cf_options, io_tracer_); new Version(cfd, this, file_options_, mutable_cf_options, io_tracer_);
constexpr bool update_stats = false; constexpr bool update_stats = false;
version->PrepareAppend(mutable_cf_options, update_stats); const ReadOptions read_options;
version->PrepareAppend(mutable_cf_options, read_options, update_stats);
AppendVersion(cfd, version); AppendVersion(cfd, version);
} }
@ -1516,14 +1526,15 @@ class VersionSet {
void Reset(); void Reset();
// Returns approximated offset of a key in a file for a given version. // Returns approximated offset of a key in a file for a given version.
uint64_t ApproximateOffsetOf(Version* v, const FdWithKeyRange& f, uint64_t ApproximateOffsetOf(const ReadOptions& read_options, Version* v,
const Slice& key, TableReaderCaller caller); const FdWithKeyRange& f, const Slice& key,
TableReaderCaller caller);
// Returns approximated data size between start and end keys in a file // Returns approximated data size between start and end keys in a file
// for a given version. // for a given version.
uint64_t ApproximateSize(Version* v, const FdWithKeyRange& f, uint64_t ApproximateSize(const ReadOptions& read_options, Version* v,
const Slice& start, const Slice& end, const FdWithKeyRange& f, const Slice& start,
TableReaderCaller caller); const Slice& end, TableReaderCaller caller);
struct MutableCFState { struct MutableCFState {
uint64_t log_number; uint64_t log_number;
@ -1542,9 +1553,11 @@ class VersionSet {
void AppendVersion(ColumnFamilyData* column_family_data, Version* v); void AppendVersion(ColumnFamilyData* column_family_data, Version* v);
ColumnFamilyData* CreateColumnFamily(const ColumnFamilyOptions& cf_options, ColumnFamilyData* CreateColumnFamily(const ColumnFamilyOptions& cf_options,
const ReadOptions& read_options,
const VersionEdit* edit); const VersionEdit* edit);
Status VerifyFileMetadata(ColumnFamilyData* cfd, const std::string& fpath, Status VerifyFileMetadata(const ReadOptions& read_options,
ColumnFamilyData* cfd, const std::string& fpath,
int level, const FileMetaData& meta); int level, const FileMetaData& meta);
// Protected by DB mutex. // Protected by DB mutex.
@ -1620,7 +1633,8 @@ class VersionSet {
InstrumentedMutex* mu, InstrumentedMutex* mu,
FSDirectory* dir_contains_current_file, FSDirectory* dir_contains_current_file,
bool new_descriptor_log, bool new_descriptor_log,
const ColumnFamilyOptions* new_cf_options); const ColumnFamilyOptions* new_cf_options,
const ReadOptions& read_options);
void LogAndApplyCFHelper(VersionEdit* edit, void LogAndApplyCFHelper(VersionEdit* edit,
SequenceNumber* max_last_sequence); SequenceNumber* max_last_sequence);
@ -1672,13 +1686,15 @@ class ReactiveVersionSet : public VersionSet {
private: private:
std::unique_ptr<ManifestTailer> manifest_tailer_; std::unique_ptr<ManifestTailer> manifest_tailer_;
// TODO: plumb Env::IOActivity
const ReadOptions read_options_;
using VersionSet::LogAndApply; using VersionSet::LogAndApply;
using VersionSet::Recover; using VersionSet::Recover;
Status LogAndApply( Status LogAndApply(
const autovector<ColumnFamilyData*>& /*cfds*/, const autovector<ColumnFamilyData*>& /*cfds*/,
const autovector<const MutableCFOptions*>& /*mutable_cf_options_list*/, const autovector<const MutableCFOptions*>& /*mutable_cf_options_list*/,
const ReadOptions& /* read_options */,
const autovector<autovector<VersionEdit*>>& /*edit_lists*/, const autovector<autovector<VersionEdit*>>& /*edit_lists*/,
InstrumentedMutex* /*mu*/, FSDirectory* /*dir_contains_current_file*/, InstrumentedMutex* /*mu*/, FSDirectory* /*dir_contains_current_file*/,
bool /*new_descriptor_log*/, const ColumnFamilyOptions* /*new_cf_option*/, bool /*new_descriptor_log*/, const ColumnFamilyOptions* /*new_cf_option*/,

@ -1307,9 +1307,9 @@ class VersionSetTestBase {
Status LogAndApplyToDefaultCF(VersionEdit& edit) { Status LogAndApplyToDefaultCF(VersionEdit& edit) {
mutex_.Lock(); mutex_.Lock();
Status s = Status s = versions_->LogAndApply(
versions_->LogAndApply(versions_->GetColumnFamilySet()->GetDefault(), versions_->GetColumnFamilySet()->GetDefault(), mutable_cf_options_,
mutable_cf_options_, &edit, &mutex_, nullptr); read_options_, &edit, &mutex_, nullptr);
mutex_.Unlock(); mutex_.Unlock();
return s; return s;
} }
@ -1321,9 +1321,9 @@ class VersionSetTestBase {
vedits.push_back(e.get()); vedits.push_back(e.get());
} }
mutex_.Lock(); mutex_.Lock();
Status s = Status s = versions_->LogAndApply(
versions_->LogAndApply(versions_->GetColumnFamilySet()->GetDefault(), versions_->GetColumnFamilySet()->GetDefault(), mutable_cf_options_,
mutable_cf_options_, vedits, &mutex_, nullptr); read_options_, vedits, &mutex_, nullptr);
mutex_.Unlock(); mutex_.Unlock();
return s; return s;
} }
@ -1335,7 +1335,7 @@ class VersionSetTestBase {
VersionEdit dummy; VersionEdit dummy;
ASSERT_OK(versions_->LogAndApply( ASSERT_OK(versions_->LogAndApply(
versions_->GetColumnFamilySet()->GetDefault(), mutable_cf_options_, versions_->GetColumnFamilySet()->GetDefault(), mutable_cf_options_,
&dummy, &mutex_, db_directory, new_descriptor_log)); read_options_, &dummy, &mutex_, db_directory, new_descriptor_log));
mutex_.Unlock(); mutex_.Unlock();
} }
@ -1350,7 +1350,8 @@ class VersionSetTestBase {
Status s; Status s;
mutex_.Lock(); mutex_.Lock();
s = versions_->LogAndApply(/*column_family_data=*/nullptr, s = versions_->LogAndApply(/*column_family_data=*/nullptr,
MutableCFOptions(cf_options), &new_cf, &mutex_, MutableCFOptions(cf_options), read_options_,
&new_cf, &mutex_,
/*db_directory=*/nullptr, /*db_directory=*/nullptr,
/*new_descriptor_log=*/false, &cf_options); /*new_descriptor_log=*/false, &cf_options);
mutex_.Unlock(); mutex_.Unlock();
@ -1372,6 +1373,7 @@ class VersionSetTestBase {
ColumnFamilyOptions cf_options_; ColumnFamilyOptions cf_options_;
ImmutableOptions immutable_options_; ImmutableOptions immutable_options_;
MutableCFOptions mutable_cf_options_; MutableCFOptions mutable_cf_options_;
const ReadOptions read_options_;
std::shared_ptr<Cache> table_cache_; std::shared_ptr<Cache> table_cache_;
WriteController write_controller_; WriteController write_controller_;
WriteBufferManager write_buffer_manager_; WriteBufferManager write_buffer_manager_;
@ -1395,6 +1397,8 @@ class VersionSetTest : public VersionSetTestBase, public testing::Test {
TEST_F(VersionSetTest, SameColumnFamilyGroupCommit) { TEST_F(VersionSetTest, SameColumnFamilyGroupCommit) {
NewDB(); NewDB();
const int kGroupSize = 5; const int kGroupSize = 5;
const ReadOptions read_options;
autovector<VersionEdit> edits; autovector<VersionEdit> edits;
for (int i = 0; i != kGroupSize; ++i) { for (int i = 0; i != kGroupSize; ++i) {
edits.emplace_back(VersionEdit()); edits.emplace_back(VersionEdit());
@ -1421,8 +1425,8 @@ TEST_F(VersionSetTest, SameColumnFamilyGroupCommit) {
}); });
SyncPoint::GetInstance()->EnableProcessing(); SyncPoint::GetInstance()->EnableProcessing();
mutex_.Lock(); mutex_.Lock();
Status s = versions_->LogAndApply(cfds, all_mutable_cf_options, edit_lists, Status s = versions_->LogAndApply(cfds, all_mutable_cf_options, read_options,
&mutex_, nullptr); edit_lists, &mutex_, nullptr);
mutex_.Unlock(); mutex_.Unlock();
EXPECT_OK(s); EXPECT_OK(s);
EXPECT_EQ(kGroupSize - 1, count); EXPECT_EQ(kGroupSize - 1, count);
@ -1622,9 +1626,9 @@ TEST_F(VersionSetTest, ObsoleteBlobFile) {
edit.AddBlobFileGarbage(blob_file_number, total_blob_count, total_blob_bytes); edit.AddBlobFileGarbage(blob_file_number, total_blob_count, total_blob_bytes);
mutex_.Lock(); mutex_.Lock();
Status s = Status s = versions_->LogAndApply(
versions_->LogAndApply(versions_->GetColumnFamilySet()->GetDefault(), versions_->GetColumnFamilySet()->GetDefault(), mutable_cf_options_,
mutable_cf_options_, &edit, &mutex_, nullptr); read_options_, &edit, &mutex_, nullptr);
mutex_.Unlock(); mutex_.Unlock();
ASSERT_OK(s); ASSERT_OK(s);
@ -2242,7 +2246,7 @@ class VersionSetWithTimestampTest : public VersionSetTest {
Status s; Status s;
mutex_.Lock(); mutex_.Lock();
s = versions_->LogAndApply(cfd_, *(cfd_->GetLatestMutableCFOptions()), s = versions_->LogAndApply(cfd_, *(cfd_->GetLatestMutableCFOptions()),
edits_, &mutex_, nullptr); read_options_, edits_, &mutex_, nullptr);
mutex_.Unlock(); mutex_.Unlock();
ASSERT_OK(s); ASSERT_OK(s);
VerifyFullHistoryTsLow(*std::max_element(ts_lbs.begin(), ts_lbs.end())); VerifyFullHistoryTsLow(*std::max_element(ts_lbs.begin(), ts_lbs.end()));
@ -2252,6 +2256,9 @@ class VersionSetWithTimestampTest : public VersionSetTest {
ColumnFamilyData* cfd_{nullptr}; ColumnFamilyData* cfd_{nullptr};
// edits_ must contain and own pointers to heap-alloc VersionEdit objects. // edits_ must contain and own pointers to heap-alloc VersionEdit objects.
autovector<VersionEdit*> edits_; autovector<VersionEdit*> edits_;
private:
const ReadOptions read_options_;
}; };
const std::string VersionSetWithTimestampTest::kNewCfName("new_cf"); const std::string VersionSetWithTimestampTest::kNewCfName("new_cf");
@ -2680,6 +2687,8 @@ class VersionSetTestDropOneCF : public VersionSetTestBase,
// Repeat the test for i = 1, 2, 3 to simulate dropping the first, middle and // Repeat the test for i = 1, 2, 3 to simulate dropping the first, middle and
// last column family in an atomic group. // last column family in an atomic group.
TEST_P(VersionSetTestDropOneCF, HandleDroppedColumnFamilyInAtomicGroup) { TEST_P(VersionSetTestDropOneCF, HandleDroppedColumnFamilyInAtomicGroup) {
const ReadOptions read_options;
std::vector<ColumnFamilyDescriptor> column_families; std::vector<ColumnFamilyDescriptor> column_families;
SequenceNumber last_seqno; SequenceNumber last_seqno;
std::unique_ptr<log::Writer> log_writer; std::unique_ptr<log::Writer> log_writer;
@ -2709,7 +2718,7 @@ TEST_P(VersionSetTestDropOneCF, HandleDroppedColumnFamilyInAtomicGroup) {
mutex_.Lock(); mutex_.Lock();
s = versions_->LogAndApply(cfd_to_drop, s = versions_->LogAndApply(cfd_to_drop,
*cfd_to_drop->GetLatestMutableCFOptions(), *cfd_to_drop->GetLatestMutableCFOptions(),
&drop_cf_edit, &mutex_, nullptr); read_options, &drop_cf_edit, &mutex_, nullptr);
mutex_.Unlock(); mutex_.Unlock();
ASSERT_OK(s); ASSERT_OK(s);
@ -2758,8 +2767,8 @@ TEST_P(VersionSetTestDropOneCF, HandleDroppedColumnFamilyInAtomicGroup) {
}); });
SyncPoint::GetInstance()->EnableProcessing(); SyncPoint::GetInstance()->EnableProcessing();
mutex_.Lock(); mutex_.Lock();
s = versions_->LogAndApply(cfds, mutable_cf_options_list, edit_lists, &mutex_, s = versions_->LogAndApply(cfds, mutable_cf_options_list, read_options,
nullptr); edit_lists, &mutex_, nullptr);
mutex_.Unlock(); mutex_.Unlock();
ASSERT_OK(s); ASSERT_OK(s);
ASSERT_EQ(1, called); ASSERT_EQ(1, called);

@ -33,14 +33,15 @@ class OfflineManifestWriter {
/*no_error_if_files_missing*/ true); /*no_error_if_files_missing*/ true);
} }
Status LogAndApply(ColumnFamilyData* cfd, VersionEdit* edit, Status LogAndApply(const ReadOptions& read_options, ColumnFamilyData* cfd,
VersionEdit* edit,
FSDirectory* dir_contains_current_file) { FSDirectory* dir_contains_current_file) {
// Use `mutex` to imitate a locked DB mutex when calling `LogAndApply()`. // Use `mutex` to imitate a locked DB mutex when calling `LogAndApply()`.
InstrumentedMutex mutex; InstrumentedMutex mutex;
mutex.Lock(); mutex.Lock();
Status s = versions_.LogAndApply(cfd, *cfd->GetLatestMutableCFOptions(), Status s = versions_.LogAndApply(
edit, &mutex, dir_contains_current_file, cfd, *cfd->GetLatestMutableCFOptions(), read_options, edit, &mutex,
false /* new_descriptor_log */); dir_contains_current_file, false /* new_descriptor_log */);
mutex.Unlock(); mutex.Unlock();
return s; return s;
} }

@ -2036,6 +2036,7 @@ class MemTableInserter : public WriteBatch::Handler {
// key not found in memtable. Do sst get, update, add // key not found in memtable. Do sst get, update, add
SnapshotImpl read_from_snapshot; SnapshotImpl read_from_snapshot;
read_from_snapshot.number_ = sequence_; read_from_snapshot.number_ = sequence_;
// TODO: plumb Env::IOActivity
ReadOptions ropts; ReadOptions ropts;
// it's going to be overwritten for sure, so no point caching data block // it's going to be overwritten for sure, so no point caching data block
// containing the old version // containing the old version
@ -2480,6 +2481,7 @@ class MemTableInserter : public WriteBatch::Handler {
// operations in the same batch. // operations in the same batch.
SnapshotImpl read_from_snapshot; SnapshotImpl read_from_snapshot;
read_from_snapshot.number_ = sequence_; read_from_snapshot.number_ = sequence_;
// TODO: plumb Env::IOActivity
ReadOptions read_options; ReadOptions read_options;
read_options.snapshot = &read_from_snapshot; read_options.snapshot = &read_from_snapshot;

@ -311,6 +311,8 @@ DECLARE_int32(create_timestamped_snapshot_one_in);
DECLARE_bool(allow_data_in_errors); DECLARE_bool(allow_data_in_errors);
DECLARE_bool(enable_thread_tracking);
// Tiered storage // Tiered storage
DECLARE_bool(enable_tiered_storage); // set last_level_temperature DECLARE_bool(enable_tiered_storage); // set last_level_temperature
DECLARE_int64(preclude_last_level_data_seconds); DECLARE_int64(preclude_last_level_data_seconds);

@ -55,8 +55,7 @@ void ThreadBody(void* v) {
} }
} }
} }
bool RunStressTestImpl(SharedState* shared) {
bool RunStressTest(SharedState* shared) {
SystemClock* clock = db_stress_env->GetSystemClock().get(); SystemClock* clock = db_stress_env->GetSystemClock().get();
StressTest* stress = shared->GetStressTest(); StressTest* stress = shared->GetStressTest();
@ -207,5 +206,11 @@ bool RunStressTest(SharedState* shared) {
} }
return true; return true;
} }
bool RunStressTest(SharedState* shared) {
ThreadStatusUtil::RegisterThread(db_stress_env, ThreadStatus::USER);
bool result = RunStressTestImpl(shared);
ThreadStatusUtil::UnregisterThread();
return result;
}
} // namespace ROCKSDB_NAMESPACE } // namespace ROCKSDB_NAMESPACE
#endif // GFLAGS #endif // GFLAGS

@ -10,8 +10,30 @@
#ifdef GFLAGS #ifdef GFLAGS
#pragma once #pragma once
#include "db_stress_tool/db_stress_common.h" #include "db_stress_tool/db_stress_common.h"
#include "monitoring/thread_status_util.h"
namespace ROCKSDB_NAMESPACE { namespace ROCKSDB_NAMESPACE {
class DbStressRandomAccessFileWrapper : public FSRandomAccessFileOwnerWrapper {
public:
explicit DbStressRandomAccessFileWrapper(
std::unique_ptr<FSRandomAccessFile>&& target)
: FSRandomAccessFileOwnerWrapper(std::move(target)) {}
IOStatus Read(uint64_t offset, size_t n, const IOOptions& options,
Slice* result, char* scratch,
IODebugContext* dbg) const override {
#ifndef NDEBUG
const ThreadStatus::OperationType thread_op =
ThreadStatusUtil::GetThreadOperation();
Env::IOActivity io_activity =
ThreadStatusUtil::TEST_GetExpectedIOActivity(thread_op);
assert(io_activity == Env::IOActivity::kUnknown ||
io_activity == options.io_activity);
#endif
return target()->Read(offset, n, options, result, scratch, dbg);
}
};
class DbStressFSWrapper : public FileSystemWrapper { class DbStressFSWrapper : public FileSystemWrapper {
public: public:
explicit DbStressFSWrapper(const std::shared_ptr<FileSystem>& t) explicit DbStressFSWrapper(const std::shared_ptr<FileSystem>& t)
@ -19,6 +41,18 @@ class DbStressFSWrapper : public FileSystemWrapper {
static const char* kClassName() { return "DbStressFS"; } static const char* kClassName() { return "DbStressFS"; }
const char* Name() const override { return kClassName(); } const char* Name() const override { return kClassName(); }
IOStatus NewRandomAccessFile(const std::string& f,
const FileOptions& file_opts,
std::unique_ptr<FSRandomAccessFile>* r,
IODebugContext* dbg) override {
std::unique_ptr<FSRandomAccessFile> file;
IOStatus s = target()->NewRandomAccessFile(f, file_opts, &file, dbg);
if (s.ok()) {
r->reset(new DbStressRandomAccessFileWrapper(std::move(file)));
}
return s;
}
IOStatus DeleteFile(const std::string& f, const IOOptions& opts, IOStatus DeleteFile(const std::string& f, const IOOptions& opts,
IODebugContext* dbg) override { IODebugContext* dbg) override {
// We determine whether it is a manifest file by searching a strong, // We determine whether it is a manifest file by searching a strong,

@ -1051,6 +1051,11 @@ DEFINE_bool(allow_data_in_errors,
ROCKSDB_NAMESPACE::Options().allow_data_in_errors, ROCKSDB_NAMESPACE::Options().allow_data_in_errors,
"If true, allow logging data, e.g. key, value in LOG files."); "If true, allow logging data, e.g. key, value in LOG files.");
DEFINE_bool(enable_thread_tracking,
ROCKSDB_NAMESPACE::Options().enable_thread_tracking,
"If true, the status of the threads involved in this DB will be "
"tracked and available via GetThreadList() API.");
DEFINE_int32(verify_iterator_with_expected_state_one_in, 0, DEFINE_int32(verify_iterator_with_expected_state_one_in, 0,
"If non-zero, when TestIterate() is to be called, there is a " "If non-zero, when TestIterate() is to be called, there is a "
"1/verify_iterator_with_expected_state_one_in " "1/verify_iterator_with_expected_state_one_in "

@ -3207,6 +3207,8 @@ void InitializeOptionsFromFlags(
} }
options.allow_data_in_errors = FLAGS_allow_data_in_errors; options.allow_data_in_errors = FLAGS_allow_data_in_errors;
options.enable_thread_tracking = FLAGS_enable_thread_tracking;
} }
void InitializeOptionsGeneral( void InitializeOptionsGeneral(

@ -99,13 +99,6 @@ int db_stress_tool(int argc, char** argv) {
env_wrapper_guard = std::make_shared<CompositeEnvWrapper>( env_wrapper_guard = std::make_shared<CompositeEnvWrapper>(
raw_env, std::make_shared<DbStressFSWrapper>(raw_env->GetFileSystem())); raw_env, std::make_shared<DbStressFSWrapper>(raw_env->GetFileSystem()));
if (!env_opts && !FLAGS_use_io_uring) {
// If using the default Env (Posix), wrap DbStressEnvWrapper with the
// legacy EnvWrapper. This is a workaround to prevent MultiGet and scans
// from failing when IO uring is disabled. The EnvWrapper
// has a default implementation of ReadAsync that redirects to Read.
env_wrapper_guard = std::make_shared<EnvWrapper>(env_wrapper_guard);
}
db_stress_env = env_wrapper_guard.get(); db_stress_env = env_wrapper_guard.get();
FLAGS_rep_factory = StringToRepFactory(FLAGS_memtablerep.c_str()); FLAGS_rep_factory = StringToRepFactory(FLAGS_memtablerep.c_str());

@ -1220,7 +1220,8 @@ void MultiOpsTxnsStressTest::VerifyDb(ThreadState* thread) const {
// which can be called before TransactionDB::Open() returns to caller. // which can be called before TransactionDB::Open() returns to caller.
// Therefore, at that time, db_ and txn_db_ may still be nullptr. // Therefore, at that time, db_ and txn_db_ may still be nullptr.
// Caller has to make sure that the race condition does not happen. // Caller has to make sure that the race condition does not happen.
void MultiOpsTxnsStressTest::VerifyPkSkFast(int job_id) { void MultiOpsTxnsStressTest::VerifyPkSkFast(const ReadOptions& read_options,
int job_id) {
DB* const db = db_aptr_.load(std::memory_order_acquire); DB* const db = db_aptr_.load(std::memory_order_acquire);
if (db == nullptr) { if (db == nullptr) {
return; return;
@ -1249,6 +1250,7 @@ void MultiOpsTxnsStressTest::VerifyPkSkFast(int job_id) {
ReadOptions ropts; ReadOptions ropts;
ropts.snapshot = snapshot; ropts.snapshot = snapshot;
ropts.total_order_seek = true; ropts.total_order_seek = true;
ropts.io_activity = read_options.io_activity;
std::unique_ptr<Iterator> it(db_->NewIterator(ropts)); std::unique_ptr<Iterator> it(db_->NewIterator(ropts));
for (it->Seek(start_key); it->Valid(); it->Next()) { for (it->Seek(start_key); it->Valid(); it->Next()) {

@ -288,7 +288,7 @@ class MultiOpsTxnsStressTest : public StressTest {
VerifyDb(thread); VerifyDb(thread);
} }
void VerifyPkSkFast(int job_id); void VerifyPkSkFast(const ReadOptions& read_options, int job_id);
protected: protected:
class Counter { class Counter {
@ -424,7 +424,8 @@ class MultiOpsTxnsStressListener : public EventListener {
(void)db; (void)db;
#endif #endif
assert(info.cf_id == 0); assert(info.cf_id == 0);
stress_test_->VerifyPkSkFast(info.job_id); const ReadOptions read_options(Env::IOActivity::kFlush);
stress_test_->VerifyPkSkFast(read_options, info.job_id);
} }
void OnCompactionCompleted(DB* db, const CompactionJobInfo& info) override { void OnCompactionCompleted(DB* db, const CompactionJobInfo& info) override {
@ -433,7 +434,8 @@ class MultiOpsTxnsStressListener : public EventListener {
(void)db; (void)db;
#endif #endif
assert(info.cf_id == 0); assert(info.cf_id == 0);
stress_test_->VerifyPkSkFast(info.job_id); const ReadOptions read_options(Env::IOActivity::kCompaction);
stress_test_->VerifyPkSkFast(read_options, info.job_id);
} }
private: private:

@ -185,9 +185,9 @@ IOStatus GenerateOneFileChecksum(
if (!io_s.ok()) { if (!io_s.ok()) {
return io_s; return io_s;
} }
reader.reset(new RandomAccessFileReader(std::move(r_file), file_path, reader.reset(new RandomAccessFileReader(
nullptr /*Env*/, io_tracer, nullptr, std::move(r_file), file_path, nullptr /*Env*/, io_tracer, nullptr,
0, nullptr, rate_limiter)); Histograms::HISTOGRAM_ENUM_MAX, nullptr, rate_limiter));
} }
// Found that 256 KB readahead size provides the best performance, based on // Found that 256 KB readahead size provides the best performance, based on

@ -80,6 +80,8 @@ inline IOStatus PrepareIOFromReadOptions(const ReadOptions& ro,
} }
opts.rate_limiter_priority = ro.rate_limiter_priority; opts.rate_limiter_priority = ro.rate_limiter_priority;
opts.io_activity = ro.io_activity;
return IOStatus::OK(); return IOStatus::OK();
} }

@ -22,7 +22,11 @@
#include "util/rate_limiter.h" #include "util/rate_limiter.h"
namespace ROCKSDB_NAMESPACE { namespace ROCKSDB_NAMESPACE {
const std::array<Histograms, std::size_t(Env::IOActivity::kUnknown)>
kReadHistograms{{
FILE_READ_FLUSH_MICROS,
FILE_READ_COMPACTION_MICROS,
}};
inline void RecordIOStats(Statistics* stats, Temperature file_temperature, inline void RecordIOStats(Statistics* stats, Temperature file_temperature,
bool is_last_level, size_t size) { bool is_last_level, size_t size) {
IOSTATS_ADD(bytes_read, size); IOSTATS_ADD(bytes_read, size);
@ -94,6 +98,9 @@ IOStatus RandomAccessFileReader::Read(
uint64_t elapsed = 0; uint64_t elapsed = 0;
{ {
StopWatch sw(clock_, stats_, hist_type_, StopWatch sw(clock_, stats_, hist_type_,
(opts.io_activity != Env::IOActivity::kUnknown)
? kReadHistograms[(std::size_t)(opts.io_activity)]
: Histograms::HISTOGRAM_ENUM_MAX,
(stats_ != nullptr) ? &elapsed : nullptr, true /*overwrite*/, (stats_ != nullptr) ? &elapsed : nullptr, true /*overwrite*/,
true /*delay_enabled*/); true /*delay_enabled*/);
auto prev_perf_level = GetPerfLevel(); auto prev_perf_level = GetPerfLevel();
@ -288,6 +295,9 @@ IOStatus RandomAccessFileReader::MultiRead(
uint64_t elapsed = 0; uint64_t elapsed = 0;
{ {
StopWatch sw(clock_, stats_, hist_type_, StopWatch sw(clock_, stats_, hist_type_,
(opts.io_activity != Env::IOActivity::kUnknown)
? kReadHistograms[(std::size_t)(opts.io_activity)]
: Histograms::HISTOGRAM_ENUM_MAX,
(stats_ != nullptr) ? &elapsed : nullptr, true /*overwrite*/, (stats_ != nullptr) ? &elapsed : nullptr, true /*overwrite*/,
true /*delay_enabled*/); true /*delay_enabled*/);
auto prev_perf_level = GetPerfLevel(); auto prev_perf_level = GetPerfLevel();
@ -425,7 +435,7 @@ IOStatus RandomAccessFileReader::MultiRead(
} }
IOStatus RandomAccessFileReader::PrepareIOOptions(const ReadOptions& ro, IOStatus RandomAccessFileReader::PrepareIOOptions(const ReadOptions& ro,
IOOptions& opts) { IOOptions& opts) const {
if (clock_ != nullptr) { if (clock_ != nullptr) {
return PrepareIOFromReadOptions(ro, clock_, opts); return PrepareIOFromReadOptions(ro, clock_, opts);
} else { } else {
@ -476,13 +486,17 @@ IOStatus RandomAccessFileReader::ReadAsync(
assert(read_async_info->buf_.CurrentSize() == 0); assert(read_async_info->buf_.CurrentSize() == 0);
StopWatch sw(clock_, nullptr /*stats*/, 0 /*hist_type*/, &elapsed, StopWatch sw(clock_, nullptr /*stats*/,
true /*overwrite*/, true /*delay_enabled*/); Histograms::HISTOGRAM_ENUM_MAX /*hist_type*/,
Histograms::HISTOGRAM_ENUM_MAX, &elapsed, true /*overwrite*/,
true /*delay_enabled*/);
s = file_->ReadAsync(aligned_req, opts, read_async_callback, s = file_->ReadAsync(aligned_req, opts, read_async_callback,
read_async_info, io_handle, del_fn, nullptr /*dbg*/); read_async_info, io_handle, del_fn, nullptr /*dbg*/);
} else { } else {
StopWatch sw(clock_, nullptr /*stats*/, 0 /*hist_type*/, &elapsed, StopWatch sw(clock_, nullptr /*stats*/,
true /*overwrite*/, true /*delay_enabled*/); Histograms::HISTOGRAM_ENUM_MAX /*hist_type*/,
Histograms::HISTOGRAM_ENUM_MAX, &elapsed, true /*overwrite*/,
true /*delay_enabled*/);
s = file_->ReadAsync(req, opts, read_async_callback, read_async_info, s = file_->ReadAsync(req, opts, read_async_callback, read_async_info,
io_handle, del_fn, nullptr /*dbg*/); io_handle, del_fn, nullptr /*dbg*/);
} }

@ -122,7 +122,8 @@ class RandomAccessFileReader {
std::unique_ptr<FSRandomAccessFile>&& raf, const std::string& _file_name, std::unique_ptr<FSRandomAccessFile>&& raf, const std::string& _file_name,
SystemClock* clock = nullptr, SystemClock* clock = nullptr,
const std::shared_ptr<IOTracer>& io_tracer = nullptr, const std::shared_ptr<IOTracer>& io_tracer = nullptr,
Statistics* stats = nullptr, uint32_t hist_type = 0, Statistics* stats = nullptr,
uint32_t hist_type = Histograms::HISTOGRAM_ENUM_MAX,
HistogramImpl* file_read_hist = nullptr, HistogramImpl* file_read_hist = nullptr,
RateLimiter* rate_limiter = nullptr, RateLimiter* rate_limiter = nullptr,
const std::vector<std::shared_ptr<EventListener>>& listeners = {}, const std::vector<std::shared_ptr<EventListener>>& listeners = {},
@ -197,7 +198,7 @@ class RandomAccessFileReader {
bool use_direct_io() const { return file_->use_direct_io(); } bool use_direct_io() const { return file_->use_direct_io(); }
IOStatus PrepareIOOptions(const ReadOptions& ro, IOOptions& opts); IOStatus PrepareIOOptions(const ReadOptions& ro, IOOptions& opts) const;
IOStatus ReadAsync(FSReadRequest& req, const IOOptions& opts, IOStatus ReadAsync(FSReadRequest& req, const IOOptions& opts,
std::function<void(const FSReadRequest&, void*)> cb, std::function<void(const FSReadRequest&, void*)> cb,

@ -436,6 +436,14 @@ class Env : public Customizable {
IO_TOTAL = 4 IO_TOTAL = 4
}; };
// EXPERIMENTAL
enum class IOActivity : uint8_t {
kFlush = 0,
kCompaction = 1,
kDBOpen = 2,
kUnknown,
};
// Arrange to run "(*function)(arg)" once in a background thread, in // Arrange to run "(*function)(arg)" once in a background thread, in
// the thread pool specified by pri. By default, jobs go to the 'LOW' // the thread pool specified by pri. By default, jobs go to the 'LOW'
// priority thread pool. // priority thread pool.

@ -116,6 +116,8 @@ struct IOOptions {
// directories and list only files in GetChildren API. // directories and list only files in GetChildren API.
bool do_not_recurse; bool do_not_recurse;
Env::IOActivity io_activity = Env::IOActivity::kUnknown;
IOOptions() : IOOptions(false) {} IOOptions() : IOOptions(false) {}
explicit IOOptions(bool force_dir_fsync_) explicit IOOptions(bool force_dir_fsync_)

@ -1696,8 +1696,11 @@ struct ReadOptions {
// Default: true // Default: true
bool optimize_multiget_for_io; bool optimize_multiget_for_io;
Env::IOActivity io_activity;
ReadOptions(); ReadOptions();
ReadOptions(bool cksum, bool cache); ReadOptions(bool cksum, bool cache);
explicit ReadOptions(Env::IOActivity io_activity);
}; };
// Options that control write operations // Options that control write operations

@ -472,7 +472,13 @@ enum Histograms : uint32_t {
NUM_FILES_IN_SINGLE_COMPACTION, NUM_FILES_IN_SINGLE_COMPACTION,
DB_SEEK, DB_SEEK,
WRITE_STALL, WRITE_STALL,
// Time spent in reading block-based or plain SST table
SST_READ_MICROS, SST_READ_MICROS,
// Time spent in reading SST table (currently only block-based table) or blob
// file for flush or compaction
FILE_READ_FLUSH_MICROS,
FILE_READ_COMPACTION_MICROS,
// The number of subcompactions actually scheduled during a compaction // The number of subcompactions actually scheduled during a compaction
NUM_SUBCOMPACTIONS_SCHEDULED, NUM_SUBCOMPACTIONS_SCHEDULED,
// Value size distribution in each operation // Value size distribution in each operation

@ -56,6 +56,7 @@ struct ThreadStatus {
OP_UNKNOWN = 0, OP_UNKNOWN = 0,
OP_COMPACTION, OP_COMPACTION,
OP_FLUSH, OP_FLUSH,
OP_DBOPEN,
NUM_OP_TYPES NUM_OP_TYPES
}; };

@ -681,4 +681,3 @@ class Transaction {
}; };
} // namespace ROCKSDB_NAMESPACE } // namespace ROCKSDB_NAMESPACE

@ -5619,6 +5619,10 @@ class HistogramTypeJni {
return 0x38; return 0x38;
case ROCKSDB_NAMESPACE::Histograms::TABLE_OPEN_PREFETCH_TAIL_READ_BYTES: case ROCKSDB_NAMESPACE::Histograms::TABLE_OPEN_PREFETCH_TAIL_READ_BYTES:
return 0x39; return 0x39;
case ROCKSDB_NAMESPACE::Histograms::FILE_READ_FLUSH_MICROS:
return 0x3A;
case ROCKSDB_NAMESPACE::Histograms::FILE_READ_COMPACTION_MICROS:
return 0x3B;
case ROCKSDB_NAMESPACE::Histograms::HISTOGRAM_ENUM_MAX: case ROCKSDB_NAMESPACE::Histograms::HISTOGRAM_ENUM_MAX:
// 0x1F for backwards compatibility on current minor version. // 0x1F for backwards compatibility on current minor version.
return 0x1F; return 0x1F;
@ -5738,6 +5742,10 @@ class HistogramTypeJni {
case 0x39: case 0x39:
return ROCKSDB_NAMESPACE::Histograms:: return ROCKSDB_NAMESPACE::Histograms::
TABLE_OPEN_PREFETCH_TAIL_READ_BYTES; TABLE_OPEN_PREFETCH_TAIL_READ_BYTES;
case 0x3A:
return ROCKSDB_NAMESPACE::Histograms::FILE_READ_FLUSH_MICROS;
case 0x3B:
return ROCKSDB_NAMESPACE::Histograms::FILE_READ_COMPACTION_MICROS;
case 0x1F: case 0x1F:
// 0x1F for backwards compatibility on current minor version. // 0x1F for backwards compatibility on current minor version.
return ROCKSDB_NAMESPACE::Histograms::HISTOGRAM_ENUM_MAX; return ROCKSDB_NAMESPACE::Histograms::HISTOGRAM_ENUM_MAX;
@ -6777,6 +6785,8 @@ class OperationTypeJni {
return 0x1; return 0x1;
case ROCKSDB_NAMESPACE::ThreadStatus::OperationType::OP_FLUSH: case ROCKSDB_NAMESPACE::ThreadStatus::OperationType::OP_FLUSH:
return 0x2; return 0x2;
case ROCKSDB_NAMESPACE::ThreadStatus::OperationType::OP_DBOPEN:
return 0x3;
default: default:
return 0x7F; // undefined return 0x7F; // undefined
} }
@ -6793,6 +6803,8 @@ class OperationTypeJni {
return ROCKSDB_NAMESPACE::ThreadStatus::OperationType::OP_COMPACTION; return ROCKSDB_NAMESPACE::ThreadStatus::OperationType::OP_COMPACTION;
case 0x2: case 0x2:
return ROCKSDB_NAMESPACE::ThreadStatus::OperationType::OP_FLUSH; return ROCKSDB_NAMESPACE::ThreadStatus::OperationType::OP_FLUSH;
case 0x3:
return ROCKSDB_NAMESPACE::ThreadStatus::OperationType::OP_DBOPEN;
default: default:
// undefined/default // undefined/default
return ROCKSDB_NAMESPACE::ThreadStatus::OperationType::OP_UNKNOWN; return ROCKSDB_NAMESPACE::ThreadStatus::OperationType::OP_UNKNOWN;

@ -169,6 +169,10 @@ public enum HistogramType {
*/ */
TABLE_OPEN_PREFETCH_TAIL_READ_BYTES((byte) 0x39), TABLE_OPEN_PREFETCH_TAIL_READ_BYTES((byte) 0x39),
FILE_READ_FLUSH_MICROS((byte) 0x3A),
FILE_READ_COMPACTION_MICROS((byte) 0x3B),
// 0x1F for backwards compatibility on current minor version. // 0x1F for backwards compatibility on current minor version.
HISTOGRAM_ENUM_MAX((byte) 0x1F); HISTOGRAM_ENUM_MAX((byte) 0x1F);

@ -14,7 +14,8 @@ package org.rocksdb;
public enum OperationType { public enum OperationType {
OP_UNKNOWN((byte)0x0), OP_UNKNOWN((byte)0x0),
OP_COMPACTION((byte)0x1), OP_COMPACTION((byte)0x1),
OP_FLUSH((byte)0x2); OP_FLUSH((byte) 0x2),
OP_DBOPEN((byte) 0x3);
private final byte value; private final byte value;

@ -1548,7 +1548,8 @@ static void RandomAccessFileReaderRead(benchmark::State& state) {
: Temperature::kCold; : Temperature::kCold;
readers.emplace_back(new RandomAccessFileReader( readers.emplace_back(new RandomAccessFileReader(
std::move(f), fname, env->GetSystemClock().get(), nullptr, statistics, std::move(f), fname, env->GetSystemClock().get(), nullptr, statistics,
0, nullptr, nullptr, {}, temperature, rand_num == 1)); Histograms::HISTOGRAM_ENUM_MAX, nullptr, nullptr, {}, temperature,
rand_num == 1));
} }
IOOptions io_options; IOOptions io_options;

@ -243,6 +243,8 @@ const std::vector<std::pair<Histograms, std::string>> HistogramsNameMap = {
{DB_SEEK, "rocksdb.db.seek.micros"}, {DB_SEEK, "rocksdb.db.seek.micros"},
{WRITE_STALL, "rocksdb.db.write.stall"}, {WRITE_STALL, "rocksdb.db.write.stall"},
{SST_READ_MICROS, "rocksdb.sst.read.micros"}, {SST_READ_MICROS, "rocksdb.sst.read.micros"},
{FILE_READ_FLUSH_MICROS, "rocksdb.file.read.flush.micros"},
{FILE_READ_COMPACTION_MICROS, "rocksdb.file.read.compaction.micros"},
{NUM_SUBCOMPACTIONS_SCHEDULED, "rocksdb.num.subcompactions.scheduled"}, {NUM_SUBCOMPACTIONS_SCHEDULED, "rocksdb.num.subcompactions.scheduled"},
{BYTES_PER_READ, "rocksdb.bytes.per.read"}, {BYTES_PER_READ, "rocksdb.bytes.per.read"},
{BYTES_PER_WRITE, "rocksdb.bytes.per.write"}, {BYTES_PER_WRITE, "rocksdb.bytes.per.write"},

@ -47,15 +47,19 @@ void ThreadStatusUpdater::ResetThreadStatus() {
SetColumnFamilyInfoKey(nullptr); SetColumnFamilyInfoKey(nullptr);
} }
void ThreadStatusUpdater::SetEnableTracking(bool enable_tracking) {
auto* data = Get();
if (data == nullptr) {
return;
}
data->enable_tracking.store(enable_tracking, std::memory_order_relaxed);
}
void ThreadStatusUpdater::SetColumnFamilyInfoKey(const void* cf_key) { void ThreadStatusUpdater::SetColumnFamilyInfoKey(const void* cf_key) {
auto* data = Get(); auto* data = Get();
if (data == nullptr) { if (data == nullptr) {
return; return;
} }
// set the tracking flag based on whether cf_key is non-null or not.
// If enable_thread_tracking is set to false, the input cf_key
// would be nullptr.
data->enable_tracking = (cf_key != nullptr);
data->cf_key.store(const_cast<void*>(cf_key), std::memory_order_relaxed); data->cf_key.store(const_cast<void*>(cf_key), std::memory_order_relaxed);
} }
@ -86,6 +90,14 @@ void ThreadStatusUpdater::SetThreadOperation(
} }
} }
ThreadStatus::OperationType ThreadStatusUpdater::GetThreadOperation() {
ThreadStatusData* data = GetLocalThreadStatus();
if (data == nullptr) {
return ThreadStatus::OperationType::OP_UNKNOWN;
}
return data->operation_type.load(std::memory_order_relaxed);
}
void ThreadStatusUpdater::SetThreadOperationProperty(int i, uint64_t value) { void ThreadStatusUpdater::SetThreadOperationProperty(int i, uint64_t value) {
auto* data = GetLocalThreadStatus(); auto* data = GetLocalThreadStatus();
if (data == nullptr) { if (data == nullptr) {
@ -211,9 +223,7 @@ ThreadStatusData* ThreadStatusUpdater::GetLocalThreadStatus() {
if (thread_status_data_ == nullptr) { if (thread_status_data_ == nullptr) {
return nullptr; return nullptr;
} }
if (!thread_status_data_->enable_tracking) { if (!thread_status_data_->enable_tracking.load(std::memory_order_relaxed)) {
assert(thread_status_data_->cf_key.load(std::memory_order_relaxed) ==
nullptr);
return nullptr; return nullptr;
} }
return thread_status_data_; return thread_status_data_;

@ -62,7 +62,8 @@ struct ConstantColumnFamilyInfo {
// status of a thread using a set of atomic pointers. // status of a thread using a set of atomic pointers.
struct ThreadStatusData { struct ThreadStatusData {
#ifdef ROCKSDB_USING_THREAD_STATUS #ifdef ROCKSDB_USING_THREAD_STATUS
explicit ThreadStatusData() : enable_tracking(false) { explicit ThreadStatusData() {
enable_tracking.store(false);
thread_id.store(0); thread_id.store(0);
thread_type.store(ThreadStatus::USER); thread_type.store(ThreadStatus::USER);
cf_key.store(nullptr); cf_key.store(nullptr);
@ -72,13 +73,10 @@ struct ThreadStatusData {
} }
// A flag to indicate whether the thread tracking is enabled // A flag to indicate whether the thread tracking is enabled
// in the current thread. This value will be updated based on whether // in the current thread.
// the associated Options::enable_thread_tracking is set to true
// in ThreadStatusUtil::SetColumnFamily().
//
// If set to false, then SetThreadOperation and SetThreadState // If set to false, then SetThreadOperation and SetThreadState
// will be no-op. // will be no-op.
bool enable_tracking; std::atomic<bool> enable_tracking;
std::atomic<uint64_t> thread_id; std::atomic<uint64_t> thread_id;
std::atomic<ThreadStatus::ThreadType> thread_type; std::atomic<ThreadStatus::ThreadType> thread_type;
@ -119,8 +117,10 @@ class ThreadStatusUpdater {
// Register the current thread for tracking. // Register the current thread for tracking.
void RegisterThread(ThreadStatus::ThreadType ttype, uint64_t thread_id); void RegisterThread(ThreadStatus::ThreadType ttype, uint64_t thread_id);
void SetEnableTracking(bool enable_tracking);
// Update the column-family info of the current thread by setting // Update the column-family info of the current thread by setting
// its thread-local pointer of ThreadStateInfo to the correct entry. // its thread-local pointer of ThreadStatusData to the correct entry.
void SetColumnFamilyInfoKey(const void* cf_key); void SetColumnFamilyInfoKey(const void* cf_key);
// returns the column family info key. // returns the column family info key.
@ -129,6 +129,9 @@ class ThreadStatusUpdater {
// Update the thread operation of the current thread. // Update the thread operation of the current thread.
void SetThreadOperation(const ThreadStatus::OperationType type); void SetThreadOperation(const ThreadStatus::OperationType type);
// Return the thread operation of the current thread.
ThreadStatus::OperationType GetThreadOperation();
// The start time of the current thread operation. It is in the format // The start time of the current thread operation. It is in the format
// of micro-seconds since some fixed point in time. // of micro-seconds since some fixed point in time.
void SetOperationStartTime(const uint64_t start_time); void SetOperationStartTime(const uint64_t start_time);

@ -33,27 +33,23 @@ void ThreadStatusUtil::UnregisterThread() {
} }
} }
void ThreadStatusUtil::SetColumnFamily(const ColumnFamilyData* cfd, void ThreadStatusUtil::SetEnableTracking(bool enable_tracking) {
const Env* env, if (thread_updater_local_cache_ == nullptr) {
bool enable_thread_tracking) {
if (!MaybeInitThreadLocalUpdater(env)) {
return; return;
} }
assert(thread_updater_local_cache_); thread_updater_local_cache_->SetEnableTracking(enable_tracking);
if (cfd != nullptr && enable_thread_tracking) { }
thread_updater_local_cache_->SetColumnFamilyInfoKey(cfd);
} else { void ThreadStatusUtil::SetColumnFamily(const ColumnFamilyData* cfd) {
// When cfd == nullptr or enable_thread_tracking == false, we set if (thread_updater_local_cache_ == nullptr) {
// ColumnFamilyInfoKey to nullptr, which makes SetThreadOperation return;
// and SetThreadState become no-op.
thread_updater_local_cache_->SetColumnFamilyInfoKey(nullptr);
} }
assert(cfd);
thread_updater_local_cache_->SetColumnFamilyInfoKey(cfd);
} }
void ThreadStatusUtil::SetThreadOperation(ThreadStatus::OperationType op) { void ThreadStatusUtil::SetThreadOperation(ThreadStatus::OperationType op) {
if (thread_updater_local_cache_ == nullptr) { if (thread_updater_local_cache_ == nullptr) {
// thread_updater_local_cache_ must be set in SetColumnFamily
// or other ThreadStatusUtil functions.
return; return;
} }
@ -68,6 +64,13 @@ void ThreadStatusUtil::SetThreadOperation(ThreadStatus::OperationType op) {
thread_updater_local_cache_->SetThreadOperation(op); thread_updater_local_cache_->SetThreadOperation(op);
} }
ThreadStatus::OperationType ThreadStatusUtil::GetThreadOperation() {
if (thread_updater_local_cache_ == nullptr) {
return ThreadStatus::OperationType::OP_UNKNOWN;
}
return thread_updater_local_cache_->GetThreadOperation();
}
ThreadStatus::OperationStage ThreadStatusUtil::SetThreadOperationStage( ThreadStatus::OperationStage ThreadStatusUtil::SetThreadOperationStage(
ThreadStatus::OperationStage stage) { ThreadStatus::OperationStage stage) {
if (thread_updater_local_cache_ == nullptr) { if (thread_updater_local_cache_ == nullptr) {
@ -172,9 +175,7 @@ bool ThreadStatusUtil::MaybeInitThreadLocalUpdater(const Env* /*env*/) {
return false; return false;
} }
void ThreadStatusUtil::SetColumnFamily(const ColumnFamilyData* /*cfd*/, void ThreadStatusUtil::SetColumnFamily(const ColumnFamilyData* /*cfd*/) {}
const Env* /*env*/,
bool /*enable_thread_tracking*/) {}
void ThreadStatusUtil::SetThreadOperation(ThreadStatus::OperationType /*op*/) {} void ThreadStatusUtil::SetThreadOperation(ThreadStatus::OperationType /*op*/) {}
@ -189,7 +190,7 @@ void ThreadStatusUtil::SetThreadState(ThreadStatus::StateType /*state*/) {}
void ThreadStatusUtil::NewColumnFamilyInfo(const DB* /*db*/, void ThreadStatusUtil::NewColumnFamilyInfo(const DB* /*db*/,
const ColumnFamilyData* /*cfd*/, const ColumnFamilyData* /*cfd*/,
const std::string& /*cf_name*/, const std::string& /*cf_name*/,
const Env* /*env*/) {} const Env* env) {}
void ThreadStatusUtil::EraseColumnFamilyInfo(const ColumnFamilyData* /*cfd*/) {} void ThreadStatusUtil::EraseColumnFamilyInfo(const ColumnFamilyData* /*cfd*/) {}

@ -52,13 +52,18 @@ class ThreadStatusUtil {
// the current thread does not hold db_mutex. // the current thread does not hold db_mutex.
static void EraseDatabaseInfo(const DB* db); static void EraseDatabaseInfo(const DB* db);
static void SetEnableTracking(bool enable_tracking);
// Update the thread status to indicate the current thread is doing // Update the thread status to indicate the current thread is doing
// something related to the specified column family. // something related to the specified column family.
static void SetColumnFamily(const ColumnFamilyData* cfd, const Env* env, //
bool enable_thread_tracking); // REQUIRES: cfd != nullptr
static void SetColumnFamily(const ColumnFamilyData* cfd);
static void SetThreadOperation(ThreadStatus::OperationType type); static void SetThreadOperation(ThreadStatus::OperationType type);
static ThreadStatus::OperationType GetThreadOperation();
static ThreadStatus::OperationStage SetThreadOperationStage( static ThreadStatus::OperationStage SetThreadOperationStage(
ThreadStatus::OperationStage stage); ThreadStatus::OperationStage stage);
@ -74,6 +79,9 @@ class ThreadStatusUtil {
static void TEST_SetStateDelay(const ThreadStatus::StateType state, static void TEST_SetStateDelay(const ThreadStatus::StateType state,
int micro); int micro);
static void TEST_StateDelay(const ThreadStatus::StateType state); static void TEST_StateDelay(const ThreadStatus::StateType state);
static Env::IOActivity TEST_GetExpectedIOActivity(
ThreadStatus::OperationType thread_op);
#endif #endif
protected: protected:

@ -27,6 +27,20 @@ void ThreadStatusUtil::TEST_StateDelay(const ThreadStatus::StateType state) {
} }
} }
Env::IOActivity ThreadStatusUtil::TEST_GetExpectedIOActivity(
ThreadStatus::OperationType thread_op) {
switch (thread_op) {
case ThreadStatus::OperationType::OP_FLUSH:
return Env::IOActivity::kFlush;
case ThreadStatus::OperationType::OP_COMPACTION:
return Env::IOActivity::kCompaction;
case ThreadStatus::OperationType::OP_DBOPEN:
return Env::IOActivity::kDBOpen;
default:
return Env::IOActivity::kUnknown;
}
}
#endif // !NDEBUG #endif // !NDEBUG
} // namespace ROCKSDB_NAMESPACE } // namespace ROCKSDB_NAMESPACE

@ -682,8 +682,6 @@ DBOptions* DBOptions::IncreaseParallelism(int total_threads) {
env->SetBackgroundThreads(1, Env::HIGH); env->SetBackgroundThreads(1, Env::HIGH);
return this; return this;
} }
ReadOptions::ReadOptions() ReadOptions::ReadOptions()
: snapshot(nullptr), : snapshot(nullptr),
iterate_lower_bound(nullptr), iterate_lower_bound(nullptr),
@ -708,7 +706,8 @@ ReadOptions::ReadOptions()
value_size_soft_limit(std::numeric_limits<uint64_t>::max()), value_size_soft_limit(std::numeric_limits<uint64_t>::max()),
adaptive_readahead(false), adaptive_readahead(false),
async_io(false), async_io(false),
optimize_multiget_for_io(true) {} optimize_multiget_for_io(true),
io_activity(Env::IOActivity::kUnknown) {}
ReadOptions::ReadOptions(bool cksum, bool cache) ReadOptions::ReadOptions(bool cksum, bool cache)
: snapshot(nullptr), : snapshot(nullptr),
@ -734,6 +733,34 @@ ReadOptions::ReadOptions(bool cksum, bool cache)
value_size_soft_limit(std::numeric_limits<uint64_t>::max()), value_size_soft_limit(std::numeric_limits<uint64_t>::max()),
adaptive_readahead(false), adaptive_readahead(false),
async_io(false), async_io(false),
optimize_multiget_for_io(true) {} optimize_multiget_for_io(true),
io_activity(Env::IOActivity::kUnknown) {}
ReadOptions::ReadOptions(Env::IOActivity _io_activity)
: snapshot(nullptr),
iterate_lower_bound(nullptr),
iterate_upper_bound(nullptr),
readahead_size(0),
max_skippable_internal_keys(0),
read_tier(kReadAllTier),
verify_checksums(true),
fill_cache(true),
tailing(false),
managed(false),
total_order_seek(false),
auto_prefix_mode(false),
prefix_same_as_start(false),
pin_data(false),
background_purge_on_iterator_cleanup(false),
ignore_range_deletions(false),
timestamp(nullptr),
iter_start_ts(nullptr),
deadline(std::chrono::microseconds::zero()),
io_timeout(std::chrono::microseconds::zero()),
value_size_soft_limit(std::numeric_limits<uint64_t>::max()),
adaptive_readahead(false),
async_io(false),
optimize_multiget_for_io(true),
io_activity(_io_activity) {}
} // namespace ROCKSDB_NAMESPACE } // namespace ROCKSDB_NAMESPACE

@ -583,6 +583,7 @@ Status BlockBasedTable::Open(
ro.io_timeout = read_options.io_timeout; ro.io_timeout = read_options.io_timeout;
ro.rate_limiter_priority = read_options.rate_limiter_priority; ro.rate_limiter_priority = read_options.rate_limiter_priority;
ro.verify_checksums = read_options.verify_checksums; ro.verify_checksums = read_options.verify_checksums;
ro.io_activity = read_options.io_activity;
// prefetch both index and filters, down to all partitions // prefetch both index and filters, down to all partitions
const bool prefetch_all = prefetch_index_and_filter_in_cache || level == 0; const bool prefetch_all = prefetch_index_and_filter_in_cache || level == 0;
@ -2231,7 +2232,8 @@ Status BlockBasedTable::MultiGetFilter(const ReadOptions& read_options,
return Status::OK(); return Status::OK();
} }
Status BlockBasedTable::Prefetch(const Slice* const begin, Status BlockBasedTable::Prefetch(const ReadOptions& read_options,
const Slice* const begin,
const Slice* const end) { const Slice* const end) {
auto& comparator = rep_->internal_comparator; auto& comparator = rep_->internal_comparator;
UserComparatorWrapper user_comparator(comparator.user_comparator()); UserComparatorWrapper user_comparator(comparator.user_comparator());
@ -2241,7 +2243,7 @@ Status BlockBasedTable::Prefetch(const Slice* const begin,
} }
BlockCacheLookupContext lookup_context{TableReaderCaller::kPrefetch}; BlockCacheLookupContext lookup_context{TableReaderCaller::kPrefetch};
IndexBlockIter iiter_on_stack; IndexBlockIter iiter_on_stack;
auto iiter = NewIndexIterator(ReadOptions(), /*need_upper_bound_check=*/false, auto iiter = NewIndexIterator(read_options, /*need_upper_bound_check=*/false,
&iiter_on_stack, /*get_context=*/nullptr, &iiter_on_stack, /*get_context=*/nullptr,
&lookup_context); &lookup_context);
std::unique_ptr<InternalIteratorBase<IndexValue>> iiter_unique_ptr; std::unique_ptr<InternalIteratorBase<IndexValue>> iiter_unique_ptr;
@ -2278,7 +2280,7 @@ Status BlockBasedTable::Prefetch(const Slice* const begin,
DataBlockIter biter; DataBlockIter biter;
Status tmp_status; Status tmp_status;
NewDataBlockIterator<DataBlockIter>( NewDataBlockIterator<DataBlockIter>(
ReadOptions(), block_handle, &biter, /*type=*/BlockType::kData, read_options, block_handle, &biter, /*type=*/BlockType::kData,
/*get_context=*/nullptr, &lookup_context, /*get_context=*/nullptr, &lookup_context,
/*prefetch_buffer=*/nullptr, /*for_compaction=*/false, /*prefetch_buffer=*/nullptr, /*for_compaction=*/false,
/*async_read=*/false, tmp_status); /*async_read=*/false, tmp_status);
@ -2298,11 +2300,10 @@ Status BlockBasedTable::VerifyChecksum(const ReadOptions& read_options,
// Check Meta blocks // Check Meta blocks
std::unique_ptr<Block> metaindex; std::unique_ptr<Block> metaindex;
std::unique_ptr<InternalIterator> metaindex_iter; std::unique_ptr<InternalIterator> metaindex_iter;
ReadOptions ro; s = ReadMetaIndexBlock(read_options, nullptr /* prefetch buffer */,
s = ReadMetaIndexBlock(ro, nullptr /* prefetch buffer */, &metaindex, &metaindex, &metaindex_iter);
&metaindex_iter);
if (s.ok()) { if (s.ok()) {
s = VerifyChecksumInMetaBlocks(metaindex_iter.get()); s = VerifyChecksumInMetaBlocks(read_options, metaindex_iter.get());
if (!s.ok()) { if (!s.ok()) {
return s; return s;
} }
@ -2409,7 +2410,7 @@ BlockType BlockBasedTable::GetBlockTypeForMetaBlockByName(
} }
Status BlockBasedTable::VerifyChecksumInMetaBlocks( Status BlockBasedTable::VerifyChecksumInMetaBlocks(
InternalIteratorBase<Slice>* index_iter) { const ReadOptions& read_options, InternalIteratorBase<Slice>* index_iter) {
Status s; Status s;
for (index_iter->SeekToFirst(); index_iter->Valid(); index_iter->Next()) { for (index_iter->SeekToFirst(); index_iter->Valid(); index_iter->Next()) {
s = index_iter->status(); s = index_iter->status();
@ -2425,14 +2426,14 @@ Status BlockBasedTable::VerifyChecksumInMetaBlocks(
// Unfortunate special handling for properties block checksum w/ // Unfortunate special handling for properties block checksum w/
// global seqno // global seqno
std::unique_ptr<TableProperties> table_properties; std::unique_ptr<TableProperties> table_properties;
s = ReadTablePropertiesHelper(ReadOptions(), handle, rep_->file.get(), s = ReadTablePropertiesHelper(read_options, handle, rep_->file.get(),
nullptr /* prefetch_buffer */, rep_->footer, nullptr /* prefetch_buffer */, rep_->footer,
rep_->ioptions, &table_properties, rep_->ioptions, &table_properties,
nullptr /* memory_allocator */); nullptr /* memory_allocator */);
} else { } else {
s = BlockFetcher( s = BlockFetcher(
rep_->file.get(), nullptr /* prefetch buffer */, rep_->footer, rep_->file.get(), nullptr /* prefetch buffer */, rep_->footer,
ReadOptions(), handle, &contents, rep_->ioptions, read_options, handle, &contents, rep_->ioptions,
false /* decompress */, false /*maybe_compressed*/, false /* decompress */, false /*maybe_compressed*/,
GetBlockTypeForMetaBlockByName(meta_block_name), GetBlockTypeForMetaBlockByName(meta_block_name),
UncompressionDict::GetEmptyDict(), rep_->persistent_cache_options) UncompressionDict::GetEmptyDict(), rep_->persistent_cache_options)
@ -2544,7 +2545,8 @@ uint64_t BlockBasedTable::GetApproximateDataSize() {
return rep_->footer.metaindex_handle().offset(); return rep_->footer.metaindex_handle().offset();
} }
uint64_t BlockBasedTable::ApproximateOffsetOf(const Slice& key, uint64_t BlockBasedTable::ApproximateOffsetOf(const ReadOptions& read_options,
const Slice& key,
TableReaderCaller caller) { TableReaderCaller caller) {
uint64_t data_size = GetApproximateDataSize(); uint64_t data_size = GetApproximateDataSize();
if (UNLIKELY(data_size == 0)) { if (UNLIKELY(data_size == 0)) {
@ -2558,6 +2560,7 @@ uint64_t BlockBasedTable::ApproximateOffsetOf(const Slice& key,
IndexBlockIter iiter_on_stack; IndexBlockIter iiter_on_stack;
ReadOptions ro; ReadOptions ro;
ro.total_order_seek = true; ro.total_order_seek = true;
ro.io_activity = read_options.io_activity;
auto index_iter = auto index_iter =
NewIndexIterator(ro, /*disable_prefix_seek=*/true, NewIndexIterator(ro, /*disable_prefix_seek=*/true,
/*input_iter=*/&iiter_on_stack, /*get_context=*/nullptr, /*input_iter=*/&iiter_on_stack, /*get_context=*/nullptr,
@ -2586,7 +2589,8 @@ uint64_t BlockBasedTable::ApproximateOffsetOf(const Slice& key,
static_cast<double>(rep_->file_size)); static_cast<double>(rep_->file_size));
} }
uint64_t BlockBasedTable::ApproximateSize(const Slice& start, const Slice& end, uint64_t BlockBasedTable::ApproximateSize(const ReadOptions& read_options,
const Slice& start, const Slice& end,
TableReaderCaller caller) { TableReaderCaller caller) {
assert(rep_->internal_comparator.Compare(start, end) <= 0); assert(rep_->internal_comparator.Compare(start, end) <= 0);
@ -2603,6 +2607,7 @@ uint64_t BlockBasedTable::ApproximateSize(const Slice& start, const Slice& end,
IndexBlockIter iiter_on_stack; IndexBlockIter iiter_on_stack;
ReadOptions ro; ReadOptions ro;
ro.total_order_seek = true; ro.total_order_seek = true;
ro.io_activity = read_options.io_activity;
auto index_iter = auto index_iter =
NewIndexIterator(ro, /*disable_prefix_seek=*/true, NewIndexIterator(ro, /*disable_prefix_seek=*/true,
/*input_iter=*/&iiter_on_stack, /*get_context=*/nullptr, /*input_iter=*/&iiter_on_stack, /*get_context=*/nullptr,
@ -2654,9 +2659,9 @@ bool BlockBasedTable::TEST_IndexBlockInCache() const {
} }
Status BlockBasedTable::GetKVPairsFromDataBlocks( Status BlockBasedTable::GetKVPairsFromDataBlocks(
std::vector<KVPairBlock>* kv_pair_blocks) { const ReadOptions& read_options, std::vector<KVPairBlock>* kv_pair_blocks) {
std::unique_ptr<InternalIteratorBase<IndexValue>> blockhandles_iter( std::unique_ptr<InternalIteratorBase<IndexValue>> blockhandles_iter(
NewIndexIterator(ReadOptions(), /*need_upper_bound_check=*/false, NewIndexIterator(read_options, /*need_upper_bound_check=*/false,
/*input_iter=*/nullptr, /*get_context=*/nullptr, /*input_iter=*/nullptr, /*get_context=*/nullptr,
/*lookup_contex=*/nullptr)); /*lookup_contex=*/nullptr));
@ -2677,7 +2682,7 @@ Status BlockBasedTable::GetKVPairsFromDataBlocks(
std::unique_ptr<InternalIterator> datablock_iter; std::unique_ptr<InternalIterator> datablock_iter;
Status tmp_status; Status tmp_status;
datablock_iter.reset(NewDataBlockIterator<DataBlockIter>( datablock_iter.reset(NewDataBlockIterator<DataBlockIter>(
ReadOptions(), blockhandles_iter->value().handle, read_options, blockhandles_iter->value().handle,
/*input_iter=*/nullptr, /*type=*/BlockType::kData, /*input_iter=*/nullptr, /*type=*/BlockType::kData,
/*get_context=*/nullptr, /*lookup_context=*/nullptr, /*get_context=*/nullptr, /*lookup_context=*/nullptr,
/*prefetch_buffer=*/nullptr, /*for_compaction=*/false, /*prefetch_buffer=*/nullptr, /*for_compaction=*/false,
@ -2723,7 +2728,8 @@ Status BlockBasedTable::DumpTable(WritableFile* out_file) {
"--------------------------------------\n"; "--------------------------------------\n";
std::unique_ptr<Block> metaindex; std::unique_ptr<Block> metaindex;
std::unique_ptr<InternalIterator> metaindex_iter; std::unique_ptr<InternalIterator> metaindex_iter;
ReadOptions ro; // TODO: plumb Env::IOActivity
const ReadOptions ro;
Status s = ReadMetaIndexBlock(ro, nullptr /* prefetch_buffer */, &metaindex, Status s = ReadMetaIndexBlock(ro, nullptr /* prefetch_buffer */, &metaindex,
&metaindex_iter); &metaindex_iter);
if (s.ok()) { if (s.ok()) {
@ -2779,7 +2785,7 @@ Status BlockBasedTable::DumpTable(WritableFile* out_file) {
if (rep_->uncompression_dict_reader) { if (rep_->uncompression_dict_reader) {
CachableEntry<UncompressionDict> uncompression_dict; CachableEntry<UncompressionDict> uncompression_dict;
s = rep_->uncompression_dict_reader->GetOrReadUncompressionDictionary( s = rep_->uncompression_dict_reader->GetOrReadUncompressionDictionary(
nullptr /* prefetch_buffer */, false /* no_io */, nullptr /* prefetch_buffer */, ro, false /* no_io */,
false, /* verify_checksums */ false, /* verify_checksums */
nullptr /* get_context */, nullptr /* lookup_context */, nullptr /* get_context */, nullptr /* lookup_context */,
&uncompression_dict); &uncompression_dict);
@ -2797,7 +2803,7 @@ Status BlockBasedTable::DumpTable(WritableFile* out_file) {
} }
// Output range deletions block // Output range deletions block
auto* range_del_iter = NewRangeTombstoneIterator(ReadOptions()); auto* range_del_iter = NewRangeTombstoneIterator(ro);
if (range_del_iter != nullptr) { if (range_del_iter != nullptr) {
range_del_iter->SeekToFirst(); range_del_iter->SeekToFirst();
if (range_del_iter->Valid()) { if (range_del_iter->Valid()) {
@ -2827,8 +2833,10 @@ Status BlockBasedTable::DumpTable(WritableFile* out_file) {
Status BlockBasedTable::DumpIndexBlock(std::ostream& out_stream) { Status BlockBasedTable::DumpIndexBlock(std::ostream& out_stream) {
out_stream << "Index Details:\n" out_stream << "Index Details:\n"
"--------------------------------------\n"; "--------------------------------------\n";
// TODO: plumb Env::IOActivity
const ReadOptions read_options;
std::unique_ptr<InternalIteratorBase<IndexValue>> blockhandles_iter( std::unique_ptr<InternalIteratorBase<IndexValue>> blockhandles_iter(
NewIndexIterator(ReadOptions(), /*need_upper_bound_check=*/false, NewIndexIterator(read_options, /*need_upper_bound_check=*/false,
/*input_iter=*/nullptr, /*get_context=*/nullptr, /*input_iter=*/nullptr, /*get_context=*/nullptr,
/*lookup_contex=*/nullptr)); /*lookup_contex=*/nullptr));
Status s = blockhandles_iter->status(); Status s = blockhandles_iter->status();
@ -2876,8 +2884,10 @@ Status BlockBasedTable::DumpIndexBlock(std::ostream& out_stream) {
} }
Status BlockBasedTable::DumpDataBlocks(std::ostream& out_stream) { Status BlockBasedTable::DumpDataBlocks(std::ostream& out_stream) {
// TODO: plumb Env::IOActivity
const ReadOptions read_options;
std::unique_ptr<InternalIteratorBase<IndexValue>> blockhandles_iter( std::unique_ptr<InternalIteratorBase<IndexValue>> blockhandles_iter(
NewIndexIterator(ReadOptions(), /*need_upper_bound_check=*/false, NewIndexIterator(read_options, /*need_upper_bound_check=*/false,
/*input_iter=*/nullptr, /*get_context=*/nullptr, /*input_iter=*/nullptr, /*get_context=*/nullptr,
/*lookup_contex=*/nullptr)); /*lookup_contex=*/nullptr));
Status s = blockhandles_iter->status(); Status s = blockhandles_iter->status();
@ -2911,7 +2921,7 @@ Status BlockBasedTable::DumpDataBlocks(std::ostream& out_stream) {
std::unique_ptr<InternalIterator> datablock_iter; std::unique_ptr<InternalIterator> datablock_iter;
Status tmp_status; Status tmp_status;
datablock_iter.reset(NewDataBlockIterator<DataBlockIter>( datablock_iter.reset(NewDataBlockIterator<DataBlockIter>(
ReadOptions(), blockhandles_iter->value().handle, read_options, blockhandles_iter->value().handle,
/*input_iter=*/nullptr, /*type=*/BlockType::kData, /*input_iter=*/nullptr, /*type=*/BlockType::kData,
/*get_context=*/nullptr, /*lookup_context=*/nullptr, /*get_context=*/nullptr, /*lookup_context=*/nullptr,
/*prefetch_buffer=*/nullptr, /*for_compaction=*/false, /*prefetch_buffer=*/nullptr, /*for_compaction=*/false,

@ -153,7 +153,8 @@ class BlockBasedTable : public TableReader {
// Pre-fetch the disk blocks that correspond to the key range specified by // Pre-fetch the disk blocks that correspond to the key range specified by
// (kbegin, kend). The call will return error status in the event of // (kbegin, kend). The call will return error status in the event of
// IO or iteration error. // IO or iteration error.
Status Prefetch(const Slice* begin, const Slice* end) override; Status Prefetch(const ReadOptions& read_options, const Slice* begin,
const Slice* end) override;
// Given a key, return an approximate byte offset in the file where // Given a key, return an approximate byte offset in the file where
// the data for that key begins (or would begin if the key were // the data for that key begins (or would begin if the key were
@ -161,15 +162,16 @@ class BlockBasedTable : public TableReader {
// bytes, and so includes effects like compression of the underlying data. // bytes, and so includes effects like compression of the underlying data.
// E.g., the approximate offset of the last key in the table will // E.g., the approximate offset of the last key in the table will
// be close to the file length. // be close to the file length.
uint64_t ApproximateOffsetOf(const Slice& key, uint64_t ApproximateOffsetOf(const ReadOptions& read_options,
const Slice& key,
TableReaderCaller caller) override; TableReaderCaller caller) override;
// Given start and end keys, return the approximate data size in the file // Given start and end keys, return the approximate data size in the file
// between the keys. The returned value is in terms of file bytes, and so // between the keys. The returned value is in terms of file bytes, and so
// includes effects like compression of the underlying data. // includes effects like compression of the underlying data.
// The start key must not be greater than the end key. // The start key must not be greater than the end key.
uint64_t ApproximateSize(const Slice& start, const Slice& end, uint64_t ApproximateSize(const ReadOptions& read_options, const Slice& start,
TableReaderCaller caller) override; const Slice& end, TableReaderCaller caller) override;
Status ApproximateKeyAnchors(const ReadOptions& read_options, Status ApproximateKeyAnchors(const ReadOptions& read_options,
std::vector<Anchor>& anchors) override; std::vector<Anchor>& anchors) override;
@ -265,7 +267,8 @@ class BlockBasedTable : public TableReader {
// Retrieve all key value pairs from data blocks in the table. // Retrieve all key value pairs from data blocks in the table.
// The key retrieved are internal keys. // The key retrieved are internal keys.
Status GetKVPairsFromDataBlocks(std::vector<KVPairBlock>* kv_pair_blocks); Status GetKVPairsFromDataBlocks(const ReadOptions& read_options,
std::vector<KVPairBlock>* kv_pair_blocks);
struct Rep; struct Rep;
@ -477,7 +480,8 @@ class BlockBasedTable : public TableReader {
static BlockType GetBlockTypeForMetaBlockByName(const Slice& meta_block_name); static BlockType GetBlockTypeForMetaBlockByName(const Slice& meta_block_name);
Status VerifyChecksumInMetaBlocks(InternalIteratorBase<Slice>* index_iter); Status VerifyChecksumInMetaBlocks(const ReadOptions& read_options,
InternalIteratorBase<Slice>* index_iter);
Status VerifyChecksumInBlocks(const ReadOptions& read_options, Status VerifyChecksumInBlocks(const ReadOptions& read_options,
InternalIteratorBase<IndexValue>* index_iter); InternalIteratorBase<IndexValue>* index_iter);

@ -68,7 +68,7 @@ TBlockIter* BlockBasedTable::NewDataBlockIterator(
// uncompression dict is typically at the end of the file and would // uncompression dict is typically at the end of the file and would
// most likely break the sequentiality of the access pattern. // most likely break the sequentiality of the access pattern.
s = rep_->uncompression_dict_reader->GetOrReadUncompressionDictionary( s = rep_->uncompression_dict_reader->GetOrReadUncompressionDictionary(
ro.async_io ? nullptr : prefetch_buffer, no_io, ro.verify_checksums, ro.async_io ? nullptr : prefetch_buffer, ro, no_io, ro.verify_checksums,
get_context, lookup_context, &uncompression_dict); get_context, lookup_context, &uncompression_dict);
if (!s.ok()) { if (!s.ok()) {
iter->Invalidate(s); iter->Invalidate(s);

@ -421,7 +421,7 @@ DEFINE_SYNC_AND_ASYNC(void, BlockBasedTable::MultiGet)
uncompression_dict_status = uncompression_dict_status =
rep_->uncompression_dict_reader rep_->uncompression_dict_reader
->GetOrReadUncompressionDictionary( ->GetOrReadUncompressionDictionary(
nullptr /* prefetch_buffer */, no_io, nullptr /* prefetch_buffer */, read_options, no_io,
read_options.verify_checksums, get_context, read_options.verify_checksums, get_context,
&metadata_lookup_context, &uncompression_dict); &metadata_lookup_context, &uncompression_dict);
uncompression_dict_inited = true; uncompression_dict_inited = true;

@ -74,17 +74,17 @@ Status HashIndexReader::Create(const BlockBasedTable* table,
// Read contents for the blocks // Read contents for the blocks
BlockContents prefixes_contents; BlockContents prefixes_contents;
BlockFetcher prefixes_block_fetcher( BlockFetcher prefixes_block_fetcher(
file, prefetch_buffer, footer, ReadOptions(), prefixes_handle, file, prefetch_buffer, footer, ro, prefixes_handle, &prefixes_contents,
&prefixes_contents, ioptions, true /*decompress*/, ioptions, true /*decompress*/, true /*maybe_compressed*/,
true /*maybe_compressed*/, BlockType::kHashIndexPrefixes, BlockType::kHashIndexPrefixes, UncompressionDict::GetEmptyDict(),
UncompressionDict::GetEmptyDict(), cache_options, memory_allocator); cache_options, memory_allocator);
s = prefixes_block_fetcher.ReadBlockContents(); s = prefixes_block_fetcher.ReadBlockContents();
if (!s.ok()) { if (!s.ok()) {
return s; return s;
} }
BlockContents prefixes_meta_contents; BlockContents prefixes_meta_contents;
BlockFetcher prefixes_meta_block_fetcher( BlockFetcher prefixes_meta_block_fetcher(
file, prefetch_buffer, footer, ReadOptions(), prefixes_meta_handle, file, prefetch_buffer, footer, ro, prefixes_meta_handle,
&prefixes_meta_contents, ioptions, true /*decompress*/, &prefixes_meta_contents, ioptions, true /*decompress*/,
true /*maybe_compressed*/, BlockType::kHashIndexMetadata, true /*maybe_compressed*/, BlockType::kHashIndexMetadata,
UncompressionDict::GetEmptyDict(), cache_options, memory_allocator); UncompressionDict::GetEmptyDict(), cache_options, memory_allocator);

@ -85,6 +85,7 @@ InternalIteratorBase<IndexValue>* PartitionIndexReader::NewIterator(
ro.async_io = read_options.async_io; ro.async_io = read_options.async_io;
ro.rate_limiter_priority = read_options.rate_limiter_priority; ro.rate_limiter_priority = read_options.rate_limiter_priority;
ro.verify_checksums = read_options.verify_checksums; ro.verify_checksums = read_options.verify_checksums;
ro.io_activity = read_options.io_activity;
// We don't return pinned data from index blocks, so no need // We don't return pinned data from index blocks, so no need
// to set `block_contents_pinned`. // to set `block_contents_pinned`.

@ -77,8 +77,9 @@ Status UncompressionDictReader::ReadUncompressionDictionary(
} }
Status UncompressionDictReader::GetOrReadUncompressionDictionary( Status UncompressionDictReader::GetOrReadUncompressionDictionary(
FilePrefetchBuffer* prefetch_buffer, bool no_io, bool verify_checksums, FilePrefetchBuffer* prefetch_buffer, const ReadOptions& ro, bool no_io,
GetContext* get_context, BlockCacheLookupContext* lookup_context, bool verify_checksums, GetContext* get_context,
BlockCacheLookupContext* lookup_context,
CachableEntry<UncompressionDict>* uncompression_dict) const { CachableEntry<UncompressionDict>* uncompression_dict) const {
assert(uncompression_dict); assert(uncompression_dict);
@ -92,6 +93,7 @@ Status UncompressionDictReader::GetOrReadUncompressionDictionary(
read_options.read_tier = kBlockCacheTier; read_options.read_tier = kBlockCacheTier;
} }
read_options.verify_checksums = verify_checksums; read_options.verify_checksums = verify_checksums;
read_options.io_activity = ro.io_activity;
return ReadUncompressionDictionary(table_, prefetch_buffer, read_options, return ReadUncompressionDictionary(table_, prefetch_buffer, read_options,
cache_dictionary_blocks(), get_context, cache_dictionary_blocks(), get_context,

@ -32,8 +32,9 @@ class UncompressionDictReader {
std::unique_ptr<UncompressionDictReader>* uncompression_dict_reader); std::unique_ptr<UncompressionDictReader>* uncompression_dict_reader);
Status GetOrReadUncompressionDictionary( Status GetOrReadUncompressionDictionary(
FilePrefetchBuffer* prefetch_buffer, bool no_io, bool verify_checksums, FilePrefetchBuffer* prefetch_buffer, const ReadOptions& ro, bool no_io,
GetContext* get_context, BlockCacheLookupContext* lookup_context, bool verify_checksums, GetContext* get_context,
BlockCacheLookupContext* lookup_context,
CachableEntry<UncompressionDict>* uncompression_dict) const; CachableEntry<UncompressionDict>* uncompression_dict) const;
size_t ApproximateMemoryUsage() const; size_t ApproximateMemoryUsage() const;

@ -70,8 +70,10 @@ class CuckooBuilderTest : public testing::Test {
// Assert Table Properties. // Assert Table Properties.
std::unique_ptr<TableProperties> props; std::unique_ptr<TableProperties> props;
const ReadOptions read_options;
ASSERT_OK(ReadTableProperties(file_reader.get(), read_file_size, ASSERT_OK(ReadTableProperties(file_reader.get(), read_file_size,
kCuckooTableMagicNumber, ioptions, &props)); kCuckooTableMagicNumber, ioptions,
read_options, &props));
// Check unused bucket. // Check unused bucket.
std::string unused_key = std::string unused_key =
props->user_collected_properties[CuckooTablePropertyNames::kEmptyKey]; props->user_collected_properties[CuckooTablePropertyNames::kEmptyKey];
@ -627,4 +629,3 @@ int main(int argc, char** argv) {
::testing::InitGoogleTest(&argc, argv); ::testing::InitGoogleTest(&argc, argv);
return RUN_ALL_TESTS(); return RUN_ALL_TESTS();
} }

@ -59,8 +59,11 @@ CuckooTableReader::CuckooTableReader(
} }
{ {
std::unique_ptr<TableProperties> props; std::unique_ptr<TableProperties> props;
status_ = ReadTableProperties(file_.get(), file_size, // TODO: plumb Env::IOActivity
kCuckooTableMagicNumber, ioptions, &props); const ReadOptions read_options;
status_ =
ReadTableProperties(file_.get(), file_size, kCuckooTableMagicNumber,
ioptions, read_options, &props);
if (!status_.ok()) { if (!status_.ok()) {
return; return;
} }

@ -58,12 +58,14 @@ class CuckooTableReader : public TableReader {
size_t ApproximateMemoryUsage() const override; size_t ApproximateMemoryUsage() const override;
// Following methods are not implemented for Cuckoo Table Reader // Following methods are not implemented for Cuckoo Table Reader
uint64_t ApproximateOffsetOf(const Slice& /*key*/, uint64_t ApproximateOffsetOf(const ReadOptions& /*read_options*/,
const Slice& /*key*/,
TableReaderCaller /*caller*/) override { TableReaderCaller /*caller*/) override {
return 0; return 0;
} }
uint64_t ApproximateSize(const Slice& /*start*/, const Slice& /*end*/, uint64_t ApproximateSize(const ReadOptions& /* read_options */,
const Slice& /*start*/, const Slice& /*end*/,
TableReaderCaller /*caller*/) override { TableReaderCaller /*caller*/) override {
return 0; return 0;
} }

@ -412,20 +412,22 @@ Status ReadTablePropertiesHelper(
Status ReadTableProperties(RandomAccessFileReader* file, uint64_t file_size, Status ReadTableProperties(RandomAccessFileReader* file, uint64_t file_size,
uint64_t table_magic_number, uint64_t table_magic_number,
const ImmutableOptions& ioptions, const ImmutableOptions& ioptions,
const ReadOptions& read_options,
std::unique_ptr<TableProperties>* properties, std::unique_ptr<TableProperties>* properties,
MemoryAllocator* memory_allocator, MemoryAllocator* memory_allocator,
FilePrefetchBuffer* prefetch_buffer) { FilePrefetchBuffer* prefetch_buffer) {
BlockHandle block_handle; BlockHandle block_handle;
Footer footer; Footer footer;
Status s = FindMetaBlockInFile(file, file_size, table_magic_number, ioptions, Status s =
kPropertiesBlockName, &block_handle, FindMetaBlockInFile(file, file_size, table_magic_number, ioptions,
memory_allocator, prefetch_buffer, &footer); read_options, kPropertiesBlockName, &block_handle,
memory_allocator, prefetch_buffer, &footer);
if (!s.ok()) { if (!s.ok()) {
return s; return s;
} }
if (!block_handle.IsNull()) { if (!block_handle.IsNull()) {
s = ReadTablePropertiesHelper(ReadOptions(), block_handle, file, s = ReadTablePropertiesHelper(read_options, block_handle, file,
prefetch_buffer, footer, ioptions, properties, prefetch_buffer, footer, ioptions, properties,
memory_allocator); memory_allocator);
} else { } else {
@ -473,14 +475,20 @@ Status FindMetaBlock(InternalIterator* meta_index_iter,
Status ReadMetaIndexBlockInFile(RandomAccessFileReader* file, Status ReadMetaIndexBlockInFile(RandomAccessFileReader* file,
uint64_t file_size, uint64_t table_magic_number, uint64_t file_size, uint64_t table_magic_number,
const ImmutableOptions& ioptions, const ImmutableOptions& ioptions,
const ReadOptions& read_options,
BlockContents* metaindex_contents, BlockContents* metaindex_contents,
MemoryAllocator* memory_allocator, MemoryAllocator* memory_allocator,
FilePrefetchBuffer* prefetch_buffer, FilePrefetchBuffer* prefetch_buffer,
Footer* footer_out) { Footer* footer_out) {
Footer footer; Footer footer;
IOOptions opts; IOOptions opts;
auto s = ReadFooterFromFile(opts, file, *ioptions.fs, prefetch_buffer, Status s;
file_size, &footer, table_magic_number); s = file->PrepareIOOptions(read_options, opts);
if (!s.ok()) {
return s;
}
s = ReadFooterFromFile(opts, file, *ioptions.fs, prefetch_buffer, file_size,
&footer, table_magic_number);
if (!s.ok()) { if (!s.ok()) {
return s; return s;
} }
@ -489,7 +497,7 @@ Status ReadMetaIndexBlockInFile(RandomAccessFileReader* file,
} }
auto metaindex_handle = footer.metaindex_handle(); auto metaindex_handle = footer.metaindex_handle();
return BlockFetcher(file, prefetch_buffer, footer, ReadOptions(), return BlockFetcher(file, prefetch_buffer, footer, read_options,
metaindex_handle, metaindex_contents, ioptions, metaindex_handle, metaindex_contents, ioptions,
false /* do decompression */, false /*maybe_compressed*/, false /* do decompression */, false /*maybe_compressed*/,
BlockType::kMetaIndex, UncompressionDict::GetEmptyDict(), BlockType::kMetaIndex, UncompressionDict::GetEmptyDict(),
@ -497,18 +505,16 @@ Status ReadMetaIndexBlockInFile(RandomAccessFileReader* file,
.ReadBlockContents(); .ReadBlockContents();
} }
Status FindMetaBlockInFile(RandomAccessFileReader* file, uint64_t file_size, Status FindMetaBlockInFile(
uint64_t table_magic_number, RandomAccessFileReader* file, uint64_t file_size,
const ImmutableOptions& ioptions, uint64_t table_magic_number, const ImmutableOptions& ioptions,
const std::string& meta_block_name, const ReadOptions& read_options, const std::string& meta_block_name,
BlockHandle* block_handle, BlockHandle* block_handle, MemoryAllocator* memory_allocator,
MemoryAllocator* memory_allocator, FilePrefetchBuffer* prefetch_buffer, Footer* footer_out) {
FilePrefetchBuffer* prefetch_buffer,
Footer* footer_out) {
BlockContents metaindex_contents; BlockContents metaindex_contents;
auto s = ReadMetaIndexBlockInFile( auto s = ReadMetaIndexBlockInFile(
file, file_size, table_magic_number, ioptions, &metaindex_contents, file, file_size, table_magic_number, ioptions, read_options,
memory_allocator, prefetch_buffer, footer_out); &metaindex_contents, memory_allocator, prefetch_buffer, footer_out);
if (!s.ok()) { if (!s.ok()) {
return s; return s;
} }
@ -526,6 +532,7 @@ Status ReadMetaBlock(RandomAccessFileReader* file,
FilePrefetchBuffer* prefetch_buffer, uint64_t file_size, FilePrefetchBuffer* prefetch_buffer, uint64_t file_size,
uint64_t table_magic_number, uint64_t table_magic_number,
const ImmutableOptions& ioptions, const ImmutableOptions& ioptions,
const ReadOptions& read_options,
const std::string& meta_block_name, BlockType block_type, const std::string& meta_block_name, BlockType block_type,
BlockContents* contents, BlockContents* contents,
MemoryAllocator* memory_allocator) { MemoryAllocator* memory_allocator) {
@ -535,15 +542,16 @@ Status ReadMetaBlock(RandomAccessFileReader* file,
BlockHandle block_handle; BlockHandle block_handle;
Footer footer; Footer footer;
Status status = FindMetaBlockInFile( Status status =
file, file_size, table_magic_number, ioptions, meta_block_name, FindMetaBlockInFile(file, file_size, table_magic_number, ioptions,
&block_handle, memory_allocator, prefetch_buffer, &footer); read_options, meta_block_name, &block_handle,
memory_allocator, prefetch_buffer, &footer);
if (!status.ok()) { if (!status.ok()) {
return status; return status;
} }
return BlockFetcher(file, prefetch_buffer, footer, ReadOptions(), return BlockFetcher(file, prefetch_buffer, footer, read_options, block_handle,
block_handle, contents, ioptions, false /* decompress */, contents, ioptions, false /* decompress */,
false /*maybe_compressed*/, block_type, false /*maybe_compressed*/, block_type,
UncompressionDict::GetEmptyDict(), UncompressionDict::GetEmptyDict(),
PersistentCacheOptions::kEmpty, memory_allocator) PersistentCacheOptions::kEmpty, memory_allocator)

@ -119,6 +119,7 @@ Status ReadTablePropertiesHelper(
Status ReadTableProperties(RandomAccessFileReader* file, uint64_t file_size, Status ReadTableProperties(RandomAccessFileReader* file, uint64_t file_size,
uint64_t table_magic_number, uint64_t table_magic_number,
const ImmutableOptions& ioptions, const ImmutableOptions& ioptions,
const ReadOptions& read_options,
std::unique_ptr<TableProperties>* properties, std::unique_ptr<TableProperties>* properties,
MemoryAllocator* memory_allocator = nullptr, MemoryAllocator* memory_allocator = nullptr,
FilePrefetchBuffer* prefetch_buffer = nullptr); FilePrefetchBuffer* prefetch_buffer = nullptr);
@ -139,6 +140,7 @@ Status FindMetaBlock(InternalIterator* meta_index_iter,
Status FindMetaBlockInFile(RandomAccessFileReader* file, uint64_t file_size, Status FindMetaBlockInFile(RandomAccessFileReader* file, uint64_t file_size,
uint64_t table_magic_number, uint64_t table_magic_number,
const ImmutableOptions& ioptions, const ImmutableOptions& ioptions,
const ReadOptions& read_options,
const std::string& meta_block_name, const std::string& meta_block_name,
BlockHandle* block_handle, BlockHandle* block_handle,
MemoryAllocator* memory_allocator = nullptr, MemoryAllocator* memory_allocator = nullptr,
@ -149,6 +151,7 @@ Status FindMetaBlockInFile(RandomAccessFileReader* file, uint64_t file_size,
Status ReadMetaIndexBlockInFile(RandomAccessFileReader* file, Status ReadMetaIndexBlockInFile(RandomAccessFileReader* file,
uint64_t file_size, uint64_t table_magic_number, uint64_t file_size, uint64_t table_magic_number,
const ImmutableOptions& ioptions, const ImmutableOptions& ioptions,
const ReadOptions& read_options,
BlockContents* block_contents, BlockContents* block_contents,
MemoryAllocator* memory_allocator = nullptr, MemoryAllocator* memory_allocator = nullptr,
FilePrefetchBuffer* prefetch_buffer = nullptr, FilePrefetchBuffer* prefetch_buffer = nullptr,
@ -161,6 +164,7 @@ Status ReadMetaBlock(RandomAccessFileReader* file,
FilePrefetchBuffer* prefetch_buffer, uint64_t file_size, FilePrefetchBuffer* prefetch_buffer, uint64_t file_size,
uint64_t table_magic_number, uint64_t table_magic_number,
const ImmutableOptions& ioptions, const ImmutableOptions& ioptions,
const ReadOptions& read_options,
const std::string& meta_block_name, BlockType block_type, const std::string& meta_block_name, BlockType block_type,
BlockContents* contents, BlockContents* contents,
MemoryAllocator* memory_allocator = nullptr); MemoryAllocator* memory_allocator = nullptr);

@ -41,12 +41,14 @@ class MockTableReader : public TableReader {
GetContext* get_context, const SliceTransform* prefix_extractor, GetContext* get_context, const SliceTransform* prefix_extractor,
bool skip_filters = false) override; bool skip_filters = false) override;
uint64_t ApproximateOffsetOf(const Slice& /*key*/, uint64_t ApproximateOffsetOf(const ReadOptions& /*read_options*/,
const Slice& /*key*/,
TableReaderCaller /*caller*/) override { TableReaderCaller /*caller*/) override {
return 0; return 0;
} }
uint64_t ApproximateSize(const Slice& /*start*/, const Slice& /*end*/, uint64_t ApproximateSize(const ReadOptions& /*read_options*/,
const Slice& /*start*/, const Slice& /*end*/,
TableReaderCaller /*caller*/) override { TableReaderCaller /*caller*/) override {
return 0; return 0;
} }

@ -126,8 +126,10 @@ Status PlainTableReader::Open(
} }
std::unique_ptr<TableProperties> props; std::unique_ptr<TableProperties> props;
// TODO: plumb Env::IOActivity
const ReadOptions read_options;
auto s = ReadTableProperties(file.get(), file_size, kPlainTableMagicNumber, auto s = ReadTableProperties(file.get(), file_size, kPlainTableMagicNumber,
ioptions, &props); ioptions, read_options, &props);
if (!s.ok()) { if (!s.ok()) {
return s; return s;
} }
@ -297,10 +299,14 @@ Status PlainTableReader::PopulateIndex(TableProperties* props,
assert(props != nullptr); assert(props != nullptr);
BlockContents index_block_contents; BlockContents index_block_contents;
Status s = ReadMetaBlock(file_info_.file.get(), nullptr /* prefetch_buffer */,
file_size_, kPlainTableMagicNumber, ioptions_, // TODO: plumb Env::IOActivity
PlainTableIndexBuilder::kPlainTableIndexBlock, const ReadOptions read_options;
BlockType::kIndex, &index_block_contents); Status s =
ReadMetaBlock(file_info_.file.get(), nullptr /* prefetch_buffer */,
file_size_, kPlainTableMagicNumber, ioptions_, read_options,
PlainTableIndexBuilder::kPlainTableIndexBlock,
BlockType::kIndex, &index_block_contents);
bool index_in_file = s.ok(); bool index_in_file = s.ok();
@ -310,8 +316,8 @@ Status PlainTableReader::PopulateIndex(TableProperties* props,
if (index_in_file) { if (index_in_file) {
s = ReadMetaBlock(file_info_.file.get(), nullptr /* prefetch_buffer */, s = ReadMetaBlock(file_info_.file.get(), nullptr /* prefetch_buffer */,
file_size_, kPlainTableMagicNumber, ioptions_, file_size_, kPlainTableMagicNumber, ioptions_,
BloomBlockBuilder::kBloomBlock, BlockType::kFilter, read_options, BloomBlockBuilder::kBloomBlock,
&bloom_block_contents); BlockType::kFilter, &bloom_block_contents);
bloom_in_file = s.ok() && bloom_block_contents.data.size() > 0; bloom_in_file = s.ok() && bloom_block_contents.data.size() > 0;
} }
@ -614,12 +620,14 @@ Status PlainTableReader::Get(const ReadOptions& /*ro*/, const Slice& target,
return Status::OK(); return Status::OK();
} }
uint64_t PlainTableReader::ApproximateOffsetOf(const Slice& /*key*/, uint64_t PlainTableReader::ApproximateOffsetOf(
TableReaderCaller /*caller*/) { const ReadOptions& /*read_options*/, const Slice& /*key*/,
TableReaderCaller /*caller*/) {
return 0; return 0;
} }
uint64_t PlainTableReader::ApproximateSize(const Slice& /*start*/, uint64_t PlainTableReader::ApproximateSize(const ReadOptions& /* read_options*/,
const Slice& /*start*/,
const Slice& /*end*/, const Slice& /*end*/,
TableReaderCaller /*caller*/) { TableReaderCaller /*caller*/) {
return 0; return 0;

@ -92,11 +92,12 @@ class PlainTableReader : public TableReader {
GetContext* get_context, const SliceTransform* prefix_extractor, GetContext* get_context, const SliceTransform* prefix_extractor,
bool skip_filters = false) override; bool skip_filters = false) override;
uint64_t ApproximateOffsetOf(const Slice& key, uint64_t ApproximateOffsetOf(const ReadOptions& read_options,
const Slice& key,
TableReaderCaller caller) override; TableReaderCaller caller) override;
uint64_t ApproximateSize(const Slice& start, const Slice& end, uint64_t ApproximateSize(const ReadOptions& read_options, const Slice& start,
TableReaderCaller caller) override; const Slice& end, TableReaderCaller caller) override;
uint32_t GetIndexSize() const { return index_.GetIndexSize(); } uint32_t GetIndexSize() const { return index_.GetIndexSize(); }
void SetupForCompaction() override; void SetupForCompaction() override;

@ -355,8 +355,11 @@ Status SstFileDumper::ReadTableProperties(uint64_t table_magic_number,
RandomAccessFileReader* file, RandomAccessFileReader* file,
uint64_t file_size, uint64_t file_size,
FilePrefetchBuffer* prefetch_buffer) { FilePrefetchBuffer* prefetch_buffer) {
// TODO: plumb Env::IOActivity
const ReadOptions read_options;
Status s = ROCKSDB_NAMESPACE::ReadTableProperties( Status s = ROCKSDB_NAMESPACE::ReadTableProperties(
file, file_size, table_magic_number, ioptions_, &table_properties_, file, file_size, table_magic_number, ioptions_, read_options,
&table_properties_,
/* memory_allocator= */ nullptr, prefetch_buffer); /* memory_allocator= */ nullptr, prefetch_buffer);
if (!s.ok()) { if (!s.ok()) {
if (!silent_) { if (!silent_) {
@ -514,4 +517,3 @@ Status SstFileDumper::ReadTableProperties(
return init_result_; return init_result_;
} }
} // namespace ROCKSDB_NAMESPACE } // namespace ROCKSDB_NAMESPACE

@ -66,6 +66,7 @@ Status SstFileReader::Open(const std::string& file_path) {
} }
Iterator* SstFileReader::NewIterator(const ReadOptions& roptions) { Iterator* SstFileReader::NewIterator(const ReadOptions& roptions) {
assert(roptions.io_activity == Env::IOActivity::kUnknown);
auto r = rep_.get(); auto r = rep_.get();
auto sequence = roptions.snapshot != nullptr auto sequence = roptions.snapshot != nullptr
? roptions.snapshot->GetSequenceNumber() ? roptions.snapshot->GetSequenceNumber()
@ -91,9 +92,9 @@ std::shared_ptr<const TableProperties> SstFileReader::GetTableProperties()
} }
Status SstFileReader::VerifyChecksum(const ReadOptions& read_options) { Status SstFileReader::VerifyChecksum(const ReadOptions& read_options) {
assert(read_options.io_activity == Env::IOActivity::kUnknown);
return rep_->table_reader->VerifyChecksum(read_options, return rep_->table_reader->VerifyChecksum(read_options,
TableReaderCaller::kSSTFileReader); TableReaderCaller::kSSTFileReader);
} }
} // namespace ROCKSDB_NAMESPACE } // namespace ROCKSDB_NAMESPACE

Some files were not shown because too many files have changed in this diff Show More

Loading…
Cancel
Save