From b78ed0460b7f16013e62f72a0253eb98220a0965 Mon Sep 17 00:00:00 2001 From: Andrew Kryczka Date: Thu, 1 Feb 2018 09:36:01 -0800 Subject: [PATCH] fix ReadaheadRandomAccessFile/iterator prefetch bug Summary: `ReadaheadRandomAccessFile` is used by iterators for file reads in several cases, like in compaction when `compaction_readahead_size > 0` or `use_direct_io_for_flush_and_compaction == true`, or in user iterator when `ReadOptions::readahead_size > 0`. `ReadaheadRandomAccessFile` maintains an internal buffer for readahead data. It assumes that, if the buffer's length is less than `ReadaheadRandomAccessFile::readahead_size_`, which is fixed in the constructor, then EOF has been reached so it doesn't try reading further. Recently, d938226af405681c592f25310f41c0c933bcdb19 started calling `RandomAccessFile::Prefetch` with various lengths: 8KB, 16KB, etc. When the `RandomAccessFile` is a `ReadaheadRandomAccessFile`, it triggers the above condition and incorrectly determines EOF. If a block is partially in the readahead buffer and EOF is incorrectly decided, the result is a truncated data block. The problem is reproducible: ``` TEST_TMPDIR=/data/compaction_bench ./db_bench -benchmarks=fillrandom -write_buffer_size=1048576 -target_file_size_base=1048576 -block_size=18384 -use_direct_io_for_flush_and_compaction=true ... put error: Corruption: truncated block read from /data/compaction_bench/dbbench/000014.sst offset 20245, expected 10143 bytes, got 8427 ``` Closes https://github.com/facebook/rocksdb/pull/3454 Differential Revision: D6869405 Pulled By: ajkr fbshipit-source-id: 87001c299e7600a37c0dcccbd0368e0954c929cf --- util/file_reader_writer.cc | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/util/file_reader_writer.cc b/util/file_reader_writer.cc index cf18004d1..448efa7e3 100644 --- a/util/file_reader_writer.cc +++ b/util/file_reader_writer.cc @@ -541,6 +541,11 @@ class ReadaheadRandomAccessFile : public RandomAccessFile { } virtual Status Prefetch(uint64_t offset, size_t n) override { + if (n < readahead_size_) { + // Don't allow smaller prefetches than the configured `readahead_size_`. + // `Read()` assumes a smaller prefetch buffer indicates EOF was reached. + return Status::OK(); + } size_t prefetch_offset = TruncateToPageBoundary(alignment_, offset); if (prefetch_offset == buffer_offset_) { return Status::OK();