From dbf37c290ae0cbf1532f706942c098380cb9c53a Mon Sep 17 00:00:00 2001 From: anand76 Date: Wed, 21 Dec 2022 09:15:53 -0800 Subject: [PATCH] Fix async prefetch heap use after free (#11049) Summary: This PR fixes a heap use after free bug in the async prefetch code that happens in the following scenario - 1. Scan thread starts 2 async reads for Seek, one for the seek block and one for prefetching 2. Before the first read in https://github.com/facebook/rocksdb/issues/1 completes, another thread reads and loads the block in cache 3. The first scan thread finds the block in cache, continues and the next block cache miss is for a block that spans the boundary of the 2 prefetch buffers, and the 1st read is complete but the 2nd one is not complete yet 4. The scan thread will reallocate (i.e free the old buffer and allocate a new one) the 2nd prefetch buffer, and the in-progress prefetch is orphaned 5. The orphaned prefetch finally completes, resulting in a use after free Also add a few asserts to surface bugs earlier in the crash tests. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11049 Test Plan: Repro with db_stress and verify the fix Reviewed By: akankshamahajan15 Differential Revision: D42181118 Pulled By: anand1976 fbshipit-source-id: 1ac55d2f64a89ce128c1c574262b8aa7d82eb8cc --- HISTORY.md | 1 + file/file_prefetch_buffer.cc | 15 ++++++++++++++- 2 files changed, 15 insertions(+), 1 deletion(-) diff --git a/HISTORY.md b/HISTORY.md index fd59277f0..de1d720ad 100644 --- a/HISTORY.md +++ b/HISTORY.md @@ -13,6 +13,7 @@ * Fixed a BackupEngine bug in which RestoreDBFromLatestBackup would fail if the latest backup was deleted and there is another valid backup available. * Fix L0 file misorder corruption caused by ingesting files of overlapping seqnos with memtable entries' through introducing `epoch_number`. Before the fix, `force_consistency_checks=true` may catch the corruption before it's exposed to readers, in which case writes returning `Status::Corruption` would be expected. Also replace the previous incomplete fix (#5958) to the same corruption with this new and more complete fix. * Fixed a bug in LockWAL() leading to re-locking mutex (#11020). +* Fixed a heap use after free bug in async scan prefetching when the scan thread and another thread try to read and load the same seek block into cache. ## 7.9.0 (11/21/2022) ### Performance Improvements diff --git a/file/file_prefetch_buffer.cc b/file/file_prefetch_buffer.cc index 9ea9129e2..4ac0d0504 100644 --- a/file/file_prefetch_buffer.cc +++ b/file/file_prefetch_buffer.cc @@ -247,10 +247,14 @@ void FilePrefetchBuffer::AbortAllIOs() { // Release io_handles. if (bufs_[curr_].io_handle_ != nullptr && bufs_[curr_].del_fn_ != nullptr) { DestroyAndClearIOHandle(curr_); + } else { + bufs_[curr_].async_read_in_progress_ = false; } if (bufs_[second].io_handle_ != nullptr && bufs_[second].del_fn_ != nullptr) { DestroyAndClearIOHandle(second); + } else { + bufs_[second].async_read_in_progress_ = false; } } @@ -325,7 +329,16 @@ Status FilePrefetchBuffer::HandleOverlappingData( uint64_t& tmp_offset, size_t& tmp_length) { Status s; size_t alignment = reader->file()->GetRequiredBufferAlignment(); - uint32_t second = curr_ ^ 1; + uint32_t second; + + // Check if the first buffer has the required offset and the async read is + // still in progress. This should only happen if a prefetch was initiated + // by Seek, but the next access is at another offset. + if (bufs_[curr_].async_read_in_progress_ && + IsOffsetInBufferWithAsyncProgress(offset, curr_)) { + PollAndUpdateBuffersIfNeeded(offset); + } + second = curr_ ^ 1; // If data is overlapping over two buffers, copy the data from curr_ and // call ReadAsync on curr_.