Fix race condition between file purge and backup/checkpoint (#10187)
Summary: Resolves https://github.com/facebook/rocksdb/issues/10129 I extracted this fix from https://github.com/facebook/rocksdb/issues/7516 since it's also already a bug in main branch, and we want to separate it from the main part of the PR. There can be a race condition between two threads. Thread 1 executes `DBImpl::FindObsoleteFiles()` while thread 2 executes `GetSortedWals()`. ``` Time thread 1 thread 2 | mutex_.lock | read disable_delete_obsolete_files_ | ... | wait on log_sync_cv and release mutex_ | mutex_.lock | ++disable_delete_obsolete_files_ | mutex_.unlock | mutex_.lock | while (pending_purge_obsolete_files > 0) { bg_cv.wait;} | wake up with mutex_ locked | compute WALs tracked by MANIFEST | mutex_.unlock | wake up with mutex_ locked | ++pending_purge_obsolete_files_ | mutex_.unlock | | delete obsolete WAL | WAL missing but tracked in MANIFEST. V ``` The fix proposed eliminates the possibility of the above by increasing `pending_purge_obsolete_files_` before `FindObsoleteFiles()` can possibly release the mutex. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10187 Test Plan: make check Reviewed By: ltamasi Differential Revision: D37214235 Pulled By: riversand963 fbshipit-source-id: 556ab1b58ae6d19150169dfac4db08195c797184main
parent
6061905790
commit
725df120e9
Loading…
Reference in new issue