From ca81b80d83f4344c2b7665e57d5ce188751695b3 Mon Sep 17 00:00:00 2001 From: Andrew Kryczka Date: Tue, 28 Jun 2022 14:27:49 -0700 Subject: [PATCH] Deflake RateLimiting/BackupEngineRateLimitingTestWithParam (#10271) Summary: We saw flakes with the following failure: ``` [ RUN ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/1 utilities/backup/backup_engine_test.cc:2667: Failure Expected: (restore_time) > (0.8 * rate_limited_restore_time), actual: 48269 vs 60470.4 terminate called after throwing an instance of 'testing::internal::GoogleTestFailureException' what(): utilities/backup/backup_engine_test.cc:2667: Failure Expected: (restore_time) > (0.8 * rate_limited_restore_time), actual: 48269 vs 60470.4 Received signal 6 (Aborted) t/run-backup_engine_test-RateLimiting-BackupEngineRateLimitingTestWithParam.RateLimiting-1: line 4: 1032887 Aborted (core dumped) TEST_TMPDIR=$d ./backup_engine_test --gtest_filter=RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/1 ``` Investigation revealed we forgot to use the mock time `SystemClock` for restore rate limiting. Then the test used wall clock time, which made the execution of "GenericRateLimiter::Request:PostTimedWait" non-deterministic as wall clock time might have advanced enough that waiting was not needed. This PR changes restore rate limiting to use mock time, which guarantees we always execute "GenericRateLimiter::Request:PostTimedWait". Then the assertions that rely on times recorded inside that callback should be robust. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10271 Test Plan: Applied the following patch which guaranteed repro before the fix. Verified the test passes after this PR even with that patch applied. ``` diff --git a/util/rate_limiter.cc b/util/rate_limiter.cc index f369e3220..6b3ed82fa 100644 --- a/util/rate_limiter.cc +++ b/util/rate_limiter.cc @@ -158,6 +158,7 @@ void GenericRateLimiter::SetBytesPerSecond(int64_t bytes_per_second) { void GenericRateLimiter::Request(int64_t bytes, const Env::IOPriority pri, Statistics* stats) { + usleep(100000); assert(bytes <= refill_bytes_per_period_.load(std::memory_order_relaxed)); bytes = std::max(static_cast(0), bytes); TEST_SYNC_POINT("GenericRateLimiter::Request"); ``` Reviewed By: hx235 Differential Revision: D37499848 Pulled By: ajkr fbshipit-source-id: fd790d5a192996be8ba13b656751ccc7d8cb8f6e --- utilities/backup/backup_engine_test.cc | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/utilities/backup/backup_engine_test.cc b/utilities/backup/backup_engine_test.cc index 3b939c5ec..1f49bdaf6 100644 --- a/utilities/backup/backup_engine_test.cc +++ b/utilities/backup/backup_engine_test.cc @@ -2655,8 +2655,10 @@ TEST_P(BackupEngineRateLimitingTestWithParam, RateLimiting) { ASSERT_GT(backup_time, 0.8 * rate_limited_backup_time); OpenBackupEngine(); - TEST_SetDefaultRateLimitersClock(backup_engine_.get(), - special_env->GetSystemClock()); + TEST_SetDefaultRateLimitersClock( + backup_engine_.get(), + special_env->GetSystemClock() /* backup_rate_limiter_clock */, + special_env->GetSystemClock() /* restore_rate_limiter_clock */); auto start_restore = special_env->NowMicros(); ASSERT_OK(backup_engine_->RestoreDBFromLatestBackup(dbname_, dbname_));