Summary:
Context:
[Rapid thread creation and deletion](https://github.com/facebook/rocksdb/blob/6.27.fb/utilities/transactions/write_prepared_transaction_test.cc#L439-L444) in `SnapshotConcurrentAccessTest.SnapshotConcurrentAcces` inside a [potentially big loop](https://github.com/facebook/rocksdb/blob/6.27.fb/utilities/transactions/write_prepared_transaction_test.cc#L1238-L1248) can lead to heavy-loading the system with many threads due to delay in actually cleaning up thread's resource in the kernel sometime. We ran into some [flaky failure](https://app.circleci.com/pipelines/github/facebook/rocksdb/10383/workflows/136f1005-80a9-4515-aee9-fe36ac6462a1/jobs/253289) in CI and reproduced it by below:
- Command
```
Added `ROCKSDB_NAMESPACE::port::InstallStackTraceHandler();` like https://github.com/facebook/rocksdb/pull/9276
DEBUG_LEVEL=2 make -j56 write_prepared_transaction_test
GTEST_CATCH_EXCEPTIONS=0 ~/gtest-parallel/gtest-parallel -r 200 -w 200 ./write_prepared_transaction_test --gtest_filter=TwoWriteQueues/SnapshotConcurrentAccessTest.SnapshotConcurrentAccess/1
```
- Stack, where `write_prepared_transaction_test.cc:442` in `https://github.com/facebook/rocksdb/issues/9` points to thread creation
```
[ RUN ] TwoWriteQueues/SnapshotConcurrentAccessTest.SnapshotConcurrentAccess/1
....terminate called after throwing an instance of 'std::system_error'
what(): Resource temporarily unavailable
Received signal 6 (Aborted)
#0 /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x38) [0x7fc114f39438]
...
https://github.com/facebook/rocksdb/issues/7 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8e73) [0x7fc1158a5e73] ?? ??:0
https://github.com/facebook/rocksdb/issues/8 ./write_prepared_transaction_test() [0x4ca86c] std:🧵:thread<rocksdb::WritePreparedTransactionTestBase::SnapshotConcurrentAccessTestInternal(rocksdb::WritePreparedTxnDB*, std::vector<unsigned long, std::allocator<unsigned long> > const&, std::vector<unsigned long, std::allocator<unsigned long> const&, rocksdb::WritePreparedTxnDB::CommitEntry&, unsigned long&, unsigned long, unsigned long, unsigned long, unsigned long)::{lambda()https://github.com/facebook/rocksdb/issues/1}>(rocksdb::WritePreparedTransactionTestBase::SnapshotConcurrentAccessTestInternal(rocksdb::WritePreparedTxnDB*, s d::vector<unsigned long, std::allocator<unsigned long> > const&, std::vector<unsigned long, std::allocator<unsigned long> > const&, rocksdb::WritePreparedTxnDB::CommitEntry&, unsigned long&, unsigned long, unsigned long, unsigned long, unsigned long)::{l mbda()https://github.com/facebook/rocksdb/issues/1}&&) /usr/include/c++/5/thread:137 (discriminator 4)
https://github.com/facebook/rocksdb/issues/9 ./write_prepared_transaction_test() [0x4bb80c] rocksdb::WritePreparedTransactionTestBase::SnapshotConcurrentAccessTestInternal(rocksdb::WritePreparedTxnDB*, std::vector<unsigned long, std::allocator<unsigned long> > const&, std::vector<unsigned long, std::allocator<unsigned long> > const&, rocksdb::W itePreparedTxnDB::CommitEntry&, unsigned long&, unsigned long, unsigned long, unsigned long, unsigned long) /home/circleci/project/utilities/transactions/write_prepared_transaction_test.cc:442
https://github.com/facebook/rocksdb/issues/10 ./write_prepared_transaction_test() [0x4407b6] rocksdb::SnapshotConcurrentAccessTest_SnapshotConcurrentAccess_Test::TestBody() /home/circleci/project/utilities/transactions/write_prepared_transaction_test.cc:1244
...
[109/200] TwoWriteQueues/SnapshotConcurrentAccessTest.SnapshotConcurrentAccess/1 returned/aborted with exit code -6 (34462 ms)
```
- Move thread 2's work into current thread to avoid half of the thread creation cuz there is no difference in doing so. We expect this can make the thread-creation error less often, even though we can't gurantee it from happening again. Considering this is a trivial change with positive impact, it's still worth landing and monitor if it's enough to solve the problem in reality.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9281
Test Plan:
Before the change, repeating the test 200 times with 200 workers failed
`~/gtest-parallel/gtest-parallel -r 200 -w 200 ./write_prepared_transaction_test --gtest_filter=TwoWriteQueues/SnapshotConcurrentAccessTest.SnapshotConcurrentAccess/1`
```
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from TwoWriteQueues/SnapshotConcurrentAccessTest
[ RUN ] TwoWriteQueues/SnapshotConcurrentAccessTest.SnapshotConcurrentAccess/1
..unknown file: Failure
C++ exception with description "Resource temporarily unavailable" thrown in the test body.
[ FAILED ] TwoWriteQueues/SnapshotConcurrentAccessTest.SnapshotConcurrentAccess/1, where GetParam() = (false, true, 1, 0, 1, 20) (11882 ms)
[----------] 1 test from TwoWriteQueues/SnapshotConcurrentAccessTest (11882 ms total)
[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (11882 ms total)
[ PASSED ] 0 tests.
[ FAILED ] 1 test, listed below:
[ FAILED ] TwoWriteQueues/SnapshotConcurrentAccessTest.SnapshotConcurrentAccess/1, where GetParam() = (false, true, 1, 0, 1, 20)
```
After the change: repeating the test 200 times with 200 workers didn't fail, even with repeating the "repeating" for 10 times like below
`for i in {1..10}; do ~/gtest-parallel/gtest-parallel -r 200 -w 200 ./write_prepared_transaction_test --gtest_filter=TwoWriteQueues/SnapshotConcurrentAccessTest.SnapshotConcurrentAccess/1; done`
```
[200/200] TwoWriteQueues/SnapshotConcurrentAccessTest.SnapshotConcurrentAccess/1
[200/200] TwoWriteQueues/SnapshotConcurrentAccessTest.SnapshotConcurrentAccess/1
[200/200] TwoWriteQueues/SnapshotConcurrentAccessTest.SnapshotConcurrentAccess/1
[200/200] TwoWriteQueues/SnapshotConcurrentAccessTest.SnapshotConcurrentAccess/1
[200/200] TwoWriteQueues/SnapshotConcurrentAccessTest.SnapshotConcurrentAccess/1
[200/200] TwoWriteQueues/SnapshotConcurrentAccessTest.SnapshotConcurrentAccess/1
[200/200] TwoWriteQueues/SnapshotConcurrentAccessTest.SnapshotConcurrentAccess/1
[200/200] TwoWriteQueues/SnapshotConcurrentAccessTest.SnapshotConcurrentAccess/1
[200/200] TwoWriteQueues/SnapshotConcurrentAccessTest.SnapshotConcurrentAccess/1
[200/200] TwoWriteQueues/SnapshotConcurrentAccessTest.SnapshotConcurrentAccess/1
```
It does failed when repeating the test 400 times with 400 workers
`~/project$ ~/gtest-parallel/gtest-parallel -r 400 -w 400 ./write_prepared_transaction_test --gtest_filter=TwoWriteQueues/SnapshotConcurrentAccessTest.SnapshotConcurrentAccess/1`
```
[1/400] TwoWriteQueues/SnapshotConcurrentAccessTest.SnapshotConcurrentAccess/1 (2928 ms)
Note: Google Test filter = TwoWriteQueues/SnapshotConcurrentAccessTest.SnapshotConcurrentAccess/1
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from TwoWriteQueues/SnapshotConcurrentAccessTest
[ RUN ] TwoWriteQueues/SnapshotConcurrentAccessTest.SnapshotConcurrentAccess/1
unknown file: Failure
C++ exception with description "std::bad_alloc" thrown in the test body.
[ FAILED ] TwoWriteQueues/SnapshotConcurrentAccessTest.SnapshotConcurrentAccess/1, where GetParam() = (false, true, 1, 0, 1, 20) (2597 ms)
[----------] 1 test from TwoWriteQueues/SnapshotConcurrentAccessTest (2597 ms total)
```
Reviewed By: ajkr
Differential Revision: D33026776
Pulled By: hx235
fbshipit-source-id: 509f57126392821e835e48396e5bf224f4f5dcac
Summary:
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9266
This diff adds a new tag `CommitWithTimestamp`. Currently, there is no API to trigger writing
this tag to WAL, thus it is unavailable to users.
This is an ongoing effort to add user-defined timestamp support to write-committed transactions.
This diff also indicates all column families that may potentially participate in the same
transaction must either disable timestamp or have the same timestamp format, since
`CommitWithTimestamp` tag is followed by a single byte-array denoting the commit
timestamp of the transaction. We will enforce this checking in a future diff. We keep this
diff small.
Reviewed By: ltamasi
Differential Revision: D31721350
fbshipit-source-id: e1450811443647feb6ca01adec4c8aaae270ffc6
Summary:
I'm working on a new format_version=6 to support context
checksum (https://github.com/facebook/rocksdb/issues/9058) and this includes much of the refactoring and test
updates to support that change.
Test coverage data and manual inspection agree on dead code in
block_based_table_reader.cc (removed).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9240
Test Plan:
tests enhanced to cover more cases etc.
Extreme case performance testing indicates small % regression in fillseq (w/ compaction), though CPU profile etc. doesn't suggest any explanation. There is enhanced correctness checking in Footer::DecodeFrom, but this should be negligible.
TEST_TMPDIR=/dev/shm/ ./db_bench -benchmarks=fillseq -memtablerep=vector -allow_concurrent_memtable_write=false -num=30000000 -checksum_type=1 --disable_wal={false,true}
(Each is ops/s averaged over 50 runs, run simultaneously with competing configuration for load fairness)
Before w/ wal: 454512
After w/ wal: 444820 (-2.1%)
Before w/o wal: 1004560
After w/o wal: 998897 (-0.6%)
Since this doesn't modify WAL code, one would expect real effects to be larger in w/o wal case.
This regression will be corrected in a follow-up PR.
Reviewed By: ajkr
Differential Revision: D32813769
Pulled By: pdillinger
fbshipit-source-id: 444a244eabf3825cd329b7d1b150cddce320862f
Summary:
This changes write_prepared_transaction_test under CircleCI to
print a stack trace on unhandled exception, so that we can debug rare
exceptions seen in CircleCI:
[ RUN ] TwoWriteQueues/SnapshotConcurrentAccessTest.SnapshotConcurrentAccess/24
.......unknown file: Failure
C++ exception with description "Resource temporarily unavailable" thrown in the test body.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9276
Test Plan:
manual run test with seeded 'throw', with and without
CIRCLECI=true environment variable
Reviewed By: ajkr, hx235
Differential Revision: D32996993
Pulled By: pdillinger
fbshipit-source-id: e790408ce204b676d3d84a290e41be511b203bfa
Summary:
If ignore_unsupported_options=true, then it is possible for MemTableRepFactory::CreateFromString to succeed without setting a result (result=nullptr). This would cause the original value to be overwritten with null and an error would be raised later when PrepareOptions is invoked.
Added unit test for this condition. Will add (in another PR unless required by reviewers) comparable tests for all of the other Customizable classes.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9273
Reviewed By: ltamasi
Differential Revision: D32990365
Pulled By: mrambacher
fbshipit-source-id: b150724c3f5ae7346357b3866244fd93466875c7
Summary:
Previously, the OnErrorRecoveryCompleted callback was called when
RocksDB was able to successfully recover from a retryable error.
However, if the recovery failed and was eventually stopped, there was no
indication of the status. To fix that, a new OnErrorRecoveryEnd callback
is introduced that deprecates the OnErrorRecoveryCompleted callback. The
new callback is called with the original error and the new error status.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9244
Test Plan: Add a new unit test in error_handler_fs_test
Reviewed By: zhichao-cao
Differential Revision: D32922303
Pulled By: anand1976
fbshipit-source-id: f04e77a9cb92c5ea6385590682d3fcf559971b99
Summary:
When table_options.prepopulate_block_cache is set to
BlockBasedTableOptions::PrepopulateBlockCache::kFlushOnly and
table_options.partition_filters is also set true, then there is
segmentation failure when top level filter is fetched because its
entered with wrong type in cache.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9263
Test Plan:
Updated unit tests;
Ran db_stress: make crash_test -j32
Reviewed By: pdillinger
Differential Revision: D32936566
Pulled By: akankshamahajan15
fbshipit-source-id: 8bd79e53830d3e3c1bb79787e1ffbc3cb46d4426
Summary:
This test case seems to be occasionally failing due to the code hitting
the immediate deletion branch in `DeleteScheduler::DeleteFile`. The
patch increases the allowed trash ratio to a huge value to prevent this
from happening.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9269
Test Plan:
```
gtest-parallel --repeat=10000 ./db_sst_test --gtest_filter=DBSSTTest.DestroyDBWithRateLimitedDelete
```
Reviewed By: akankshamahajan15
Differential Revision: D32956596
Pulled By: ltamasi
fbshipit-source-id: 3945e7c1c19ede76698e03c3f133bc1d9fd61b84
Summary:
Context/Summary:
Uninitialized variable `SequenceNumber old_saved_seqno` causes asan related compilation error/warning below:
```
db_stress_tool/expected_state.cc:308:55: error: ‘old_saved_seqno’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
308 | if (s.ok() && old_saved_seqno != kMaxSequenceNumber &&
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
```
Fix it by initializing to 0.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9265
Test Plan:
- make clean && COMPILE_WITH_ASAN=1 make -j48 db_stress_tool/expected_state.o
- monitor if same error happens again after merging
Reviewed By: ajkr
Differential Revision: D32939630
Pulled By: hx235
fbshipit-source-id: 41697515fd11ada8427f606b5dceb4e58d12cb80
Summary:
The `Statistics` objects are meant to be shared across translation
units, but this was prevented by declaring them static. We need to
ensure they are defined once in the program. The effect is now
`StressTest::PrintStatistics()` can actually print statistics since it
now sees non-null values when `--statistics=1`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9260
Reviewed By: zhichao-cao
Differential Revision: D32910162
Pulled By: ajkr
fbshipit-source-id: c926d6f556177987bee5fa3cbc87597803b230ee
Summary:
When `--sync_fault_injection` is set, this PR takes a snapshot of the expected values and starts an operation trace when the DB is opened. These files are stored in `--expected_values_dir`. They will be used for recovering the expected state of the DB following a crash where a suffix of unsynced operations are allowed to be lost.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8960
Test Plan: injected crashed at various points in `FileExpectedStateManager` and verified the next run recovers the state/trace file with highest seqno and removes all older/temporary files. Note we don't use sync_fault_injection in CI crash tests yet.
Reviewed By: pdillinger
Differential Revision: D31194941
Pulled By: ajkr
fbshipit-source-id: b0f935a529a0186c5a9c7709fcaa8829de8a84cf
Summary:
Fix a bug that causes file temperature not preserved after DB is restarted, or options.max_manifest_file_size is hit.
Also, pass temperature information to NewRandomAccessFile() to allow users to hack a solution where they don't preserve tiering information.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9242
Test Plan: Add a unit test that would fail without the fix.
Reviewed By: jay-zhuang
Differential Revision: D32818150
fbshipit-source-id: 36aa3f148c60107f7b8e9d65b63b039f9e1a1eec
Summary:
When using the SST file manager, the actual deletion of DB files
potentially occurs in the background. The patch adds another call
to `SstFileManagerImpl::WaitForEmptyTrash` to the test case
`DBSSTTest.DBWithSFMForBlobFilesAtomicFlush` to ensure the deletions
are performed before the test checks the number of deleted files.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9241
Test Plan:
```
gtest-parallel --repeat=1000 ./db_sst_test --gtest_filter=DBSSTTest.DBWithSFMForBlobFilesAtomicFlush
```
Reviewed By: akankshamahajan15
Differential Revision: D32811427
Pulled By: ltamasi
fbshipit-source-id: 7f2ad649a22bd2d7900e5f132372034093cfcf47
Summary:
**Context:**
Searching `TableProperties::properties_offsets` across the codebase reveals that internally it is only used to find the external SST file's global seqno offeset. Therefore we can narrow it down and replace this map property with a uint64_t property `external_sst_file_global_seqno_offset` to save memory usage related to table properties.
Note:
- See PR comments for discussion about potential impact on existing external usage of `TableProperties::properties_offsets`
- See PR comments for discussion on keeping external SST file global seqno's offset VS using a simple flag indicating seqno's existence.
**Summary:**
- Replaced `TableProperties::properties_offsets` with `TableProperties::external_sst_file_global_seqno_offset`
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9212
Test Plan: - Relied on existing tests should be sufficient since `TableProperties::properties_offsets` existed before and should already be tested.
Reviewed By: ajkr
Differential Revision: D32665941
Pulled By: hx235
fbshipit-source-id: 718e44617346dc4f3b1276ee953e61c196277795
Summary:
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9205
Update WriteBatch::AssignTimestamp() APIs so that they take an
additional argument, i.e. a function object called `checker` indicating the user-specified logic of performing
checks on timestamp sizes.
WriteBatch is a building block used by multiple other RocksDB components, each of which may track
timestamp information in different data structures. For example, transaction can either write to
`WriteBatchWithIndex` which is a `WriteBatch` with index, or write directly to raw `WriteBatch` if
`Transaction::DisableIndexing()` is called.
`WriteBatchWithIndex` keeps mapping from column family id to comparator, and transaction needs
to keep similar information for the `WriteBatch` if user calls `Transaction::DisableIndexing()` (dynamically)
so that we will know the size of each timestamp later. The bookkeeping info maintained by `WriteBatchWithIndex`
and `Transaction` should not overlap.
When we later call `WriteBatch::AssignTimestamp()`, we need to use these data structures to guarantee
that we do not accidentally assign timestamps for keys from column families that disable timestamp.
Reviewed By: ltamasi
Differential Revision: D31735186
fbshipit-source-id: 8b1709ed880ac72f995aa9e012e5873b290840a7
Summary:
Extend C API to add new function `rocksdb_livefiles_column_family_name`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9232
Reviewed By: akankshamahajan15
Differential Revision: D32736516
Pulled By: ajkr
fbshipit-source-id: a854256a0f4652c903ab5ad8355ded051ac19987
Summary:
This patch fixes an issue that occur when dependencies of plugins are not
installed to the same prefix as librocksdb. Because plugin dependencies are
declared in the `Libs` field of rocksdb.pc, programs that link against
librocksdb with `pkg-config --libs rocksdb` will link with `-L` flag for the
path of librocksdb only. This patch allows plugin dependencies to be declared in
the `Requires` field of rocksdb.pc, so that pkg-config will correctly provide
`-L` flags for dependencies of plugins that are installed in other locations.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9198
Reviewed By: akankshamahajan15
Differential Revision: D32596620
Pulled By: ajkr
fbshipit-source-id: e17b2b6452b5f2e955b430140197c57e26a4a518
Summary:
https://github.com/facebook/rocksdb/issues/9026 fixed histogram NUM_FILES_IN_SINGLE_COMPACTION for level compaction, but missed fix for universal compaction.
This PR fixed NUM_FILES_IN_SINGLE_COMPACTION for universal compaction.
Quote from https://github.com/facebook/rocksdb/issues/9026:
> currently histogram `NUM_FILES_IN_SINGLE_COMPACTION` just counted files in first level of compaction input, this fix counts files in all levels of compaction input.
Thanks for ajkr pointed this missed fix!
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9168
Reviewed By: akankshamahajan15
Differential Revision: D32434494
Pulled By: ajkr
fbshipit-source-id: 93ea092af4afbd8dce67898ffb350cf26b065ed2
Summary:
db_stress asserts/seg-faults with below command (on debug and release builds)
```
"rm -rf /tmp/rocksdbtest*; db_stress --ops_per_thread=1000 --reopen=5"
=======================================
Error opening unique id file for append: IO error: No such file or directory:
While open a file for appending: /tmp/rocksdbtest-0/dbstress/.unique_ids:
No such file or directory
Choosing random keys with no overwrite
Creating 2621440 locks
Starting continuous_verification_thread
2021/11/15-08:46:49 Initializing worker threads
2021/11/15-08:46:49 Starting database operations
2021/11/15-08:46:49 Reopening database for the 1th time
WARNING: prefix_size is non-zero but memtablerep != prefix_hash
DB path: [/tmp/rocksdbtest-0/dbstress]
Segmentation fault
=======================================
```
StressTest() constructor deletes the directory "dbstress" because
the option --destroy_db_initially is true by default in db_stress.
This Seg fault happens on a new database, UniqueIdVerifier's constructor
tries to read the ".unique_ids" file, if the file is not present,
ReopenWritableFile() tries to create .unique_ids file, but fails
as the directory db_stress is not available. The data_file_writer_
is set as an invalid(null) pointer and in subsequent calls (~UniqueIdVerifier()
and UniqueIdVerifier::Verify()) it accesses this null pointer and crashes.
This patch creates db_stress directory if it is missing, so the .unique_ids file
is created.
Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com>
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9219
Reviewed By: ajkr
Differential Revision: D32730151
Pulled By: pdillinger
fbshipit-source-id: f47baba56b380d93c3ba5608904756e86bbf14f5
Summary:
1. Fix GetOptionsPtr for Wrapped (Inner() != nullptr) Customizable objects. This allows the inner options to be returned via this method.
2. Allow the option type map to be nullptr. This allows objects to be registered as options (for GetOptionsPtr) but not be used by the configuration methods.
Added tests as appropriate.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9213
Reviewed By: zhichao-cao
Differential Revision: D32718882
Pulled By: mrambacher
fbshipit-source-id: 563203d1f006a2629060feb31c5dff9a233e1e83
Summary:
Added missing include, and cleaned up to make same mistake less
likely in future (minimize conditional compilation)
Fixes https://github.com/facebook/rocksdb/issues/9183
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9209
Test Plan: added to existing test
Reviewed By: mrambacher
Differential Revision: D32631390
Pulled By: pdillinger
fbshipit-source-id: 63a0501855cf5fac9e22ca1e5c4f53725dbf3f93
Summary:
You could easily reproduce the failure by injecting sleep(11)
before `store.Flush()`. Fixed by setting TTL time to approximately test
timeout time.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9226
Test Plan: manual
Reviewed By: akankshamahajan15
Differential Revision: D32698105
Pulled By: pdillinger
fbshipit-source-id: 40529af9d9f2389585988b7c81dffb120e2795a2
Summary:
Saw error like this:
`Backup failed -- IO error: No such file or directory: While opening a
file for sequentially reading:
/dev/shm/rocksdb/rocksdb_crashtest_blackbox/004426.log: No such file or
directory`
Unfortunately, GetSortedWalFiles (used by Backups, Checkpoint, etc.)
relies on no file deletions happening while its operating, which
means not only disabling (more) deletions, but ensuring any pending
deletions are completed. Two fixes related to this:
* There was a gap in several places between decrementing
pending_purge_obsolete_files_ and incrementing bg_purge_scheduled_ where
the db mutex would be released and GetSortedWalFiles (and others) could
get false information that no deletions are pending.
* The fix to https://github.com/facebook/rocksdb/issues/8591 (disabling deletions in GetSortedWalFiles) seems
incomplete because it doesn't prevent pending deletions from occuring
during the operation (if deletions not already disabled, the case that
was to be fixed by the change).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9208
Test Plan:
existing tests (it's hard to write a test for interleavings
that are now excluded - this is what stress test is for)
Reviewed By: ajkr
Differential Revision: D32630675
Pulled By: pdillinger
fbshipit-source-id: a121e3da648de130cd24d44c524232f4eb22f178
Summary:
* added missing override specifiers for overriden methods
this fixes compiler warnings emitted by g++ and clang++ when compile option `-Wsuggest-override` is turned on.
* fix compile warning with -Wmaybe-uninitialized
g++-11 warns about a _potentially_ uninitialized variable when using `-Wmaybe_uninitialized`:
```
env/env.cc: In member function ‘virtual rocksdb::Status rocksdb::Env::GetHostNameString(std::string*)’:
env/env.cc:738:66: error: ‘hostname_buf’ may be used uninitialized [-Werror=maybe-uninitialized]
738 | Status s = GetHostName(hostname_buf.data(), hostname_buf.size());
| ^
In file included from /usr/include/c++/11/tuple:39,
from /usr/include/c++/11/functional:54,
from ./include/rocksdb/env.h:22,
from env/env.cc:10:
/usr/include/c++/11/array:176:7: note: by argument 1 of type ‘const std::array<char, 256>*’ to ‘constexpr std::array<_Tp, _Nm>::size_type std::array<_Tp, _Nm>::size() const [with _Tp = char; long unsigned int _Nm = 256]’ declared here
176 | size() const noexcept { return _Nm; }
| ^~~~
env/env.cc:737:37: note: ‘hostname_buf’ declared here
737 | std::array<char, kMaxHostNameLen> hostname_buf;
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9199
Reviewed By: jay-zhuang
Differential Revision: D32630703
Pulled By: pdillinger
fbshipit-source-id: 9ea3010b1105a582548e3c3c0db4475b201e4a10
Summary:
More follow-up to https://github.com/facebook/rocksdb/issues/9193 + https://github.com/facebook/rocksdb/issues/9188
* Even though we need to print ETA updates to avoid hitting the 10min
timeout, we need to avoid printing an update if there's no actual
progress, so that hung tests will timeout after 10 min rather than 5
hours.
* When there is a hung test, it's really annoying to track down which
test is hung, so if no progress is observed for 1 minute, we run ps once
to show what is running.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9201
Test Plan: manual and CircleCI
Reviewed By: jay-zhuang
Differential Revision: D32612028
Pulled By: pdillinger
fbshipit-source-id: 00f8ea70fc5fec9ede28ff74287d90fc73854aad
Summary:
Printing file checksum (usually an integer) in non-hex format is barely useful. To make the matter
worse, it can mess with the output format. If you use `less` to redirect the output of `ldb manifest_dump`,
non-hex file checksum can cause `less` not to function as expected.
Also output some additional fields to json output.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9196
Test Plan: manually test `ldb manifest_dump`.
Reviewed By: ajkr
Differential Revision: D32590253
Pulled By: riversand963
fbshipit-source-id: de434b7e60dd05b0b7cb76eff2240b21f9ae4b32
Summary:
Address some issues with https://github.com/facebook/rocksdb/issues/9188
* Internal CI doesn't render \r as anything, so use \n for "not
connected to terminal" case
* CircleCI apparently uses a pseudo-tty for output and although
rerdirect stdout (because of EAGAIN bug) we don't redirect stderr, so it
is detected as a terminal-connected case. Fix by redirecting stderr
also.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9193
Test Plan: manual, CI
Reviewed By: akankshamahajan15
Differential Revision: D32581128
Pulled By: pdillinger
fbshipit-source-id: 5ae7c3209128d8dbd4153c5b9fdb2b810e6deb2e
Summary:
Disable the QPS verification in test temporally, which causes the test failure due to different system delays.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9190
Test Plan: make check
Reviewed By: siying
Differential Revision: D32576289
Pulled By: zhichao-cao
fbshipit-source-id: 1df972e77dd82eed5af3462e5db5e141aadf8fae
Summary:
Original unit test fail to test the case of multi-cf mode switching to new manifest. The assertion
failure will trigger when the primary instance reopens and secondary continues to tail the
newly-created MANIFEST. Fix the assertion failure and update existing unit tests.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9143
Test Plan: make check
Reviewed By: ltamasi
Differential Revision: D32574233
Pulled By: riversand963
fbshipit-source-id: 857ddbe994019091276458abebcf8e2b65340468
Summary:
The patch adds a new BlobDB configuration option `blob_compaction_readahead_size`
that can be used to enable prefetching data from blob files during compaction.
This is important when using storage with higher latencies like HDDs or remote filesystems.
If enabled, prefetching is used for all cases when blobs are read during compaction,
namely garbage collection, compaction filters (when the existing value has to be read from
a blob file), and `Merge` (when the value of the base `Put` is stored in a blob file).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9187
Test Plan: Ran `make check` and the stress/crash test.
Reviewed By: riversand963
Differential Revision: D32565512
Pulled By: ltamasi
fbshipit-source-id: 87be9cebc3aa01cc227bec6b5f64d827b8164f5d
Summary:
A bug in https://github.com/facebook/rocksdb/issues/9163 can cause checksum verification to fail if
parsing a properties block fails.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9189
Test Plan:
check_format_compatible.sh (never quite works locally but
this particular case seems fixed using variants of SHORT_TEST=1).
And added new unit test case.
Reviewed By: ajkr
Differential Revision: D32574626
Pulled By: pdillinger
fbshipit-source-id: 6fa5c8595737b71a3c3d011a52daf6d6c08715d7
Summary:
`ReadOptions::iter_start_seqnum` and `DBOptions::preserve_deletes` are
deprecated, please try using user defined timestamp feature instead.
The feature is used to support differential snapshots, but not well
maintained (https://github.com/facebook/rocksdb/issues/6837, https://github.com/facebook/rocksdb/issues/8472) and the interface is not user friendly which
returns an internal key from the iterator. The user defined timestamp
feature is a more flexible feature to support similar usecase, please
switch to that if you have such usecase.
The deprecated feature will be removed in a future release.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9091
Test Plan:
check LOG
Fix https://github.com/facebook/rocksdb/issues/9090
Reviewed By: ajkr
Differential Revision: D32071750
Pulled By: jay-zhuang
fbshipit-source-id: b882c4668dd1bf26ce03c4c192f1bba584bf6104
Summary:
Generating megabytes of successful test output has caused
issues / inconveniences for CI and internal sandcastle runs. This
changes their configuration to only print output from failed tests.
(Successful test output is still available in files under t/.)
This likewise changes default behavior of parallel `make check` as
a quick team poll showed interest in that.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9188
Test Plan:
Seed some test failures and observe
* `make -j24 check` (new behavior)
* `PRINT_PARALLEL_OUTPUTS=1 make -j24 check` (old CI behavior)
* `QUIET_PARALLEL_TESTS=1 make -j24 check` (old manual run behavior)
Reviewed By: siying
Differential Revision: D32567392
Pulled By: pdillinger
fbshipit-source-id: 8d8fb64aebd16bca103b11e3bd1f13c488a69611