Summary:
This PR updates secondary instance testing in stress test by default.
A background thread will be started (disabled by default), running a secondary instance tailing the logs of the primary.
Periodically (every 1 sec), this thread calls `TryCatchUpWithPrimary()` and uses point lookup or range scan
to read some random keys with only very basic verification to make sure no assertion failure is triggered.
Thanks to https://github.com/facebook/rocksdb/issues/10061 , we can enable secondary instance when user-defined timestamp is enabled.
Also removed a less useful test configuration, `secondary_catch_up_one_in`. This is very similar to the periodic
catch-up.
In the last commit, I decided not to enable it now, but just update the tests, since secondary instance does not
work well when the underlying file is renamed by primary, e.g. SstFileManager.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/10121
Test Plan:
```
TEST_TMPDIR=/dev/shm/rocksdb make crash_test
TEST_TMPDIR=/dev/shm/rocksdb make crash_test_with_ts
TEST_TMPDIR=/dev/shm/rocksdb make crash_test_with_atomic_flush
```
Reviewed By: ajkr
Differential Revision: D36939458
Pulled By: riversand963
fbshipit-source-id: 1c065b7efc3690fc341569b9d369a5cbd8ef6b3e
Summary:
After https://github.com/facebook/rocksdb/issues/9357 we began seeing the following error attempting to acquire
locks for file ingestion:
```
FATAL: ThreadSanitizer CHECK failed: /home/engshare/third-party2/llvm-fb/12/src/llvm/compiler-rt/lib/sanitizer_common/sanitizer_deadlock_detector.h:67 "((n_all_locks_)) < (((sizeof(all_locks_with_contexts_)/sizeof((all_locks_with_contexts_)[0]))))" (0x40, 0x40)
```
The command was using default values for `ingest_external_file_width`
(1000) and `log2_keys_per_lock` (2). The expected number of locks needed
to update those keys is then (1000 / 2^2) = 250, which is above the 0x40 (64)
limit. This PR reduces the default value of `ingest_external_file_width`
to 100 so the expected number of locks is 25, which is within the limit.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/10131
Reviewed By: ltamasi
Differential Revision: D36986307
Pulled By: ajkr
fbshipit-source-id: e918cdb2fcc39517d585f1e5fd2539e185ada7c1
Summary:
**Summary:**
Add unit tests to verify that the dynamic priority can be passed from compaction to FS. Compaction reads&writes and other DB reads&writes share the same read&write paths to FSRandomAccessFile or FSWritableFile, so a MockTestFileSystem is added to replace the default filesystem from Env to intercept and verify the io_priority. To prepare the compaction input files, use the default filesystem from Env. To test the io priority of the compaction reads and writes, db_options_.fs is set as MockTestFileSystem.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/10088
Test Plan: Add unit tests.
Reviewed By: anand1976
Differential Revision: D36882528
Pulled By: gitbw95
fbshipit-source-id: 120adc15801966f2b8c9fc45285f590a3fff96d1
Summary:
The default implementation of Close() function in Directory/FSDirectory classes returns `NotSupported` status. However, we don't want operations that worked in older versions to begin failing after upgrading when run on FileSystems that have not implemented Directory::Close() yet. So we require the upper level that calls Close() function should properly handle "NotSupported" status instead of treating it as an error status.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/10127
Reviewed By: ajkr
Differential Revision: D36971112
Pulled By: littlepig2013
fbshipit-source-id: 100f0e6ad1191e1acc1ba6458c566a11724cf466
Summary:
As pointed out by [https://github.com/facebook/rocksdb/pull/8351#discussion_r645765422](https://github.com/facebook/rocksdb/pull/8351#discussion_r645765422), check `manual_compaction_paused` and `manual_compaction_canceled` can be reduced by setting `*canceled` to be true in `DisableManualCompaction()` and `*canceled` to be false in the last time calling `EnableManualCompaction()`.
Changed Tests: The origin `DBTest2.PausingManualCompaction1` uses a callback function to increase `manual_compaction_paused` and the origin CompactionJob/CompactionIterator with `manual_compaction_paused` can detect this. I changed the callback function so that it sets `*canceled` as true if `canceled` is not `nullptr` (to notify CompactionJob/CompactionIterator the compaction has been canceled).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/10070
Test Plan: This change does not introduce new features, but some slight difference in compaction implementation. Run the same manual compaction unit tests as before (e.g., PausingManualCompaction[1-4], CancelManualCompaction[1-2], CancelManualCompactionWithListener in db_test2, and db_compaction_test).
Reviewed By: ajkr
Differential Revision: D36949133
Pulled By: littlepig2013
fbshipit-source-id: c5dc4c956fbf8f624003a0f5ad2690240063a821
Summary:
With this change, when a given read timestamp is smaller than the column-family's full_history_ts_low, Get(), MultiGet() and iterators APIs will return Status::InValidArgument().
Test plan
```
$COMPILE_WITH_ASAN=1 make -j24 all
$./db_with_timestamp_basic_test --gtest_filter=DBBasicTestWithTimestamp.UpdateFullHistoryTsLow
$ make -j24 check
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/10109
Reviewed By: riversand963
Differential Revision: D36901126
Pulled By: jowlyzhang
fbshipit-source-id: 255feb1a66195351f06c1d0e42acb1ff74527f86
Summary:
As pointed by anand1976 in his [comment](https://github.com/facebook/rocksdb/pull/10049#pullrequestreview-994255819), previous implementation (adding Close() function in Directory/FSDirectory class) is not backward-compatible. And we mistakenly added the default implementation `return Status::NotSupported("Close")` or `return IOStatus::NotSupported("Close")` in WritableFile class in this [pull request](https://github.com/facebook/rocksdb/pull/10101). This pull request fixes the above issue.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/10123
Reviewed By: ajkr
Differential Revision: D36943661
Pulled By: littlepig2013
fbshipit-source-id: 9dc45f4d2ab3a9d51c30bdfde679f1d13c4d5509
Summary:
There was an overflow bug when computing the variance in the HistogramStat class.
This manifests, for instance, when running cache_bench with default arguments. This executes 32M lookups/inserts/deletes in a block cache, and then computes (among other things) the variance of the latencies. The variance is computed as ``variance = (cur_sum_squares * cur_num - cur_sum * cur_sum) / (cur_num * cur_num)``, where ``cum_sum_squares`` is the sum of the squares of the samples, ``cur_num`` is the number of samples, and ``cur_sum`` is the sum of the samples. Because the median latency in a typical run is around 3800 nanoseconds, both the ``cur_sum_squares * cur_num`` and ``cur_sum * cur_sum`` terms overflow as uint64_t.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/10100
Test Plan: Added a unit test. Run ``make -j24 histogram_test && ./histogram_test``.
Reviewed By: pdillinger
Differential Revision: D36942738
Pulled By: guidotag
fbshipit-source-id: 0af5fb9e2a297a284e8e74c24e604d302906006e
Summary:
We have three related concepts:
* BlockType: an internal enum conceptually indicating a type of SST file
block
* CacheEntryRole: a user-facing enum for categorizing block cache entries,
which is also involved in associated cache entries with an appropriate
deleter. Can include categories for non-block cache entries (e.g. memory
reservations).
* TBlocklike: a C++ type for the actual type behind a void* cache entry.
We had some existing code ugliness because BlockType did not imply
TBlocklike, because of various kinds of "filter" block. This refactoring
fixes that with new BlockTypes.
More clean-up can come in later work.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/10098
Test Plan: existing tests
Reviewed By: akankshamahajan15
Differential Revision: D36897945
Pulled By: pdillinger
fbshipit-source-id: 3ae496b5caa81e0a0ed85e873eb5b525e2d9a295
Summary:
CircleCI runner based benchmarking. A runner is a dedicate machine configured for CircleCI to perform work on. Our work is a repeatable benchmark, the `benchmark-linux` job in `config.yml`
A runner, in CircleCI terminology, is a machine that is managed by the client (us) rather than running on CircleCI resources in the cloud. This means that we define and configure the iron, and that therefore the performance is repeatable and predictable. Which is what we need for performance regression benchmarking.
On a time schedule (or on commit, during branch development) benchmarks are set off on the runner, and then a script is run `benchmark_log_tool.py` which parses the benchmark output and pushes it into a pre-configured OpenSearch document connected to an OpenSearch dashboard. Members of the team can examine benchmark performance changes on the dashboard.
As time progresses we can add different benchmarks to the suite which gets run.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9723
Reviewed By: pdillinger
Differential Revision: D35555626
Pulled By: jay-zhuang
fbshipit-source-id: c6a905ca04494495c3784cfbb991f5ab90c807ee
Summary:
The patch adds some low-level logic that can be used to serialize/deserialize
a sorted vector of wide columns to/from a simple binary searchable string
representation. Currently, there is no user-facing API; this will be implemented in
subsequent stages.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9915
Test Plan: `make check`
Reviewed By: siying
Differential Revision: D35978076
Pulled By: ltamasi
fbshipit-source-id: 33f5f6628ec3bcd8c8beab363b1978ac047a8788
Summary:
If caller specifies a non-null `timestamp` argument in `DB::Get()` or a non-null `timestamps` in `DB::MultiGet()`,
RocksDB will return the timestamps of the point tombstones.
Note: DeleteRange is still unsupported.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/10056
Test Plan: make check
Reviewed By: ltamasi
Differential Revision: D36677956
Pulled By: riversand963
fbshipit-source-id: 2d7af02cc7237b1829cd269086ea895a49d501ae
Summary:
**Context:**
https://github.com/facebook/rocksdb/pull/9748 added support to charge table reader memory to block cache. In the test `ChargeTableReaderTest/ChargeTableReaderTest.Basic`, it estimated the table reader memory, calculated the expected number of table reader opened based on this estimation and asserted this number with actual number. The expected number of table reader opened calculated based on estimated table reader memory will not be 100% accurate and should have tolerance for error. It was previously set to 1% and recently encountered an assertion failure that `(opened_table_reader_num) <= (max_table_reader_num_capped_upper_bound), actual: 375 or 376 vs 374` where `opened_table_reader_num` is the actual opened one and `max_table_reader_num_capped_upper_bound` is the estimated opened one (=371 * 1.01). I believe it's safe to increase error tolerance from 1% to 5% hence there is this PR.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/10113
Test Plan: - CI again succeeds.
Reviewed By: ajkr
Differential Revision: D36911556
Pulled By: hx235
fbshipit-source-id: 259687dd77b450fea0f5658a5b567a1d31d4b1f7
Summary:
When building RocksDB with getdeps on Windows, `thirdparty.inc` get in the way since `FindXXXX.cmake` are working properly now.
This PR adds an option to skip that file when building RocksDB so we can disable it.
FB: see [D36905191](https://www.internalfb.com/diff/D36905191).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/10110
Reviewed By: siying
Differential Revision: D36913882
Pulled By: fanzeyi
fbshipit-source-id: 33d36841dc0d4fe87f51e1d9fd2b158a3adab88f
Summary:
The patch attempts to fix three bugs in `verify_random_db.sh`:
1) https://github.com/facebook/rocksdb/pull/9937 changed the default for
`--try_load_options` to true in the script's use case, so we have to
explicitly set it to false if the corresponding argument of the script
is 0. This should fix the issue we've been seeing with our forward
compatibility tests where 7.3 is unable to open a database created by
the version on main after adding a new configuration option.
2) The script seems to support two "extra parameters"; however,
in practice, if the second one was set, only that one was passed on to
`ldb`. Now both get forwarded.
3) When running the `diff` command, the base DB directory was passed as
the second argument instead of the file containing the `ldb` output
(this actually seems to work, probably accidentally though).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/10112
Reviewed By: pdillinger
Differential Revision: D36911363
Pulled By: ltamasi
fbshipit-source-id: fe29db4e28d373cee51a12322c59050fc50e926d
Summary:
Closing https://github.com/facebook/rocksdb/issues/10080
When `SyncWAL()` calls `MarkLogsSynced()`, even if there is only one active WAL file,
this event should still be added to the MANIFEST.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/10087
Test Plan: make check
Reviewed By: ajkr
Differential Revision: D36797580
Pulled By: riversand963
fbshipit-source-id: 24184c9dd606b3939a454ed41de6e868d1519999
Summary:
cache_bench can now run with FastLRUCache.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/10095
Test Plan:
- Temporarily add an ``assert(false)`` in the execution path that sets up the FastLRUCache. Run ``make -j24 cache_bench``. Then test the appropriate code is used by running ``./cache_bench -cache_type=fast_lru_cache`` and checking that the assert is called. Repeat for LRUCache.
- Verify that FastLRUCache (currently a clone of LRUCache) has similar latency distribution than LRUCache, by comparing the outputs of ``./cache_bench -cache_type=fast_lru_cache`` and ``./cache_bench -cache_type=lru_cache``.
Reviewed By: pdillinger
Differential Revision: D36875834
Pulled By: guidotag
fbshipit-source-id: eb2ad0bb32c2717a258a6ac66ed736e06f826cd8
Summary:
As pointed by anand1976 in his [comment](https://github.com/facebook/rocksdb/pull/10049#pullrequestreview-994255819), previous implementation is not backward-compatible. In this implementation, the default implementation `return Status::NotSupported("Close")` or `return IOStatus::NotSupported("Close")` is added for `Close()` function for `*Directory` classes.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/10101
Test Plan: DBBasicTest.DBCloseAllDirectoryFDs
Reviewed By: anand1976
Differential Revision: D36899346
Pulled By: littlepig2013
fbshipit-source-id: 430624793362f330cbb8837960f0e8712a944ab9
Summary:
db_bench can now run with FastLRUCache.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/10096
Test Plan:
- Temporarily add an ``assert(false)`` in the execution path that sets up the FastLRUCache. Run ``make -j24 db_bench``. Then test the appropriate code is used by running ``./db_bench -cache_type=fast_lru_cache`` and checking that the assert is called. Repeat for LRUCache.
- Verify that FastLRUCache (currently a clone of LRUCache) produces similar benchmark data than LRUCache, by comparing the outputs of ``./db_bench -benchmarks=fillseq,fillrandom,readseq,readrandom -cache_type=fast_lru_cache`` and ``./db_bench -benchmarks=fillseq,fillrandom,readseq,readrandom -cache_type=lru_cache``.
Reviewed By: gitbw95
Differential Revision: D36898774
Pulled By: guidotag
fbshipit-source-id: f9f6b6f6da124f88b21b3c8dee742fbb04eff773
Summary:
In [close_db_dir](https://github.com/facebook/rocksdb/pull/10049) pull request, some merging conflicts occurred (some comments and one line `s.PermitUncheckedError()` are missing). This pull request aims to put them back.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/10093
Reviewed By: ajkr
Differential Revision: D36884117
Pulled By: littlepig2013
fbshipit-source-id: 8c0e2a8793fc52804067c511843bd1ff4912c1c3
Summary:
Currently, if blob files are enabled (i.e. `enable_blob_files` is true), large values are extracted both during flush/recovery (when SST files are written into level 0 of the LSM tree) and during compaction into any LSM tree level. For certain use cases that have a mix of short-lived and long-lived values, it might make sense to support extracting large values only during compactions whose output level is greater than or equal to a specified LSM tree level (e.g. compactions into L1/L2/... or above). This could reduce the space amplification caused by large values that are turned into garbage shortly after being written at the price of some write amplification incurred by long-lived values whose extraction to blob files is delayed.
In order to achieve this, we would like to do the following:
- Add a new configuration option `blob_file_starting_level` (default: 0) to `AdvancedColumnFamilyOptions` (and `MutableCFOptions` and extend the related logic)
- Instantiate `BlobFileBuilder` in `BuildTable` (used during flush and recovery, where the LSM tree level is L0) and `CompactionJob` iff `enable_blob_files` is set and the LSM tree level is `>= blob_file_starting_level`
- Add unit tests for the new functionality, and add the new option to our stress tests (`db_stress` and `db_crashtest.py` )
- Add the new option to our benchmarking tool `db_bench` and the BlobDB benchmark script `run_blob_bench.sh`
- Add the new option to the `ldb` tool (see https://github.com/facebook/rocksdb/wiki/Administration-and-Data-Access-Tool)
- Ideally extend the C and Java bindings with the new option
- Update the BlobDB wiki to document the new option.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/10077
Reviewed By: ltamasi
Differential Revision: D36884156
Pulled By: gangliao
fbshipit-source-id: 942bab025f04633edca8564ed64791cb5e31627d
Summary:
Only used as temperature high bound for current code, may
increase with more temperatures added.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/10044
Test Plan: ci
Reviewed By: siying
Differential Revision: D36633410
Pulled By: jay-zhuang
fbshipit-source-id: eecdfa7623c31778c31d789902eacf78aad7b482
Summary:
Garbage collection is generally controlled by the BlobDB configuration options `enable_blob_garbage_collection` and `blob_garbage_collection_age_cutoff`. However, there might be use cases where we would want to temporarily override these options while performing a manual compaction. (One use case would be doing a full key-space manual compaction with full=100% garbage collection age cutoff in order to minimize the space occupied by the database.) Our goal here is to make it possible to override the configured GC parameters when using the `CompactRange` API to perform manual compactions. This PR would involve:
- Extending the `CompactRangeOptions` structure so clients can both force-enable and force-disable GC, as well as use a different cutoff than what's currently configured
- Storing whether blob GC should actually be enabled during a certain manual compaction and the cutoff to use in the `Compaction` object (considering the above overrides) and passing it to `CompactionIterator` via `CompactionProxy`
- Updating the BlobDB wiki to document the new options.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/10073
Test Plan: Adding unit tests and adding the new options to the stress test tool.
Reviewed By: ltamasi
Differential Revision: D36848700
Pulled By: gangliao
fbshipit-source-id: c878ef101d1c612429999f513453c319f75d78e9
Summary:
Currently, the DB directory file descriptor is left open until the deconstruction process (`DB::Close()` does not close the file descriptor). To verify this, comment out the lines between `db_ = nullptr` and `db_->Close()` (line 512, 513, 514, 515 in ldb_cmd.cc) to leak the ``db_'' object, build `ldb` tool and run
```
strace --trace=open,openat,close ./ldb --db=$TEST_TMPDIR --ignore_unknown_options put K1 V1 --create_if_missing
```
There is one directory file descriptor that is not closed in the strace log.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/10049
Test Plan: Add a new unit test DBBasicTest.DBCloseAllDirectoryFDs: Open a database with different WAL directory and three different data directories, and all directory file descriptors should be closed after calling Close(). Explicitly call Close() after a directory file descriptor is not used so that the counter of directory open and close should be equivalent.
Reviewed By: ajkr, hx235
Differential Revision: D36722135
Pulled By: littlepig2013
fbshipit-source-id: 07bdc2abc417c6b30997b9bbef1f79aa757b21ff
Summary:
Stress tests can run with the experimental FastLRUCache. Crash tests randomly choose between LRUCache and FastLRUCache.
Since only LRUCache supports a secondary cache, we validate the `--secondary_cache_uri` and `--cache_type` flags---when `--secondary_cache_uri` is set, the `--cache_type` is set to `lru_cache`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/10081
Test Plan:
- To test that the FastLRUCache is used and the stress test runs successfully, run `make -j24 CRASH_TEST_EXT_ARGS=—duration=960 blackbox_crash_test_with_atomic_flush`. The cache type should sometimes be `fast_lru_cache`.
- To test the flag validation, run `make -j24 CRASH_TEST_EXT_ARGS="--duration=960 --secondary_cache_uri=x" blackbox_crash_test_with_atomic_flush` multiple times. The test will always be aborted (which is okay). Check that the cache type is always `lru_cache`.
Reviewed By: anand1976
Differential Revision: D36839908
Pulled By: guidotag
fbshipit-source-id: ebcdfdcd12ec04c96c09ae5b9c9d1e613bdd1725
Summary:
`db_impl.alive_log_files_` is used to track the WAL size in `db_impl.logs_`.
Get the `LogFileNumberSize` obj in `alive_log_files_` the same time as `log_writer` to keep them consistent.
For this issue, it's not safe to do `deque::reverse_iterator::operator*` and `deque::pop_front()` concurrently,
so remove the tail cache.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/10086
Test Plan:
```
# on Windows
gtest-parallel ./db_test --gtest_filter=DBTest.FileCreationRandomFailure -r 1000 -w 100
```
Reviewed By: riversand963
Differential Revision: D36822373
Pulled By: jay-zhuang
fbshipit-source-id: 5e738051dfc7bcf6a15d85ba25e6365df6b6a6af
Summary:
We recently saw a case in crash test in which a WAL file in the
middle of the list of live WALs was not included in the backup, so the
DB was not openable due to missing WAL. We are not sure why, but this
change should at least turn that into a backup-time failure by ensuring
all the WAL files expected by the manifest (according to VersionSet) are
included in `GetSortedWalFiles()` (used by `GetLiveFilesStorageInfo()`,
`BackupEngine`, and `Checkpoint`)
Related: to maximize the effectiveness of
track_and_verify_wals_in_manifest with GetSortedWalFiles() during
checkpoint/backup, we will now sync WAL in GetLiveFilesStorageInfo()
when track_and_verify_wals_in_manifest=true.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/10083
Test Plan: added new unit test for the check in GetSortedWalFiles()
Reviewed By: ajkr
Differential Revision: D36791608
Pulled By: pdillinger
fbshipit-source-id: a27bcf0213fc7ab177760fede50d4375d579afa6
Summary:
In case of non-TransactionDB and avoid_flush_during_recovery = true, RocksDB won't
flush the data from WAL to L0 for all column families if possible. As a
result, not all column families can increase their log_numbers, and
min_log_number_to_keep won't change.
For transaction DB (.allow_2pc), even with the flush, there may be old WAL files that it must not delete because they can contain data of uncommitted transactions and min_log_number_to_keep won't change.
If we persist a new MANIFEST with
advanced log_numbers for some column families, then during a second
crash after persisting the MANIFEST, RocksDB will see some column
families' log_numbers larger than the corrupted wal, and the "column family inconsistency" error will be hit, causing recovery to fail.
As a solution, RocksDB will persist the new MANIFEST after successfully syncing the new WAL.
If a future recovery starts from the new MANIFEST, then it means the new WAL is successfully synced. Due to the sentinel empty write batch at the beginning, kPointInTimeRecovery of WAL is guaranteed to go after this point.
If future recovery starts from the old MANIFEST, it means the writing the new MANIFEST failed. We won't have the "SST ahead of WAL" error.
Currently, RocksDB DB::Open() may creates and writes to two new MANIFEST files even before recovery succeeds. This PR buffers the edits in a structure and writes to a new MANIFEST after recovery is successful
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9922
Test Plan:
1. Update unit tests to fail without this change
2. make crast_test -j
Branch with unit test and no fix https://github.com/facebook/rocksdb/pull/9942 to keep track of unit test (without fix)
Reviewed By: riversand963
Differential Revision: D36043701
Pulled By: akankshamahajan15
fbshipit-source-id: 5760970db0a0920fb73d3c054a4155733500acd9
Summary:
This variable is actually not being used for anything meaningful, thus remove it.
This can make https://github.com/facebook/rocksdb/issues/7516 slightly simpler by reducing the amount of state that must be made lock-free.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/10078
Test Plan: make check
Reviewed By: ajkr
Differential Revision: D36779817
Pulled By: riversand963
fbshipit-source-id: ffb0d9ad6149616917ae5e02bb28102cb90fc406
Summary:
After https://github.com/facebook/rocksdb/issues/9984, BackupEngineTest.Concurrency becomes flaky.
During DB::Open(), someone else can rename/remove the LOG file, causing
this thread's `CreateLoggerFromOptions()` to fail. The reason is that the operation sequence
of "FileExists -> Rename" is not atomic. It's possible that a FileExists() returns OK, but the file
gets deleted before Rename(), causing the latter to return IOError with PathNotFound subcode.
Although it's not encouraged to concurrently modify the contents of the directories managed by
the database instance in this case, we can still perform some simple handling to make DB::Open()
more robust. In this case, we can check if a racing thread has deleted the original LOG file, we can
allow this thread to continue creating a new LOG file.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/10069
Test Plan: ~/gtest-parallel/gtest-parallel -r 100 ./backup_engine_test --gtest_filter=BackupEngineTest.Concurrency
Reviewed By: jay-zhuang
Differential Revision: D36736913
Pulled By: riversand963
fbshipit-source-id: 3cbe92d77ca175e55e586bdb1a32ac8107217ae6
Summary:
Fix the unittest `ExternalSSTFileBasicTest.StableSnapshotWhileLoggingToManifest` introduced in https://github.com/facebook/rocksdb/issues/10051 that is failing.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/10066
Test Plan: CI
Reviewed By: ajkr
Differential Revision: D36720669
Pulled By: cbi42
fbshipit-source-id: 47a6d2c161f27b605ede5c62d1776eecaf0d5363
Summary:
TSAN test is slower, for `TransactionStressTest` and
`DeadlockStress`, they're reaching the timeout limit of 600 seconds.
Decreasing the transaction test number.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/10063
Test Plan: CI
Reviewed By: ajkr
Differential Revision: D36711727
Pulled By: jay-zhuang
fbshipit-source-id: 600f82a6d32108f52fbe5572fcc7497607b7fe98
Summary:
Tests could hang because of flags are not test and set
atomiclly, so it's waiting for a sync point forever.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/10060
Test Plan: CI
Reviewed By: ajkr
Differential Revision: D36706311
Pulled By: jay-zhuang
fbshipit-source-id: d54b8053ce51b2de74162b28f496c048519b6cde
Summary:
For regular db instance and secondary instance, we return error and refuse to open DB if Logger creation fails.
Our current code allows it, but it is really difficult to debug because
there will be no LOG files. The same for OPTIONS file, which will be explored in another PR.
Furthermore, Arena::AllocateAligned(size_t bytes, size_t huge_page_size, Logger* logger) has an
assertion as the following:
```cpp
#ifdef MAP_HUGETLB
if (huge_page_size > 0 && bytes > 0) {
assert(logger != nullptr);
}
#endif
```
It can be removed.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9984
Test Plan: make check
Reviewed By: jay-zhuang
Differential Revision: D36347754
Pulled By: riversand963
fbshipit-source-id: 529798c0511d2eaa2f0fd40cf7e61c4cbc6bc57e
Summary:
RocksDB uses the (no longer aptly named) SST file manager (see https://github.com/facebook/rocksdb/wiki/Managing-Disk-Space-Utilization) to track and potentially limit the space used by SST and blob files (as well as to rate-limit the deletion of these data files). The SST file manager tracks the SST and blob file sizes in an in-memory hash map, which has to be rebuilt during DB open. File sizes can be generally obtained by querying the file system; however, there is a performance optimization possibility here since the sizes of SST and blob files are also tracked in the RocksDB MANIFEST, so we can simply pass the file sizes stored there instead of consulting the file system for each file. Currently, this optimization is only implemented for SST files; we would like to extend it to blob files as well.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/10062
Test Plan:
Add unit tests for the change to the test suite
ltamasi riversand963 akankshamahajan15
Reviewed By: ltamasi
Differential Revision: D36726621
Pulled By: gangliao
fbshipit-source-id: 4010dc46ef7306142f1c2e0d1c3bf75b196ef82a
Summary:
This PR adds timestamp support to the secondary DB instance.
With this, these timestamp related APIs are supported:
ReadOptions.timestamp : read should return the latest data visible to this specified timestamp
Iterator::timestamp() : returns the timestamp associated with the key, value
DB:Get(..., std::string* timestamp) : returns the timestamp associated with the key, value in timestamp
Test plan (on devserver):
```
$COMPILE_WITH_ASAN=1 make -j24 all
$./db_secondary_test --gtest_filter=DBSecondaryTestWithTimestamp*
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/10061
Reviewed By: riversand963
Differential Revision: D36722915
Pulled By: jowlyzhang
fbshipit-source-id: 644ada39e4e51164a759593478c38285e0c1a666
Summary:
There are currently some preprocessor checks that assume support for Visual Studio versions older than 2015 (i.e., 0 < _MSC_VER < 1900), although we don't support them any more.
We removed all code that only compiles on those older versions, except third-party/ files.
The ROCKSDB_NOEXCEPT symbol is now obsolete, since it now always gets replaced by noexcept. We removed it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/10065
Reviewed By: pdillinger
Differential Revision: D36721901
Pulled By: guidotag
fbshipit-source-id: a2892d365ef53cce44a0a7d90dd6b72ee9b5e5f2