Summary:
Although PR https://github.com/facebook/rocksdb/issues/7032 fixes the construction of the `SstFileDumper` in `GetFileDbIdentities` by setting a proper `Env` of the `Options` passed in the constructor, the file path was not corrected accordingly. This actually disables backup engine to use db session ids in the file names since the `db_session_id` is always empty.
Now it is fixed by setting the correct path in the construction of `SstFileDumper`. Furthermore, to preserve the Direct IO property that backup engine already has, parameter `EnvOptions` is added to `GetFileDbIdentities` and `SstFileDumper`.
The `BackupUsingDirectIO` test is updated accordingly.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7104
Test Plan: backupable_db_test and some manual tests.
Reviewed By: ajkr
Differential Revision: D22443245
Pulled By: gg814
fbshipit-source-id: 056a9bb8b82947c5e73d7c3fbb62bfe23af5e562
Summary:
When format_version is high enough to support user-key and
there are index entries for same user key that spans multiple data
blocks then it changes from user-key mode to internal-key mode. But the
flush policy is not reset to point to Block Builder of internal-keys.
After this switch, no entries are added to user key index partition
result, thus it never triggers flushing the block.
Fix: 1. After adding the entry in sub_builder_index_, if there is a switch
from user-key to internal-key, then flush policy is updated to point to
Block Builder of internal-keys index partition.
2. Set sub_builder_index_->seperator_is_key_plus_seq_ = true if
seperator_is_key_plus_seq_ is set to true so that subsequent partitions
can also use internal key mode.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7096
Test Plan: make check -j64
Reviewed By: ajkr
Differential Revision: D22416598
Pulled By: akankshamahajan15
fbshipit-source-id: 01fc2dc07ea1b32f8fb803995ebe6e9a3fbe67ac
Summary:
Delicious copy-pasta from https://github.com/facebook/rocksdb/issues/7039
Also fixing DestroyDir to allow files to go missing while it is operating. This seems to fix failures I got with test plan reproducer.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7103
Test Plan:
make blackbox_crash_test_with_atomic_flush for a while with
checkpoint_one_in=100
Reviewed By: siying
Differential Revision: D22435315
Pulled By: pdillinger
fbshipit-source-id: 0ec0538402493887aeda43ecc03f32979cb84ced
Summary:
The fix in PR https://github.com/facebook/rocksdb/issues/7082 is not really successful because there is still a small chance that the test will fail.
In addtion to flushing, we close the DB and then reopen before corrupting a table file in the DB. Specifically, we corrupt a table file before backup takes place as follows.
* Open DB
* Fill DB
* Flush DB (optional, no flushing here also works)
* Close DB
* Reopen DB
* Corrupt a table file in the DB
This should make the test reliable.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7102
Test Plan:
`while ./backupable_db_test --gtest_filter=*TableFileCorruptedBeforeBackup*; do true; done`
(kept running for an hour or so :)
Reviewed By: pdillinger
Differential Revision: D22432417
Pulled By: gg814
fbshipit-source-id: d407eee93ff428bb662f80cde1659fbf0149d0cd
Summary:
Currently, `EventListener` in listner.h only have callback functions for file read and write. One may favor extended callback functions for more file I/O operations like flush, sync and close. This PR tries to add those interface and have them called when appropriate throughout the code base.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7055
Test Plan:
Write an experimental listener with those new callback functions with log output in them; run experiments and check logs to see those functions are actually called.
Default test suits `make check` should also be included.
Reviewed By: riversand963
Differential Revision: D22380624
Pulled By: roghnin
fbshipit-source-id: 4121491d45c2c2aae8c255e7998090559a241c6a
Summary:
On some platforms like MacOS, a second 'make check' can lead to
/bin/rm: Argument list too long
This is fixed by replacing with a 'find'. Also, using '-f' for more rm calls
to avoid prompt.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7095
Test Plan: 'make check' on Linux and MacOS
Reviewed By: riversand963
Differential Revision: D22415808
Pulled By: pdillinger
fbshipit-source-id: 0fd1ebae13739c9d81f9e813e99b062715604d6b
Summary:
by tracking and linking against runtime dependent libraries in
Makefile
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7098
Test Plan: look for fix in CircleCI
Reviewed By: riversand963
Differential Revision: D22420860
Pulled By: pdillinger
fbshipit-source-id: d211d709214bf5306db68e43b7a2f18169281022
Summary:
Primarily, this change adds a way to work around a bug limiting the effective output (and therefore debugability) of the Linux builds using parallel make. We would get
make[1]: write error: stdout
probably due to a kernel bug, apparently affecting both available ubuntu 16 machine images (maybe not affecting docker images, less horsepower). https://bugs.launchpad.net/ubuntu/+source/linux-signed/+bug/1814393
Now in the CircleCI config, make output on Ubuntu is piped through a custom 'cat' that ignores EAGAIN errors, which seems to fix the problem.
Significant other changes:
* Add another linux build that combines
* LIB_MODE=shared, to ensure this works with compile and unit test execution
* Alternative rocksdb namespace, to ensure this works (not rely on Travis)
* ASSERT_STATUS_CHECKED=1, but with building all unit tests and running those expected to pass with it
* Run release build with and without gflags. (Was running only without, ignore large swaths of code in a normal release build! Two regressions in this build, only with gflags, in the last week not caught by CI!)
* Use gflags with unity and LITE build, as typical case.
Debugability improvements:
* Use V=1 to show commands being executed (thanks to EAGAIN work-around)
* Print kernel version and compiler versions as part of V=1 output from Makefile
Cosmetic other changes:
* Put more commands on one line, for less clutter in CircleCI output pages
* Remove redundant "all" in "make all check" and put make command options before targets
* Change some recursive "make clean" into dependency on "clean," toward minimizing unnecessary overhead (detect platform, build version, etc.) of extra recursive makes
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7078
Reviewed By: siying
Differential Revision: D22391647
Pulled By: pdillinger
fbshipit-source-id: d446fccf5a8c568b37dc8748621c8a5c546fe135
Summary:
(a) use STRESS_LIBRARY for db_stress and make sure
STRESS_LIBRARY has other stress test dependencies (as in buck build)
(b) fix rpath option to be accepted on MacOS. It still doesn't fully work
for me e.g. to run a LIB_MODE=shared unit test binary from another
directory, as it does on Linux, but the option is now accepted, and running
unit tests from current directory works for me.
Also adding LIB_MODE=shared to Travis. (Later TBD where best to fit in
in CircleCI.)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7066
Test Plan: manual
Reviewed By: cheng-chang
Differential Revision: D22364068
Pulled By: pdillinger
fbshipit-source-id: 6fa98a222f89f808ee786474de1100d92c1adec3
Summary:
If the corruption of a table file is done before flushing, then db manifest may record the checksum for the corrupted table, which results in "matching checksums" when backup engine tries to verfiy the checksum, and causes a flaky test.
Fix the issue by adding `Flush()` before trying to corrupt a table file in *db*.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7082
Test Plan:
`buck test`
Without the fix, failed 5 of 100 tests.
Suspected whether the pseudo randomness causes the issue: doubling `keys_iteration` resulted in 2 of 100 tests failed; deterministically corrupting tables file also caused 2 of 100 tests to fail.
With the fix, passed 200 of 200 tests.
Reviewed By: pdillinger
Differential Revision: D22375421
Pulled By: gg814
fbshipit-source-id: 7304618e7520684b6087e42d0b58329c5ad18329
Summary:
This is to fix special logic to run tests inside FB.
Buck test is broken after moving to cpp_unittest(). Move c_test back to the previous approach.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7076
Test Plan: Watch the Sandcastle run
Reviewed By: ajkr
Differential Revision: D22370096
fbshipit-source-id: 4a464d0903f2c76ae2de3a8ad373ffc9bedec64c
Summary:
When table file checksums are enabled and stored in the DB manifest by using the RocksDB default crc32c checksum function, BackupEngine will calculate the crc32c checksum of the file to be copied and compare the calculated result with the one stored in the DB manifest before copying the file to the backup directory.
After copying to the backup directory, BackupEngine will verify the checksum of the copied file with the one calculated before copying. This helps detect some rare corruption events such as bit-flips during the copying process.
No verification with checksums in DB manifest will be performed if the table file checksum function is not the RocksDB default crc32c checksum function.
In addition, If `share_table_files` and `share_files_with_checksum` are true, BackupEngine will compare the checksums computed before and after copying of the table files.
Corresponding tests are added.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7015
Test Plan: Passed make check
Reviewed By: pdillinger
Differential Revision: D22165732
Pulled By: gg814
fbshipit-source-id: ee0e8cc397c455eba64545c29380b9d9853588ec
Summary:
After https://github.com/facebook/rocksdb/pull/7036, we still see extra DBTest that can timeout when running 10 or 20 in parallel. Expand skip-fsync mode in whole DBTest. Still preserve other tests from doing this mode to be conservative.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7049
Test Plan: Run all existing files.
Reviewed By: pdillinger
Differential Revision: D22301700
fbshipit-source-id: f9a9e3b3b26ce640665a47cb8bff33ba0c89b565
Summary:
GetFileDbIdentities requires either db_id non-null or db_session_id non-null.
Passing nullptr for db_id or db_session_id in CopyOrCreateFile indicates the caller does not want to obtain the value for db_id or db_session_id.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7063
Test Plan:
USE_CLANG=1 make analyze
backupable_db_test
Reviewed By: pdillinger
Differential Revision: D22338497
Pulled By: gg814
fbshipit-source-id: 2aa2dcc14d156b0f99b07d6cf3c731ee088272cd
Summary:
When format_version is high enough to support user-key and there are index entries for same user key that spans multiple data blocks then it changes from user-key mode to internal-key mode. But the flush policy is not reset to point to Block Builder of internal-keys. After this switch, no entries are added to user key index partition result, thus it never triggers flushing the block.
Fix: After adding the entry in sub_builder_index_, if there is a switch from user-key to internal-key, then flush policy is updated to point to Block Builder of internal-keys index partition.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7022
Test Plan:
1. make check -j64
2. Added one unit test case
Reviewed By: ajkr
Differential Revision: D22197734
Pulled By: akankshamahajan15
fbshipit-source-id: d87e9e46bccab8e896ee6979d6b79c51f73d479e
Summary:
This check is flaky because compaction could run between the `Flush()` and the `TestGetTickerCount()`, which would increase the `BLOCK_CACHE_INDEX_MISS` count beyond what the test expects. Verified by adding a `sleep(1)` between those two lines and observing the counter is too high every time. The solution is just to remove this check as it doesn't have any use anyways. The latter check of index miss is sufficient to conclude the newest L0 file (i.e., the one generated by intra-L0) does not have its index block pinned in cache. It'd be nice to simultaneously check the L0 files generated by flush do have their index blocks pinned in cache, but that's not what the line deleted in this PR was checking..
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7065
Reviewed By: pdillinger
Differential Revision: D22340327
Pulled By: ajkr
fbshipit-source-id: e076b2c7228b7fa763dd0c0cb13828e176c1abee
Summary:
Random memtable layouts could cause random failure,
reproducible with command below running for a while. Test now using
deterministic behavior.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7064
Test Plan: while ./db_test --gtest_filter=*SizesMemTable*; do true; done
Reviewed By: siying
Differential Revision: D22339442
Pulled By: pdillinger
fbshipit-source-id: 8e74e5a9b5e88f7030854045a22c12cf561d5de6
Summary:
Change the linking of tests/tools to be against a library rather than a list of objects. This change substantially reduces the size of the objects produced.
peterd clean repo size: 264M
Before this change, with make all: 40G
After this change, with make all: 28G
With make LIB_MODE=shared all: 7.0G
The list of TESTS was changed from being hard-coded to generated from the test sources variable. Note that there are some test sources that are not built as tests (though the set of tests is identical to the previous version).
Added OBJ_DIR option to Makefile to allow objects to be placed in an alternative location. By default, OBJ_DIR is the same as before ("./").
This change is a precursor to being able to build/run the tests/tools linked against static libraries. Additionally, it should be possible to clean up and merge some of the rules for building tests and the like if so desired.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6660
Reviewed By: riversand963
Differential Revision: D22244463
Pulled By: pdillinger
fbshipit-source-id: db9c6341d81ed62c2270374f4ede02fb9604c754
Summary:
`bool BackupableDBOptions::new_naming_for_backup_files` is updated to `BackupTableNameOption BackupableDBOptions::share_files_with_checksum_naming`, where `BackupTableNameOption` is an `enum` type with two enumerators `kChecksumAndFileSize` and `kChecksumAndFileSize`. This opens up possibilities of extenting the current naming scheme for backup table files. By default, `BackupTableNameOption BackupableDBOptions::share_files_with_checksum_naming` is set to `kChecksumAndDbSessionId`.
Revert `BackupEngine::VerifyBackup` to only check file sizes by default.
Also fix the construction of the `SstFileDumper` in `GetFileDbIdentities` by setting a proper `Env` of the `Options` passed in the constructor.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7032
Test Plan: make check
Reviewed By: ajkr
Differential Revision: D22237763
Pulled By: gg814
fbshipit-source-id: 466902a4e731babd64e30f0e82ca1aa82962e52e
Summary:
With mmap enabled on an uncompressed file, we were previously always doing a heap allocation to obtain the scratch buffer for `RandomAccessFileReader::Read()`. However, that allocation was unnecessary as the underlying file reader returned a pointer into its mapped memory, not the provided scratch buffer. This PR makes passes the `BlockFetcher`'s inline buffer as the scratch buffer if the data block is small enough (less than `kDefaultStackBufferSize` bytes, currently 5000). Ideally we would not pass a scratch buffer at all for an mmap read; however, the `RandomAccessFile::Read()` API guarantees such a buffer is provided, and non-standard implementations may be relying on it even when `Options::allow_mmap_reads == true`. In that case, this PR still works but introduces an extra copy from the inline buffer to a heap buffer.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7043
Reviewed By: cheng-chang
Differential Revision: D22320606
Pulled By: ajkr
fbshipit-source-id: ad964dd23df34e07d979c6032c2dfe5454c98b52
Summary:
The earlier `VersionBuilder` code only cleaned up blob files that were
marked as entirely consisting of garbage using `VersionEdits` with
`BlobFileGarbage`. This covers the cases when table files go through
regular compaction, where we iterate through the KVs and thus have an
opportunity to calculate the amount of garbage (that is, most cases).
However, it does not help when table files are simply dropped (e.g. deletion
compactions or the `DeleteFile` API). To deal with such cases, the patch
adds logic that cleans up all blob files at the head of the list until the first
one with linked SSTs is found. (As an example, let's assume we have blob files
with numbers 1..10, and the first one with any linked SSTs is number 8.
This means that SSTs in the `Version` only rely on blob files with numbers >= 8,
and thus 1..7 are no longer needed.)
The code change itself is pretty small; however, changing the logic like this
necessitated changes to some tests that have been added recently (namely
to the ones that use blob files in isolation, i.e. without any table files referring
to them). Some of these cases were fixed by bypassing `VersionBuilder` altogether
in order to keep the tests simple (which actually makes them more proper unit tests
as well), while the `VersionBuilder` unit tests were fixed by adding dummy table
files to the test cases as needed.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7001
Test Plan: `make check`
Reviewed By: riversand963
Differential Revision: D22119474
Pulled By: ltamasi
fbshipit-source-id: c6547141355667d4291d9661d6518eb741e7b54a
Summary:
There are errors like `Transaction put: Operation timed out: Timeout waiting to lock key
terminate called without an active exception`, based on experiment on devserver, increasing timeouts can resolve the issue.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7056
Test Plan: watch stress test with txn.
Reviewed By: anand1976
Differential Revision: D22317265
Pulled By: cheng-chang
fbshipit-source-id: 2dc3352def5e78d2c39a18d7262a3a65ca98bbba
Summary:
WriteCallbackTest.WriteWithCallbackTest has a deep for-loop and in some cases runs very long. Parameterimized it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7037
Test Plan: Run the test and see it passes.
Reviewed By: ltamasi
Differential Revision: D22269259
fbshipit-source-id: a1b6687b5bf4609754833d14cf383d68bc7ab27a
Summary:
We see crash test occassionally fails with "A checkpoint operation failed with: Invalid argument: Directory exists". The suspicious is that the directory fails to be deleted because some trash files. Deep clean the directory after a DestroyDB() call.
Also add more debugging printf in case it fails.
Also, preserve the DB if verification fails.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7039
Test Plan: Run db_stress with low --checkpoint_one_in value
Reviewed By: riversand963
Differential Revision: D22271694
fbshipit-source-id: 6a9b2abb664fc69a4dc666741df4f6b23703cd6d
Summary:
Added compaction filter support for BlobDB non-TTL values. Same as vanilla RocksDB, user compaction filter applies to all k/v pairs of the compaction for non-TTL values. It honors `min_blob_size`, which potentially results value transitions between inlined data and stored-in-blob data when size of value is changed.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6850
Reviewed By: siying
Differential Revision: D22263487
Pulled By: ltamasi
fbshipit-source-id: 8fc03f8cde2a5c831e63b436b3dbf1b7f90939e8
Summary:
Fsyncing files is not providing more test coverage in many tests. Provide an option in SpecialEnv to turn it off to speed it up and enable this option in some tests with relatively long run time.
Most of those tests can be divided as parameterized gtest too. This two speed up approaches are orthogonal and we can do both if needed.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7036
Test Plan: Run all tests and make sure they pass.
Reviewed By: ltamasi
Differential Revision: D22268084
fbshipit-source-id: 6d4a838a1b7328c13931a2a5d93de57aa02afaab
Summary:
Current implementation of the ```read_options.deadline``` option only checks the deadline for random file reads during point lookups. This PR extends the checks to file opens, prefetches and preloads as part of table open.
The main changes are in the ```BlockBasedTable```, partitioned index and filter readers, and ```TableCache``` to take ReadOptions as an additional parameter. In ```BlockBasedTable::Open```, in order to retain existing behavior w.r.t checksum verification and block cache usage, we filter out most of the options in ```ReadOptions``` except ```deadline```. However, having the ```ReadOptions``` gives us more flexibility to honor other options like verify_checksums, fill_cache etc. in the future.
Additional changes in callsites due to function signature changes in ```NewTableReader()``` and ```FilePrefetchBuffer```.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6982
Test Plan: Add new unit tests in db_basic_test
Reviewed By: riversand963
Differential Revision: D22219515
Pulled By: anand1976
fbshipit-source-id: 8a3b92f4a889808013838603aa3ca35229cd501b
Summary:
VS2019 is covered in CircleCI. The only thing missing there is -DCMAKE_CXX_STANDARD=20 option. Add the option there and remove VS2019 build from Appveyor.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7038
Test Plan: Watch build results.
Reviewed By: pdillinger, ltamasi
Differential Revision: D22270010
fbshipit-source-id: 77d30be49d38b41516fa8a12be45395c27b12761
Summary:
After https://github.com/facebook/rocksdb/issues/6949 , VersionSet::io_status_ can be concurrently accessed by multiple
threads without lock, causing tsan test to fail. For example, a bg flush thread
resets io_status_ before calling LogAndApply(), while another thread already in
the process of LogAndApply() reads io_status_. This is a bug.
We do not have to reset io_status_ each time we call LogAndApply(). io_status_
is part of the state of VersionSet, and it indicates the outcome of preceding
MANIFEST/CURRENT files IO operations. Its value should be updated only when:
1. MANIFEST/CURRENT files IO fail for the first time.
2. MANIFEST/CURRENT files IO succeed as part of recovering from a prior
failure without process restart, e.g. calling Resume().
Test Plan (devserver):
COMPILE_WITH_TSAN=1 make check
COMPILE_WITH_TSAN=1 make db_test2
./db_test2 --gtest_filter=DBTest2.CompactionStall
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7034
Reviewed By: zhichao-cao
Differential Revision: D22247137
Pulled By: riversand963
fbshipit-source-id: 77b83e05390f3ee3cd2d96d3fdd6fe4f225e3216
Summary:
A parameter `verify_with_checksum` is added to `BackupEngine::VerifyBackup`, which is true by default. So now `BackupEngine::VerifyBackup` verifies backup files with checksum AND file size by default. When `verify_with_checksum` is false, `BackupEngine::VerifyBackup` only compares file sizes to verify backup files.
Also add a test for the case when corruption does not change the file size.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7014
Test Plan: Passed backupable_db_test
Reviewed By: zhichao-cao
Differential Revision: D22165590
Pulled By: gg814
fbshipit-source-id: 606a7450714e868bceb38598c89fd356c6004f4f
Summary:
We are still keeping unity build working. So it's a good idea to add to a pre-commit CI.
A latest GCC docker image just to get a little bit more coverage. Fix three small issues to make it pass.
Also make unity_test to run db_basic_test rather than db_test to cut the test time. There is no point to run expensive tests here. It was set to run db_test before db_basic_test was separated out.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7026
Test Plan: watch tests to pass.
Reviewed By: zhichao-cao
Differential Revision: D22223197
fbshipit-source-id: baa3b6cbb623bf359829b63ce35715c75bcb0ed4
Summary:
ASAN run is powerful in finding memory leak bugs. Running it as a part of the pre-merge CI can help contributors avoid to merge some code with bugs.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7027
Test Plan: Watch the test result.
Reviewed By: pdillinger
Differential Revision: D22222371
fbshipit-source-id: 92f9ce19e01a94ba5f9b765e154f7bcdece5c2a9