Summary:
`FaultInjectionTest{Env,FS}::ReopenWritableFile()` functions were accidentally deleting WALs from previous `db_stress` runs causing verification to fail. They were operating under the assumption that `ReopenWritableFile()` would delete any existing file. It was a reasonable assumption considering the `{Env,FileSystem}::ReopenWritableFile()` documentation stated that would happen. The only problem was neither the implementations we offer nor the "real" clients in RocksDB code followed that contract. So, this PR updates the contract as well as fixing the fault injection client usage.
The fault injection change exposed that `ExternalSSTFileBasicTest.SyncFailure` was relying on a fault injection `Env` dropping unsynced data written by a regular `Env`. I changed that test to make its `SstFileWriter` use fault injection `Env`, and also implemented `LinkFile()` in fault injection so the unsynced data is tracked under the new name.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8995
Test Plan:
- Verified it fixes the following failure:
```
$ ./db_stress --clear_column_family_one_in=0 --column_families=1 --db=/dev/shm/rocksdb_crashtest_whitebox --delpercent=5 --expected_values_dir=/dev/shm/rocksdb_crashtest_expected --iterpercent=0 --key_len_percent_dist=1,30,69 --max_key=100000 --max_key_len=3 --nooverwritepercent=1 --ops_per_thread=1000 --prefixpercent=0 --readpercent=60 --reopen=0 --target_file_size_base=1048576 --test_batches_snapshots=0 --write_buffer_size=1048576 --writepercent=35 --value_size_mult=33 -threads=1
...
$ ./db_stress --avoid_flush_during_recovery=1 --clear_column_family_one_in=0 --column_families=1 --db=/dev/shm/rocksdb_crashtest_whitebox --delpercent=5 --destroy_db_initially=0 --expected_values_dir=/dev/shm/rocksdb_crashtest_expected --iterpercent=10 --key_len_percent_dist=1,30,69 --max_bytes_for_level_base=4194304 --max_key=100000 --max_key_len=3 --nooverwritepercent=1 --open_files=-1 --open_metadata_write_fault_one_in=8 --open_write_fault_one_in=16 --ops_per_thread=1000 --prefix_size=-1 --prefixpercent=0 --readpercent=50 --sync=1 --target_file_size_base=1048576 --test_batches_snapshots=0 --write_buffer_size=1048576 --writepercent=35 --value_size_mult=33 -threads=1
...
Verification failed for column family 0 key 000000000000001300000000000000857878787878 (1143): Value not found: NotFound:
Crash-recovery verification failed :(
...
```
- `make check -j48`
Reviewed By: ltamasi
Differential Revision: D31495388
Pulled By: ajkr
fbshipit-source-id: 7886ccb6a07cb8b78ad7b6c1c341ccf40bb68385
Summary:
Added "no-elide-constructors to the ASSERT_STATUS_CHECK builds. This flag gives more errors/warnings for some of the Status checks where an inner class checks a Status and later returns it. In this case, without the elide check on, the returned status may not have been checked in the caller, thereby bypassing the checked code.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7798
Reviewed By: jay-zhuang
Differential Revision: D25680451
Pulled By: pdillinger
fbshipit-source-id: c3f14ed9e2a13f0a8c54d839d5fb4d1fc1e93917
Summary:
The patch adds support for injecting errors when reading from `RandomAccessFile`
using `FaultInjectionTestEnv`. (This functionality was curiously missing
w/r/t `RandomAccessFile`, even though it was implemented for `RandomRWFile`.)
The patch also fixes up a test case in `blob_db_test` which uses `FaultInjectionTestEnv`
but has so far relied on reads from `RandomAccessFile`s succeeding even after
deactivating the filesystem.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7447
Test Plan: `make check`
Reviewed By: zhichao-cao
Differential Revision: D23971740
Pulled By: ltamasi
fbshipit-source-id: 8492736cb64b1ee138c658822535f3ff4fe560c6
Summary:
Cleans up some of the dependencies on test code in the Makefile while building tools:
- Moves the test::RandomString, DBBaseTest::RandomString into Random
- Moves the test::RandomHumanReadableString into Random
- Moves the DestroyDir method into file_utils
- Moves the SetupSyncPointsToMockDirectIO into sync_point.
- Moves the FaultInjection Env and FS classes under env
These changes allow all of the tools to build without dependencies on test_util, thereby simplifying the build dependencies. By moving the FaultInjection code, the dependency in db_stress on different libraries for debug vs release was eliminated.
Tested both release and debug builds via Make and CMake for both static and shared libraries.
More work remains to clean up how the tools are built and remove some unnecessary dependencies. There is also more work that should be done to get the Makefile and CMake to align in their builds -- what is in the libraries and the sizes of the executables are different.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7097
Reviewed By: riversand963
Differential Revision: D22463160
Pulled By: pdillinger
fbshipit-source-id: e19462b53324ab3f0b7c72459dbc73165cc382b2
Summary:
The error message is causing problems in the crash tests due to the
error parsing logic in db_crashtest.py.
This is a follow up PR for https://github.com/facebook/rocksdb/pull/6694.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6696
Test Plan: make check
Reviewed By: zhichao-cao
Differential Revision: D21021875
Pulled By: cheng-chang
fbshipit-source-id: 11e3f536df16941a89949ebcd2147cd8dfa3fbe0
Summary:
When dynamically linking two binaries together, different builds of RocksDB from two sources might cause errors. To provide a tool for user to solve the problem, the RocksDB namespace is changed to a flag which can be overridden in build time.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6433
Test Plan: Build release, all and jtest. Try to build with ROCKSDB_NAMESPACE with another flag.
Differential Revision: D19977691
fbshipit-source-id: aa7f2d0972e1c31d75339ac48478f34f6cfcfb3e
Summary:
It it not safe to assume application had sync the SST file before ingest it into DB. Also the directory to put the ingested file needs to be fsync, otherwise the file can be lost. For integrity of RocksDB we need to sync the ingested file and directory before apply the change to manifest.
Also syncing after writing global sequence when write_global_seqno=true was removed in https://github.com/facebook/rocksdb/issues/4172. Adding it back.
Fixes https://github.com/facebook/rocksdb/issues/5287.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5435
Test Plan:
Test ingest file with ldb command and observe fsync/fdatasync in strace output. Tried both move_files=true and move_files=false.
https://gist.github.com/yiwu-arbug/650a4023f57979056d83485fa863bef9
More test suggestions are welcome.
Differential Revision: D15941675
Pulled By: riversand963
fbshipit-source-id: 389533f3923065a96df2cdde23ff4724a1810d78
Summary:
There are too many types of files under util/. Some test related files don't belong to there or just are just loosely related. Mo
ve them to a new directory test_util/, so that util/ is cleaner.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5377
Differential Revision: D15551366
Pulled By: siying
fbshipit-source-id: 0f5c8653832354ef8caa31749c0143815d719e2c
Summary:
Make file ingestion atomic.
as title.
Ingesting external SST files into multiple column families should be atomic. If
a crash occurs and db reopens, either all column families have successfully
ingested the files before the crash, or non of the ingestions have any effect
on the state of the db.
Also add unit tests for atomic ingestion.
Note that the unit test here does not cover the case of incomplete atomic group
in the MANIFEST, which is covered in VersionSetTest already.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4895
Differential Revision: D13718245
Pulled By: riversand963
fbshipit-source-id: 7df97cc483af73ad44dd6993008f99b083852198
Summary:
Ran the following commands to recursively change all the files under RocksDB:
```
find . -type f -name "*.cc" -exec sed -i 's/ unique_ptr/ std::unique_ptr/g' {} +
find . -type f -name "*.cc" -exec sed -i 's/<unique_ptr/<std::unique_ptr/g' {} +
find . -type f -name "*.cc" -exec sed -i 's/ shared_ptr/ std::shared_ptr/g' {} +
find . -type f -name "*.cc" -exec sed -i 's/<shared_ptr/<std::shared_ptr/g' {} +
```
Running `make format` updated some formatting on the files touched.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4638
Differential Revision: D12934992
Pulled By: sagar0
fbshipit-source-id: 45a15d23c230cdd64c08f9c0243e5183934338a8
Summary:
A fix similar to #4410 but on the write path. On IO error on `SelectBlobFile()` we didn't return error code properly, but simply a nullptr of `BlobFile`. The `AppendBlob()` method didn't have null check for the pointer and caused crash. The fix make sure we properly return error code in this case.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4580
Differential Revision: D10513849
Pulled By: yiwu-arbug
fbshipit-source-id: 80bca920d1d7a3541149de981015ad83e0aa14b5
Summary:
The old error message was misleading because it led people to believe the truncation operation failed.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4454
Differential Revision: D10203575
Pulled By: riversand963
fbshipit-source-id: c76482a132566635cb55d4c73d45c461f295ec43
Summary:
Fix IO error on read not being handle and crashing the DB. With the fix we properly return the error.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4410
Differential Revision: D9979246
Pulled By: yiwu-arbug
fbshipit-source-id: 111a85675067a29c03cb60e9a34103f4ff636694
Summary:
Currently, if RocksDB encounters errors during a write operation (user requested or BG operations), it sets DBImpl::bg_error_ and fails subsequent writes. This PR allows the DB to be resumed for certain classes of errors. It consists of 3 parts -
1. Introduce Status::Severity in rocksdb::Status to indicate whether a given error can be recovered from or not
2. Refactor the error handling code so that setting bg_error_ and deciding on severity is in one place
3. Provide an API for the user to clear the error and resume the DB instance
This whole change is broken up into multiple PRs. Initially, we only allow clearing the error for Status::NoSpace() errors during background flush/compaction. Subsequent PRs will expand this to include more errors and foreground operations such as Put(), and implement a polling mechanism for out-of-space errors.
Closes https://github.com/facebook/rocksdb/pull/3997
Differential Revision: D8653831
Pulled By: anand1976
fbshipit-source-id: 6dc835c76122443a7668497c0226b4f072bc6afd
Summary:
Fix the bug when sync log fail, FlushJob::Run() will not be execute and
reference to cfd->current() will not be release.
Closes https://github.com/facebook/rocksdb/pull/1792
Differential Revision: D4441316
Pulled By: yiwu-arbug
fbshipit-source-id: 5523e28
Summary:
Fixes compile error:
In file included from ./util/statistics.h:17:0,
from ./util/stop_watch.h:8,
from ./util/perf_step_timer.h:9,
from ./util/iostats_context_imp.h:8,
from ./util/posix_logger.h:27,
from ./port/util_logger.h:18,
from ./db/auto_roll_logger.h:15,
from db/auto_roll_logger.cc:6:
./util/thread_local.h:65:16: error: 'function' in namespace 'std' does not name a template type
typedef std::function<void(void*, void*)> FoldFunc;
Closes https://github.com/facebook/rocksdb/pull/1656
Differential Revision: D4318702
Pulled By: yiwu-arbug
fbshipit-source-id: 8c5d17a
Summary:
Consider the following WAL with 4 batch entries prefixed with their sequence at time of memtable insert.
[1: BEGIN_PREPARE, PUT, PUT, PUT, PUT, END_PREPARE(a)]
[1: BEGIN_PREPARE, PUT, PUT, PUT, PUT, END_PREPARE(b)]
[4: COMMIT(a)]
[7: COMMIT(b)]
The first two batches do not consume any sequence numbers so are both prefixed with seq=1.
For 2pc commit, memtable insertion takes place before COMMIT batch is written to WAL.
We can see that sequence number consumption takes place between WAL entries giving us the seemingly sparse sequence prefix for WAL entries.
This is a valid WAL.
Because with 2PC markers one WriteBatch points to another batch containing its inserts a writebatch can consume more or less sequence numbers than the number of sequence consuming entries that it contains.
We can see that, given the entries in the WAL, 6 sequence ids were consumed. Yet on recovery the maximum sequence consumed would be 7 + 3 (the number of sequence numbers consumed by COMMIT(b))
So, now upon recovery we must track the actual consumption of sequence numbers.
In the provided scenario there will be no sequence gaps, but it is possible to produce a sequence gap. This should not be a problem though. correct?
Test Plan: provided test.
Reviewers: sdong
Subscribers: andrewkr, leveldb, dhruba, hermanlee4
Differential Revision: https://reviews.facebook.net/D57645