rocksdb

Commit Graph

Author	SHA1	Message	Date
sdong	692f6a3138	Implement NextAndGetResult() in memtable and level iterator (#7179 ) Summary: NextAndGetResult() is not implemented in memtable and is very simply implemented in level iterator. The result is that for a normal leveled iterator, performance regression will be observed for calling PrepareValue() for most iterator Next(). Mitigate the problem by implementing the function for both iterators. In level iterator, the implementation cannot be perfect as when calling file iterator's SeekToFirst() we don't have information about whether the value is prepared. Fortunately, the first key should not cause a big portion of the CPu. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7179 Test Plan: Run normal crash test for a while. Reviewed By: anand1976 Differential Revision: D22783840 fbshipit-source-id: c19f45cdf21b756190adef97a3b66ccde3936e05	5 years ago
mrambacher	d9d190742c	Make env_test work with ASSERT_STATUS_CHECKED (#7176 ) Summary: Make (most of) the env_test pass when ASSERT_STATUS_CHECKED is enabled. One test that opens a database is currently disabled in this mode, as there are many errors that need revisited for DB tests and status checks. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7176 Reviewed By: cheng-chang Differential Revision: D22799278 Pulled By: ajkr fbshipit-source-id: 16d8a02eaeecd6df1060249b6a5811292801f2ed	5 years ago
codingsh	83ea266b43	export stats_persist_period_sec (#7168 ) Summary: fixed - https://github.com/rust-rocksdb/rust-rocksdb/issues/447 - https://github.com/rust-rocksdb/rust-rocksdb/pull/448 Pull Request resolved: https://github.com/facebook/rocksdb/pull/7168 Reviewed By: cheng-chang Differential Revision: D22736013 Pulled By: ajkr fbshipit-source-id: fdd784aa75d26a367b9108b05ffdd94a2ae117d3	5 years ago
Tomas Kolda	cd4592c220	SST Partitioner interface that allows to split SST files (#6957 ) Summary: SST Partitioner interface that allows to split SST files during compactions. It basically instruct compaction to create a new file when needed. When one is using well defined prefixes and prefixed way of defining tables it is good to define also partitioning so that promotion of some SST file does not cover huge key space on next level (worst case complete space). Pull Request resolved: https://github.com/facebook/rocksdb/pull/6957 Reviewed By: ajkr Differential Revision: D22461239 fbshipit-source-id: 9ce07bba08b3ba89c2d45630520368f704d1316e	5 years ago
Jay Zhuang	b0c5ecd6b3	Make max_subcompactions dynamically changeable (#7159 ) Summary: Make `max-subcompactions` dynamically changeable by passing the `DBOption` to Compaction. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7159 Reviewed By: siying Differential Revision: D22671238 Pulled By: jay-zhuang fbshipit-source-id: 311ca9f6bb606965544d8708616d358cfed5be42	5 years ago
Levi Tamasi	0d04a8434a	Sync blob files before closing them (#7160 ) Summary: BlobDB currently syncs each blob file periodically after writing a certain amount of data (as specified by the configuration option `BlobDBOptions::bytes_per_sync`) and all open blob files when the base DB's memtables are flushed. With the patch, in addition to the above, blob files are also synced right before being closed, after the footer has been written. This will be beneficial for the new integrated blob file write path as well. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7160 Test Plan: `make check` Reviewed By: riversand963 Differential Revision: D22672646 Pulled By: ltamasi fbshipit-source-id: 62b34263543a7e74abcbb7adf011daa1e699998f	5 years ago
Cheng Chang	96ce0470a7	Clean snapshot dir before taking snapshot (#7156 ) Summary: `DBTest::SnapshotFiles` runs the tests in a `while` loop. Currently, the snapshot directory is not cleaned up in each loop, so previous snapshot files may remain in the next loop's snapshot. When I'm working on https://github.com/facebook/rocksdb/pull/7129, when checking the tracked WALs in MANIFEST, I find that this test always fails because it reads some unknown WAL. It turns out that the unknown WAL is left from previous loops. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7156 Test Plan: make db_test && ./db_test --gtest_filters=*SnapshotFiles Reviewed By: siying Differential Revision: D22668360 Pulled By: cheng-chang fbshipit-source-id: 69d4aa3506038ba30e218e8ae966357935a99c6c	5 years ago
mrambacher	d44cbc5314	Add hash of key/value checks when paranoid_file_checks=true (#7134 ) Summary: When paraoid_files_checks=true, a rolling key-value hash is generated and compared to what is written to the file. If the values do not match, the SST file is rejected. Code put in place for the check for both flush and compaction jobs. Corresponding test added to corruption_test. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7134 Reviewed By: cheng-chang Differential Revision: D22646149 fbshipit-source-id: 8fde1984a1a11edd3bd82a413acffc5ea7aa683f	5 years ago
Haosen Wen	dbc51adbac	Use steady_clock instead of system_clock in FileOperationInfo::TimePoint (#7153 ) Summary: Issue https://github.com/facebook/rocksdb/issues/7133 reported that using `system_clock` in `FileOperationInfo::TimePoint` causes the duration of file flush operation (which can be a noop on MacOS in some scenarios) appears to be 0 and fail an assertion in listener_test. Using `steady_clock` supposedly fixed the problem. `steady_clock` actually fits better into the use cases of `FileOperationInfo::TimePoint` as all usages care about durations but not wall clock time. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7153 Test Plan: make check. Reviewed By: riversand963 Differential Revision: D22654136 Pulled By: roghnin fbshipit-source-id: 5980b1080734bdae496a18071a2c2b5887c67d85	5 years ago
sdong	1cf4731dbb	column_family_test: fix a data race related to sleeping task (#7150 ) Summary: TSAN reports warning in one column_family_test: WARNING: ThreadSanitizer: data race (pid=16352) Write of size 8 at 0x7ffcdf042158 by main thread: #0 pthread_cond_destroy <null> (column_family_test+0x471f65) https://github.com/facebook/rocksdb/issues/1 rocksdb::port::CondVar::~CondVar() /home/circleci/project/port/port_posix.cc:101:49 (column_family_test+0x8a627a) https://github.com/facebook/rocksdb/issues/2 rocksdb::test::SleepingBackgroundTask::~SleepingBackgroundTask() /home/circleci/project/./test_util/testutil.h:397:7 (column_family_test+0x54b6e2) https://github.com/facebook/rocksdb/issues/3 rocksdb::ColumnFamilyTest_FlushCloseWALFiles_Test::TestBody() /home/circleci/project/db/column_family_test.cc:3008:1 (column_family_test+0x54b6e2) ...... Previous read of size 8 at 0x7ffcdf042158 by thread T2 (mutexes: write M0): #0 pthread_cond_broadcast <null> (column_family_test+0x471dd2) https://github.com/facebook/rocksdb/issues/1 rocksdb::port::CondVar::SignalAll() /home/circleci/project/port/port_posix.cc:139:28 (column_family_test+0x8a651a) https://github.com/facebook/rocksdb/issues/2 rocksdb::test::SleepingBackgroundTask::DoSleep() /home/circleci/project/./test_util/testutil.h:412:12 (column_family_test+0x58574b) ...... Likely, SleepingBackgroundTask::DoSleep() started to execute after the main thread has finished everything, cancelled and waited for sleeping tasks to finish. At this time, although DoSlee() will not sleep, but it also accesses the mutex, creating a data race with destructor of the test. Fix this bug by waiting for the sleeping task to start sleeping after it is scheduled. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7150 Test Plan: Run these modified tests and make sure it doesn't break. Reviewed By: riversand963 Differential Revision: D22630716 fbshipit-source-id: cc5781cf69083685de406490438898238bdfc2d3	5 years ago
sdong	9870704420	Fix a minor data race in stats dumping threads initialization (#7151 ) Summary: https://github.com/facebook/rocksdb/pull/7145 creates a minor data race against the stat creation counter. Turn it to atomic. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7151 Test Plan: Run the test. Reviewed By: ajkr Differential Revision: D22631014 fbshipit-source-id: c6fb69ac5b9df7139795dacea5ce9fb9fd3278d7	5 years ago
Zhichao Cao	ed4712fe7e	Remove time out testing cases in error_handler_fs_test (#7141 ) Summary: Remove the 3 testing cases that cause the time out in linux build by https://github.com/facebook/rocksdb/issues/6765 . Will fix them later. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7141 Test Plan: make asan_check, buck run Reviewed By: ajkr Differential Revision: D22593831 Pulled By: zhichao-cao fbshipit-source-id: 14956c36476ecc3393f613178c22e13df843126e	5 years ago
Andrew Kryczka	9a83fd21e6	stagger first DumpMallocStats after opening DB (#7145 ) Summary: Previously when running `db_bench` with large value for `num_multi_dbs` and enabled `Options::dump_malloc_stats`, we would see most CPU spent in jemalloc locking. After this PR that no longer shows up at the top of the profile. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7145 Reviewed By: riversand963 Differential Revision: D22593031 Pulled By: ajkr fbshipit-source-id: 3b3fc91f93249c6afee53f59f34c487c3fc5add6	5 years ago
sdong	ca5a069a79	Suppress a TSAN warning (#7126 ) Summary: TSAN shows warning with clang with warning similar to this: WARNING: ThreadSanitizer: data race (pid=10159) Atomic write of size 8 at 0x7b5000002890 by thread T33: #0 __tsan_atomic64_store <null> (db_test+0x4ca2b5) https://github.com/facebook/rocksdb/issues/1 std::__atomic_base<unsigned long>::store(unsigned long, std::memory_order) /usr/bin/../lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/bits/atomic_base.h:374:2 (db_test+0x774fde) https://github.com/facebook/rocksdb/issues/2 rocksdb::VersionSet::SetLastSequence(unsigned long) /home/circleci/project/./db/version_set.h:1057:20 (db_test+0x774fde) https://github.com/facebook/rocksdb/issues/3 rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, rocksdb::WriteBatch, rocksdb::WriteCallback, unsigned long, unsigned long, bool, unsigned long, unsigned long, rocksdb::PreReleaseCallback) /home/circleci/project/db/db_impl/db_impl_write.cc:449:18 (db_test+0x774fde) ...... Previous read of size 8 at 0x7b5000002890 by thread T5 (mutexes: write M1044689462619020832): #0 rocksdb::DBImpl::ReleaseSnapshot(rocksdb::Snapshot const) /home/circleci/project/db/db_impl/db_impl.cc (db_test+0x6f4ae7) https://github.com/facebook/rocksdb/issues/1 rocksdb::(anonymous namespace)::MTThreadBody(void) /home/circleci/project/db/db_test.cc:2514:13 (db_test+0x56ac59) https://github.com/facebook/rocksdb/issues/2 rocksdb::(anonymous namespace)::StartThreadWrapper(void) /home/circleci/project/env/env_posix.cc:443:3 (db_test+0x88c4cd) It is not limited to ReleaseSnapshot() and rocksdb::DBImpl::MultiCFSnapshot(). While we are not 100% sure it doesn't indicate any correctness violation, we suppress them for now to keep TSAN clean with more tests so that we can cover more bugs with CI. In the gcc runs we have been running, this warning rarely shows up. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7126 Test Plan: See the mini-TSAN test to pass with reasonable run time. Reviewed By: ajkr Differential Revision: D22552375 fbshipit-source-id: ebdd3854cb3becec3403970326a1ca961db2ab00	5 years ago
Zhichao Cao	a10f12eda1	Auto resume the DB from Retryable IO Error (#6765 ) Summary: In current codebase, in write path, if Retryable IO Error happens, SetBGError is called. The retryable IO Error is converted to hard error and DB is in read only mode. User or application needs to resume it. In this PR, if Retryable IO Error happens in one DB, SetBGError will create a new thread to call Resume (auto resume). otpions.max_bgerror_resume_count controls if auto resume is enabled or not (if max_bgerror_resume_count<=0, auto resume will not be enabled). options.bgerror_resume_retry_interval controls the time interval to call Resume again if the previous resume fails due to the Retryable IO Error. If non-retryable error happens during resume, auto resume will terminate. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6765 Test Plan: Added the unit test cases in error_handler_fs_test and pass make asan_check Reviewed By: anand1976 Differential Revision: D21916789 Pulled By: zhichao-cao fbshipit-source-id: acb8b5e5dc3167adfa9425a5b7fc104f6b95cb0b	5 years ago
Yanqin Jin	27735dea9a	Report corrupted keys during compaction (#7124 ) Summary: Currently, RocksDB lets compaction to go through even in case of corrupted keys, the number of which is reported in CompactionJobStats. However, RocksDB does not check this value. We should let compaction run in a stricter mode. Temporarily disable two tests that allow corrupted keys in compaction. With this PR, the two tests will assert(false) and terminate. Still need to investigate what is the recommended google-test way of doing it. Death test (EXPECT_DEATH) in gtest has warnings now. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7124 Test Plan: make check Reviewed By: ajkr Differential Revision: D22530722 Pulled By: riversand963 fbshipit-source-id: 6a5a6a992028c6d4f92cb74693c92db462ae4ad6	5 years ago
Levi Tamasi	bdf4de6cb9	Remove some dead code from BlobLogWriter (#7125 ) Summary: Periodic syncing of blob files is performed by `WritableFileWriter`; `bytes_per_sync_` and `next_sync_offset_` in `BlobLogWriter` are actually unused (or more precisely, only used by methods that are themselves unused). The patch removes all this dead code. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7125 Test Plan: `make check` Reviewed By: riversand963 Differential Revision: D22531021 Pulled By: ltamasi fbshipit-source-id: 6b293ad5a79d3e6bf15c5c68f7aedd7ce7a15f10	5 years ago
Yanqin Jin	c628fae6d1	Report corruption on unrecognized value type (#7121 ) Summary: During memtable lookup, an unrecognized value type should be reported as Status::Corruption. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7121 Test Plan: make check Reviewed By: cheng-chang Differential Revision: D22512124 Pulled By: riversand963 fbshipit-source-id: 9b97be7d9b230c5aae9205f96054420e5ea09066	5 years ago
Stanislav Tkach	393e486e3e	Add getters for options to the C API (#7094 ) Summary: Along with https://github.com/facebook/rocksdb/issues/6925 and https://github.com/facebook/rocksdb/issues/6998, this should add getters for all Options fields except several ones with non-trivial interface (for example rocksdb_options_set_min_level_to_compress). Pull Request resolved: https://github.com/facebook/rocksdb/pull/7094 Reviewed By: riversand963 Differential Revision: D22479800 Pulled By: pdillinger fbshipit-source-id: d14f305e12cfe268d07e0fe229d55cef299c792a	5 years ago
wenh	4924a506b9	Reduce `env_->GetChildren()` calls in DBImpl::Recover() (#7044 ) Summary: There currently exist multiple `GetChildren()` calls in `DBImpl::Recover()`, which can be expensive in cases of distributed file systems. This pull request try to call `DBImpl::Recover()` of each necessary directory only _once_ and reuse the results in the places of repeated calls in current code. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7044 Test Plan: Run `make check` and use the default test suite. The modified code should be semantically identical to the current code. As a proof of this solution, we may optionally deploy the system onto a (real or simulated) distributed system and expect reduced latency caused by manifest fetching. (WIP) Reviewed By: riversand963 Differential Revision: D22419925 Pulled By: roghnin fbshipit-source-id: d3774fbfbc246c5527101bc16747eb5c90919886	5 years ago
mrambacher	c7c7b07f06	More Makefile Cleanup (#7097 ) Summary: Cleans up some of the dependencies on test code in the Makefile while building tools: - Moves the test::RandomString, DBBaseTest::RandomString into Random - Moves the test::RandomHumanReadableString into Random - Moves the DestroyDir method into file_utils - Moves the SetupSyncPointsToMockDirectIO into sync_point. - Moves the FaultInjection Env and FS classes under env These changes allow all of the tools to build without dependencies on test_util, thereby simplifying the build dependencies. By moving the FaultInjection code, the dependency in db_stress on different libraries for debug vs release was eliminated. Tested both release and debug builds via Make and CMake for both static and shared libraries. More work remains to clean up how the tools are built and remove some unnecessary dependencies. There is also more work that should be done to get the Makefile and CMake to align in their builds -- what is in the libraries and the sizes of the executables are different. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7097 Reviewed By: riversand963 Differential Revision: D22463160 Pulled By: pdillinger fbshipit-source-id: e19462b53324ab3f0b7c72459dbc73165cc382b2	5 years ago
Yanqin Jin	f70ad03137	Parameterize a few tests in DBWALTest (#7105 ) Summary: As title. The goal is to shorten the execution time of several tests when they are combined together in a single TEST_F. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7105 Test Plan: make db_wal_test ./db_wal_test Reviewed By: ltamasi Differential Revision: D22442705 Pulled By: riversand963 fbshipit-source-id: 0ad49b8f21fa86dcd5a4d3c9a06af313735ac217	5 years ago
Akanksha Mahajan	54f171fe90	Update Flush policy in PartitionedIndexBuilder on switching from user-key to internal-key mode (#7096 ) Summary: When format_version is high enough to support user-key and there are index entries for same user key that spans multiple data blocks then it changes from user-key mode to internal-key mode. But the flush policy is not reset to point to Block Builder of internal-keys. After this switch, no entries are added to user key index partition result, thus it never triggers flushing the block. Fix: 1. After adding the entry in sub_builder_index_, if there is a switch from user-key to internal-key, then flush policy is updated to point to Block Builder of internal-keys index partition. 2. Set sub_builder_index_->seperator_is_key_plus_seq_ = true if seperator_is_key_plus_seq_ is set to true so that subsequent partitions can also use internal key mode. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7096 Test Plan: make check -j64 Reviewed By: ajkr Differential Revision: D22416598 Pulled By: akankshamahajan15 fbshipit-source-id: 01fc2dc07ea1b32f8fb803995ebe6e9a3fbe67ac	5 years ago
rafael-aero	712458fc34	Add RestoreDBFromLatestBackup to C API, add new C# package (#7092 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/7092 Reviewed By: riversand963 Differential Revision: D22412323 Pulled By: ajkr fbshipit-source-id: 3fc1c63bb19a8cd2c0ae620800c28f199a7f494b	5 years ago
rockeet	b649d8cb97	Fixed Factory construct just for calling .Name() (#7080 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/7080 Reviewed By: riversand963 Differential Revision: D22412352 Pulled By: ajkr fbshipit-source-id: 1d7f4c1621040a0130245139b52c3f4d3deac865	5 years ago
wenh	226d1f9c73	extend listener callback functions to more file I/O operations (#7055 ) Summary: Currently, `EventListener` in listner.h only have callback functions for file read and write. One may favor extended callback functions for more file I/O operations like flush, sync and close. This PR tries to add those interface and have them called when appropriate throughout the code base. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7055 Test Plan: Write an experimental listener with those new callback functions with log output in them; run experiments and check logs to see those functions are actually called. Default test suits `make check` should also be included. Reviewed By: riversand963 Differential Revision: D22380624 Pulled By: roghnin fbshipit-source-id: 4121491d45c2c2aae8c255e7998090559a241c6a	5 years ago
Andrew Kryczka	dd29ad4223	Separate internal and user key comparators in `BlockIter` (#6944 ) Summary: Replace `BlockIter::comparator_` and `IndexBlockIter::user_comparator_wrapper_` with a concrete `UserComparatorWrapper` and `InternalKeyComparator`. The motivation for this change was the inconvenience of not knowing the concrete type of `BlockIter::comparator_`, which prevented calling specialized internal key comparison functions to optimize comparison of keys with global seqno applied. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6944 Test Plan: benchmark setup -- single file DBs, in-memory, no compression. "normal_db" created by regular flush; "ingestion_db" created by ingesting a file. Both DBs have same contents. ``` $ TEST_TMPDIR=/dev/shm/normal_db/ ./db_bench -benchmarks=fillrandom,compact -write_buffer_size=10485760000 -disable_auto_compactions=true -compression_type=none -num=1000000 $ ./ldb write_extern_sst ./tmp.sst --db=/dev/shm/ingestion_db/dbbench/ --compression_type=no --hex --create_if_missing < <(./sst_dump --command=scan --output_hex --file=/dev/shm/normal_db/dbbench/000007.sst \| awk 'began {print "0x" substr($1, 2, length($1) - 2), "==>", "0x" $5} ; /^Sst file format: block-based/ {began=1}') $ ./ldb ingest_extern_sst ./tmp.sst --db=/dev/shm/ingestion_db/dbbench/ ``` benchmark run command: ``` $ TEST_TMPDIR=/dev/shm/$DB/ ./db_bench -benchmarks=seekrandom -seek_nexts=$SEEK_NEXT -use_existing_db=true -cache_index_and_filter_blocks=false -num=1000000 -cache_size=0 -threads=1 -reads=200000000 -mmap_read=1 -verify_checksum=false ``` results: perf improved marginally for ingestion_db and did not change significantly for normal_db: SEEK_NEXT \| DB \| code \| ops/sec \| % change -- \| -- \| -- \| -- \| -- 0 \| normal_db \| master \| 350880 \| 0 \| normal_db \| PR6944 \| 351040 \| 0.0 0 \| ingestion_db \| master \| 343255 \| 0 \| ingestion_db \| PR6944 \| 349424 \| 1.8 10 \| normal_db \| master \| 218711 \| 10 \| normal_db \| PR6944 \| 217892 \| -0.4 10 \| ingestion_db \| master \| 220334 \| 10 \| ingestion_db \| PR6944 \| 226437 \| 2.8 Reviewed By: pdillinger Differential Revision: D21924676 Pulled By: ajkr fbshipit-source-id: ea4288a2eefa8112eb6c651a671c1de18c12e538	5 years ago
Levi Tamasi	a693341604	Move the blob file format related classes to the main namespace, rename reader/writer (#7086 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/7086 Test Plan: `make check` Reviewed By: zhichao-cao Differential Revision: D22395420 Pulled By: ltamasi fbshipit-source-id: 088a20097bd6b73b0c433cd79725779f97ec04f2	5 years ago
Peter Dillinger	4b107ceb7e	Improve code comments in EstimateLiveDataSize (#7072 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/7072 Reviewed By: ajkr Differential Revision: D22391641 Pulled By: pdillinger fbshipit-source-id: 0ef355576454514263ab684eb1a5c06787f3242a	5 years ago
Jay Zhuang	00de699096	Replace reinterpret_cast with static_cast_with_check (#7067 ) Summary: Replace `reinterpret_cast` with `static_cast_with_check` for `DBImpl` and `ColumnFamilyHandleImpl`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7067 Reviewed By: siying Differential Revision: D22361587 Pulled By: jay-zhuang fbshipit-source-id: dfe9e8f3af39c3d27cc372c55ab9ad905eb0a5a1	5 years ago
Zitan Chen	373d5ac485	BackupEngine verifies table file checksums on creating new backups (#7015 ) Summary: When table file checksums are enabled and stored in the DB manifest by using the RocksDB default crc32c checksum function, BackupEngine will calculate the crc32c checksum of the file to be copied and compare the calculated result with the one stored in the DB manifest before copying the file to the backup directory. After copying to the backup directory, BackupEngine will verify the checksum of the copied file with the one calculated before copying. This helps detect some rare corruption events such as bit-flips during the copying process. No verification with checksums in DB manifest will be performed if the table file checksum function is not the RocksDB default crc32c checksum function. In addition, If `share_table_files` and `share_files_with_checksum` are true, BackupEngine will compare the checksums computed before and after copying of the table files. Corresponding tests are added. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7015 Test Plan: Passed make check Reviewed By: pdillinger Differential Revision: D22165732 Pulled By: gg814 fbshipit-source-id: ee0e8cc397c455eba64545c29380b9d9853588ec	5 years ago
Peter Dillinger	a680a7ea37	Un-revert #7049 , revert #7022 (#7071 ) Summary: Even though local bisection gave me a clear signal (and still does) that reverting https://github.com/facebook/rocksdb/issues/7049 would fix the failures in MultiThreadedDBTest, https://github.com/facebook/rocksdb/issues/7022 seems to be the root cause. Reverting https://github.com/facebook/rocksdb/issues/7022 and keeping https://github.com/facebook/rocksdb/issues/7049 seems to fix the issue in local reproducer also. (Had these landed in opposite order, bisection would have found the root cause.) Pull Request resolved: https://github.com/facebook/rocksdb/pull/7071 Reviewed By: akankshamahajan15 Differential Revision: D22362857 Pulled By: pdillinger fbshipit-source-id: ed63df3d74e9d4ce1604de8fe43b216166c7a3f0	5 years ago
Peter Dillinger	52d59e0c93	Revert "Whole DBTest to skip fsync (#7049 )" (#7070 ) Summary: This reverts commit `4f1534bdb0`. This commit caused failures and deadlocks in MultiThreadedDBTest.MultiThreaded/69 and others. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7070 Reviewed By: riversand963 Differential Revision: D22358778 Pulled By: pdillinger fbshipit-source-id: faf8f2cb469a7063a113921c8e9c64a9f7610dac	5 years ago
sdong	4f1534bdb0	Whole DBTest to skip fsync (#7049 ) Summary: After https://github.com/facebook/rocksdb/pull/7036, we still see extra DBTest that can timeout when running 10 or 20 in parallel. Expand skip-fsync mode in whole DBTest. Still preserve other tests from doing this mode to be conservative. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7049 Test Plan: Run all existing files. Reviewed By: pdillinger Differential Revision: D22301700 fbshipit-source-id: f9a9e3b3b26ce640665a47cb8bff33ba0c89b565	5 years ago
Akanksha Mahajan	5edfe3a3d8	Update Flush policy in PartitionedIndexBuilder on switching from user-key to internal-key mode (#7022 ) Summary: When format_version is high enough to support user-key and there are index entries for same user key that spans multiple data blocks then it changes from user-key mode to internal-key mode. But the flush policy is not reset to point to Block Builder of internal-keys. After this switch, no entries are added to user key index partition result, thus it never triggers flushing the block. Fix: After adding the entry in sub_builder_index_, if there is a switch from user-key to internal-key, then flush policy is updated to point to Block Builder of internal-keys index partition. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7022 Test Plan: 1. make check -j64 2. Added one unit test case Reviewed By: ajkr Differential Revision: D22197734 Pulled By: akankshamahajan15 fbshipit-source-id: d87e9e46bccab8e896ee6979d6b79c51f73d479e	5 years ago
Andrew Kryczka	c25a014792	deflake DBCompactionTestWithParam.IntraL0Compaction test (#7065 ) Summary: This check is flaky because compaction could run between the `Flush()` and the `TestGetTickerCount()`, which would increase the `BLOCK_CACHE_INDEX_MISS` count beyond what the test expects. Verified by adding a `sleep(1)` between those two lines and observing the counter is too high every time. The solution is just to remove this check as it doesn't have any use anyways. The latter check of index miss is sufficient to conclude the newest L0 file (i.e., the one generated by intra-L0) does not have its index block pinned in cache. It'd be nice to simultaneously check the L0 files generated by flush do have their index blocks pinned in cache, but that's not what the line deleted in this PR was checking.. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7065 Reviewed By: pdillinger Differential Revision: D22340327 Pulled By: ajkr fbshipit-source-id: e076b2c7228b7fa763dd0c0cb13828e176c1abee	5 years ago
Peter Dillinger	e2fd501d44	Stabilize DBTest.ApproximateSizesMemTable (#7064 ) Summary: Random memtable layouts could cause random failure, reproducible with command below running for a while. Test now using deterministic behavior. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7064 Test Plan: while ./db_test --gtest_filter=SizesMemTable; do true; done Reviewed By: siying Differential Revision: D22339442 Pulled By: pdillinger fbshipit-source-id: 8e74e5a9b5e88f7030854045a22c12cf561d5de6	5 years ago
mrambacher	80f71b5863	Use Libraries in the RocksDB Makefile Build (#6660 ) Summary: Change the linking of tests/tools to be against a library rather than a list of objects. This change substantially reduces the size of the objects produced. peterd clean repo size: 264M Before this change, with make all: 40G After this change, with make all: 28G With make LIB_MODE=shared all: 7.0G The list of TESTS was changed from being hard-coded to generated from the test sources variable. Note that there are some test sources that are not built as tests (though the set of tests is identical to the previous version). Added OBJ_DIR option to Makefile to allow objects to be placed in an alternative location. By default, OBJ_DIR is the same as before ("./"). This change is a precursor to being able to build/run the tests/tools linked against static libraries. Additionally, it should be possible to clean up and merge some of the rules for building tests and the like if so desired. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6660 Reviewed By: riversand963 Differential Revision: D22244463 Pulled By: pdillinger fbshipit-source-id: db9c6341d81ed62c2270374f4ede02fb9604c754	5 years ago
Levi Tamasi	e367bc7f4b	Clean up blob files based on the linked SST set (#7001 ) Summary: The earlier `VersionBuilder` code only cleaned up blob files that were marked as entirely consisting of garbage using `VersionEdits` with `BlobFileGarbage`. This covers the cases when table files go through regular compaction, where we iterate through the KVs and thus have an opportunity to calculate the amount of garbage (that is, most cases). However, it does not help when table files are simply dropped (e.g. deletion compactions or the `DeleteFile` API). To deal with such cases, the patch adds logic that cleans up all blob files at the head of the list until the first one with linked SSTs is found. (As an example, let's assume we have blob files with numbers 1..10, and the first one with any linked SSTs is number 8. This means that SSTs in the `Version` only rely on blob files with numbers >= 8, and thus 1..7 are no longer needed.) The code change itself is pretty small; however, changing the logic like this necessitated changes to some tests that have been added recently (namely to the ones that use blob files in isolation, i.e. without any table files referring to them). Some of these cases were fixed by bypassing `VersionBuilder` altogether in order to keep the tests simple (which actually makes them more proper unit tests as well), while the `VersionBuilder` unit tests were fixed by adding dummy table files to the test cases as needed. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7001 Test Plan: `make check` Reviewed By: riversand963 Differential Revision: D22119474 Pulled By: ltamasi fbshipit-source-id: c6547141355667d4291d9661d6518eb741e7b54a	5 years ago
sdong	80b107a0a9	Divide WriteCallbackTest.WriteWithCallbackTest (#7037 ) Summary: WriteCallbackTest.WriteWithCallbackTest has a deep for-loop and in some cases runs very long. Parameterimized it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7037 Test Plan: Run the test and see it passes. Reviewed By: ltamasi Differential Revision: D22269259 fbshipit-source-id: a1b6687b5bf4609754833d14cf383d68bc7ab27a	5 years ago
Burton Li	5be2cb6948	Compaction filter support for BlobDB (#6850 ) Summary: Added compaction filter support for BlobDB non-TTL values. Same as vanilla RocksDB, user compaction filter applies to all k/v pairs of the compaction for non-TTL values. It honors `min_blob_size`, which potentially results value transitions between inlined data and stored-in-blob data when size of value is changed. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6850 Reviewed By: siying Differential Revision: D22263487 Pulled By: ltamasi fbshipit-source-id: 8fc03f8cde2a5c831e63b436b3dbf1b7f90939e8	5 years ago
sdong	58547e533b	Disable fsync in some tests to speed them up (#7036 ) Summary: Fsyncing files is not providing more test coverage in many tests. Provide an option in SpecialEnv to turn it off to speed it up and enable this option in some tests with relatively long run time. Most of those tests can be divided as parameterized gtest too. This two speed up approaches are orthogonal and we can do both if needed. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7036 Test Plan: Run all tests and make sure they pass. Reviewed By: ltamasi Differential Revision: D22268084 fbshipit-source-id: 6d4a838a1b7328c13931a2a5d93de57aa02afaab	5 years ago
Anand Ananthabhotla	9a5886bd8c	Extend Get/MultiGet deadline support to table open (#6982 ) Summary: Current implementation of the ```read_options.deadline``` option only checks the deadline for random file reads during point lookups. This PR extends the checks to file opens, prefetches and preloads as part of table open. The main changes are in the ```BlockBasedTable```, partitioned index and filter readers, and ```TableCache``` to take ReadOptions as an additional parameter. In ```BlockBasedTable::Open```, in order to retain existing behavior w.r.t checksum verification and block cache usage, we filter out most of the options in ```ReadOptions``` except ```deadline```. However, having the ```ReadOptions``` gives us more flexibility to honor other options like verify_checksums, fill_cache etc. in the future. Additional changes in callsites due to function signature changes in ```NewTableReader()``` and ```FilePrefetchBuffer```. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6982 Test Plan: Add new unit tests in db_basic_test Reviewed By: riversand963 Differential Revision: D22219515 Pulled By: anand1976 fbshipit-source-id: 8a3b92f4a889808013838603aa3ca35229cd501b	5 years ago
Stanislav Tkach	1b85d57cf5	Expose KeyMayExist in the C API (#7021 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/7021 Reviewed By: ajkr Differential Revision: D22246297 Pulled By: pdillinger fbshipit-source-id: 81dfd0a49e4d5ce0c9f00772c17cca425757ea24	5 years ago
Yanqin Jin	d47c871190	Fix data race to VersionSet::io_status_ (#7034 ) Summary: After https://github.com/facebook/rocksdb/issues/6949 , VersionSet::io_status_ can be concurrently accessed by multiple threads without lock, causing tsan test to fail. For example, a bg flush thread resets io_status_ before calling LogAndApply(), while another thread already in the process of LogAndApply() reads io_status_. This is a bug. We do not have to reset io_status_ each time we call LogAndApply(). io_status_ is part of the state of VersionSet, and it indicates the outcome of preceding MANIFEST/CURRENT files IO operations. Its value should be updated only when: 1. MANIFEST/CURRENT files IO fail for the first time. 2. MANIFEST/CURRENT files IO succeed as part of recovering from a prior failure without process restart, e.g. calling Resume(). Test Plan (devserver): COMPILE_WITH_TSAN=1 make check COMPILE_WITH_TSAN=1 make db_test2 ./db_test2 --gtest_filter=DBTest2.CompactionStall Pull Request resolved: https://github.com/facebook/rocksdb/pull/7034 Reviewed By: zhichao-cao Differential Revision: D22247137 Pulled By: riversand963 fbshipit-source-id: 77b83e05390f3ee3cd2d96d3fdd6fe4f225e3216	5 years ago
sdong	f9817201af	Add unity build to CircleCI (#7026 ) Summary: We are still keeping unity build working. So it's a good idea to add to a pre-commit CI. A latest GCC docker image just to get a little bit more coverage. Fix three small issues to make it pass. Also make unity_test to run db_basic_test rather than db_test to cut the test time. There is no point to run expensive tests here. It was set to run db_test before db_basic_test was separated out. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7026 Test Plan: watch tests to pass. Reviewed By: zhichao-cao Differential Revision: D22223197 fbshipit-source-id: baa3b6cbb623bf359829b63ce35715c75bcb0ed4	5 years ago
Stanislav Tkach	70b5d95dc7	Add (more) getters for options to the C API (#6998 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6998 Reviewed By: ajkr Differential Revision: D22211700 Pulled By: pdillinger fbshipit-source-id: 1141c20527dee5e13205059bf8e83927063c4c1e	5 years ago
sdong	d64cf0e4ee	Move away from direct TmpDir() call in some tests (#7030 ) Summary: Some tests directly uses TmpDir() as temporary directory without adding any randomize factor. This would cause failures when tests run in parallel. Fix it by moving some of them to test::PerThreadDBPath() Pull Request resolved: https://github.com/facebook/rocksdb/pull/7030 Test Plan: Watch existing tests pass Reviewed By: zhichao-cao Differential Revision: D22224710 fbshipit-source-id: 28c9932fede0a4a64670e5b5fdb08f4fb5dccdd0	5 years ago
Zitan Chen	be41c61f22	Add a new option for BackupEngine to store table files under shared_checksum using DB session id in the backup filenames (#6997 ) Summary: `BackupableDBOptions::new_naming_for_backup_files` is added. This option is false by default. When it is true, backup table filenames under directory shared_checksum are of the form `<file_number>_<crc32c>_<db_session_id>.sst`. Note that when this option is true, it comes into effect only when both `share_files_with_checksum` and `share_table_files` are true. Three new test cases are added. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6997 Test Plan: Passed make check. Reviewed By: ajkr Differential Revision: D22098895 Pulled By: gg814 fbshipit-source-id: a1d9145e7fe562d71cde7ac995e17cb24fd42e76	5 years ago
Yanqin Jin	e66199d848	First step towards handling MANIFEST write error (#6949 ) Summary: This PR provides preliminary support for handling IO error during MANIFEST write. File write/sync is not guaranteed to be atomic. If we encounter an IOError while writing/syncing to the MANIFEST file, we cannot be sure about the state of the MANIFEST file. The version edits may or may not have reached the file. During cleanup, if we delete the newly-generated SST files referenced by the pending version edit(s), but the version edit(s) actually are persistent in the MANIFEST, then next recovery attempt will process the version edits(s) and then fail since the SST files have already been deleted. One approach is to truncate the MANIFEST after write/sync error, so that it is safe to delete the SST files. However, file truncation may not be supported on certain file systems. Therefore, we take the following approach. If an IOError is detected during MANIFEST write/sync, we disable file deletions for the faulty database. Depending on whether the IOError is retryable (set by underlying file system), either RocksDB or application can call `DB::Resume()`, or simply shutdown and restart. During `Resume()`, RocksDB will try to switch to a new MANIFEST and write all existing in-memory version storage in the new file. If this succeeds, then RocksDB may proceed. If all recovery is completed, then file deletions will be re-enabled. Note that multiple threads can call `LogAndApply()` at the same time, though only one of them will be going through the process MANIFEST write, possibly batching the version edits of other threads. When the leading MANIFEST writer finishes, all of the MANIFEST writing threads in this batch will have the same IOError. They will all call `ErrorHandler::SetBGError()` in which file deletion will be disabled. Possible future directions: - Add an `ErrorContext` structure so that it is easier to pass more info to `ErrorHandler`. Currently, as in this example, a new `BackgroundErrorReason` has to be added. Test plan (dev server): make check Pull Request resolved: https://github.com/facebook/rocksdb/pull/6949 Reviewed By: anand1976 Differential Revision: D22026020 Pulled By: riversand963 fbshipit-source-id: f3c68a2ef45d9b505d0d625c7c5e0c88495b91c8	5 years ago

... 8 9 10 11 12 ...

4542 Commits (f63331ebaf3493fda342da9b0dd86d8ab6c1bbe0)