rocksdb

Commit Graph

Author	SHA1	Message	Date
Maysam Yabandeh	dcb73e7735	WritePrepared: release snapshot equal to max (#4944 ) Summary: WritePrepared maintains a list of snapshots that are <= max_evicted_seq_. Based on this list, old_commit_map_ is updated if an evicted commit entry overlaps with such snapshot. Such lists are garbage collected when the release of snapshot is reported to WritePreparedTxnDB, which is the next time max_evicted_seq_ is updated and yet the snapshot is not found is the list returned from DB. This logic was broken since ReleaseSnapshotInternal was using "< max_evicted_seq_" to cleanup old_commit_map_, which would leave a snapshot uncleaned if it "= max_evicted_seq_". The patch fixes that and adds a unit test to check for the bug. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4944 Differential Revision: D13945000 Pulled By: maysamyabandeh fbshipit-source-id: 0c904294f735911f52348a148bf1f945282fc17c	7 years ago
Alexander Zinoviev	32a6dd9a41	Add a new CPU time counter to compaction report (#4889 ) Summary: Measure CPU time consumed for a compaction and report it in the stats report Enable NowCPUNanos() to work for MacOS Pull Request resolved: https://github.com/facebook/rocksdb/pull/4889 Differential Revision: D13701276 Pulled By: zinoale fbshipit-source-id: 5024e5bbccd4dd10fd90d947870237f436445055	7 years ago
Yi Wu	b1ad6ebba8	WritePrepared: fix two versions in compaction see different status for released snapshots (#4890 ) Summary: Fix how CompactionIterator::findEarliestVisibleSnapshots handles released snapshot. It fixing the two scenarios: Scenario 1: key1 has two values v1 and v2. There're two snapshots s1 and s2 taken after v1 and v2 are committed. Right after compaction output v2, s1 is released. Now findEarliestVisibleSnapshot may see s1 being released, and return the next snapshot, which is s2. That's larger than v2's earliest visible snapshot, which was s1. The fix: the only place we check against last snapshot and current key snapshot is when we decide whether to compact out a value if it is hidden by a later value. In the check if we see current snapshot is even larger than last snapshot, we know last snapshot is released, and we are safe to compact out current key. Scenario 2: key1 has two values v1 and v2. there are two snapshots s1 and s2 taken after v1 and v2 are committed. During compaction before we process the key, s1 is released. When compaction process v2, snapshot checker may return kSnapshotReleased, and the earliest visible snapshot for v2 become s2. When compaction process v1, snapshot checker may return kIsInSnapshot (for WritePrepared transaction, it could be because v1 is still in commit cache). The result will become inconsistent here. The fix: remember the set of released snapshots ever reported by snapshot checker, and ignore them when finding result for findEarliestVisibleSnapshot. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4890 Differential Revision: D13705538 Pulled By: maysamyabandeh fbshipit-source-id: e577f0d9ee1ff5a6035f26859e56902ecc85a5a4	7 years ago
Maysam Yabandeh	7fd9813b9f	WritePrepared: commit of delayed prepared entries (#4894 ) Summary: Here is the order of ops in a commit: 1) update commit cache 2) publish seq, 3) RemovePrepared. In case of a delayed prepared, there will be a gap between when the commit is visible to snapshots until delayed_prepared_ is cleaned up. To tell apart this case from a delayed uncommitted txn from, the commit entry of a delayed prepared is also stored in delayed_prepared_commits_, which is updated before publishing the commit. Also logic in GetSnapshotInternal that ensures that each new snapshot is always larger than max_evicted_seq_ is updated to check against the upcoming value of max_evicted_seq_ rather than its current one. This is because AdvanceMaxEvictedSeq gets the list of snapshots lower than the new max, before updating max_evicted_seq_. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4894 Differential Revision: D13726988 Pulled By: maysamyabandeh fbshipit-source-id: 1e70d78061b50c944c9816bf4b6dac405ab4ccd3	7 years ago
tom wang	73ff15c07b	WritePrepared: fix typo in comments Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4891 Differential Revision: D13718016 Pulled By: miasantreble fbshipit-source-id: 90bd372cff453a1c2d104c1cf49731d5dd770c14	7 years ago
Yanqin Jin	dd9eca1c58	Remove unused variable to fix clang compilation err (#4893 ) Summary: as title. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4893 Differential Revision: D13716733 Pulled By: riversand963 fbshipit-source-id: 6811d6a99fe2094d5344f854e8939f01238b2adb	7 years ago
Yi Wu	128f532858	WritePrepared: fix issue with snapshot released during compaction (#4858 ) Summary: Compaction iterator keep a copy of list of live snapshots at the beginning of compaction, and then query snapshot checker to verify if values of a sequence number is visible to these snapshots. However when the snapshot is released in the middle of compaction, the snapshot checker implementation (i.e. WritePreparedSnapshotChecker) may remove info with the snapshot and may report incorrect result, which lead to values being compacted out when it shouldn't. This patch conservatively keep the values if snapshot checker determines that the snapshots is released. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4858 Differential Revision: D13617146 Pulled By: maysamyabandeh fbshipit-source-id: cf18a94f6f61a94bcff73c280f117b224af5fbc3	7 years ago
Yi Wu	5d4fddfa52	WritePrepared: Fix visible key compacted out by compaction (#4883 ) Summary: With WritePrepared transaction, flush/compaction can contain uncommitted keys, and those keys can get committed during compaction. If a snapshot is taken before the key is committed, it should not see the key. On the other hand, compaction grab the list of snapshots at its beginning, and only consider those snapshots to dedup keys. Consider the case: ``` seq = 1: put "foo" = "bar" seq = 2: transaction T: delete "foo", prepare seq = 3: compaction start seq = 4: take snapshot S seq = 5: transaction T: commit. ... seq = N: compaction iterator reached key "foo". ``` When compaction start, the list of snapshot is empty. Compaction doesn't take snapshot S into account. When it reached "foo", transaction T is committed. Compaction may think the value "foo=bar" is not visible by any snapshot (which is wrong), and compact the value out. The fix is to explicitly take a snapshot before compaction grabbing the list of snapshots. Compaction will then has to keep keys visible to this snapshot. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4883 Differential Revision: D13668775 Pulled By: maysamyabandeh fbshipit-source-id: 1cab9615f94b7d3e8522cc3d44c3a14c7d4720e4	7 years ago
Maysam Yabandeh	cad99a6031	WritePrepared: snapshot should be larger than max_evicted_seq_ (#4886 ) Summary: The AdvanceMaxEvictedSeq algorithm assumes that new snapshots always have sequence number larger than the last max_evicted_seq_. To enforce this assumption we make two changes: i) max is not advanced beyond the last published seq, with the exception that the evicted commit entry itself is not published yet, which is quite rare. ii) When obtaining the snapshot if the max_evicted_seq_ is not published yet, commit a dummy entry so that it waits for it to be published and also increased the latest published seq by one above the max. To test these non-realistic corner cases we create a commit cache with size 1 so that every single commit results into eviction. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4886 Differential Revision: D13685270 Pulled By: maysamyabandeh fbshipit-source-id: 5461bc09c2a9b75798bfcb9853a256c81cdac0b0	7 years ago
Yi Wu	d50c10ed37	WritePrepared: Fix SmallestUnCommittedSeq() doesn't check delayed_prepared (#4867 ) Summary: When prepared_txns_ heap is empty, SmallestUnCommittedSeq() should check delayed_prepared_ set as well. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4867 Differential Revision: D13632134 Pulled By: maysamyabandeh fbshipit-source-id: b0423bb0a58dc95f1e636d5ed3f6e619df801fb7	7 years ago
Maysam Yabandeh	856ac24484	WritePrepared: fix race condition on GetSnapshotListFromDB (#4872 ) Summary: Fixes a typo that made mutex_ to remain unlocked when GetSnapshotListFromDB called from WritePreparedTxnDB. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4872 Differential Revision: D13640381 Pulled By: maysamyabandeh fbshipit-source-id: 50f6600568f9092b4b43115f6ebd96e6c7388ad7	7 years ago
Zhongyi Xie	6a4ec41fed	add assert to silence clang warning (#4871 ) Summary: currently clang analyze fails with the following warning: > utilities/transactions/write_prepared_transaction_test.cc:1451:5: warning: Forming reference to null pointer ASSERT_GT(wp_db->max_evicted_seq_, 0); // max after recovery Pull Request resolved: https://github.com/facebook/rocksdb/pull/4871 Differential Revision: D13638053 Pulled By: miasantreble fbshipit-source-id: b192b0c13c411c58defc9e280b34cdfcab3fa8e3	7 years ago
Maysam Yabandeh	f3a99e8a4d	WritePrepared: Report released snapshots in IsInSnapshot (#4856 ) Summary: Previously IsInSnapshot assumed that the snapshot is valid at the time that the function is called. However there are cases where that might not be valid. Example is background compactions where the compaction algorithm operates with a list of snapshots some of which might be released by the time they are being passed to IsInSnapshot. The patch make two changes to enable the caller to tell difference: i) any live snapshot below max is added to max_committed_seq_, which allows IsInSnapshot to confidently tell whether the passed snapshot is invalid if it below max, ii) extends IsInSnapshot API with a "released" variable that is set true when IsInSnapshot find no such snapshot below max and also find no other way to give a certain return value. In such cases the return value is true but the caller should also check the "released" boolean after the call. In short here is the changes in the API: i) If the snapshot is valid, no change is required. ii) If the snapshot might be invalid, a reference to "released" boolean must be passed to IsInSnapshot. ii-a) If snapshot is above max, IsInSnapshot can figure the return valid using the commit cache. ii-b) otherwise if snapshot is in old_commit_map_, IsInSnapshot can use that to tell if value was visible to the snapshot. ii-c) otherwise it sets "released" to true and returns true as well. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4856 Differential Revision: D13599847 Pulled By: maysamyabandeh fbshipit-source-id: 1752be28667f886a1efec8cae5714b9b7a8f1e0f	7 years ago
Maysam Yabandeh	cd227d74ba	WritePrepared: improve IsInSnapshotEmptyMapTest (#4853 ) Summary: IsInSnapshotEmptyMapTest tests that IsInSnapshot returns correct value for existing data after a recovery, where max is not zero and yet commit cache is empty. The existing test was preliminary which is improved in this patch. It also increases the db sequence after recovery so that there the snapshot immediately taken after recovery would have a sequence number different than that of max_evicted_seq. This simplifies the logic in IsInSnapshot by not having to consider the special case that an old snapshot might be equal to max_evicted_seq and yet not present in old_commit_map. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4853 Differential Revision: D13595223 Pulled By: maysamyabandeh fbshipit-source-id: 77c12ca8a3f61a47479a93bef2038ff502dc3322	7 years ago
Maysam Yabandeh	0ed98bf89e	WritePrepared: fix snapshot sequence in rollback (#4851 ) Summary: The rollback algorithm in WritePrepared transactions requires reading the values before the transaction start. Currently it uses the prepare_seq -1 as the snapshot sequence number for the read. This is not correct since the passed sequence number must be for a valid snapshot. The patch fixes it by passing kMaxSequenceNumber instead. This is fine since all the writes done by the aborted transaction will be skipped during the read anyway. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4851 Differential Revision: D13592773 Pulled By: maysamyabandeh fbshipit-source-id: ff1bf92ea9909d4cccb173bdff49febc0e9eb7a2	7 years ago
Faustin Lammler	7d65bd5ce4	Fix spelling errors (#4827 ) Summary: Hi, Lintian, the Debian package checker complains about spelling error (spelling-error-in-binary). See https://salsa.debian.org/mariadb-team/mariadb-10.3/-/jobs/98380 Pull Request resolved: https://github.com/facebook/rocksdb/pull/4827 Differential Revision: D13566362 Pulled By: riversand963 fbshipit-source-id: cd4e9212133c73b0591030de6cdedaa47575968d	7 years ago
DorianZheng	2670fe8c73	Get `CompactionJobInfo` from CompactFiles Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4716 Differential Revision: D13207677 Pulled By: ajkr fbshipit-source-id: d0ccf5a66df6cbb07288b0c5ebad81fd9df3926b	7 years ago
Maysam Yabandeh	b878f93c70	Extend Transaction::GetForUpdate with do_validate (#4680 ) Summary: Transaction::GetForUpdate is extended with a do_validate parameter with default value of true. If false it skips validating the snapshot (if there is any) before doing the read. After the read it also returns the latest value (expects the ReadOptions::snapshot to be nullptr). This allows RocksDB applications to use GetForUpdate similarly to how InnoDB does. Similarly ::Merge, ::Put, ::Delete, and ::SingleDelete are extended with assume_exclusive_tracked with default value of false. It true it indicates that call is assumed to be after a ::GetForUpdate(do_validate=false). The Java APIs are accordingly updated. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4680 Differential Revision: D13068508 Pulled By: maysamyabandeh fbshipit-source-id: f0b59db28f7f6a078b60844d902057140765e67d	7 years ago
Zhongyi Xie	2f1ca4e838	Revert "BaseDeltaIterator: always check valid() before accessing key(… (#4744 ) Summary: …) (#4702)" This reverts commit `3a18bb3e15`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4744 Differential Revision: D13311869 Pulled By: miasantreble fbshipit-source-id: 6300b12cc34828d8b9274e907a3aef1506d5d553	7 years ago
Zhongyi Xie	3a18bb3e15	BaseDeltaIterator: always check valid() before accessing key() (#4702 ) Summary: Current implementation of `current_over_upper_bound_` fails to take into consideration that keys might be invalid in either base iterator or delta iterator. Calling key() in such scenario will lead to assertion failure and runtime errors. This PR addresses the bug by adding check for valid keys before calling `IsOverUpperBound()`, also added test coverage for iterate_upper_bound usage in BaseDeltaIterator Also recommit https://github.com/facebook/rocksdb/pull/4656 (It was reverted earlier due to bugs) Pull Request resolved: https://github.com/facebook/rocksdb/pull/4702 Differential Revision: D13146643 Pulled By: miasantreble fbshipit-source-id: 6d136929da12d0f2e2a5cea474a8038ec5cdf1d0	7 years ago
Maysam Yabandeh	f1b0841f06	WritePrepared: followup fix for snapshot double release issue (#4734 ) Summary: The fix in #4727 for double snapshot release was incomplete since it does not properly remove the duplicate entires in the snapshot list after finding that a snapshot is still valid. The patch does that and also improves the unit test to show the issue. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4734 Differential Revision: D13266260 Pulled By: maysamyabandeh fbshipit-source-id: 351e2c40cca45a87b757774c11af74182314911e	7 years ago
Maysam Yabandeh	1a5a93ff74	WritePrepared: Fix double snapshot release issue (#4727 ) Summary: Currently the garbage collection of items in old_commit_map_ was done upon ::ReleaseSnapshot. The assumption behind this method was that the sequence number of snapshots are unique, which is incorrect. In the very rare cases that two consecutive snapshot have the same sequence number this could lead the release of the first snapshot affect the old_commit_map_ that is necessary to service the reads of the second snapshot. The bug would be triggered only if i) two snapshot have the same seq, ii) both of them are very old (older than the last ~4m transactions), and iii) there is commit entry overlapping with the snapshot seq number. It is fixed by doing the cleanup of old_commit_map_ in UpdateSnapshot: the new list of snapshots are compared with the old one and the missing sequence numbers are concluded released. If two snapshots have the same seq number, after the release of one of them, the seq number still appears in the snapshot least and thus not cleaned up prematurely. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4727 Differential Revision: D13246495 Pulled By: maysamyabandeh fbshipit-source-id: 93b87a5042afd8060889df245526d3f5d29de9fe	7 years ago
Zhongyi Xie	a21cb22ee3	Revert "apply ReadOptions.iterate_upper_bound to transaction iterator… (#4705 ) Summary: … (#4656)" This reverts commit `b76398a82b`. Will add test coverage for iterate_upper_bound before re-commit b76398 Pull Request resolved: https://github.com/facebook/rocksdb/pull/4705 Differential Revision: D13148592 Pulled By: miasantreble fbshipit-source-id: 4d1ce0bfd9f7a5359a7688bd780eb06a66f45b1f	7 years ago
Zhongyi Xie	b76398a82b	apply ReadOptions.iterate_upper_bound to transaction iterator (#4656 ) Summary: Currently transaction iterator does not apply `ReadOptions.iterate_upper_bound` when iterating. This PR attempts to fix the problem by having `BaseDeltaIterator` enforcing the upper bound check when iterator state is changed. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4656 Differential Revision: D13039257 Pulled By: miasantreble fbshipit-source-id: 909eb9f6b4597a4d80418fb139f32ec82c6ec1d1	7 years ago
Sagar Vemuri	dc3528077a	Update all unique/shared_ptr instances to be qualified with namespace std (#4638 ) Summary: Ran the following commands to recursively change all the files under RocksDB: ``` find . -type f -name ".cc" -exec sed -i 's/ unique_ptr/ std::unique_ptr/g' {} + find . -type f -name ".cc" -exec sed -i 's/<unique_ptr/<std::unique_ptr/g' {} + find . -type f -name ".cc" -exec sed -i 's/ shared_ptr/ std::shared_ptr/g' {} + find . -type f -name ".cc" -exec sed -i 's/<shared_ptr/<std::shared_ptr/g' {} + ``` Running `make format` updated some formatting on the files touched. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4638 Differential Revision: D12934992 Pulled By: sagar0 fbshipit-source-id: 45a15d23c230cdd64c08f9c0243e5183934338a8	7 years ago
Siying Dong	566fc8b994	Black list some valgrind tests (#4642 ) Summary: valgrind tests with 1 thread run too long. To make it shorter, black list some long tests. These are already blacklisted in parallel valgrind tests, but they are not in non-parallel mode Pull Request resolved: https://github.com/facebook/rocksdb/pull/4642 Differential Revision: D12945237 Pulled By: siying fbshipit-source-id: 04cf977d435996480fe87aa09f14b17975b74f7d	7 years ago
Maysam Yabandeh	2b5b7bc795	WritePrepared: Fix bug in searching in non-cached snapshots (#4639 ) Summary: When evicting an entry form the commit_cache, it is verified against the list of old snapshots to see if it overlaps with any. The list of old snapshots is split into two lists: an efficient concurrent cache and an slow vector protected by a lock. The patch fixes a bug that would stop the search in the cache if it finds any and yet would not include the larger snapshots in the slower list. An extra info log entry is also removed. The condition to trigger that although very rare is still feasible and should not spam the LOG when that happens. Fixes #4621 Pull Request resolved: https://github.com/facebook/rocksdb/pull/4639 Differential Revision: D12934989 Pulled By: maysamyabandeh fbshipit-source-id: 4e0fe8147ba292b554ae78e94c21c2ef31e03e2d	7 years ago
Simon Grätzer	ad21b1af52	Set WriteCommitted txn id to commit sequence number (#4565 ) Summary: SetId and GetId are the experimental API that so far being used in WritePrepared and WriteUnPrepared transactions, where the id is assigned at the prepare time. The patch extends the API to WriteCommitted transactions, by setting the id at commit time. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4565 Differential Revision: D10557862 Pulled By: maysamyabandeh fbshipit-source-id: 2b27a140682b6185a4988fa88f8152628e0d67af	7 years ago
jsteemann	d1c0d3f358	Small issues (#4564 ) Summary: Couple of very minor improvements (typos in comments, full qualification of class name, reordering members of a struct to make it smaller) Pull Request resolved: https://github.com/facebook/rocksdb/pull/4564 Differential Revision: D10510183 Pulled By: maysamyabandeh fbshipit-source-id: c7ddf9bfbf2db08cd31896c3fd93789d3fa68c8b	7 years ago
Maysam Yabandeh	3f5282268f	Skip concurrency control during recovery of pessimistic txn (#4346 ) Summary: TransactionOptions::skip_concurrency_control allows pessimistic transactions to skip the overhead of concurrency control. This could be as an optimization if the application knows that the transaction would not have any conflict with concurrent transactions. It is currently used during recovery assuming (i) application guarantees no conflict between prepared transactions in the WAL (ii) application guarantees that recovered transactions will be rolled back/commit before new transactions start. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4346 Differential Revision: D9759149 Pulled By: maysamyabandeh fbshipit-source-id: f896e84fa58b0b584be904c7fd3883a41ea3215b	7 years ago
cngzhnp	64324e329e	Support pragma once in all header files and cleanup some warnings (#4339 ) Summary: As you know, almost all compilers support "pragma once" keyword instead of using include guards. To be keep consistency between header files, all header files are edited. Besides this, try to fix some warnings about loss of data. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4339 Differential Revision: D9654990 Pulled By: ajkr fbshipit-source-id: c2cf3d2d03a599847684bed81378c401920ca848	7 years ago
Mikhail Antonov	927f274939	Avoiding write stall caused by manual flushes (#4297 ) Summary: Basically at the moment it seems it's possible to cause write stall by calling flush (either manually vis DB::Flush(), or from Backup Engine directly calling FlushMemTable() while background flush may be already happening. One of the ways to fix it is that in DBImpl::CompactRange() we already check for possible stall and delay flush if needed before we actually proceed to call FlushMemTable(). We can simply move this delay logic to separate method and call it from FlushMemTable. This is draft patch, for first look; need to check tests/update SyncPoints and most certainly would need to add allow_write_stall method to FlushOptions(). Pull Request resolved: https://github.com/facebook/rocksdb/pull/4297 Differential Revision: D9420705 Pulled By: mikhail-antonov fbshipit-source-id: f81d206b55e1d7b39e4dc64242fdfbceeea03fcc	7 years ago
Yi Wu	4f12d49daf	Suppress clang analyzer error (#4299 ) Summary: Suppress multiple clang-analyzer error. All of them are clang false-positive. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4299 Differential Revision: D9430740 Pulled By: yiwu-arbug fbshipit-source-id: fbdd575bdc214d124826d61d35a117995c509279	7 years ago
jsteemann	90f744941d	adds missing PopSavePoint method to Transaction (#4256 ) Summary: Transaction has had methods to deal with SavePoints already, but was missing the PopSavePoint method provided by WriteBatch and WriteBatchWithIndex. This PR adds PopSavePoint to Transaction as well. Having the method on Transaction-level too is useful for applications that repeatedly execute a sequence of operations that normally succeed, but infrequently need to get rolled back. Using SavePoints here is sensible, but as operations normally succeed the application may pile up a lot of useless SavePoints inside a Transaction, leading to slightly increased memory usage for managing the unneeded SavePoints. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4256 Differential Revision: D9326932 Pulled By: yiwu-arbug fbshipit-source-id: 53a0af18a6c7e87feff8a56f1f3eab9df7f371d6	7 years ago
Siying Dong	9c0c8f5ff6	GetAllKeyVersions() to take an extra argument of `max_num_ikeys`. (#4271 ) Summary: Right now, `ldb idump` may have memory out of control if there is a big range of tombstones. Add an option to cut maxinum number of keys in GetAllKeyVersions(), and push down --max_num_ikeys from ldb. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4271 Differential Revision: D9369149 Pulled By: siying fbshipit-source-id: 7cbb797b7d2fa16573495a7e84937456d3ff25bf	7 years ago
Maysam Yabandeh	caf0f53a74	Index value delta encoding (#3983 ) Summary: Given that index value is a BlockHandle, which is basically an <offset, size> pair we can apply delta encoding on the values. The first value at each index restart interval encoded the full BlockHandle but the rest encode only the size. Refer to IndexBlockIter::DecodeCurrentValue for the detail of the encoding. This reduces the index size which helps using the block cache more efficiently. The feature is enabled with using format_version 4. The feature comes with a bit of cpu overhead which should be paid back by the higher cache hits due to smaller index block size. Results with sysbench read-only using 4k blocks and using 16 index restart interval: Format 2: 19585 rocksdb read-only range=100 Format 3: 19569 rocksdb read-only range=100 Format 4: 19352 rocksdb read-only range=100 Pull Request resolved: https://github.com/facebook/rocksdb/pull/3983 Differential Revision: D8361343 Pulled By: maysamyabandeh fbshipit-source-id: f882ee082322acac32b0072e2bdbb0b5f854e651	7 years ago
Manuel Ung	ea212e5316	WriteUnPrepared: Implement unprepared batches for transactions (#4104 ) Summary: This adds support for writing unprepared batches based on size defined in `TransactionOptions::max_write_batch_size`. This is done by overriding methods that modify data (Put/Delete/SingleDelete/Merge) and checking first if write batch size has exceeded threshold. If so, the write batch is written to DB as an unprepared batch. Support for Commit/Rollback for unprepared batch is added as well. This has been done by simply extending the WritePrepared Commit/Rollback logic to take care of all unprep_seq numbers either when updating prepare heap, or adding to commit map. For updating the commit map, this logic exists inside `WriteUnpreparedCommitEntryPreReleaseCallback`. A test change was also made to have transactions unregister themselves when committing without prepare. This is because with write unprepared, there may be unprepared entries (which act similarly to prepared entries) already when a commit is done without prepare. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4104 Differential Revision: D8785717 Pulled By: lth fbshipit-source-id: c02006e281ec1ce00f628e2a7beec0ee73096a91	7 years ago
Maysam Yabandeh	8581a93a6b	Per-thread unique test db names (#4135 ) Summary: The patch makes sure that two parallel test threads will operate on different db paths. This enables using open source tools such as gtest-parallel to run the tests of a file in parallel. Example: ``` ~/gtest-parallel/gtest-parallel ./table_test``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/4135 Differential Revision: D8846653 Pulled By: maysamyabandeh fbshipit-source-id: 799bad1abb260e3d346bcb680d2ae207a852ba84	7 years ago
Fosco Marotto	8527012bb6	Converted db/merge_test.cc to use gtest (#4114 ) Summary: Picked up a task to convert this to use the gtest framework. It can't be this simple, can it? It works, but should all the std::cout be removed? ``` [$] ~/git/rocksdb [gft !]: ./merge_test [==========] Running 2 tests from 1 test case. [----------] Global test environment set-up. [----------] 2 tests from MergeTest [ RUN ] MergeTest.MergeDbTest Test read-modify-write counters... a: 3 1 2 a: 3 b: 1225 3 Compaction started ... Compaction ended a: 3 b: 1225 Test merge-based counters... a: 3 1 2 a: 3 b: 1225 3 Test merge in memtable... a: 3 1 2 a: 3 b: 1225 3 Test Partial-Merge Test merge-operator not set after reopen [ OK ] MergeTest.MergeDbTest (93 ms) [ RUN ] MergeTest.MergeDbTtlTest Opening database with TTL Test read-modify-write counters... a: 3 1 2 a: 3 b: 1225 3 Compaction started ... Compaction ended a: 3 b: 1225 Test merge-based counters... a: 3 1 2 a: 3 b: 1225 3 Test merge in memtable... Opening database with TTL a: 3 1 2 a: 3 b: 1225 3 Test Partial-Merge Opening database with TTL Opening database with TTL Opening database with TTL Opening database with TTL Test merge-operator not set after reopen [ OK ] MergeTest.MergeDbTtlTest (97 ms) [----------] 2 tests from MergeTest (190 ms total) [----------] Global test environment tear-down [==========] 2 tests from 1 test case ran. (190 ms total) [ PASSED ] 2 tests. ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/4114 Differential Revision: D8822886 Pulled By: gfosco fbshipit-source-id: c299d008e883c3bb911d2b357a2e9e4423f8e91a	7 years ago
Maysam Yabandeh	537a233941	Exclude StackableDB from transaction stress tests (#4132 ) Summary: The transactions are currently tested with and without using StackableDB. This is mostly to check that the code path is consistent with stackable db as well. Slow, stress tests however do not benefit from being run again with StackableDB. The patch excludes StackableDB from such tests. On a single core it reduced the runtime of transaction_test from 199s to 135s. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4132 Differential Revision: D8841655 Pulled By: maysamyabandeh fbshipit-source-id: 7b9aaba2673b542b195439dfb306cef26bd63b19	7 years ago
Manuel Ung	b9846370e9	WriteUnPrepared: Add support for recovering WriteUnprepared transactions (#4078 ) Summary: This adds support for recovering WriteUnprepared transactions through the following changes: - The information in `RecoveredTransaction` is extended so that it can reference multiple batches. - `MarkBeginPrepare` is extended with a bool indicating whether it is an unprepared begin, and this is passed down to `InsertRecoveredTransaction` to indicate whether the current transaction is prepared or not. - `WriteUnpreparedTxnDB::Initialize` is overridden so that it will rollback unprepared transactions from the recovered transactions. This can be done without updating the prepare heap/commit map, because this is before the DB has finished initializing, and after writing the rollback batch, those data structures should not contain information about the rolled back transaction anyway. Commit/Rollback of live transactions is still unimplemented and will come later. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4078 Differential Revision: D8703382 Pulled By: lth fbshipit-source-id: 7e0aada6c23bd39299f1f20d6c060492e0e6b60a	7 years ago
Daniel Black	36fa49ceb5	transaction_test: -Wunused-variable with clang-7 (#4074 ) Summary: clang version 7.0.0- (trunk) Target: x86_64-unknown-linux-gnu Thread model: posix InstalledDir: /usr/bin clang++-7 -DROCKSDB_USE_RTTI -g -W -Wextra -Wall -Wsign-compare -Wshadow -Wno-unused-parameter -Werror -I. -I./include -std=c++11 -DROCKSDB_PLATFORM_POSIX -DROCKSDB_LIB_IO_POSIX -DOS_LINUX -fno-builtin-memcmp -DROCKSDB_FALLOCATE_PRESENT -DSNAPPY -DGFLAGS=google -DZLIB -DBZIP2 -DROCKSDB_MALLOC_USABLE_SIZE -DROCKSDB_PTHREAD_ADAPTIVE_MUTEX -DROCKSDB_BACKTRACE -DROCKSDB_RANGESYNC_PRESENT -DROCKSDB_SCHED_GETCPU_PRESENT -Wshorten-64-to-32 -march=native -DHAVE_SSE42 -DROCKSDB_SUPPORT_THREAD_LOCAL -isystem ./third-party/gtest-1.7.0/fused-src -DTRAVIS -O2 -fno-omit-frame-pointer -momit-leaf-frame-pointer -Woverloaded-virtual -Wnon-virtual-dtor -Wno-missing-field-initializers -c utilities/transactions/transaction_test.cc -o utilities/transactions/transaction_test.o utilities/transactions/transaction_test.cc:2282:22: error: unused variable 'txn_options' [-Werror,-Wunused-variable] TransactionOptions txn_options; ^ utilities/transactions/transaction_test.cc:2822:22: error: unused variable 'txn_options' [-Werror,-Wunused-variable] TransactionOptions txn_options; ^ utilities/transactions/transaction_test.cc:2928:22: error: unused variable 'txn_options' [-Werror,-Wunused-variable] TransactionOptions txn_options; ^ utilities/transactions/transaction_test.cc:3109:22: error: unused variable 'txn_options' [-Werror,-Wunused-variable] TransactionOptions txn_options; ^ utilities/transactions/transaction_test.cc:4364:22: error: unused variable 'txn_options' [-Werror,-Wunused-variable] TransactionOptions txn_options; ^ Closes https://github.com/facebook/rocksdb/pull/4074 Differential Revision: D8698051 Pulled By: ajkr fbshipit-source-id: 6255618eefdd189962fbea1b02cf1eb5ae501274	7 years ago
Manuel Ung	8ad63a4b86	WriteUnPrepared: Add new WAL marker kTypeBeginUnprepareXID (#4069 ) Summary: This adds a new WAL marker of type kTypeBeginUnprepareXID. Also, DBImpl now contains a field called batch_per_txn (meaning one WriteBatch per transaction, or possibly multiple WriteBatches). This would also indicate that this DB is using WriteUnprepared policy. Recovery code would be able to make use of this extra field on DBImpl in a separate diff. For now, it is just used to determine whether the WAL is compatible or not. Closes https://github.com/facebook/rocksdb/pull/4069 Differential Revision: D8675099 Pulled By: lth fbshipit-source-id: ca27cae1738e46d65f2bb92860fc759deb874749	7 years ago
chouxi	818c84e116	Store timestamp in deadlock detection (#4060 ) Summary: - Summary Add timestamp into the DeadlockInfo to store the timestamp when deadlock detected on the rocksdb side. - Testplan: `make check -j64` Closes https://github.com/facebook/rocksdb/pull/4060 Differential Revision: D8655380 Pulled By: chouxi fbshipit-source-id: f58e1aa5e09eb1d1eed0a181d4e2304aaf01efe8	7 years ago
Manuel Ung	a16e00b7b9	WriteUnPrepared Txn: Disable seek to snapshot optimization (#3955 ) Summary: This is implemented by extending ReadCallback with another function `MaxUnpreparedSequenceNumber` which returns the largest visible sequence number for the current transaction, if there is uncommitted data written to DB. Otherwise, it returns zero, indicating no uncommitted data. There are the places where reads had to be modified. - Get and Seek/Next was just updated to seek to max(snapshot_seq, MaxUnpreparedSequenceNumber()) instead, and iterate until a key was visible. - Prev did not need need updates since it did not use the Seek to sequence number optimization. Assuming that locks were held when writing unprepared keys, and ValidateSnapshot runs, there should only be committed keys and unprepared keys of the current transaction, all of which are visible. Prev will simply iterate to get the last visible key. - Reseeking to skip keys optimization was also disabled for write unprepared, since it's possible to hit the max_skip condition even while reseeking. There needs to be some way to resolve infinite looping in this case. Closes https://github.com/facebook/rocksdb/pull/3955 Differential Revision: D8286688 Pulled By: lth fbshipit-source-id: 25e42f47fdeb5f7accea0f4fd350ef35198caafe	7 years ago
Manuel Ung	ab2254bedf	Fix clang analyze Summary: This fixes the errors as reported here: https://github.com/facebook/rocksdb/pull/3941#issuecomment-394424043 Closes https://github.com/facebook/rocksdb/pull/3950 Differential Revision: D8263086 Pulled By: lth fbshipit-source-id: 5e148d489cab2153e5846d16979a0a1f2d677d57	7 years ago
Manuel Ung	01e3c30def	Extend existing unit tests to run with WriteUnprepared as well Summary: As titled. I have not extended the Compatibility tests because the new WAL markers are still unimplemented. Closes https://github.com/facebook/rocksdb/pull/3941 Differential Revision: D8238394 Pulled By: lth fbshipit-source-id: 980e3d44837bbf2cfa64047f9738f559dfac4b1d	7 years ago
Manuel Ung	aaac6cd16f	Add write unprepared classes by inheriting from write prepared Summary: Closes https://github.com/facebook/rocksdb/pull/3907 Differential Revision: D8218325 Pulled By: lth fbshipit-source-id: ff32d8dab4a159cd2762876cba4b15e3dc51ff3b	7 years ago
Siying Dong	4dd80debd0	Remove tests from ROCKSDB_VALGRIND_RUN Summary: In order to make valgrind check test to pass in a day, remove some tests that run prohibitively slow under valgrind. Closes https://github.com/facebook/rocksdb/pull/3924 Differential Revision: D8210184 Pulled By: siying fbshipit-source-id: 5b06fb08f3cf57571d422d05a0dbddc9f9376f7a	7 years ago
Maysam Yabandeh	66c7aa32fb	Clarify the ownership of root db after TransactionDB::Open Summary: The patch clarifies the ownership of the root db after TransactionDB::Open. If it is a success the ownership if with the TransactionDB, and the root db will be deleted when the destructor of the base class, StackableDB, is called. If it is failure, the temporarily created root db will also be deleted properly. The patch also includes lots of useful formatting changes. Closes https://github.com/facebook/rocksdb/pull/3714 upon which this patch is built. Closes https://github.com/facebook/rocksdb/pull/3806 Differential Revision: D7878010 Pulled By: maysamyabandeh fbshipit-source-id: f54f3942e29434143ae5a2423ceec9c7072cd4c2	8 years ago

... 3 4 5 6 7 ...

427 Commits (5cf6ab6f315e2506171aad2504638a7da9af7d1e)