rocksdb

Commit Graph

Author	SHA1	Message	Date
Michael R. Crusoe	051696bf98	fix some spelling typos (#6464 ) Summary: Found from Debian's "Lintian" program Pull Request resolved: https://github.com/facebook/rocksdb/pull/6464 Differential Revision: D20162862 Pulled By: zhichao-cao fbshipit-source-id: 06941ee2437b038b2b8045becbe9d2c6fbff3e12	5 years ago
Manuel Ung	41535d0218	WriteUnPrepared: Pass in correct subbatch count during rollback (#6463 ) Summary: Today `WriteUnpreparedTxn::RollbackInternal` will write the rollback batch assuming that there is only a single subbatch. However, because untracked_keys_ are currently not deduplicated, it's possible for duplicate keys to exist, and thus split the batch. Also, tracked_keys_ also does not support compators outside of the bytewise comparators, so it's possible for duplicates to occur there as well. To solve this, just pass in the correct subbatch count. Also, removed `WriteUnpreparedRollbackPreReleaseCallback` to unify the Commit/Rollback codepaths some more. Also, fixed a bug in `CommitInternal` where if 1. two_write_queue is true and 2. include_data is true, then `WriteUnpreparedCommitEntryPreReleaseCallback` ends up calling `AddCommitted` on the commit time write batch a second time on the second write. To fix, `WriteUnpreparedCommitEntryPreReleaseCallback` is re-initialized. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6463 Differential Revision: D20150153 Pulled By: lth fbshipit-source-id: df0b42d39406c75af73df995aa1138f0db539cd1	5 years ago
sdong	fdf882ded2	Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433 ) Summary: When dynamically linking two binaries together, different builds of RocksDB from two sources might cause errors. To provide a tool for user to solve the problem, the RocksDB namespace is changed to a flag which can be overridden in build time. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6433 Test Plan: Build release, all and jtest. Try to build with ROCKSDB_NAMESPACE with another flag. Differential Revision: D19977691 fbshipit-source-id: aa7f2d0972e1c31d75339ac48478f34f6cfcfb3e	5 years ago
Manuel Ung	dc23c125c3	WriteUnPrepared: Untracked keys (#6404 ) Summary: For write unprepared, some applications may bypass the transaction api, and write keys directly into the write batch. However, since they are not tracked, rollbacks (both for savepoint and transaction) are not aware that these keys have to be rolled back. The fix is to track them in `WriteUnpreparedTxn::untracked_keys_`. This is populated whenever we flush unprepared batches into the DB. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6404 Differential Revision: D19842023 Pulled By: lth fbshipit-source-id: a9edfc643d5c905fc89da9a9a9094d30c9b70108	5 years ago
wolfkdy	29e24434fe	refine code (#6420 ) Summary: I create a new branch from the branch new upsteram/master and "git merge --squash". Maybe it will fix everything. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6420 Differential Revision: D19897152 Pulled By: zhichao-cao fbshipit-source-id: 6575d9e3b23e360f42ee1480b43028b5fcc20136	5 years ago
Manuel Ung	fb571509a7	WriteUnPrepared: Enable WAL during crash recovery (#6418 ) Summary: Unfortunately, it seems like mysqld reuses xids across machine restarts. When that happens, we could have something like the following happening: ``` BEGIN_PREPARE(unprepared) Put(a) END_PREPARE(xid = 1) -- crash and recover with Put(a) rolled back as it was not prepared BEGIN_PREPARE(prepared) Put(b) END_PREPARE(xid = 1) COMMIT(xid = 1) -- crash and recover with both a, b ``` To solve this, we will have to log the rollback batch into the WAL during recovery. WritePrepared already logs the rollback batch into the WAL, if a rollback happens after prepare, so there is no problem there. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6418 Differential Revision: D19896151 Pulled By: lth fbshipit-source-id: 2ff65ddc5fe75efd57736fed4b7cd7a109d26609	5 years ago
sdong	ac8e89a443	Should flush and sync WAL when writing it in DB::Open() (#6417 ) Summary: A recent fix related to 2pc https://github.com/facebook/rocksdb/pull/6313/ writes something to WAL, but does not flush or sync. This causes assertion failure "impl->TEST_WALBufferIsEmpty()" if manual_wal_flush = true. We should fsync the entry to make sure a second power reset can recover. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6417 Test Plan: Add manual_wal_flush=true case in TransactionTest.DoubleCrashInRecovery and fix a bug in the test so that the bug can be reproduced. It passes with the fix. Differential Revision: D19894537 fbshipit-source-id: f1e84e49e2269f583c6019743118292cd8b6598e	5 years ago
Yanqin Jin	f2fbc5d668	Shorten certain test names to avoid infra failure (#6352 ) Summary: Unit test names, together with other components, are used to create log files during some internal testing. Overly long names cause infra failure due to file names being too long. Look for internal tests. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6352 Differential Revision: D19649307 Pulled By: riversand963 fbshipit-source-id: 6f29de096e33c0eaa87d9c8702f810eda50059e7	5 years ago
Maysam Yabandeh	2f973ca96e	Double Crash in kPointInTimeRecovery with TransactionDB (#6313 ) Summary: In WritePrepared there could be gap in sequence numbers. This breaks the trick we use in kPointInTimeRecovery which assume the first seq in the log right after the corrupted log is one larger than the last seq we read from the logs. To let this trick keep working, we add a dummy entry with the expected sequence to the first log right after recovery. Also in WriteCommitted, if the log right after the corrupted log is empty, since it has no sequence number to let the sequential trick work, it is assumed as unexpected behavior. This is however expected to happen if we close the db after recovering from a corruption and before writing anything new to it. To remedy that, we apply the same technique by writing a dummy entry to the log that is created after the corrupted log. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6313 Differential Revision: D19458291 Pulled By: maysamyabandeh fbshipit-source-id: 09bc49e574690085df45b034ca863ff315937e2d	5 years ago
Maysam Yabandeh	eff5e076f5	unordered_write incompatible with max_successive_merges (#6284 ) Summary: unordered_write is incompatible with non-zero max_successive_merges. Although we check this at runtime, we currently don't prevent the user from setting this combination in options. This has led to stress tests to fail with this combination is tried in ::SetOptions. The patch fixes that and also reverts the changes performed by https://github.com/facebook/rocksdb/pull/6254, in which max_successive_merges was mistakenly declared incompatible with unordered_write. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6284 Differential Revision: D19356115 Pulled By: maysamyabandeh fbshipit-source-id: f06dadec777622bd75f267361c022735cf8cecb6	5 years ago
Maysam Yabandeh	5709e97a74	Skip CancelAllBackgroundWork if DBImpl is already closed (#6268 ) Summary: WritePreparedTxnDB calls CancelAllBackgroundWork in its destructor to avoid dangling references to it from background job's SnapshotChecker callback. However, if the DBImpl is already closed, the info log might be closed with it, which causes memory leak when CancelAllBackgroundWork tries to print to the info log. The patch fixes that by calling CancelAllBackgroundWork only if the db is not closed already. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6268 Differential Revision: D19303439 Pulled By: maysamyabandeh fbshipit-source-id: 4228a6be7e78d43c90630347baa89b008200bd15	5 years ago
wolfkdy	1ab1231acf	parallel occ (#6240 ) Summary: This is a continuation of https://github.com/facebook/rocksdb/pull/5320/files I open a new mr for these purposes, half a year has past since the old mr is posted so it's almost impossible to fulfill some points below on the old mr, especially 5) 1) add validation modes for optimistic txns 2) modify unittests to test both modes 3) make format 4) refine hash functor 5) push to master Pull Request resolved: https://github.com/facebook/rocksdb/pull/6240 Differential Revision: D19301296 fbshipit-source-id: 5b5b3cbd39558f43947f7d2dec6cd31a06386edb	5 years ago
Maysam Yabandeh	48a678b7c9	Prevent an incompatible combination of options (#6254 ) Summary: allow_concurrent_memtable_write is incompatible with non-zero max_successive_merges. Although we check this at runtime, we currently don't prevent the user from setting this combination in options. This has led to stress tests to fail with this combination is tried in ::SetOptions. The patch fixes that. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6254 Differential Revision: D19265819 Pulled By: maysamyabandeh fbshipit-source-id: 47f2e2dc26fe0972c7152f4da15dadb9703f1179	5 years ago
anand1976	1be48cb895	Fix crash in Transaction::MultiGet() when num_keys > 32 Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6192 Test Plan: Add a unit test that fails without the fix and passes now make check Differential Revision: D19124781 Pulled By: anand1976 fbshipit-source-id: 8c8cb6fa16c3fc23ec011e168561a13f76bbd783	5 years ago
Adam Retter	6d58ea901d	Fix compilation under MSVC VS2015 (#6081 ) Summary: NOTE: this also needs to be back-ported to 6.4.6 and possibly older branches if further releases from them is envisaged. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6081 Differential Revision: D18710107 Pulled By: zhichao-cao fbshipit-source-id: 03260f9316566e2bfc12c7d702d6338bb7941e01	6 years ago
Maysam Yabandeh	0058daef7b	Disable SmallestUnCommittedSeq in Valgrind run (#6035 ) Summary: SmallestUnCommittedSeq sometimes takes too long when run under Valgrind. The patch disables it when the tests are run under Valgrind. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6035 Differential Revision: D18509198 Pulled By: maysamyabandeh fbshipit-source-id: 1191443b9fedb6b9c50d6b76f5c92371f5030230	6 years ago
Sergei Petrunia	230bcae7b6	Add a limited support for iteration bounds into BaseDeltaIterator (#5403 ) Summary: For MDEV-19670: MyRocks: key lookups into deleted data are very slow BaseDeltaIterator remembers iterate_upper_bound and will not let delta_iterator_ walk above the iterate_upper_bound if base_iterator_ is not valid anymore. == Rationale == The most straightforward way would be to make the delta_iterator (which is a rocksdb::WBWIIterator) to support iterator bounds. But checking for bounds has an extra CPU overhead. So we put the check into BaseDeltaIterator, and only make it when base_iterator_ is not valid. (note: We could take it even further, and move the check a few lines down, and only check iterator bounds ourselves if base_iterator_ is not valid AND delta_iterator_ hit a tombstone). Pull Request resolved: https://github.com/facebook/rocksdb/pull/5403 Differential Revision: D15863092 Pulled By: maysamyabandeh fbshipit-source-id: 8da458e7b9af95ff49356666f69664b4a6ccf49b	6 years ago
Maysam Yabandeh	52733b4498	WritePrepared: Fix flaky test MaxCatchupWithNewSnapshot (#5850 ) Summary: MaxCatchupWithNewSnapshot tests that the snapshot sequence number will be larger than the max sequence number when the snapshot was taken. However since the test does not have access to the max sequence number when the snapshot was taken, it uses max sequence number after that, which could have advanced the snapshot by then, thus making the test flaky. The fix is to compare with max sequence number before the snapshot was taken, which is a lower bound for the value when the snapshot was taken. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5850 Test Plan: ~/gtest-parallel/gtest-parallel --repeat=12800 ./write_prepared_transaction_test --gtest_filter="MaxCatchupWithNewSnapshot" Differential Revision: D17608926 Pulled By: maysamyabandeh fbshipit-source-id: b122ae5a27f982b290bd60da852e28d3c5eb0136	6 years ago
Peter Dillinger	ca7ccbe2ea	Misc hashing updates / upgrades (#5909 ) Summary: - Updated our included xxhash implementation to version 0.7.2 (== the latest dev version as of 2019-10-09). - Using XXH_NAMESPACE (like other fb projects) to avoid potential name collisions. - Added fastrange64, and unit tests for it and fastrange32. These are faster alternatives to hash % range. - Use preview version of XXH3 instead of MurmurHash64A for NPHash64 -- Had to update cache_test to increase probability of passing for any given hash function. - Use fastrange64 instead of % with uses of NPHash64 -- Had to fix WritePreparedTransactionTest.CommitOfDelayedPrepared to avoid deadlock apparently caused by new hash collision. - Set default seed for NPHash64 because specifying a seed rarely makes sense for it. - Removed unnecessary include xxhash.h in a popular .h file - Rename preview version of XXH3 to XXH3p for clarity and to ease backward compatibility in case final version of XXH3 is integrated. Relying on existing unit tests for NPHash64-related changes. Each new implementation of fastrange64 passed unit tests when manipulating my local build to select it. I haven't done any integration performance tests, but I consider the improved performance of the pieces being swapped in to be well established. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5909 Differential Revision: D18125196 Pulled By: pdillinger fbshipit-source-id: f6bf83d49d20cbb2549926adf454fd035f0ecc0d	6 years ago
jsteemann	da3b2840cb	save a few redundant container lookups (#5875 ) Summary: This PR eliminates repeated lookups in associative or ordered containers when a single lookup suffices. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5875 Differential Revision: D17753172 Pulled By: anand1976 fbshipit-source-id: 796b02b760082521d8c42a1cb65a76bf0e6c1b8e	6 years ago
sdong	e8263dbdaa	Apply formatter to recent 200+ commits. (#5830 ) Summary: Further apply formatter to more recent commits. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5830 Test Plan: Run all existing tests. Differential Revision: D17488031 fbshipit-source-id: 137458fd94d56dd271b8b40c522b03036943a2ab	6 years ago
sdong	c06b54d0c6	Apply formatter on recent 45 commits. (#5827 ) Summary: Some recent commits might not have passed through the formatter. I formatted recent 45 commits. The script hangs for more commits so I stopped there. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5827 Test Plan: Run all existing tests. Differential Revision: D17483727 fbshipit-source-id: af23113ee63015d8a43d89a3bc2c1056189afe8f	6 years ago
anand76	83a6a614e9	Refactor ArenaWrappedDBIter into separate files (#5801 ) Summary: Move definition and implementation for ArenaWrappedDBIter into its own .h/.cc files. Also, change inlining of functions to better comply with the Google C++ style guide. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5801 Test Plan: make check Differential Revision: D17371012 Pulled By: anand1976 fbshipit-source-id: c1361abc2851575111e357a63d88be3b3d6cb341	6 years ago
Shylock Hg	9eb3e1f77d	Use delete to disable automatic generated methods. (#5009 ) Summary: Use delete to disable automatic generated methods instead of private, and put the constructor together for more clear.This modification cause the unused field warning, so add unused attribute to disable this warning. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5009 Differential Revision: D17288733 fbshipit-source-id: 8a767ce096f185f1db01bd28fc88fef1cdd921f3	6 years ago
Wilfried Goesgens	fbab9913e2	upgrade gtest 1.7.0 => 1.8.1 for json result writing Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5332 Differential Revision: D17242232 fbshipit-source-id: c0d4646556a1335e51ac7382b986ca7f6ced7b64	6 years ago
Maysam Yabandeh	78b8cfc7ec	WriteUnPrepared: Split ReadYourOwnWriteStress to three (#5776 ) Summary: ReadYourOwnWriteStress occasionally times out on some platforms. The patch splits it to three. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5776 Differential Revision: D17231743 Pulled By: maysamyabandeh fbshipit-source-id: d42eeaf22f61a48d50f9c404d98b1081ae8dac94	6 years ago
Manuel Ung	2208cc0196	Fix build break in TransactionBaseImpl::TrackKey (#5771 ) Summary: Fix build broken in https://github.com/facebook/rocksdb/pull/5696. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5771 Differential Revision: D17217665 Pulled By: lth fbshipit-source-id: 7aa84a2a9b4feb7a3ab1cab174e09276430fe042	6 years ago
jsteemann	19e8c9b64f	use c++17's try_emplace if available (#5696 ) Summary: This avoids rehashing the key in TrackKey() in case the key is not already in the map of tracked keys, which will happen at least once per key used in a transaction. Additionally fix two typos. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5696 Differential Revision: D17210178 Pulled By: lth fbshipit-source-id: 7e2c28e9e505c1d1c1535d435250cf2b191a6fdf	6 years ago
Maysam Yabandeh	f9fb9f1421	Add a unit test to detect infinite loops with reseek optimizations (#5727 ) Summary: Iterators reseek to the target key after iterating over max_sequential_skip_in_iterations invalid values. The logic is susceptible to an infinite loop bug, which has been present with WritePrepared Transactions up until 6.2 release. Although the bug is not present on master, the patch adds a unit test to prevent it from resurfacing again. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5727 Differential Revision: D16952759 Pulled By: maysamyabandeh fbshipit-source-id: d0d973dddc8dfabd5a794931232aa4c862c74f51	6 years ago
Pratik Dhandharia	1b4c104a67	replace some reinterpret_cast with static_cast_with_check (#5740 ) Summary: This PR focuses on replacing some of the reinterpret_cast<DBImpl*> to static_cast_with_check<DBImpl, DB>. Files impacted: ./db/db_impl/db_impl_compaction_flush.cc ./db/write_batch.cc ./utilities/blob_db/blob_db_impl.cc ./utilities/transactions/pessimistic_transaction_db.cc ./utilities/transactions/transaction_base.cc ./utilities/transactions/write_prepared_txn_db.cc ./utilities/transactions/write_unprepared_txn_db.cc Pull Request resolved: https://github.com/facebook/rocksdb/pull/5740 Differential Revision: D17055691 Pulled By: pdhandharia fbshipit-source-id: 0f8034d1b32eade56e37d59c04b7bf236a81d8e8	6 years ago
Zhongyi Xie	2f41ecfe75	Refactor trimming logic for immutable memtables (#5022 ) Summary: MyRocks currently sets `max_write_buffer_number_to_maintain` in order to maintain enough history for transaction conflict checking. The effectiveness of this approach depends on the size of memtables. When memtables are small, it may not keep enough history; when memtables are large, this may consume too much memory. We are proposing a new way to configure memtable list history: by limiting the memory usage of immutable memtables. The new option is `max_write_buffer_size_to_maintain` and it will take precedence over the old `max_write_buffer_number_to_maintain` if they are both set to non-zero values. The new option accounts for the total memory usage of flushed immutable memtables and mutable memtable. When the total usage exceeds the limit, RocksDB may start dropping immutable memtables (which is also called trimming history), starting from the oldest one. The semantics of the old option actually works both as an upper bound and lower bound. History trimming will start if number of immutable memtables exceeds the limit, but it will never go below (limit-1) due to history trimming. In order the mimic the behavior with the new option, history trimming will stop if dropping the next immutable memtable causes the total memory usage go below the size limit. For example, assuming the size limit is set to 64MB, and there are 3 immutable memtables with sizes of 20, 30, 30. Although the total memory usage is 80MB > 64MB, dropping the oldest memtable will reduce the memory usage to 60MB < 64MB, so in this case no memtable will be dropped. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5022 Differential Revision: D14394062 Pulled By: miasantreble fbshipit-source-id: 60457a509c6af89d0993f988c9b5c2aa9e45f5c5	6 years ago
sdong	e0515607bc	Blacklist TransactionTest.GetWithoutSnapshot from valgrind_test (#5715 ) Summary: In valgrind_test, TransactionTest.GetWithoutSnapshot ran 2 hours and still didn't finish. Black list from valgrind_test to prevent timeout. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5715 Test Plan: run "make valgrind_test" and see whether the test is still generated. Differential Revision: D16866009 fbshipit-source-id: 92c78049b0bc1c2b9a0dfc1b7c8a9206b36f02f0	6 years ago
Manuel Ung	7785f61132	WriteUnPrepared: Fix bug in savepoints (#5703 ) Summary: Fix a bug in write unprepared savepoints. When flushing the write batch according to savepoint boundaries, we were forgetting to flush the last write batch after the last savepoint, meaning that some data was not written to DB. Also, add a small optimization where we avoid flushing empty batches. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5703 Differential Revision: D16811996 Pulled By: lth fbshipit-source-id: 600c7e0e520ad7a8fad32d77e11d932453e68e3f	6 years ago
Manuel Ung	4c70cb7306	WriteUnPrepared: support iterating while writing to transaction (#5699 ) Summary: In MyRocks, there are cases where we write while iterating through keys. This currently breaks WBWIIterator, because if a write batch flushes during iteration, the delta iterator would point to invalid memory. For now, fix by disallowing flush if there are active iterators. In the future, we will loop through all the iterators on a transaction, and refresh the iterators when a write batch is flushed. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5699 Differential Revision: D16794157 Pulled By: lth fbshipit-source-id: 5d5bf70688bd68fe58e8a766475ae88fd1be3190	6 years ago
Zhongyi Xie	90cd6c2bb1	Fix double deletion in transaction_test (#5700 ) Summary: Fix the following clang analyze failures: ``` In file included from utilities/transactions/transaction_test.cc:8: ./utilities/transactions/transaction_test.h:174:14: warning: Attempt to delete released memory delete root_db; ^ ``` The destructor of StackableDB already deletes the root db and there is no need to delete the db separately. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5700 Test Plan: USE_CLANG=1 TEST_TMPDIR=/dev/shm/rocksdb OPT=-g make -j24 analyze Differential Revision: D16800579 Pulled By: maysamyabandeh fbshipit-source-id: 64c2d70f23e07e6a15242add97c744902ea33be5	6 years ago
Manuel Ung	8a678a50ba	WriteUnPrepared: Relax restriction on iterators and writes with no snapshot (#5697 ) Summary: Currently, if a write is done without a snapshot, then `largest_validated_seq_` is set to `kMaxSequenceNumber`. This is too aggressive, because an iterator with a snapshot created after this write should be valid. Set `largest_validated_seq_` to `GetLastPublishedSequence` instead. The variable means that no keys in the current tracked key set has changed by other transactions since `largest_validated_seq_`. Also, do some extra cleanup in Clear() for safety. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5697 Differential Revision: D16788613 Pulled By: lth fbshipit-source-id: f2aa40b8b12e0c0cf9e38c940fecc8f1cc0d2385	6 years ago
Maysam Yabandeh	64855979ae	WriteUnPrepared: Pass snap_released to the callback (#5691 ) Summary: With changes made in https://github.com/facebook/rocksdb/pull/5664 we meant to pass snap_released parameter of ::IsInSnapshot from the read callbacks. Although the variable was defined, passing it to the callback in WritePreparedTxnReadCallback was missing, which is fixed in this PR. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5691 Differential Revision: D16767310 Pulled By: maysamyabandeh fbshipit-source-id: 3bf53f5964a2756a66ceef7c8f6b3ac75f102f48	6 years ago
Manuel Ung	6f0f82de87	WriteUnPrepared: increase test coverage in transaction_test (#5658 ) Summary: The changes transaction_test to set `txn_db_options.default_write_batch_flush_threshold = 1` in order to give better test coverage for WriteUnprepared. As part of the change, some tests had to be updated. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5658 Differential Revision: D16740468 Pulled By: lth fbshipit-source-id: 3821eec20baf13917c8c1fab444332f75a509de9	6 years ago
Maysam Yabandeh	12eaacb71d	WritePrepared: Fix SmallestUnCommittedSeq bug (#5683 ) Summary: SmallestUnCommittedSeq reads two data structures, prepared_txns_ and delayed_prepared_. These two are updated in CheckPreparedAgainstMax when max_evicted_seq_ advances some prepared entires. To avoid the cost of acquiring a mutex, the read from them in SmallestUnCommittedSeq is not atomic. This creates a potential race condition. The fix is to read the two data structures in the reverse order of their update. CheckPreparedAgainstMax copies the prepared entry to delayed_prepared_ before removing it from prepared_txns_ and SmallestUnCommittedSeq looks into prepared_txns_ before reading delayed_prepared_. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5683 Differential Revision: D16744699 Pulled By: maysamyabandeh fbshipit-source-id: b1bdb134018beb0b9de58827f512662bea35cad0	6 years ago
Vijay Nadimpalli	d150e01474	New API to get all merge operands for a Key (#5604 ) Summary: This is a new API added to db.h to allow for fetching all merge operands associated with a Key. The main motivation for this API is to support use cases where doing a full online merge is not necessary as it is performance sensitive. Example use-cases: 1. Update subset of columns and read subset of columns - Imagine a SQL Table, a row is encoded as a K/V pair (as it is done in MyRocks). If there are many columns and users only updated one of them, we can use merge operator to reduce write amplification. While users only read one or two columns in the read query, this feature can avoid a full merging of the whole row, and save some CPU. 2. Updating very few attributes in a value which is a JSON-like document - Updating one attribute can be done efficiently using merge operator, while reading back one attribute can be done more efficiently if we don't need to do a full merge. ---------------------------------------------------------------------------------------------------- API : Status GetMergeOperands( const ReadOptions& options, ColumnFamilyHandle* column_family, const Slice& key, PinnableSlice* merge_operands, GetMergeOperandsOptions* get_merge_operands_options, int* number_of_operands) Example usage : int size = 100; int number_of_operands = 0; std::vector<PinnableSlice> values(size); GetMergeOperandsOptions merge_operands_info; db_->GetMergeOperands(ReadOptions(), db_->DefaultColumnFamily(), "k1", values.data(), merge_operands_info, &number_of_operands); Description : Returns all the merge operands corresponding to the key. If the number of merge operands in DB is greater than merge_operands_options.expected_max_number_of_operands no merge operands are returned and status is Incomplete. Merge operands returned are in the order of insertion. merge_operands-> Points to an array of at-least merge_operands_options.expected_max_number_of_operands and the caller is responsible for allocating it. If the status returned is Incomplete then number_of_operands will contain the total number of merge operands found in DB for key. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5604 Test Plan: Added unit test and perf test in db_bench that can be run using the command: ./db_bench -benchmarks=getmergeoperands --merge_operator=sortlist Differential Revision: D16657366 Pulled By: vjnadimpalli fbshipit-source-id: 0faadd752351745224ee12d4ae9ef3cb529951bf	6 years ago
Maysam Yabandeh	208556ee13	WritePrepared: fix Get without snapshot (#5664 ) Summary: if read_options.snapshot is not set, ::Get will take the last sequence number after taking a super-version and uses that as the sequence number. Theoretically max_eviceted_seq_ could advance this sequence number. This could lead ::IsInSnapshot that will be invoked by the ReadCallback to notice the absence of the snapshot. In this case, the ReadCallback should have passed a non-value to snap_released so that it could be set by the ::IsInSnapshot. The patch does that, and adds a unit test to verify it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5664 Differential Revision: D16614033 Pulled By: maysamyabandeh fbshipit-source-id: 06fb3fd4aacd75806ed1a1acec7961f5d02486f2	6 years ago
Maysam Yabandeh	e579e32eaa	Disable ReadYourOwnWriteStress when run under Valgrind (#5671 ) Summary: It sometimes times out when run under valgrind taking around 20m. The patch skips the test under Valgrind. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5671 Differential Revision: D16652382 Pulled By: maysamyabandeh fbshipit-source-id: 0f6f4f76d37337d56226b689e01b14523dd07aae	6 years ago
Manuel Ung	f622ca2c7c	WriteUnPrepared: savepoint support (#5627 ) Summary: Add savepoint support when the current transaction has flushed unprepared batches. Rolling back to savepoint is similar to rolling back a transaction. It requires the set of keys that have changed since the savepoint, re-reading the keys at the snapshot at that savepoint, and the restoring the old keys by writing out another unprepared batch. For this strategy to work though, we must be capable of reading keys at a savepoint. This does not work if keys were written out using the same sequence number before and after a savepoint. Therefore, when we flush out unprepared batches, we must split the batch by savepoint if any savepoints exist. eg. If we have the following: ``` Put(A) Put(B) Put(C) SetSavePoint() Put(D) Put(E) SetSavePoint() Put(F) ``` Then we will write out 3 separate unprepared batches: ``` Put(A) 1 Put(B) 1 Put(C) 1 Put(D) 2 Put(E) 2 Put(F) 3 ``` This is so that when we rollback to eg. the first savepoint, we can just read keys at snapshot_seq = 1. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5627 Differential Revision: D16584130 Pulled By: lth fbshipit-source-id: 6d100dd548fb20c4b76661bd0f8a2647e64477fa	6 years ago
Manuel Ung	d599135a03	WriteUnPrepared: use WriteUnpreparedTxnReadCallback for ValidateSnapshot (#5657 ) Summary: In DeferSnapshotSavePointTest, writes were failing with snapshot validation error because the key with the latest sequence number was an unprepared key from the current transaction. Fix this by passing down the correct read callback. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5657 Differential Revision: D16582466 Pulled By: lth fbshipit-source-id: 11645dac0e7c1374d917ef5fdf757d13c1d1108d	6 years ago
Manuel Ung	399f477818	WriteUnPrepared: Use WriteUnpreparedTxnReadCallback for MultiGet (#5634 ) Summary: The `TransactionTest.MultiGetBatchedTest` were failing with unprepared batches because we were not using the correct callbacks. Override MultiGet to pass down the correct ReadCallback. A similar problem is also fixed in WritePrepared. This PR also fixes an issue similar to (https://github.com/facebook/rocksdb/pull/5147), but for MultiGet instead of Get. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5634 Differential Revision: D16552674 Pulled By: lth fbshipit-source-id: 736eaf8e919c6b13d5f5655b1c0d36b57ad04804	6 years ago
Manuel Ung	80d7067cb2	Use int64_t instead of ssize_t (#5638 ) Summary: The ssize_t type was introduced in https://github.com/facebook/rocksdb/pull/5633, but it seems like it's a POSIX specific type. I just need a signed type to represent number of bytes, so use int64_t instead. It seems like we have a typedef from SSIZE_T for Windows, but it doesn't seem like we ever include "port/port.h" in our public header files. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5638 Differential Revision: D16526269 Pulled By: lth fbshipit-source-id: 8d3a5c41003951b74b29bc5f1d949b2b22da0cee	6 years ago
Manuel Ung	41df734830	WriteUnPrepared: Add new variable write_batch_flush_threshold (#5633 ) Summary: Instead of reusing `TransactionOptions::max_write_batch_size` for determining when to flush a write batch for write unprepared, add a new variable called `write_batch_flush_threshold` for this use case instead. Also add `TransactionDBOptions::default_write_batch_flush_threshold` which sets the default value if `TransactionOptions::write_batch_flush_threshold` is unspecified. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5633 Differential Revision: D16520364 Pulled By: lth fbshipit-source-id: d75ae5a2141ce7708982d5069dc3f0b58d250e8c	6 years ago
Manuel Ung	230b909da8	Fix PopSavePoint to merge info into the previous savepoint (#5628 ) Summary: Transaction::RollbackToSavePoint undos the modification made since the SavePoint beginning, and also unlocks the corresponding keys, which are tracked in the last SavePoint. Currently ::PopSavePoint simply discard these tracked keys, leaving them locked in the lock manager. This breaks a subsequent ::RollbackToSavePoint behavior as it loses track of such keys, and thus cannot unlock them. The patch fixes ::PopSavePoint by passing on the track key information to the previous SavePoint. Fixes https://github.com/facebook/rocksdb/issues/5618 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5628 Differential Revision: D16505325 Pulled By: lth fbshipit-source-id: 2bc3b30963ab4d36d996d1f66543c93abf358980	6 years ago
Manuel Ung	66b524a911	Simplify WriteUnpreparedTxnReadCallback and fix some comments (#5621 ) Summary: Simplify WriteUnpreparedTxnReadCallback so we just have one function `CalcMaxVisibleSeq`. Also, there's no need for the read callback to hold onto the transaction any more, so just hold the set of unprep_seqs, reducing about of indirection in `IsVisibleFullCheck`. Also, some comments about using transaction snapshot were out of date, so remove them. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5621 Differential Revision: D16459883 Pulled By: lth fbshipit-source-id: cd581323fd18982e817d99af57b6eaba59e599bb	6 years ago
Manuel Ung	eae832740b	WriteUnPrepared: improve read your own write functionality (#5573 ) Summary: There are a number of fixes in this PR (with most bugs found via the added stress tests): 1. Re-enable reseek optimization. This was initially disabled to avoid infinite loops in https://github.com/facebook/rocksdb/pull/3955 but this can be resolved by remembering not to reseek after a reseek has already been done. This problem only affects forward iteration in `DBIter::FindNextUserEntryInternal`, as we already disable reseeking in `DBIter::FindValueForCurrentKeyUsingSeek`. 2. Verify that ReadOption.snapshot can be safely used for iterator creation. Some snapshots would not give correct results because snaphsot validation would not be enforced, breaking some assumptions in Prev() iteration. 3. In the non-snapshot Get() case, reads done at `LastPublishedSequence` may not be enough, because unprepared sequence numbers are not published. Use `std::max(published_seq, max_visible_seq)` to do lookups instead. 4. Add stress test to test reading own writes. 5. Minor bug in the allow_concurrent_memtable_write case where we forgot to pass in batch_per_txn_. 6. Minor performance optimization in `CalcMaxUnpreparedSequenceNumber` by assigning by reference instead of value. 7. Add some more comments everywhere. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5573 Differential Revision: D16276089 Pulled By: lth fbshipit-source-id: 18029c944eb427a90a87dee76ac1b23f37ec1ccb	6 years ago

1 2 3 4 5 ...

372 Commits (aeb913dd01b0dc4ccdf5f9e05316802a94c1d6f1)