Summary:
The WriteBatch::Iterate will try with a larger sequence number if the memtable reports a duplicate. This status is specified with TryAgain status. So far the assumption was that the last entry in the batch will never return TryAgain, which is correct when WAL is created via WritePrepared since it always appends a batch separator if a natural one does not exist. However when reading a WAL generated by WriteCommitted this batch separator might not exist. Although WritePrepared is not supposed to be able to read the WAL generated by WriteCommitted we should avoid confusing scenarios in which the behavior becomes unpredictable. The path fixes that by allowing TryAgain even for the last entry of the write batch.
Closes https://github.com/facebook/rocksdb/pull/3747
Differential Revision: D7708391
Pulled By: maysamyabandeh
fbshipit-source-id: bfaddaa9b14a4cdaff6977f6f63c789a6ab1ee0d
Summary:
This is a hack as temporary fix of MyRocks with rollbacking the merge operands. The way MyRocks uses merge operands is without protection of locks, which violates the assumption behind the rollback algorithm. They are ok with not being rolled back as it would just create a gap in the autoincrement column. The hack add an option to disable the rollback of merge operands by default and only enables it to let the unit test pass.
Closes https://github.com/facebook/rocksdb/pull/3711
Differential Revision: D7597177
Pulled By: maysamyabandeh
fbshipit-source-id: 544be0f666c7e7abb7f651ec8b23124e05056728
Summary:
Fix the following bugs:
- During recovery a duplicate key was inserted twice into the write batch of the recovery transaction,
once when the memtable returns false (because it was duplicates) and once for the 2nd attempt. This would result into different SubBatch count measured when the recovered transactions is committing.
- If a cf is flushed during recovery the memtable is not available to assist in detecting the duplicate key. This could result into not advancing the sequence number when iterating over duplicate keys of a flushed cf and hence inserting the next key with the wrong sequence number.
- SubBacthCounter would reset the comparator to default comparator after the first duplicate key. The 2nd duplicate key hence would have gone through a wrong comparator and not being detected.
Closes https://github.com/facebook/rocksdb/pull/3562
Differential Revision: D7149440
Pulled By: maysamyabandeh
fbshipit-source-id: 91ec317b165f363f5d11ff8b8c47c81cebb8ed77
Summary:
TransactionDB::Write can receive some optimization hints from the user. One is to skip the concurrency control mechanism. WritePreparedTxnDB is currently ignoring such hints. This patch optimizes WritePreparedTxnDB::Write for skip_concurrency_control and skip_duplicate_key_check hints.
Closes https://github.com/facebook/rocksdb/pull/3496
Differential Revision: D6971784
Pulled By: maysamyabandeh
fbshipit-source-id: cbab10ad538fa2b8bcb47e37c77724afe6e30f03
Summary:
Without this patch, ubsan_check is currently failing with this error:
```
utilities/transactions/write_prepared_transaction_test.cc:369:63: runtime error: member call on address 0x0000051649f8 which does not point to an object of type 'WithParamInterface'
0x0000051649f8: note: object has invalid vptr
```
Tested by `COMPILE_WITH_UBSAN=1 make -j32 transaction_test` and running `./write_prepared_transaction_test --gtest_filter=TwoWriteQueues/SnapshotConcurrentAccessTest.SnapshotConcurrentAccessTest1/0`
Closes https://github.com/facebook/rocksdb/pull/3444
Differential Revision: D6850087
Pulled By: maysamyabandeh
fbshipit-source-id: 5b254da8504b8757f7aec8a820ad464154da1a1d
Summary:
SnapshotConcurrentAccessTest sometimes times out when running on the test infra. This patch splits the test into smaller sub-tests to avoid the timeout. It also benefits from lower run-time of each sub-test and increases the coverage of the test. The overall run-time of each final sub-test is at most half of the original test so we should no longer see a timeout.
Closes https://github.com/facebook/rocksdb/pull/3435
Differential Revision: D6839427
Pulled By: maysamyabandeh
fbshipit-source-id: d53fdb157109e2438ca7fe447d0cf4b71f304bd8
Summary:
DestroyDB that is used in tests loops over the files returned by ::GetChildren and delete them one by one. Such files might be already deleted in the file system (during DeleteObsoleteFileImpl for example) but will get actually deleted with a delay sometimes before ::DeleteFile is called on the file name. We have some test failures where FaultInjectionTestEnv::DeleteFile fails on assert(s.ok()) during DestroyDB. This patch removes the assert statement to fix that.
Closes https://github.com/facebook/rocksdb/pull/3324
Differential Revision: D6659545
Pulled By: maysamyabandeh
fbshipit-source-id: 4c9552fbcd494dcf3e61d475c11fc965c4388b2c
Summary:
Currently non-2pc writes do the 2nd dummy write to actually commit the transaction. This was necessary to ensure that publishing the commit sequence number will be done only from one queue (the queue that does not write to memtable). This is however not necessary when we have only one write queue, which is actually the setup that would be used by non-2pc writes. This patch eliminates the 2nd write when two_write_queues are disabled by updating the commit map in the 1st write.
Closes https://github.com/facebook/rocksdb/pull/3277
Differential Revision: D6575392
Pulled By: maysamyabandeh
fbshipit-source-id: 8ab458f7ca506905962f9166026b2ec81e749c46
Summary:
Add PreReleaseCallback to be called at the end of WriteImpl but before publishing the sequence number. The callback is used in WritePrepareTxn to i) update the commit map, ii) update the last published sequence number in the 2nd write queue. It also ensures that all the commits will go to the 2nd queue.
These changes will ensure that the commit map is updated before the sequence number is published and used by reading snapshots. If we use two write queues, the snapshots will use the seq number published by the 2nd queue. If we use one write queue (the default, the snapshots will use the last seq number in the memtable, which also indicates the last published seq number.
Closes https://github.com/facebook/rocksdb/pull/3205
Differential Revision: D6438959
Pulled By: maysamyabandeh
fbshipit-source-id: f8b6c434e94bc5f5ab9cb696879d4c23e2577ab9
Summary:
The sequence number was not properly advanced after a rollback marker. The patch extends the existing unit tests to detect the bug and also fixes it.
Closes https://github.com/facebook/rocksdb/pull/3157
Differential Revision: D6304291
Pulled By: maysamyabandeh
fbshipit-source-id: 1b519c44a5371b802da49c9e32bd00087a8da401
Summary:
Add tests to ensure that WritePrepared and WriteCommitted policies are cross compatible when the db WAL is empty. This is important when the admin want to switch between the policies. In such case, before the switch the admin needs to empty the WAL by i) committing/rollbacking all the pending transactions, ii) FlushMemTables
Closes https://github.com/facebook/rocksdb/pull/3118
Differential Revision: D6227247
Pulled By: maysamyabandeh
fbshipit-source-id: bcde3d92c1e89cda3b9cfa69f6a20af5d8993db7
Summary:
Enable concurrent_prepare flag for WritePrepared transactions and extend the existing transaction tests with this config.
Closes https://github.com/facebook/rocksdb/pull/3046
Differential Revision: D6106534
Pulled By: maysamyabandeh
fbshipit-source-id: 88c8d21d45bc492beb0a131caea84a2ac5e7d38c
Summary:
Recover txns from the WAL. Also added some unit tests.
Closes https://github.com/facebook/rocksdb/pull/2901
Differential Revision: D5859596
Pulled By: maysamyabandeh
fbshipit-source-id: 6424967b231388093b4effffe0a3b1b7ec8caeb0