WritePrepared: Clarify the need for two_write_queues in unordered_write (#5313)

Summary:
WritePrepared transactions when configured with two_write_queues=true offers higher throughput with unordered_write feature without however compromising the rocksdb guarantees. This is because it performs ordering among writes in a 2nd step that is not tied to memtable write speed. The 2nd step is naturally provided by 2PC when the commit phase does the ordering as well. Without 2PC, the 2nd step would only be provided when we use two_write_queues=true, where WritePrepared after performing the writes, in a 2nd step uses the 2nd queue to assign order to the writes.
The patch clarifies the need for two_write_queues=true in the HISTORY and inline comments of unordered_writes. Moreover it extends the stress tests of WritePrepared to unordred_write.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5313

Differential Revision: D15379977

Pulled By: maysamyabandeh

fbshipit-source-id: 5b6f05b9b59285dcbf3b0532215ba9fe7d926e00
main
Maysam Yabandeh 6 years ago committed by Facebook Github Bot
parent fb4c6a31ce
commit 5c0e304170
  1. 2
      HISTORY.md
  2. 5
      db/db_impl_write.cc
  3. 5
      include/rocksdb/options.h
  4. 6
      utilities/transactions/pessimistic_transaction_db.cc
  5. 6
      utilities/transactions/transaction_test.cc
  6. 38
      utilities/transactions/write_prepared_transaction_test.cc

@ -5,7 +5,7 @@
### New Features ### New Features
* Add an option `snap_refresh_nanos` (default to 0.1s) to periodically refresh the snapshot list in compaction jobs. Assign to 0 to disable the feature. * Add an option `snap_refresh_nanos` (default to 0.1s) to periodically refresh the snapshot list in compaction jobs. Assign to 0 to disable the feature.
* Add an option `unordered_write` which trades snapshot guarantees with higher write throughput. When used with WRITE_PREPARED transactions, it offers higher throughput with however no compromise on guarantees. * Add an option `unordered_write` which trades snapshot guarantees with higher write throughput. When used with WRITE_PREPARED transactions with two_write_queues=true, it offers higher throughput with however no compromise on guarantees.
* Allow DBImplSecondary to remove memtables with obsolete data after replaying MANIFEST and WAL. * Allow DBImplSecondary to remove memtables with obsolete data after replaying MANIFEST and WAL.
### Performance Improvements ### Performance Improvements

@ -605,6 +605,11 @@ Status DBImpl::UnorderedWriteMemtable(const WriteOptions& write_options,
size_t pending_cnt = pending_memtable_writes_.fetch_sub(1) - 1; size_t pending_cnt = pending_memtable_writes_.fetch_sub(1) - 1;
if (pending_cnt == 0) { if (pending_cnt == 0) {
// switch_cv_ waits until pending_memtable_writes_ = 0. Locking its mutex
// before notify ensures that cv is in waiting state when it is notified
// thus not missing the update to pending_memtable_writes_ even though it is
// not modified under the mutex.
std::lock_guard<std::mutex> lck(switch_mutex_);
switch_cv_.notify_all(); switch_cv_.notify_all();
} }

@ -899,8 +899,9 @@ struct DBOptions {
// ::MultiGet and Iterator's consistent-point-in-time view property. // ::MultiGet and Iterator's consistent-point-in-time view property.
// If the application cannot tolerate the relaxed guarantees, it can implement // If the application cannot tolerate the relaxed guarantees, it can implement
// its own mechanisms to work around that and yet benefit from the higher // its own mechanisms to work around that and yet benefit from the higher
// throughput. Using TransactionDB with WRITE_PREPARED write policy is one way // throughput. Using TransactionDB with WRITE_PREPARED write policy and
// to achieve immutable snapshots despite unordered_write. // two_write_queues=true is one way to achieve immutable snapshots despite
// unordered_write.
// //
// By default, i.e., when it is false, rocksdb does not advance the sequence // By default, i.e., when it is false, rocksdb does not advance the sequence
// number for new snapshots unless all the writes with lower sequence numbers // number for new snapshots unless all the writes with lower sequence numbers

@ -232,6 +232,12 @@ Status TransactionDB::Open(
return Status::NotSupported( return Status::NotSupported(
"WRITE_UNPREPARED is currently incompatible with unordered_writes"); "WRITE_UNPREPARED is currently incompatible with unordered_writes");
} }
if (txn_db_options.write_policy == WRITE_PREPARED &&
db_options.unordered_write && !db_options.two_write_queues) {
return Status::NotSupported(
"WRITE_UNPREPARED is incompatible with unordered_writes if "
"two_write_queues is not enabled.");
}
std::vector<ColumnFamilyDescriptor> column_families_copy = column_families; std::vector<ColumnFamilyDescriptor> column_families_copy = column_families;
std::vector<size_t> compaction_enabled_cf_indices; std::vector<size_t> compaction_enabled_cf_indices;

@ -47,7 +47,6 @@ INSTANTIATE_TEST_CASE_P(
std::make_tuple(false, true, WRITE_COMMITTED, kOrderedWrite), std::make_tuple(false, true, WRITE_COMMITTED, kOrderedWrite),
std::make_tuple(false, false, WRITE_PREPARED, kOrderedWrite), std::make_tuple(false, false, WRITE_PREPARED, kOrderedWrite),
std::make_tuple(false, true, WRITE_PREPARED, kOrderedWrite), std::make_tuple(false, true, WRITE_PREPARED, kOrderedWrite),
std::make_tuple(false, false, WRITE_PREPARED, kUnorderedWrite),
std::make_tuple(false, true, WRITE_PREPARED, kUnorderedWrite), std::make_tuple(false, true, WRITE_PREPARED, kUnorderedWrite),
std::make_tuple(false, false, WRITE_UNPREPARED, kOrderedWrite), std::make_tuple(false, false, WRITE_UNPREPARED, kOrderedWrite),
std::make_tuple(false, true, WRITE_UNPREPARED, kOrderedWrite))); std::make_tuple(false, true, WRITE_UNPREPARED, kOrderedWrite)));
@ -58,7 +57,6 @@ INSTANTIATE_TEST_CASE_P(
std::make_tuple(false, true, WRITE_COMMITTED, kOrderedWrite), std::make_tuple(false, true, WRITE_COMMITTED, kOrderedWrite),
std::make_tuple(false, false, WRITE_PREPARED, kOrderedWrite), std::make_tuple(false, false, WRITE_PREPARED, kOrderedWrite),
std::make_tuple(false, true, WRITE_PREPARED, kOrderedWrite), std::make_tuple(false, true, WRITE_PREPARED, kOrderedWrite),
std::make_tuple(false, false, WRITE_PREPARED, kUnorderedWrite),
std::make_tuple(false, true, WRITE_PREPARED, kUnorderedWrite), std::make_tuple(false, true, WRITE_PREPARED, kUnorderedWrite),
std::make_tuple(false, false, WRITE_UNPREPARED, kOrderedWrite), std::make_tuple(false, false, WRITE_UNPREPARED, kOrderedWrite),
std::make_tuple(false, true, WRITE_UNPREPARED, kOrderedWrite))); std::make_tuple(false, true, WRITE_UNPREPARED, kOrderedWrite)));
@ -79,7 +77,9 @@ INSTANTIATE_TEST_CASE_P(
std::make_tuple(false, false, WRITE_PREPARED, kOrderedWrite, false), std::make_tuple(false, false, WRITE_PREPARED, kOrderedWrite, false),
std::make_tuple(false, false, WRITE_PREPARED, kOrderedWrite, true), std::make_tuple(false, false, WRITE_PREPARED, kOrderedWrite, true),
std::make_tuple(false, true, WRITE_PREPARED, kOrderedWrite, false), std::make_tuple(false, true, WRITE_PREPARED, kOrderedWrite, false),
std::make_tuple(false, true, WRITE_PREPARED, kOrderedWrite, true))); std::make_tuple(false, true, WRITE_PREPARED, kOrderedWrite, true),
std::make_tuple(false, true, WRITE_PREPARED, kUnorderedWrite, false),
std::make_tuple(false, true, WRITE_PREPARED, kUnorderedWrite, true)));
#endif // ROCKSDB_VALGRIND_RUN #endif // ROCKSDB_VALGRIND_RUN
TEST_P(TransactionTest, DoubleEmptyWrite) { TEST_P(TransactionTest, DoubleEmptyWrite) {

@ -573,7 +573,6 @@ INSTANTIATE_TEST_CASE_P(
::testing::Values( ::testing::Values(
std::make_tuple(false, false, WRITE_PREPARED, kOrderedWrite), std::make_tuple(false, false, WRITE_PREPARED, kOrderedWrite),
std::make_tuple(false, true, WRITE_PREPARED, kOrderedWrite), std::make_tuple(false, true, WRITE_PREPARED, kOrderedWrite),
std::make_tuple(false, false, WRITE_PREPARED, kUnorderedWrite),
std::make_tuple(false, true, WRITE_PREPARED, kUnorderedWrite))); std::make_tuple(false, true, WRITE_PREPARED, kUnorderedWrite)));
#ifndef ROCKSDB_VALGRIND_RUN #ifndef ROCKSDB_VALGRIND_RUN
@ -644,29 +643,7 @@ INSTANTIATE_TEST_CASE_P(
std::make_tuple(false, false, WRITE_PREPARED, kOrderedWrite, 16, 20), std::make_tuple(false, false, WRITE_PREPARED, kOrderedWrite, 16, 20),
std::make_tuple(false, false, WRITE_PREPARED, kOrderedWrite, 17, 20), std::make_tuple(false, false, WRITE_PREPARED, kOrderedWrite, 17, 20),
std::make_tuple(false, false, WRITE_PREPARED, kOrderedWrite, 18, 20), std::make_tuple(false, false, WRITE_PREPARED, kOrderedWrite, 18, 20),
std::make_tuple(false, false, WRITE_PREPARED, kOrderedWrite, 19, 20), std::make_tuple(false, false, WRITE_PREPARED, kOrderedWrite, 19, 20)));
std::make_tuple(false, false, WRITE_PREPARED, kUnorderedWrite, 0, 20),
std::make_tuple(false, false, WRITE_PREPARED, kUnorderedWrite, 1, 20),
std::make_tuple(false, false, WRITE_PREPARED, kUnorderedWrite, 2, 20),
std::make_tuple(false, false, WRITE_PREPARED, kUnorderedWrite, 3, 20),
std::make_tuple(false, false, WRITE_PREPARED, kUnorderedWrite, 4, 20),
std::make_tuple(false, false, WRITE_PREPARED, kUnorderedWrite, 5, 20),
std::make_tuple(false, false, WRITE_PREPARED, kUnorderedWrite, 6, 20),
std::make_tuple(false, false, WRITE_PREPARED, kUnorderedWrite, 7, 20),
std::make_tuple(false, false, WRITE_PREPARED, kUnorderedWrite, 8, 20),
std::make_tuple(false, false, WRITE_PREPARED, kUnorderedWrite, 9, 20),
std::make_tuple(false, false, WRITE_PREPARED, kUnorderedWrite, 10, 20),
std::make_tuple(false, false, WRITE_PREPARED, kUnorderedWrite, 11, 20),
std::make_tuple(false, false, WRITE_PREPARED, kUnorderedWrite, 12, 20),
std::make_tuple(false, false, WRITE_PREPARED, kUnorderedWrite, 13, 20),
std::make_tuple(false, false, WRITE_PREPARED, kUnorderedWrite, 14, 20),
std::make_tuple(false, false, WRITE_PREPARED, kUnorderedWrite, 15, 20),
std::make_tuple(false, false, WRITE_PREPARED, kUnorderedWrite, 16, 20),
std::make_tuple(false, false, WRITE_PREPARED, kUnorderedWrite, 17, 20),
std::make_tuple(false, false, WRITE_PREPARED, kUnorderedWrite, 18, 20),
std::make_tuple(false, false, WRITE_PREPARED, kUnorderedWrite, 19,
20)));
INSTANTIATE_TEST_CASE_P( INSTANTIATE_TEST_CASE_P(
TwoWriteQueues, SeqAdvanceConcurrentTest, TwoWriteQueues, SeqAdvanceConcurrentTest,
@ -704,18 +681,7 @@ INSTANTIATE_TEST_CASE_P(
std::make_tuple(false, false, WRITE_PREPARED, kOrderedWrite, 6, 10), std::make_tuple(false, false, WRITE_PREPARED, kOrderedWrite, 6, 10),
std::make_tuple(false, false, WRITE_PREPARED, kOrderedWrite, 7, 10), std::make_tuple(false, false, WRITE_PREPARED, kOrderedWrite, 7, 10),
std::make_tuple(false, false, WRITE_PREPARED, kOrderedWrite, 8, 10), std::make_tuple(false, false, WRITE_PREPARED, kOrderedWrite, 8, 10),
std::make_tuple(false, false, WRITE_PREPARED, kOrderedWrite, 9, 10), std::make_tuple(false, false, WRITE_PREPARED, kOrderedWrite, 9, 10)));
std::make_tuple(false, false, WRITE_PREPARED, kUnorderedWrite, 0, 10),
std::make_tuple(false, false, WRITE_PREPARED, kUnorderedWrite, 1, 10),
std::make_tuple(false, false, WRITE_PREPARED, kUnorderedWrite, 2, 10),
std::make_tuple(false, false, WRITE_PREPARED, kUnorderedWrite, 3, 10),
std::make_tuple(false, false, WRITE_PREPARED, kUnorderedWrite, 4, 10),
std::make_tuple(false, false, WRITE_PREPARED, kUnorderedWrite, 5, 10),
std::make_tuple(false, false, WRITE_PREPARED, kUnorderedWrite, 6, 10),
std::make_tuple(false, false, WRITE_PREPARED, kUnorderedWrite, 7, 10),
std::make_tuple(false, false, WRITE_PREPARED, kUnorderedWrite, 8, 10),
std::make_tuple(false, false, WRITE_PREPARED, kUnorderedWrite, 9, 10)));
#endif // ROCKSDB_VALGRIND_RUN #endif // ROCKSDB_VALGRIND_RUN
TEST_P(WritePreparedTransactionTest, CommitMapTest) { TEST_P(WritePreparedTransactionTest, CommitMapTest) {

Loading…
Cancel
Save