Revise APIs related to user-defined timestamp (#8946)

Summary:
ajkr reminded me that we have a rule of not including per-kv related data in `WriteOptions`.
Namely, `WriteOptions` should not include information about "what-to-write", but should just
include information about "how-to-write".

According to this rule, `WriteOptions::timestamp` (experimental) is clearly a violation. Therefore,
this PR removes `WriteOptions::timestamp` for compliance.
After the removal, we need to pass timestamp info via another set of APIs. This PR proposes a set
of overloaded functions `Put(write_opts, key, value, ts)`, `Delete(write_opts, key, ts)`, and
`SingleDelete(write_opts, key, ts)`. Planned to add `Write(write_opts, batch, ts)`, but its complexity
made me reconsider doing it in another PR (maybe).

For better checking and returning error early, we also add a new set of APIs to `WriteBatch` that take
extra `timestamp` information when writing to `WriteBatch`es.
These set of APIs in `WriteBatchWithIndex` are currently not supported, and are on our TODO list.

Removed `WriteBatch::AssignTimestamps()` and renamed `WriteBatch::AssignTimestamp()` to
`WriteBatch::UpdateTimestamps()` since this method require that all keys have space for timestamps
allocated already and multiple timestamps can be updated.

The constructor of `WriteBatch` now takes a fourth argument `default_cf_ts_sz` which is the timestamp
size of the default column family. This will be used to allocate space when calling APIs that do not
specify a column family handle.

Also, updated `DB::Get()`, `DB::MultiGet()`, `DB::NewIterator()`, `DB::NewIterators()` methods, replacing
some assertions about timestamp to returning Status code.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8946

Test Plan:
make check
./db_bench -benchmarks=fillseq,fillrandom,readrandom,readseq,deleterandom -user_timestamp_size=8
./db_stress --user_timestamp_size=8 -nooverwritepercent=0 -test_secondary=0 -secondary_catch_up_one_in=0 -continuous_verification_interval=0

Make sure there is no perf regression by running the following
```
./db_bench_opt -db=/dev/shm/rocksdb -use_existing_db=0 -level0_stop_writes_trigger=256 -level0_slowdown_writes_trigger=256 -level0_file_num_compaction_trigger=256 -disable_wal=1 -duration=10 -benchmarks=fillrandom
```

Before this PR
```
DB path: [/dev/shm/rocksdb]
fillrandom   :       1.831 micros/op 546235 ops/sec;   60.4 MB/s
```
After this PR
```
DB path: [/dev/shm/rocksdb]
fillrandom   :       1.820 micros/op 549404 ops/sec;   60.8 MB/s
```

Reviewed By: ltamasi

Differential Revision: D33721359

Pulled By: riversand963

fbshipit-source-id: c131561534272c120ffb80711d42748d21badf09
main
Yanqin Jin 3 years ago committed by Facebook GitHub Bot
parent 920386f2b7
commit 3122cb4358
  1. 1
      HISTORY.md
  2. 11
      db/db_impl/compacted_db_impl.cc
  3. 129
      db/db_impl/db_impl.cc
  4. 81
      db/db_impl/db_impl.h
  5. 31
      db/db_impl/db_impl_readonly.cc
  6. 31
      db/db_impl/db_impl_secondary.cc
  7. 187
      db/db_impl/db_impl_write.cc
  8. 2
      db/db_kv_checksum_test.cc
  9. 13
      db/db_test.cc
  10. 9
      db/db_test2.cc
  11. 574
      db/db_with_timestamp_basic_test.cc
  12. 15
      db/db_with_timestamp_compaction_test.cc
  13. 350
      db/write_batch.cc
  14. 152
      db/write_batch_internal.h
  15. 108
      db/write_batch_test.cc
  16. 6
      db_stress_tool/batched_ops_stress.cc
  17. 6
      db_stress_tool/db_stress_test_base.cc
  18. 28
      db_stress_tool/no_batched_ops_stress.cc
  19. 35
      include/rocksdb/db.h
  20. 14
      include/rocksdb/options.h
  21. 20
      include/rocksdb/utilities/stackable_db.h
  22. 1
      include/rocksdb/utilities/transaction_db.h
  23. 18
      include/rocksdb/utilities/write_batch_with_index.h
  24. 88
      include/rocksdb/write_batch.h
  25. 11
      include/rocksdb/write_batch_base.h
  26. 88
      tools/db_bench_tool.cc
  27. 1
      utilities/blob_db/blob_db.h
  28. 1
      utilities/blob_db/blob_db_impl.h
  29. 1
      utilities/transactions/optimistic_transaction_db_impl.h
  30. 1
      utilities/transactions/write_prepared_txn_db.cc
  31. 29
      utilities/write_batch_with_index/write_batch_with_index.cc

@ -21,6 +21,7 @@
* Remove ReadOptions::iter_start_seqnum which has been deprecated. * Remove ReadOptions::iter_start_seqnum which has been deprecated.
* Remove DBOptions::preserved_deletes and DB::SetPreserveDeletesSequenceNumber(). * Remove DBOptions::preserved_deletes and DB::SetPreserveDeletesSequenceNumber().
* Remove deprecated API AdvancedColumnFamilyOptions::rate_limit_delay_max_milliseconds. * Remove deprecated API AdvancedColumnFamilyOptions::rate_limit_delay_max_milliseconds.
* Removed timestamp from WriteOptions. Accordingly, added to DB APIs Put, Delete, SingleDelete, etc. accepting an additional argument 'timestamp'. Added Put, Delete, SingleDelete, etc to WriteBatch accepting an additional argument 'timestamp'. Removed WriteBatch::AssignTimestamps(vector<Slice>) API. Renamed WriteBatch::AssignTimestamp() to WriteBatch::UpdateTimestamps() with clarified comments.
### Behavior Changes ### Behavior Changes
* Disallow the combination of DBOptions.use_direct_io_for_flush_and_compaction == true and DBOptions.writable_file_max_buffer_size == 0. This combination can cause WritableFileWriter::Append() to loop forever, and it does not make much sense in direct IO. * Disallow the combination of DBOptions.use_direct_io_for_flush_and_compaction == true and DBOptions.writable_file_max_buffer_size == 0. This combination can cause WritableFileWriter::Append() to loop forever, and it does not make much sense in direct IO.

@ -40,6 +40,11 @@ size_t CompactedDBImpl::FindFile(const Slice& key) {
Status CompactedDBImpl::Get(const ReadOptions& options, ColumnFamilyHandle*, Status CompactedDBImpl::Get(const ReadOptions& options, ColumnFamilyHandle*,
const Slice& key, PinnableSlice* value) { const Slice& key, PinnableSlice* value) {
assert(user_comparator_);
if (options.timestamp || user_comparator_->timestamp_size()) {
// TODO: support timestamp
return Status::NotSupported();
}
GetContext get_context(user_comparator_, nullptr, nullptr, nullptr, GetContext get_context(user_comparator_, nullptr, nullptr, nullptr,
GetContext::kNotFound, key, value, nullptr, nullptr, GetContext::kNotFound, key, value, nullptr, nullptr,
nullptr, true, nullptr, nullptr); nullptr, true, nullptr, nullptr);
@ -58,6 +63,11 @@ Status CompactedDBImpl::Get(const ReadOptions& options, ColumnFamilyHandle*,
std::vector<Status> CompactedDBImpl::MultiGet(const ReadOptions& options, std::vector<Status> CompactedDBImpl::MultiGet(const ReadOptions& options,
const std::vector<ColumnFamilyHandle*>&, const std::vector<ColumnFamilyHandle*>&,
const std::vector<Slice>& keys, std::vector<std::string>* values) { const std::vector<Slice>& keys, std::vector<std::string>* values) {
assert(user_comparator_);
if (user_comparator_->timestamp_size() || options.timestamp) {
// TODO: support timestamp
return std::vector<Status>(keys.size(), Status::NotSupported());
}
autovector<TableReader*, 16> reader_list; autovector<TableReader*, 16> reader_list;
for (const auto& key : keys) { for (const auto& key : keys) {
const FdWithKeyRange& f = files_.files[FindFile(key)]; const FdWithKeyRange& f = files_.files[FindFile(key)];
@ -69,6 +79,7 @@ std::vector<Status> CompactedDBImpl::MultiGet(const ReadOptions& options,
reader_list.push_back(f.fd.table_reader); reader_list.push_back(f.fd.table_reader);
} }
} }
std::vector<Status> statuses(keys.size(), Status::NotFound()); std::vector<Status> statuses(keys.size(), Status::NotFound());
values->resize(keys.size()); values->resize(keys.size());
int idx = 0; int idx = 0;

@ -1730,19 +1730,21 @@ Status DBImpl::GetImpl(const ReadOptions& read_options, const Slice& key,
get_impl_options.merge_operands != nullptr); get_impl_options.merge_operands != nullptr);
assert(get_impl_options.column_family); assert(get_impl_options.column_family);
const Comparator* ucmp = get_impl_options.column_family->GetComparator();
assert(ucmp);
size_t ts_sz = ucmp->timestamp_size();
GetWithTimestampReadCallback read_cb(0); // Will call Refresh
#ifndef NDEBUG if (read_options.timestamp) {
if (ts_sz > 0) { const Status s = FailIfTsSizesMismatch(get_impl_options.column_family,
assert(read_options.timestamp); *(read_options.timestamp));
assert(read_options.timestamp->size() == ts_sz); if (!s.ok()) {
return s;
}
} else { } else {
assert(!read_options.timestamp); const Status s = FailIfCfHasTs(get_impl_options.column_family);
if (!s.ok()) {
return s;
}
} }
#endif // NDEBUG
GetWithTimestampReadCallback read_cb(0); // Will call Refresh
PERF_CPU_TIMER_GUARD(get_cpu_nanos, immutable_db_options_.clock); PERF_CPU_TIMER_GUARD(get_cpu_nanos, immutable_db_options_.clock);
StopWatch sw(immutable_db_options_.clock, stats_, DB_GET); StopWatch sw(immutable_db_options_.clock, stats_, DB_GET);
@ -1811,7 +1813,9 @@ Status DBImpl::GetImpl(const ReadOptions& read_options, const Slice& key,
// only if t <= read_opts.timestamp and s <= snapshot. // only if t <= read_opts.timestamp and s <= snapshot.
// HACK: temporarily overwrite input struct field but restore // HACK: temporarily overwrite input struct field but restore
SaveAndRestore<ReadCallback*> restore_callback(&get_impl_options.callback); SaveAndRestore<ReadCallback*> restore_callback(&get_impl_options.callback);
if (ts_sz > 0) { const Comparator* ucmp = get_impl_options.column_family->GetComparator();
assert(ucmp);
if (ucmp->timestamp_size() > 0) {
assert(!get_impl_options assert(!get_impl_options
.callback); // timestamp with callback is not supported .callback); // timestamp with callback is not supported
read_cb.Refresh(snapshot); read_cb.Refresh(snapshot);
@ -1834,7 +1838,8 @@ Status DBImpl::GetImpl(const ReadOptions& read_options, const Slice& key,
bool skip_memtable = (read_options.read_tier == kPersistedTier && bool skip_memtable = (read_options.read_tier == kPersistedTier &&
has_unpersisted_data_.load(std::memory_order_relaxed)); has_unpersisted_data_.load(std::memory_order_relaxed));
bool done = false; bool done = false;
std::string* timestamp = ts_sz > 0 ? get_impl_options.timestamp : nullptr; std::string* timestamp =
ucmp->timestamp_size() > 0 ? get_impl_options.timestamp : nullptr;
if (!skip_memtable) { if (!skip_memtable) {
// Get value associated with key // Get value associated with key
if (get_impl_options.get_value) { if (get_impl_options.get_value) {
@ -1941,19 +1946,36 @@ std::vector<Status> DBImpl::MultiGet(
StopWatch sw(immutable_db_options_.clock, stats_, DB_MULTIGET); StopWatch sw(immutable_db_options_.clock, stats_, DB_MULTIGET);
PERF_TIMER_GUARD(get_snapshot_time); PERF_TIMER_GUARD(get_snapshot_time);
#ifndef NDEBUG size_t num_keys = keys.size();
for (const auto* cfh : column_family) { assert(column_family.size() == num_keys);
assert(cfh); std::vector<Status> stat_list(num_keys);
const Comparator* const ucmp = cfh->GetComparator();
assert(ucmp); bool should_fail = false;
if (ucmp->timestamp_size() > 0) { for (size_t i = 0; i < num_keys; ++i) {
assert(read_options.timestamp); assert(column_family[i]);
assert(ucmp->timestamp_size() == read_options.timestamp->size()); if (read_options.timestamp) {
stat_list[i] =
FailIfTsSizesMismatch(column_family[i], *(read_options.timestamp));
if (!stat_list[i].ok()) {
should_fail = true;
}
} else { } else {
assert(!read_options.timestamp); stat_list[i] = FailIfCfHasTs(column_family[i]);
if (!stat_list[i].ok()) {
should_fail = true;
}
} }
} }
#endif // NDEBUG
if (should_fail) {
for (auto& s : stat_list) {
if (s.ok()) {
s = Status::Incomplete(
"DB not queried due to invalid argument(s) in the same MultiGet");
}
}
return stat_list;
}
if (tracer_) { if (tracer_) {
// TODO: This mutex should be removed later, to improve performance when // TODO: This mutex should be removed later, to improve performance when
@ -1996,8 +2018,6 @@ std::vector<Status> DBImpl::MultiGet(
MergeContext merge_context; MergeContext merge_context;
// Note: this always resizes the values array // Note: this always resizes the values array
size_t num_keys = keys.size();
std::vector<Status> stat_list(num_keys);
values->resize(num_keys); values->resize(num_keys);
if (timestamps) { if (timestamps) {
timestamps->resize(num_keys); timestamps->resize(num_keys);
@ -2262,20 +2282,31 @@ void DBImpl::MultiGet(const ReadOptions& read_options, const size_t num_keys,
return; return;
} }
#ifndef NDEBUG bool should_fail = false;
for (size_t i = 0; i < num_keys; ++i) { for (size_t i = 0; i < num_keys; ++i) {
ColumnFamilyHandle* cfh = column_families[i]; ColumnFamilyHandle* cfh = column_families[i];
assert(cfh); assert(cfh);
const Comparator* const ucmp = cfh->GetComparator(); if (read_options.timestamp) {
assert(ucmp); statuses[i] = FailIfTsSizesMismatch(cfh, *(read_options.timestamp));
if (ucmp->timestamp_size() > 0) { if (!statuses[i].ok()) {
assert(read_options.timestamp); should_fail = true;
assert(read_options.timestamp->size() == ucmp->timestamp_size()); }
} else { } else {
assert(!read_options.timestamp); statuses[i] = FailIfCfHasTs(cfh);
if (!statuses[i].ok()) {
should_fail = true;
}
} }
} }
#endif // NDEBUG if (should_fail) {
for (size_t i = 0; i < num_keys; ++i) {
if (statuses[i].ok()) {
statuses[i] = Status::Incomplete(
"DB not queried due to invalid argument(s) in the same MultiGet");
}
}
return;
}
if (tracer_) { if (tracer_) {
// TODO: This mutex should be removed later, to improve performance when // TODO: This mutex should be removed later, to improve performance when
@ -2903,6 +2934,21 @@ Iterator* DBImpl::NewIterator(const ReadOptions& read_options,
"ReadTier::kPersistedData is not yet supported in iterators.")); "ReadTier::kPersistedData is not yet supported in iterators."));
} }
assert(column_family);
if (read_options.timestamp) {
const Status s =
FailIfTsSizesMismatch(column_family, *(read_options.timestamp));
if (!s.ok()) {
return NewErrorIterator(s);
}
} else {
const Status s = FailIfCfHasTs(column_family);
if (!s.ok()) {
return NewErrorIterator(s);
}
}
auto cfh = static_cast_with_check<ColumnFamilyHandleImpl>(column_family); auto cfh = static_cast_with_check<ColumnFamilyHandleImpl>(column_family);
ColumnFamilyData* cfd = cfh->cfd(); ColumnFamilyData* cfd = cfh->cfd();
assert(cfd != nullptr); assert(cfd != nullptr);
@ -3029,6 +3075,25 @@ Status DBImpl::NewIterators(
return Status::NotSupported( return Status::NotSupported(
"ReadTier::kPersistedData is not yet supported in iterators."); "ReadTier::kPersistedData is not yet supported in iterators.");
} }
if (read_options.timestamp) {
for (auto* cf : column_families) {
assert(cf);
const Status s = FailIfTsSizesMismatch(cf, *(read_options.timestamp));
if (!s.ok()) {
return s;
}
}
} else {
for (auto* cf : column_families) {
assert(cf);
const Status s = FailIfCfHasTs(cf);
if (!s.ok()) {
return s;
}
}
}
ReadCallback* read_callback = nullptr; // No read callback provided. ReadCallback* read_callback = nullptr; // No read callback provided.
iterators->clear(); iterators->clear();
iterators->reserve(column_families.size()); iterators->reserve(column_families.size());

@ -146,24 +146,36 @@ class DBImpl : public DB {
// ---- Implementations of the DB interface ---- // ---- Implementations of the DB interface ----
using DB::Resume; using DB::Resume;
virtual Status Resume() override; Status Resume() override;
using DB::Put; using DB::Put;
virtual Status Put(const WriteOptions& options, Status Put(const WriteOptions& options, ColumnFamilyHandle* column_family,
ColumnFamilyHandle* column_family, const Slice& key, const Slice& key, const Slice& value) override;
const Slice& value) override; Status Put(const WriteOptions& options, ColumnFamilyHandle* column_family,
const Slice& key, const Slice& ts, const Slice& value) override;
using DB::Merge; using DB::Merge;
virtual Status Merge(const WriteOptions& options, Status Merge(const WriteOptions& options, ColumnFamilyHandle* column_family,
ColumnFamilyHandle* column_family, const Slice& key, const Slice& key, const Slice& value) override;
const Slice& value) override;
using DB::Delete; using DB::Delete;
virtual Status Delete(const WriteOptions& options, Status Delete(const WriteOptions& options, ColumnFamilyHandle* column_family,
ColumnFamilyHandle* column_family, const Slice& key) override;
const Slice& key) override; Status Delete(const WriteOptions& options, ColumnFamilyHandle* column_family,
const Slice& key, const Slice& ts) override;
using DB::SingleDelete; using DB::SingleDelete;
virtual Status SingleDelete(const WriteOptions& options, Status SingleDelete(const WriteOptions& options,
ColumnFamilyHandle* column_family, ColumnFamilyHandle* column_family,
const Slice& key) override; const Slice& key) override;
Status SingleDelete(const WriteOptions& options,
ColumnFamilyHandle* column_family, const Slice& key,
const Slice& ts) override;
using DB::DeleteRange;
Status DeleteRange(const WriteOptions& options,
ColumnFamilyHandle* column_family, const Slice& begin_key,
const Slice& end_key) override;
using DB::Write; using DB::Write;
virtual Status Write(const WriteOptions& options, virtual Status Write(const WriteOptions& options,
WriteBatch* updates) override; WriteBatch* updates) override;
@ -1368,6 +1380,10 @@ class DBImpl : public DB {
// to ensure that db_session_id_ gets updated every time the DB is opened // to ensure that db_session_id_ gets updated every time the DB is opened
void SetDbSessionId(); void SetDbSessionId();
Status FailIfCfHasTs(const ColumnFamilyHandle* column_family) const;
Status FailIfTsSizesMismatch(const ColumnFamilyHandle* column_family,
const Slice& ts) const;
private: private:
friend class DB; friend class DB;
friend class ErrorHandler; friend class ErrorHandler;
@ -2396,4 +2412,43 @@ static void ClipToRange(T* ptr, V minvalue, V maxvalue) {
if (static_cast<V>(*ptr) < minvalue) *ptr = minvalue; if (static_cast<V>(*ptr) < minvalue) *ptr = minvalue;
} }
inline Status DBImpl::FailIfCfHasTs(
const ColumnFamilyHandle* column_family) const {
column_family = column_family ? column_family : DefaultColumnFamily();
assert(column_family);
const Comparator* const ucmp = column_family->GetComparator();
assert(ucmp);
if (ucmp->timestamp_size() > 0) {
std::ostringstream oss;
oss << "cannot call this method on column family "
<< column_family->GetName() << " that enables timestamp";
return Status::InvalidArgument(oss.str());
}
return Status::OK();
}
inline Status DBImpl::FailIfTsSizesMismatch(
const ColumnFamilyHandle* column_family, const Slice& ts) const {
if (!column_family) {
return Status::InvalidArgument("column family handle cannot be null");
}
assert(column_family);
const Comparator* const ucmp = column_family->GetComparator();
assert(ucmp);
if (0 == ucmp->timestamp_size()) {
std::stringstream oss;
oss << "cannot call this method on column family "
<< column_family->GetName() << " that does not enable timestamp";
return Status::InvalidArgument(oss.str());
}
const size_t ts_sz = ts.size();
if (ts_sz != ucmp->timestamp_size()) {
std::stringstream oss;
oss << "Timestamp sizes mismatch: expect " << ucmp->timestamp_size() << ", "
<< ts_sz << " given";
return Status::InvalidArgument(oss.str());
}
return Status::OK();
}
} // namespace ROCKSDB_NAMESPACE } // namespace ROCKSDB_NAMESPACE

@ -36,6 +36,15 @@ Status DBImplReadOnly::Get(const ReadOptions& read_options,
assert(pinnable_val != nullptr); assert(pinnable_val != nullptr);
// TODO: stopwatch DB_GET needed?, perf timer needed? // TODO: stopwatch DB_GET needed?, perf timer needed?
PERF_TIMER_GUARD(get_snapshot_time); PERF_TIMER_GUARD(get_snapshot_time);
assert(column_family);
const Comparator* ucmp = column_family->GetComparator();
assert(ucmp);
if (ucmp->timestamp_size() || read_options.timestamp) {
// TODO: support timestamp
return Status::NotSupported();
}
Status s; Status s;
SequenceNumber snapshot = versions_->LastSequence(); SequenceNumber snapshot = versions_->LastSequence();
auto cfh = static_cast_with_check<ColumnFamilyHandleImpl>(column_family); auto cfh = static_cast_with_check<ColumnFamilyHandleImpl>(column_family);
@ -73,6 +82,13 @@ Status DBImplReadOnly::Get(const ReadOptions& read_options,
Iterator* DBImplReadOnly::NewIterator(const ReadOptions& read_options, Iterator* DBImplReadOnly::NewIterator(const ReadOptions& read_options,
ColumnFamilyHandle* column_family) { ColumnFamilyHandle* column_family) {
assert(column_family);
const Comparator* ucmp = column_family->GetComparator();
assert(ucmp);
if (ucmp->timestamp_size() || read_options.timestamp) {
// TODO: support timestamp
return NewErrorIterator(Status::NotSupported());
}
auto cfh = static_cast_with_check<ColumnFamilyHandleImpl>(column_family); auto cfh = static_cast_with_check<ColumnFamilyHandleImpl>(column_family);
auto cfd = cfh->cfd(); auto cfd = cfh->cfd();
SuperVersion* super_version = cfd->GetSuperVersion()->Ref(); SuperVersion* super_version = cfd->GetSuperVersion()->Ref();
@ -100,6 +116,21 @@ Status DBImplReadOnly::NewIterators(
const ReadOptions& read_options, const ReadOptions& read_options,
const std::vector<ColumnFamilyHandle*>& column_families, const std::vector<ColumnFamilyHandle*>& column_families,
std::vector<Iterator*>* iterators) { std::vector<Iterator*>* iterators) {
if (read_options.timestamp) {
// TODO: support timestamp
return Status::NotSupported();
} else {
for (auto* cf : column_families) {
assert(cf);
const Comparator* ucmp = cf->GetComparator();
assert(ucmp);
if (ucmp->timestamp_size()) {
// TODO: support timestamp
return Status::NotSupported();
}
}
}
ReadCallback* read_callback = nullptr; // No read callback provided. ReadCallback* read_callback = nullptr; // No read callback provided.
if (iterators == nullptr) { if (iterators == nullptr) {
return Status::InvalidArgument("iterators not allowed to be nullptr"); return Status::InvalidArgument("iterators not allowed to be nullptr");

@ -339,6 +339,14 @@ Status DBImplSecondary::GetImpl(const ReadOptions& read_options,
StopWatch sw(immutable_db_options_.clock, stats_, DB_GET); StopWatch sw(immutable_db_options_.clock, stats_, DB_GET);
PERF_TIMER_GUARD(get_snapshot_time); PERF_TIMER_GUARD(get_snapshot_time);
assert(column_family);
const Comparator* ucmp = column_family->GetComparator();
assert(ucmp);
if (ucmp->timestamp_size() || read_options.timestamp) {
// TODO: support timestamp
return Status::NotSupported();
}
auto cfh = static_cast<ColumnFamilyHandleImpl*>(column_family); auto cfh = static_cast<ColumnFamilyHandleImpl*>(column_family);
ColumnFamilyData* cfd = cfh->cfd(); ColumnFamilyData* cfd = cfh->cfd();
if (tracer_) { if (tracer_) {
@ -404,6 +412,15 @@ Iterator* DBImplSecondary::NewIterator(const ReadOptions& read_options,
return NewErrorIterator(Status::NotSupported( return NewErrorIterator(Status::NotSupported(
"ReadTier::kPersistedData is not yet supported in iterators.")); "ReadTier::kPersistedData is not yet supported in iterators."));
} }
assert(column_family);
const Comparator* ucmp = column_family->GetComparator();
assert(ucmp);
if (ucmp->timestamp_size() || read_options.timestamp) {
// TODO: support timestamp
return NewErrorIterator(Status::NotSupported());
}
Iterator* result = nullptr; Iterator* result = nullptr;
auto cfh = static_cast_with_check<ColumnFamilyHandleImpl>(column_family); auto cfh = static_cast_with_check<ColumnFamilyHandleImpl>(column_family);
auto cfd = cfh->cfd(); auto cfd = cfh->cfd();
@ -460,6 +477,20 @@ Status DBImplSecondary::NewIterators(
if (iterators == nullptr) { if (iterators == nullptr) {
return Status::InvalidArgument("iterators not allowed to be nullptr"); return Status::InvalidArgument("iterators not allowed to be nullptr");
} }
if (read_options.timestamp) {
// TODO: support timestamp
return Status::NotSupported();
} else {
for (auto* cf : column_families) {
assert(cf);
const Comparator* ucmp = cf->GetComparator();
assert(ucmp);
if (ucmp->timestamp_size()) {
// TODO: support timestamp
return Status::NotSupported();
}
}
}
iterators->clear(); iterators->clear();
iterators->reserve(column_families.size()); iterators->reserve(column_families.size());
if (read_options.tailing) { if (read_options.tailing) {

@ -21,11 +21,28 @@ namespace ROCKSDB_NAMESPACE {
// Convenience methods // Convenience methods
Status DBImpl::Put(const WriteOptions& o, ColumnFamilyHandle* column_family, Status DBImpl::Put(const WriteOptions& o, ColumnFamilyHandle* column_family,
const Slice& key, const Slice& val) { const Slice& key, const Slice& val) {
const Status s = FailIfCfHasTs(column_family);
if (!s.ok()) {
return s;
}
return DB::Put(o, column_family, key, val); return DB::Put(o, column_family, key, val);
} }
Status DBImpl::Put(const WriteOptions& o, ColumnFamilyHandle* column_family,
const Slice& key, const Slice& ts, const Slice& val) {
const Status s = FailIfTsSizesMismatch(column_family, ts);
if (!s.ok()) {
return s;
}
return DB::Put(o, column_family, key, ts, val);
}
Status DBImpl::Merge(const WriteOptions& o, ColumnFamilyHandle* column_family, Status DBImpl::Merge(const WriteOptions& o, ColumnFamilyHandle* column_family,
const Slice& key, const Slice& val) { const Slice& key, const Slice& val) {
const Status s = FailIfCfHasTs(column_family);
if (!s.ok()) {
return s;
}
auto cfh = static_cast_with_check<ColumnFamilyHandleImpl>(column_family); auto cfh = static_cast_with_check<ColumnFamilyHandleImpl>(column_family);
if (!cfh->cfd()->ioptions()->merge_operator) { if (!cfh->cfd()->ioptions()->merge_operator) {
return Status::NotSupported("Provide a merge_operator when opening DB"); return Status::NotSupported("Provide a merge_operator when opening DB");
@ -36,22 +53,61 @@ Status DBImpl::Merge(const WriteOptions& o, ColumnFamilyHandle* column_family,
Status DBImpl::Delete(const WriteOptions& write_options, Status DBImpl::Delete(const WriteOptions& write_options,
ColumnFamilyHandle* column_family, const Slice& key) { ColumnFamilyHandle* column_family, const Slice& key) {
const Status s = FailIfCfHasTs(column_family);
if (!s.ok()) {
return s;
}
return DB::Delete(write_options, column_family, key); return DB::Delete(write_options, column_family, key);
} }
Status DBImpl::Delete(const WriteOptions& write_options,
ColumnFamilyHandle* column_family, const Slice& key,
const Slice& ts) {
const Status s = FailIfTsSizesMismatch(column_family, ts);
if (!s.ok()) {
return s;
}
return DB::Delete(write_options, column_family, key, ts);
}
Status DBImpl::SingleDelete(const WriteOptions& write_options, Status DBImpl::SingleDelete(const WriteOptions& write_options,
ColumnFamilyHandle* column_family, ColumnFamilyHandle* column_family,
const Slice& key) { const Slice& key) {
const Status s = FailIfCfHasTs(column_family);
if (!s.ok()) {
return s;
}
return DB::SingleDelete(write_options, column_family, key); return DB::SingleDelete(write_options, column_family, key);
} }
Status DBImpl::SingleDelete(const WriteOptions& write_options,
ColumnFamilyHandle* column_family, const Slice& key,
const Slice& ts) {
const Status s = FailIfTsSizesMismatch(column_family, ts);
if (!s.ok()) {
return s;
}
return DB::SingleDelete(write_options, column_family, key, ts);
}
Status DBImpl::DeleteRange(const WriteOptions& write_options,
ColumnFamilyHandle* column_family,
const Slice& begin_key, const Slice& end_key) {
const Status s = FailIfCfHasTs(column_family);
if (!s.ok()) {
return s;
}
return DB::DeleteRange(write_options, column_family, begin_key, end_key);
}
void DBImpl::SetRecoverableStatePreReleaseCallback( void DBImpl::SetRecoverableStatePreReleaseCallback(
PreReleaseCallback* callback) { PreReleaseCallback* callback) {
recoverable_state_pre_release_callback_.reset(callback); recoverable_state_pre_release_callback_.reset(callback);
} }
Status DBImpl::Write(const WriteOptions& write_options, WriteBatch* my_batch) { Status DBImpl::Write(const WriteOptions& write_options, WriteBatch* my_batch) {
return WriteImpl(write_options, my_batch, nullptr, nullptr); return WriteImpl(write_options, my_batch, /*callback=*/nullptr,
/*log_used=*/nullptr);
} }
#ifndef ROCKSDB_LITE #ifndef ROCKSDB_LITE
@ -74,6 +130,8 @@ Status DBImpl::WriteImpl(const WriteOptions& write_options,
assert(!seq_per_batch_ || batch_cnt != 0); assert(!seq_per_batch_ || batch_cnt != 0);
if (my_batch == nullptr) { if (my_batch == nullptr) {
return Status::Corruption("Batch is nullptr!"); return Status::Corruption("Batch is nullptr!");
} else if (WriteBatchInternal::TimestampsUpdateNeeded(*my_batch)) {
return Status::InvalidArgument("write batch must have timestamp(s) set");
} }
// TODO: this use of operator bool on `tracer_` can avoid unnecessary lock // TODO: this use of operator bool on `tracer_` can avoid unnecessary lock
// grabs but does not seem thread-safe. // grabs but does not seem thread-safe.
@ -283,6 +341,7 @@ Status DBImpl::WriteImpl(const WriteOptions& write_options,
size_t total_byte_size = 0; size_t total_byte_size = 0;
size_t pre_release_callback_cnt = 0; size_t pre_release_callback_cnt = 0;
for (auto* writer : write_group) { for (auto* writer : write_group) {
assert(writer);
if (writer->CheckCallback(this)) { if (writer->CheckCallback(this)) {
valid_batches += writer->batch_cnt; valid_batches += writer->batch_cnt;
if (writer->ShouldWriteToMemtable()) { if (writer->ShouldWriteToMemtable()) {
@ -487,7 +546,8 @@ Status DBImpl::PipelinedWriteImpl(const WriteOptions& write_options,
WriteContext write_context; WriteContext write_context;
WriteThread::Writer w(write_options, my_batch, callback, log_ref, WriteThread::Writer w(write_options, my_batch, callback, log_ref,
disable_memtable); disable_memtable, /*_batch_cnt=*/0,
/*_pre_release_callback=*/nullptr);
write_thread_.JoinBatchGroup(&w); write_thread_.JoinBatchGroup(&w);
TEST_SYNC_POINT("DBImplWrite::PipelinedWriteImpl:AfterJoinBatchGroup"); TEST_SYNC_POINT("DBImplWrite::PipelinedWriteImpl:AfterJoinBatchGroup");
if (w.state == WriteThread::STATE_GROUP_LEADER) { if (w.state == WriteThread::STATE_GROUP_LEADER) {
@ -526,7 +586,8 @@ Status DBImpl::PipelinedWriteImpl(const WriteOptions& write_options,
} }
} }
SequenceNumber next_sequence = current_sequence; SequenceNumber next_sequence = current_sequence;
for (auto writer : wal_write_group) { for (auto* writer : wal_write_group) {
assert(writer);
if (writer->CheckCallback(this)) { if (writer->CheckCallback(this)) {
if (writer->ShouldWriteToMemtable()) { if (writer->ShouldWriteToMemtable()) {
writer->sequence = next_sequence; writer->sequence = next_sequence;
@ -764,6 +825,7 @@ Status DBImpl::WriteImplWALOnly(
size_t pre_release_callback_cnt = 0; size_t pre_release_callback_cnt = 0;
size_t total_byte_size = 0; size_t total_byte_size = 0;
for (auto* writer : write_group) { for (auto* writer : write_group) {
assert(writer);
if (writer->CheckCallback(this)) { if (writer->CheckCallback(this)) {
total_byte_size = WriteBatchInternal::AppendedByteSize( total_byte_size = WriteBatchInternal::AppendedByteSize(
total_byte_size, WriteBatchInternal::ByteSize(writer->batch)); total_byte_size, WriteBatchInternal::ByteSize(writer->batch));
@ -2019,34 +2081,25 @@ size_t DBImpl::GetWalPreallocateBlockSize(uint64_t write_buffer_size) const {
// can call if they wish // can call if they wish
Status DB::Put(const WriteOptions& opt, ColumnFamilyHandle* column_family, Status DB::Put(const WriteOptions& opt, ColumnFamilyHandle* column_family,
const Slice& key, const Slice& value) { const Slice& key, const Slice& value) {
if (nullptr == opt.timestamp) { // Pre-allocate size of write batch conservatively.
// Pre-allocate size of write batch conservatively. // 8 bytes are taken by header, 4 bytes for count, 1 byte for type,
// 8 bytes are taken by header, 4 bytes for count, 1 byte for type, // and we allocate 11 extra bytes for key length, as well as value length.
// and we allocate 11 extra bytes for key length, as well as value length. WriteBatch batch(key.size() + value.size() + 24);
WriteBatch batch(key.size() + value.size() + 24); Status s = batch.Put(column_family, key, value);
Status s = batch.Put(column_family, key, value); if (!s.ok()) {
if (!s.ok()) { return s;
return s;
}
return Write(opt, &batch);
}
const Slice* ts = opt.timestamp;
assert(nullptr != ts);
size_t ts_sz = ts->size();
assert(column_family->GetComparator());
assert(ts_sz == column_family->GetComparator()->timestamp_size());
WriteBatch batch;
Status s;
if (key.data() + key.size() == ts->data()) {
Slice key_with_ts = Slice(key.data(), key.size() + ts_sz);
s = batch.Put(column_family, key_with_ts, value);
} else {
std::array<Slice, 2> key_with_ts_slices{{key, *ts}};
SliceParts key_with_ts(key_with_ts_slices.data(), 2);
std::array<Slice, 1> value_slices{{value}};
SliceParts values(value_slices.data(), 1);
s = batch.Put(column_family, key_with_ts, values);
} }
return Write(opt, &batch);
}
Status DB::Put(const WriteOptions& opt, ColumnFamilyHandle* column_family,
const Slice& key, const Slice& ts, const Slice& value) {
ColumnFamilyHandle* default_cf = DefaultColumnFamily();
assert(default_cf);
const Comparator* const default_cf_ucmp = default_cf->GetComparator();
assert(default_cf_ucmp);
WriteBatch batch(0, 0, 0, default_cf_ucmp->timestamp_size());
Status s = batch.Put(column_family, key, ts, value);
if (!s.ok()) { if (!s.ok()) {
return s; return s;
} }
@ -2055,29 +2108,22 @@ Status DB::Put(const WriteOptions& opt, ColumnFamilyHandle* column_family,
Status DB::Delete(const WriteOptions& opt, ColumnFamilyHandle* column_family, Status DB::Delete(const WriteOptions& opt, ColumnFamilyHandle* column_family,
const Slice& key) { const Slice& key) {
if (nullptr == opt.timestamp) {
WriteBatch batch;
Status s = batch.Delete(column_family, key);
if (!s.ok()) {
return s;
}
return Write(opt, &batch);
}
const Slice* ts = opt.timestamp;
assert(ts != nullptr);
size_t ts_sz = ts->size();
assert(column_family->GetComparator());
assert(ts_sz == column_family->GetComparator()->timestamp_size());
WriteBatch batch; WriteBatch batch;
Status s; Status s = batch.Delete(column_family, key);
if (key.data() + key.size() == ts->data()) { if (!s.ok()) {
Slice key_with_ts = Slice(key.data(), key.size() + ts_sz); return s;
s = batch.Delete(column_family, key_with_ts);
} else {
std::array<Slice, 2> key_with_ts_slices{{key, *ts}};
SliceParts key_with_ts(key_with_ts_slices.data(), 2);
s = batch.Delete(column_family, key_with_ts);
} }
return Write(opt, &batch);
}
Status DB::Delete(const WriteOptions& opt, ColumnFamilyHandle* column_family,
const Slice& key, const Slice& ts) {
ColumnFamilyHandle* default_cf = DefaultColumnFamily();
assert(default_cf);
const Comparator* const default_cf_ucmp = default_cf->GetComparator();
assert(default_cf_ucmp);
WriteBatch batch(0, 0, 0, default_cf_ucmp->timestamp_size());
Status s = batch.Delete(column_family, key, ts);
if (!s.ok()) { if (!s.ok()) {
return s; return s;
} }
@ -2086,36 +2132,27 @@ Status DB::Delete(const WriteOptions& opt, ColumnFamilyHandle* column_family,
Status DB::SingleDelete(const WriteOptions& opt, Status DB::SingleDelete(const WriteOptions& opt,
ColumnFamilyHandle* column_family, const Slice& key) { ColumnFamilyHandle* column_family, const Slice& key) {
Status s; WriteBatch batch;
if (opt.timestamp == nullptr) { Status s = batch.SingleDelete(column_family, key);
WriteBatch batch; if (!s.ok()) {
s = batch.SingleDelete(column_family, key);
if (!s.ok()) {
return s;
}
s = Write(opt, &batch);
return s; return s;
} }
return Write(opt, &batch);
}
const Slice* ts = opt.timestamp; Status DB::SingleDelete(const WriteOptions& opt,
assert(ts != nullptr); ColumnFamilyHandle* column_family, const Slice& key,
size_t ts_sz = ts->size(); const Slice& ts) {
assert(column_family->GetComparator()); ColumnFamilyHandle* default_cf = DefaultColumnFamily();
assert(ts_sz == column_family->GetComparator()->timestamp_size()); assert(default_cf);
WriteBatch batch; const Comparator* const default_cf_ucmp = default_cf->GetComparator();
if (key.data() + key.size() == ts->data()) { assert(default_cf_ucmp);
Slice key_with_ts = Slice(key.data(), key.size() + ts_sz); WriteBatch batch(0, 0, 0, default_cf_ucmp->timestamp_size());
s = batch.SingleDelete(column_family, key_with_ts); Status s = batch.SingleDelete(column_family, key, ts);
} else {
std::array<Slice, 2> key_with_ts_slices{{key, *ts}};
SliceParts key_with_ts(key_with_ts_slices.data(), 2);
s = batch.SingleDelete(column_family, key_with_ts);
}
if (!s.ok()) { if (!s.ok()) {
return s; return s;
} }
s = Write(opt, &batch); return Write(opt, &batch);
return s;
} }
Status DB::DeleteRange(const WriteOptions& opt, Status DB::DeleteRange(const WriteOptions& opt,

@ -37,7 +37,7 @@ class DbKvChecksumTest
std::pair<WriteBatch, Status> GetWriteBatch(ColumnFamilyHandle* cf_handle) { std::pair<WriteBatch, Status> GetWriteBatch(ColumnFamilyHandle* cf_handle) {
Status s; Status s;
WriteBatch wb(0 /* reserved_bytes */, 0 /* max_bytes */, WriteBatch wb(0 /* reserved_bytes */, 0 /* max_bytes */,
8 /* protection_bytes_per_entry */); 8 /* protection_bytes_per_entry */, 0 /* default_cf_ts_sz */);
switch (op_type_) { switch (op_type_) {
case WriteBatchOpType::kPut: case WriteBatchOpType::kPut:
s = wb.Put(cf_handle, "key", "val"); s = wb.Put(cf_handle, "key", "val");

@ -2862,6 +2862,11 @@ class ModelDB : public DB {
} }
return Write(o, &batch); return Write(o, &batch);
} }
Status Put(const WriteOptions& /*o*/, ColumnFamilyHandle* /*cf*/,
const Slice& /*k*/, const Slice& /*ts*/,
const Slice& /*v*/) override {
return Status::NotSupported();
}
using DB::Close; using DB::Close;
Status Close() override { return Status::OK(); } Status Close() override { return Status::OK(); }
using DB::Delete; using DB::Delete;
@ -2874,6 +2879,10 @@ class ModelDB : public DB {
} }
return Write(o, &batch); return Write(o, &batch);
} }
Status Delete(const WriteOptions& /*o*/, ColumnFamilyHandle* /*cf*/,
const Slice& /*key*/, const Slice& /*ts*/) override {
return Status::NotSupported();
}
using DB::SingleDelete; using DB::SingleDelete;
Status SingleDelete(const WriteOptions& o, ColumnFamilyHandle* cf, Status SingleDelete(const WriteOptions& o, ColumnFamilyHandle* cf,
const Slice& key) override { const Slice& key) override {
@ -2884,6 +2893,10 @@ class ModelDB : public DB {
} }
return Write(o, &batch); return Write(o, &batch);
} }
Status SingleDelete(const WriteOptions& /*o*/, ColumnFamilyHandle* /*cf*/,
const Slice& /*key*/, const Slice& /*ts*/) override {
return Status::NotSupported();
}
using DB::Merge; using DB::Merge;
Status Merge(const WriteOptions& o, ColumnFamilyHandle* cf, const Slice& k, Status Merge(const WriteOptions& o, ColumnFamilyHandle* cf, const Slice& k,
const Slice& v) override { const Slice& v) override {

@ -6845,16 +6845,13 @@ TEST_F(DBTest2, GetLatestSeqAndTsForKey) {
constexpr uint64_t kTsU64Value = 12; constexpr uint64_t kTsU64Value = 12;
for (uint64_t key = 0; key < 100; ++key) { for (uint64_t key = 0; key < 100; ++key) {
std::string ts_str; std::string ts;
PutFixed64(&ts_str, kTsU64Value); PutFixed64(&ts, kTsU64Value);
Slice ts = ts_str;
WriteOptions write_opts;
write_opts.timestamp = &ts;
std::string key_str; std::string key_str;
PutFixed64(&key_str, key); PutFixed64(&key_str, key);
std::reverse(key_str.begin(), key_str.end()); std::reverse(key_str.begin(), key_str.end());
ASSERT_OK(Put(key_str, "value", write_opts)); ASSERT_OK(db_->Put(WriteOptions(), key_str, ts, "value"));
} }
ASSERT_OK(Flush()); ASSERT_OK(Flush());

File diff suppressed because it is too large Load Diff

@ -78,9 +78,8 @@ TEST_F(TimestampCompatibleCompactionTest, UserKeyCrossFileBoundary) {
WriteOptions write_opts; WriteOptions write_opts;
for (; key < kNumKeysPerFile - 1; ++key, ++ts) { for (; key < kNumKeysPerFile - 1; ++key, ++ts) {
std::string ts_str = Timestamp(ts); std::string ts_str = Timestamp(ts);
Slice ts_slice = ts_str; ASSERT_OK(
write_opts.timestamp = &ts_slice; db_->Put(write_opts, Key1(key), ts_str, "foo_" + std::to_string(key)));
ASSERT_OK(db_->Put(write_opts, Key1(key), "foo_" + std::to_string(key)));
} }
// Write another L0 with keys 99 with newer ts. // Write another L0 with keys 99 with newer ts.
ASSERT_OK(Flush()); ASSERT_OK(Flush());
@ -88,18 +87,16 @@ TEST_F(TimestampCompatibleCompactionTest, UserKeyCrossFileBoundary) {
key = 99; key = 99;
for (int i = 0; i < 4; ++i, ++ts) { for (int i = 0; i < 4; ++i, ++ts) {
std::string ts_str = Timestamp(ts); std::string ts_str = Timestamp(ts);
Slice ts_slice = ts_str; ASSERT_OK(
write_opts.timestamp = &ts_slice; db_->Put(write_opts, Key1(key), ts_str, "bar_" + std::to_string(key)));
ASSERT_OK(db_->Put(write_opts, Key1(key), "bar_" + std::to_string(key)));
} }
ASSERT_OK(Flush()); ASSERT_OK(Flush());
uint64_t saved_read_ts2 = ts++; uint64_t saved_read_ts2 = ts++;
// Write another L0 with keys 99, 100, 101, ..., 150 // Write another L0 with keys 99, 100, 101, ..., 150
for (; key <= 150; ++key, ++ts) { for (; key <= 150; ++key, ++ts) {
std::string ts_str = Timestamp(ts); std::string ts_str = Timestamp(ts);
Slice ts_slice = ts_str; ASSERT_OK(
write_opts.timestamp = &ts_slice; db_->Put(write_opts, Key1(key), ts_str, "foo1_" + std::to_string(key)));
ASSERT_OK(db_->Put(write_opts, Key1(key), "foo1_" + std::to_string(key)));
} }
ASSERT_OK(Flush()); ASSERT_OK(Flush());
// Wait for compaction to finish // Wait for compaction to finish

@ -160,8 +160,11 @@ WriteBatch::WriteBatch(size_t reserved_bytes, size_t max_bytes)
} }
WriteBatch::WriteBatch(size_t reserved_bytes, size_t max_bytes, WriteBatch::WriteBatch(size_t reserved_bytes, size_t max_bytes,
size_t protection_bytes_per_key) size_t protection_bytes_per_key, size_t default_cf_ts_sz)
: content_flags_(0), max_bytes_(max_bytes), rep_() { : content_flags_(0),
max_bytes_(max_bytes),
default_cf_ts_sz_(default_cf_ts_sz),
rep_() {
// Currently `protection_bytes_per_key` can only be enabled at 8 bytes per // Currently `protection_bytes_per_key` can only be enabled at 8 bytes per
// entry. // entry.
assert(protection_bytes_per_key == 0 || protection_bytes_per_key == 8); assert(protection_bytes_per_key == 0 || protection_bytes_per_key == 8);
@ -186,6 +189,7 @@ WriteBatch::WriteBatch(const WriteBatch& src)
: wal_term_point_(src.wal_term_point_), : wal_term_point_(src.wal_term_point_),
content_flags_(src.content_flags_.load(std::memory_order_relaxed)), content_flags_(src.content_flags_.load(std::memory_order_relaxed)),
max_bytes_(src.max_bytes_), max_bytes_(src.max_bytes_),
default_cf_ts_sz_(src.default_cf_ts_sz_),
rep_(src.rep_) { rep_(src.rep_) {
if (src.save_points_ != nullptr) { if (src.save_points_ != nullptr) {
save_points_.reset(new SavePoints()); save_points_.reset(new SavePoints());
@ -203,6 +207,7 @@ WriteBatch::WriteBatch(WriteBatch&& src) noexcept
content_flags_(src.content_flags_.load(std::memory_order_relaxed)), content_flags_(src.content_flags_.load(std::memory_order_relaxed)),
max_bytes_(src.max_bytes_), max_bytes_(src.max_bytes_),
prot_info_(std::move(src.prot_info_)), prot_info_(std::move(src.prot_info_)),
default_cf_ts_sz_(src.default_cf_ts_sz_),
rep_(std::move(src.rep_)) {} rep_(std::move(src.rep_)) {}
WriteBatch& WriteBatch::operator=(const WriteBatch& src) { WriteBatch& WriteBatch::operator=(const WriteBatch& src) {
@ -250,6 +255,7 @@ void WriteBatch::Clear() {
prot_info_->entries_.clear(); prot_info_->entries_.clear();
} }
wal_term_point_.clear(); wal_term_point_.clear();
default_cf_ts_sz_ = 0;
} }
uint32_t WriteBatch::Count() const { return WriteBatchInternal::Count(this); } uint32_t WriteBatch::Count() const { return WriteBatchInternal::Count(this); }
@ -700,6 +706,45 @@ size_t WriteBatchInternal::GetFirstOffset(WriteBatch* /*b*/) {
return WriteBatchInternal::kHeader; return WriteBatchInternal::kHeader;
} }
std::tuple<Status, uint32_t, size_t>
WriteBatchInternal::GetColumnFamilyIdAndTimestampSize(
WriteBatch* b, ColumnFamilyHandle* column_family) {
uint32_t cf_id = GetColumnFamilyID(column_family);
size_t ts_sz = 0;
Status s;
if (column_family) {
const Comparator* const ucmp = column_family->GetComparator();
if (ucmp) {
ts_sz = ucmp->timestamp_size();
if (0 == cf_id && b->default_cf_ts_sz_ != ts_sz) {
s = Status::InvalidArgument("Default cf timestamp size mismatch");
}
}
} else if (b->default_cf_ts_sz_ > 0) {
ts_sz = b->default_cf_ts_sz_;
}
return std::make_tuple(s, cf_id, ts_sz);
}
namespace {
Status CheckColumnFamilyTimestampSize(ColumnFamilyHandle* column_family,
const Slice& ts) {
if (!column_family) {
return Status::InvalidArgument("column family handle cannot be null");
}
const Comparator* const ucmp = column_family->GetComparator();
assert(ucmp);
size_t cf_ts_sz = ucmp->timestamp_size();
if (0 == cf_ts_sz) {
return Status::InvalidArgument("timestamp disabled");
}
if (cf_ts_sz != ts.size()) {
return Status::InvalidArgument("timestamp size mismatch");
}
return Status::OK();
}
} // namespace
Status WriteBatchInternal::Put(WriteBatch* b, uint32_t column_family_id, Status WriteBatchInternal::Put(WriteBatch* b, uint32_t column_family_id,
const Slice& key, const Slice& value) { const Slice& key, const Slice& value) {
if (key.size() > size_t{port::kMaxUint32}) { if (key.size() > size_t{port::kMaxUint32}) {
@ -738,8 +783,40 @@ Status WriteBatchInternal::Put(WriteBatch* b, uint32_t column_family_id,
Status WriteBatch::Put(ColumnFamilyHandle* column_family, const Slice& key, Status WriteBatch::Put(ColumnFamilyHandle* column_family, const Slice& key,
const Slice& value) { const Slice& value) {
return WriteBatchInternal::Put(this, GetColumnFamilyID(column_family), key, size_t ts_sz = 0;
value); uint32_t cf_id = 0;
Status s;
std::tie(s, cf_id, ts_sz) =
WriteBatchInternal::GetColumnFamilyIdAndTimestampSize(this,
column_family);
if (!s.ok()) {
return s;
}
if (0 == ts_sz) {
return WriteBatchInternal::Put(this, cf_id, key, value);
}
needs_in_place_update_ts_ = true;
std::string dummy_ts(ts_sz, '\0');
std::array<Slice, 2> key_with_ts{{key, dummy_ts}};
return WriteBatchInternal::Put(this, cf_id, SliceParts(key_with_ts.data(), 2),
SliceParts(&value, 1));
}
Status WriteBatch::Put(ColumnFamilyHandle* column_family, const Slice& key,
const Slice& ts, const Slice& value) {
const Status s = CheckColumnFamilyTimestampSize(column_family, ts);
if (!s.ok()) {
return s;
}
assert(column_family);
uint32_t cf_id = column_family->GetID();
std::array<Slice, 2> key_with_ts{{key, ts}};
return WriteBatchInternal::Put(this, cf_id, SliceParts(key_with_ts.data(), 2),
SliceParts(&value, 1));
} }
Status WriteBatchInternal::CheckSlicePartsLength(const SliceParts& key, Status WriteBatchInternal::CheckSlicePartsLength(const SliceParts& key,
@ -794,8 +871,24 @@ Status WriteBatchInternal::Put(WriteBatch* b, uint32_t column_family_id,
Status WriteBatch::Put(ColumnFamilyHandle* column_family, const SliceParts& key, Status WriteBatch::Put(ColumnFamilyHandle* column_family, const SliceParts& key,
const SliceParts& value) { const SliceParts& value) {
return WriteBatchInternal::Put(this, GetColumnFamilyID(column_family), key, size_t ts_sz = 0;
value); uint32_t cf_id = 0;
Status s;
std::tie(s, cf_id, ts_sz) =
WriteBatchInternal::GetColumnFamilyIdAndTimestampSize(this,
column_family);
if (!s.ok()) {
return s;
}
if (ts_sz == 0) {
return WriteBatchInternal::Put(this, cf_id, key, value);
}
return Status::InvalidArgument(
"Cannot call this method on column family enabling timestamp");
} }
Status WriteBatchInternal::InsertNoop(WriteBatch* b) { Status WriteBatchInternal::InsertNoop(WriteBatch* b) {
@ -892,8 +985,40 @@ Status WriteBatchInternal::Delete(WriteBatch* b, uint32_t column_family_id,
} }
Status WriteBatch::Delete(ColumnFamilyHandle* column_family, const Slice& key) { Status WriteBatch::Delete(ColumnFamilyHandle* column_family, const Slice& key) {
return WriteBatchInternal::Delete(this, GetColumnFamilyID(column_family), size_t ts_sz = 0;
key); uint32_t cf_id = 0;
Status s;
std::tie(s, cf_id, ts_sz) =
WriteBatchInternal::GetColumnFamilyIdAndTimestampSize(this,
column_family);
if (!s.ok()) {
return s;
}
if (0 == ts_sz) {
return WriteBatchInternal::Delete(this, cf_id, key);
}
needs_in_place_update_ts_ = true;
std::string dummy_ts(ts_sz, '\0');
std::array<Slice, 2> key_with_ts{{key, dummy_ts}};
return WriteBatchInternal::Delete(this, cf_id,
SliceParts(key_with_ts.data(), 2));
}
Status WriteBatch::Delete(ColumnFamilyHandle* column_family, const Slice& key,
const Slice& ts) {
const Status s = CheckColumnFamilyTimestampSize(column_family, ts);
if (!s.ok()) {
return s;
}
assert(column_family);
uint32_t cf_id = column_family->GetID();
std::array<Slice, 2> key_with_ts{{key, ts}};
return WriteBatchInternal::Delete(this, cf_id,
SliceParts(key_with_ts.data(), 2));
} }
Status WriteBatchInternal::Delete(WriteBatch* b, uint32_t column_family_id, Status WriteBatchInternal::Delete(WriteBatch* b, uint32_t column_family_id,
@ -925,8 +1050,24 @@ Status WriteBatchInternal::Delete(WriteBatch* b, uint32_t column_family_id,
Status WriteBatch::Delete(ColumnFamilyHandle* column_family, Status WriteBatch::Delete(ColumnFamilyHandle* column_family,
const SliceParts& key) { const SliceParts& key) {
return WriteBatchInternal::Delete(this, GetColumnFamilyID(column_family), size_t ts_sz = 0;
key); uint32_t cf_id = 0;
Status s;
std::tie(s, cf_id, ts_sz) =
WriteBatchInternal::GetColumnFamilyIdAndTimestampSize(this,
column_family);
if (!s.ok()) {
return s;
}
if (0 == ts_sz) {
return WriteBatchInternal::Delete(this, cf_id, key);
}
return Status::InvalidArgument(
"Cannot call this method on column family enabling timestamp");
} }
Status WriteBatchInternal::SingleDelete(WriteBatch* b, Status WriteBatchInternal::SingleDelete(WriteBatch* b,
@ -957,8 +1098,40 @@ Status WriteBatchInternal::SingleDelete(WriteBatch* b,
Status WriteBatch::SingleDelete(ColumnFamilyHandle* column_family, Status WriteBatch::SingleDelete(ColumnFamilyHandle* column_family,
const Slice& key) { const Slice& key) {
return WriteBatchInternal::SingleDelete( size_t ts_sz = 0;
this, GetColumnFamilyID(column_family), key); uint32_t cf_id = 0;
Status s;
std::tie(s, cf_id, ts_sz) =
WriteBatchInternal::GetColumnFamilyIdAndTimestampSize(this,
column_family);
if (!s.ok()) {
return s;
}
if (0 == ts_sz) {
return WriteBatchInternal::SingleDelete(this, cf_id, key);
}
needs_in_place_update_ts_ = true;
std::string dummy_ts(ts_sz, '\0');
std::array<Slice, 2> key_with_ts{{key, dummy_ts}};
return WriteBatchInternal::SingleDelete(this, cf_id,
SliceParts(key_with_ts.data(), 2));
}
Status WriteBatch::SingleDelete(ColumnFamilyHandle* column_family,
const Slice& key, const Slice& ts) {
const Status s = CheckColumnFamilyTimestampSize(column_family, ts);
if (!s.ok()) {
return s;
}
assert(column_family);
uint32_t cf_id = column_family->GetID();
std::array<Slice, 2> key_with_ts{{key, ts}};
return WriteBatchInternal::SingleDelete(this, cf_id,
SliceParts(key_with_ts.data(), 2));
} }
Status WriteBatchInternal::SingleDelete(WriteBatch* b, Status WriteBatchInternal::SingleDelete(WriteBatch* b,
@ -992,8 +1165,24 @@ Status WriteBatchInternal::SingleDelete(WriteBatch* b,
Status WriteBatch::SingleDelete(ColumnFamilyHandle* column_family, Status WriteBatch::SingleDelete(ColumnFamilyHandle* column_family,
const SliceParts& key) { const SliceParts& key) {
return WriteBatchInternal::SingleDelete( size_t ts_sz = 0;
this, GetColumnFamilyID(column_family), key); uint32_t cf_id = 0;
Status s;
std::tie(s, cf_id, ts_sz) =
WriteBatchInternal::GetColumnFamilyIdAndTimestampSize(this,
column_family);
if (!s.ok()) {
return s;
}
if (0 == ts_sz) {
return WriteBatchInternal::SingleDelete(this, cf_id, key);
}
return Status::InvalidArgument(
"Cannot call this method on column family enabling timestamp");
} }
Status WriteBatchInternal::DeleteRange(WriteBatch* b, uint32_t column_family_id, Status WriteBatchInternal::DeleteRange(WriteBatch* b, uint32_t column_family_id,
@ -1026,8 +1215,24 @@ Status WriteBatchInternal::DeleteRange(WriteBatch* b, uint32_t column_family_id,
Status WriteBatch::DeleteRange(ColumnFamilyHandle* column_family, Status WriteBatch::DeleteRange(ColumnFamilyHandle* column_family,
const Slice& begin_key, const Slice& end_key) { const Slice& begin_key, const Slice& end_key) {
return WriteBatchInternal::DeleteRange(this, GetColumnFamilyID(column_family), size_t ts_sz = 0;
begin_key, end_key); uint32_t cf_id = 0;
Status s;
std::tie(s, cf_id, ts_sz) =
WriteBatchInternal::GetColumnFamilyIdAndTimestampSize(this,
column_family);
if (!s.ok()) {
return s;
}
if (0 == ts_sz) {
return WriteBatchInternal::DeleteRange(this, cf_id, begin_key, end_key);
}
return Status::InvalidArgument(
"Cannot call this method on column family enabling timestamp");
} }
Status WriteBatchInternal::DeleteRange(WriteBatch* b, uint32_t column_family_id, Status WriteBatchInternal::DeleteRange(WriteBatch* b, uint32_t column_family_id,
@ -1061,8 +1266,24 @@ Status WriteBatchInternal::DeleteRange(WriteBatch* b, uint32_t column_family_id,
Status WriteBatch::DeleteRange(ColumnFamilyHandle* column_family, Status WriteBatch::DeleteRange(ColumnFamilyHandle* column_family,
const SliceParts& begin_key, const SliceParts& begin_key,
const SliceParts& end_key) { const SliceParts& end_key) {
return WriteBatchInternal::DeleteRange(this, GetColumnFamilyID(column_family), size_t ts_sz = 0;
begin_key, end_key); uint32_t cf_id = 0;
Status s;
std::tie(s, cf_id, ts_sz) =
WriteBatchInternal::GetColumnFamilyIdAndTimestampSize(this,
column_family);
if (!s.ok()) {
return s;
}
if (0 == ts_sz) {
return WriteBatchInternal::DeleteRange(this, cf_id, begin_key, end_key);
}
return Status::InvalidArgument(
"Cannot call this method on column family enabling timestamp");
} }
Status WriteBatchInternal::Merge(WriteBatch* b, uint32_t column_family_id, Status WriteBatchInternal::Merge(WriteBatch* b, uint32_t column_family_id,
@ -1099,8 +1320,24 @@ Status WriteBatchInternal::Merge(WriteBatch* b, uint32_t column_family_id,
Status WriteBatch::Merge(ColumnFamilyHandle* column_family, const Slice& key, Status WriteBatch::Merge(ColumnFamilyHandle* column_family, const Slice& key,
const Slice& value) { const Slice& value) {
return WriteBatchInternal::Merge(this, GetColumnFamilyID(column_family), key, size_t ts_sz = 0;
value); uint32_t cf_id = 0;
Status s;
std::tie(s, cf_id, ts_sz) =
WriteBatchInternal::GetColumnFamilyIdAndTimestampSize(this,
column_family);
if (!s.ok()) {
return s;
}
if (0 == ts_sz) {
return WriteBatchInternal::Merge(this, cf_id, key, value);
}
return Status::InvalidArgument(
"Cannot call this method on column family enabling timestamp");
} }
Status WriteBatchInternal::Merge(WriteBatch* b, uint32_t column_family_id, Status WriteBatchInternal::Merge(WriteBatch* b, uint32_t column_family_id,
@ -1136,8 +1373,24 @@ Status WriteBatchInternal::Merge(WriteBatch* b, uint32_t column_family_id,
Status WriteBatch::Merge(ColumnFamilyHandle* column_family, Status WriteBatch::Merge(ColumnFamilyHandle* column_family,
const SliceParts& key, const SliceParts& value) { const SliceParts& key, const SliceParts& value) {
return WriteBatchInternal::Merge(this, GetColumnFamilyID(column_family), key, size_t ts_sz = 0;
value); uint32_t cf_id = 0;
Status s;
std::tie(s, cf_id, ts_sz) =
WriteBatchInternal::GetColumnFamilyIdAndTimestampSize(this,
column_family);
if (!s.ok()) {
return s;
}
if (0 == ts_sz) {
return WriteBatchInternal::Merge(this, cf_id, key, value);
}
return Status::InvalidArgument(
"Cannot call this method on column family enabling timestamp");
} }
Status WriteBatchInternal::PutBlobIndex(WriteBatch* b, Status WriteBatchInternal::PutBlobIndex(WriteBatch* b,
@ -1223,18 +1476,15 @@ Status WriteBatch::PopSavePoint() {
return Status::OK(); return Status::OK();
} }
Status WriteBatch::AssignTimestamp( Status WriteBatch::UpdateTimestamps(
const Slice& ts, std::function<Status(uint32_t, size_t&)> checker) { const Slice& ts, std::function<size_t(uint32_t)> ts_sz_func) {
TimestampAssigner ts_assigner(prot_info_.get(), std::move(checker), ts); TimestampUpdater<decltype(ts_sz_func)> ts_updater(prot_info_.get(),
return Iterate(&ts_assigner); std::move(ts_sz_func), ts);
} const Status s = Iterate(&ts_updater);
if (s.ok()) {
Status WriteBatch::AssignTimestamps( needs_in_place_update_ts_ = false;
const std::vector<Slice>& ts_list, }
std::function<Status(uint32_t, size_t&)> checker) { return s;
SimpleListTimestampAssigner ts_assigner(prot_info_.get(), std::move(checker),
ts_list);
return Iterate(&ts_assigner);
} }
class MemTableInserter : public WriteBatch::Handler { class MemTableInserter : public WriteBatch::Handler {
@ -2189,24 +2439,20 @@ class MemTableInserter : public WriteBatch::Handler {
const auto& batch_info = trx->batches_.begin()->second; const auto& batch_info = trx->batches_.begin()->second;
// all inserts must reference this trx log number // all inserts must reference this trx log number
log_number_ref_ = batch_info.log_number_; log_number_ref_ = batch_info.log_number_;
const auto checker = [this](uint32_t cf, size_t& ts_sz) {
assert(db_); s = batch_info.batch_->UpdateTimestamps(
VersionSet* const vset = db_->GetVersionSet(); commit_ts, [this](uint32_t cf) {
assert(vset); assert(db_);
ColumnFamilySet* const cf_set = vset->GetColumnFamilySet(); VersionSet* const vset = db_->GetVersionSet();
assert(cf_set); assert(vset);
ColumnFamilyData* cfd = cf_set->GetColumnFamily(cf); ColumnFamilySet* const cf_set = vset->GetColumnFamilySet();
assert(cfd); assert(cf_set);
const auto* const ucmp = cfd->user_comparator(); ColumnFamilyData* cfd = cf_set->GetColumnFamily(cf);
assert(ucmp); assert(cfd);
if (ucmp->timestamp_size() == 0) { const auto* const ucmp = cfd->user_comparator();
ts_sz = 0; assert(ucmp);
} else if (ucmp->timestamp_size() != ts_sz) { return ucmp->timestamp_size();
return Status::InvalidArgument("Timestamp size mismatch"); });
}
return Status::OK();
};
s = batch_info.batch_->AssignTimestamp(commit_ts, checker);
if (s.ok()) { if (s.ok()) {
s = batch_info.batch_->Iterate(this); s = batch_info.batch_->Iterate(this);
log_number_ref_ = 0; log_number_ref_ = 0;

@ -79,7 +79,7 @@ class WriteBatchInternal {
public: public:
// WriteBatch header has an 8-byte sequence number followed by a 4-byte count. // WriteBatch header has an 8-byte sequence number followed by a 4-byte count.
static const size_t kHeader = 12; static constexpr size_t kHeader = 12;
// WriteBatch methods with column_family_id instead of ColumnFamilyHandle* // WriteBatch methods with column_family_id instead of ColumnFamilyHandle*
static Status Put(WriteBatch* batch, uint32_t column_family_id, static Status Put(WriteBatch* batch, uint32_t column_family_id,
@ -221,6 +221,13 @@ class WriteBatchInternal {
// state meant to be used only during recovery. // state meant to be used only during recovery.
static void SetAsLatestPersistentState(WriteBatch* b); static void SetAsLatestPersistentState(WriteBatch* b);
static bool IsLatestPersistentState(const WriteBatch* b); static bool IsLatestPersistentState(const WriteBatch* b);
static std::tuple<Status, uint32_t, size_t> GetColumnFamilyIdAndTimestampSize(
WriteBatch* b, ColumnFamilyHandle* column_family);
static bool TimestampsUpdateNeeded(const WriteBatch& wb) {
return wb.needs_in_place_update_ts_;
}
}; };
// LocalSavePoint is similar to a scope guard // LocalSavePoint is similar to a scope guard
@ -265,39 +272,42 @@ class LocalSavePoint {
#endif #endif
}; };
template <typename Derived> template <typename TimestampSizeFuncType>
class TimestampAssignerBase : public WriteBatch::Handler { class TimestampUpdater : public WriteBatch::Handler {
public: public:
explicit TimestampAssignerBase( explicit TimestampUpdater(WriteBatch::ProtectionInfo* prot_info,
WriteBatch::ProtectionInfo* prot_info, TimestampSizeFuncType&& ts_sz_func, const Slice& ts)
std::function<Status(uint32_t, size_t&)>&& checker) : prot_info_(prot_info),
: prot_info_(prot_info), checker_(std::move(checker)) {} ts_sz_func_(std::move(ts_sz_func)),
timestamp_(ts) {
assert(!timestamp_.empty());
}
~TimestampAssignerBase() override {} ~TimestampUpdater() override {}
Status PutCF(uint32_t cf, const Slice& key, const Slice&) override { Status PutCF(uint32_t cf, const Slice& key, const Slice&) override {
return AssignTimestamp(cf, key); return UpdateTimestamp(cf, key);
} }
Status DeleteCF(uint32_t cf, const Slice& key) override { Status DeleteCF(uint32_t cf, const Slice& key) override {
return AssignTimestamp(cf, key); return UpdateTimestamp(cf, key);
} }
Status SingleDeleteCF(uint32_t cf, const Slice& key) override { Status SingleDeleteCF(uint32_t cf, const Slice& key) override {
return AssignTimestamp(cf, key); return UpdateTimestamp(cf, key);
} }
Status DeleteRangeCF(uint32_t cf, const Slice& begin_key, Status DeleteRangeCF(uint32_t cf, const Slice& begin_key,
const Slice&) override { const Slice&) override {
return AssignTimestamp(cf, begin_key); return UpdateTimestamp(cf, begin_key);
} }
Status MergeCF(uint32_t cf, const Slice& key, const Slice&) override { Status MergeCF(uint32_t cf, const Slice& key, const Slice&) override {
return AssignTimestamp(cf, key); return UpdateTimestamp(cf, key);
} }
Status PutBlobIndexCF(uint32_t cf, const Slice& key, const Slice&) override { Status PutBlobIndexCF(uint32_t cf, const Slice& key, const Slice&) override {
return AssignTimestamp(cf, key); return UpdateTimestamp(cf, key);
} }
Status MarkBeginPrepare(bool) override { return Status::OK(); } Status MarkBeginPrepare(bool) override { return Status::OK(); }
@ -314,25 +324,32 @@ class TimestampAssignerBase : public WriteBatch::Handler {
Status MarkNoop(bool /*empty_batch*/) override { return Status::OK(); } Status MarkNoop(bool /*empty_batch*/) override { return Status::OK(); }
protected: private:
Status AssignTimestamp(uint32_t cf, const Slice& key) { Status UpdateTimestamp(uint32_t cf, const Slice& key) {
Status s = static_cast_with_check<Derived>(this)->AssignTimestampImpl( Status s = UpdateTimestampImpl(cf, key, idx_);
cf, key, idx_);
++idx_; ++idx_;
return s; return s;
} }
Status CheckTimestampSize(uint32_t cf, size_t& ts_sz) { Status UpdateTimestampImpl(uint32_t cf, const Slice& key, size_t /*idx*/) {
return checker_(cf, ts_sz); if (timestamp_.empty()) {
} return Status::InvalidArgument("Timestamp is empty");
}
Status UpdateTimestampIfNeeded(size_t ts_sz, const Slice& key, size_t cf_ts_sz = ts_sz_func_(cf);
const Slice& ts) { if (0 == cf_ts_sz) {
if (ts_sz > 0) { // Skip this column family.
assert(ts_sz == ts.size()); return Status::OK();
UpdateProtectionInformationIfNeeded(key, ts); } else if (std::numeric_limits<size_t>::max() == cf_ts_sz) {
UpdateTimestamp(key, ts); // Column family timestamp info not found.
return Status::NotFound();
} else if (cf_ts_sz != timestamp_.size()) {
return Status::InvalidArgument("timestamp size mismatch");
} }
UpdateProtectionInformationIfNeeded(key, timestamp_);
char* ptr = const_cast<char*>(key.data() + key.size() - cf_ts_sz);
assert(ptr);
memcpy(ptr, timestamp_.data(), timestamp_.size());
return Status::OK(); return Status::OK();
} }
@ -347,83 +364,16 @@ class TimestampAssignerBase : public WriteBatch::Handler {
} }
} }
void UpdateTimestamp(const Slice& key, const Slice& ts) {
const size_t ts_sz = ts.size();
char* ptr = const_cast<char*>(key.data() + key.size() - ts_sz);
assert(ptr);
memcpy(ptr, ts.data(), ts_sz);
}
// No copy or move. // No copy or move.
TimestampAssignerBase(const TimestampAssignerBase&) = delete; TimestampUpdater(const TimestampUpdater&) = delete;
TimestampAssignerBase(TimestampAssignerBase&&) = delete; TimestampUpdater(TimestampUpdater&&) = delete;
TimestampAssignerBase& operator=(const TimestampAssignerBase&) = delete; TimestampUpdater& operator=(const TimestampUpdater&) = delete;
TimestampAssignerBase& operator=(TimestampAssignerBase&&) = delete; TimestampUpdater& operator=(TimestampUpdater&&) = delete;
WriteBatch::ProtectionInfo* const prot_info_ = nullptr; WriteBatch::ProtectionInfo* const prot_info_ = nullptr;
const std::function<Status(uint32_t, size_t&)> checker_{}; const TimestampSizeFuncType ts_sz_func_{};
size_t idx_ = 0;
};
class SimpleListTimestampAssigner
: public TimestampAssignerBase<SimpleListTimestampAssigner> {
public:
explicit SimpleListTimestampAssigner(
WriteBatch::ProtectionInfo* prot_info,
std::function<Status(uint32_t, size_t&)>&& checker,
const std::vector<Slice>& timestamps)
: TimestampAssignerBase<SimpleListTimestampAssigner>(prot_info,
std::move(checker)),
timestamps_(timestamps) {}
~SimpleListTimestampAssigner() override {}
private:
friend class TimestampAssignerBase<SimpleListTimestampAssigner>;
Status AssignTimestampImpl(uint32_t cf, const Slice& key, size_t idx) {
if (idx >= timestamps_.size()) {
return Status::InvalidArgument("Need more timestamps for the assignment");
}
const Slice& ts = timestamps_[idx];
size_t ts_sz = ts.size();
const Status s = this->CheckTimestampSize(cf, ts_sz);
if (!s.ok()) {
return s;
}
return this->UpdateTimestampIfNeeded(ts_sz, key, ts);
}
const std::vector<Slice>& timestamps_;
};
class TimestampAssigner : public TimestampAssignerBase<TimestampAssigner> {
public:
explicit TimestampAssigner(WriteBatch::ProtectionInfo* prot_info,
std::function<Status(uint32_t, size_t&)>&& checker,
const Slice& ts)
: TimestampAssignerBase<TimestampAssigner>(prot_info, std::move(checker)),
timestamp_(ts) {
assert(!timestamp_.empty());
}
~TimestampAssigner() override {}
private:
friend class TimestampAssignerBase<TimestampAssigner>;
Status AssignTimestampImpl(uint32_t cf, const Slice& key, size_t /*idx*/) {
if (timestamp_.empty()) {
return Status::InvalidArgument("Timestamp is empty");
}
size_t ts_sz = timestamp_.size();
const Status s = this->CheckTimestampSize(cf, ts_sz);
if (!s.ok()) {
return s;
}
return this->UpdateTimestampIfNeeded(ts_sz, key, timestamp_);
}
const Slice timestamp_; const Slice timestamp_;
size_t idx_ = 0;
}; };
} // namespace ROCKSDB_NAMESPACE } // namespace ROCKSDB_NAMESPACE

@ -949,7 +949,48 @@ Status CheckTimestampsInWriteBatch(
} }
} // namespace } // namespace
TEST_F(WriteBatchTest, AssignTimestamps) { TEST_F(WriteBatchTest, SanityChecks) {
ColumnFamilyHandleImplDummy cf0(0, test::ComparatorWithU64Ts());
ColumnFamilyHandleImplDummy cf4(4);
WriteBatch wb(0, 0, 0, /*default_cf_ts_sz=*/sizeof(uint64_t));
// Sanity checks for the new WriteBatch APIs with extra 'ts' arg.
ASSERT_TRUE(wb.Put(nullptr, "key", "ts", "value").IsInvalidArgument());
ASSERT_TRUE(wb.Delete(nullptr, "key", "ts").IsInvalidArgument());
ASSERT_TRUE(wb.SingleDelete(nullptr, "key", "ts").IsInvalidArgument());
ASSERT_TRUE(wb.Merge(nullptr, "key", "ts", "value").IsNotSupported());
ASSERT_TRUE(
wb.DeleteRange(nullptr, "begin_key", "end_key", "ts").IsNotSupported());
ASSERT_TRUE(wb.Put(&cf4, "key", "ts", "value").IsInvalidArgument());
ASSERT_TRUE(wb.Delete(&cf4, "key", "ts").IsInvalidArgument());
ASSERT_TRUE(wb.SingleDelete(&cf4, "key", "ts").IsInvalidArgument());
ASSERT_TRUE(wb.Merge(&cf4, "key", "ts", "value").IsNotSupported());
ASSERT_TRUE(
wb.DeleteRange(&cf4, "begin_key", "end_key", "ts").IsNotSupported());
constexpr size_t wrong_ts_sz = 1 + sizeof(uint64_t);
std::string ts(wrong_ts_sz, '\0');
ASSERT_TRUE(wb.Put(&cf0, "key", ts, "value").IsInvalidArgument());
ASSERT_TRUE(wb.Delete(&cf0, "key", ts).IsInvalidArgument());
ASSERT_TRUE(wb.SingleDelete(&cf0, "key", ts).IsInvalidArgument());
ASSERT_TRUE(wb.Merge(&cf0, "key", ts, "value").IsNotSupported());
ASSERT_TRUE(
wb.DeleteRange(&cf0, "begin_key", "end_key", ts).IsNotSupported());
// Sanity checks for the new WriteBatch APIs without extra 'ts' arg.
WriteBatch wb1(0, 0, 0, wrong_ts_sz);
ASSERT_TRUE(wb1.Put(&cf0, "key", "value").IsInvalidArgument());
ASSERT_TRUE(wb1.Delete(&cf0, "key").IsInvalidArgument());
ASSERT_TRUE(wb1.SingleDelete(&cf0, "key").IsInvalidArgument());
ASSERT_TRUE(wb1.Merge(&cf0, "key", "value").IsInvalidArgument());
ASSERT_TRUE(
wb1.DeleteRange(&cf0, "begin_key", "end_key").IsInvalidArgument());
}
TEST_F(WriteBatchTest, UpdateTimestamps) {
// We assume the last eight bytes of each key is reserved for timestamps. // We assume the last eight bytes of each key is reserved for timestamps.
// Therefore, we must make sure each key is longer than eight bytes. // Therefore, we must make sure each key is longer than eight bytes.
constexpr size_t key_size = 16; constexpr size_t key_size = 16;
@ -974,21 +1015,17 @@ TEST_F(WriteBatchTest, AssignTimestamps) {
} }
static constexpr size_t timestamp_size = sizeof(uint64_t); static constexpr size_t timestamp_size = sizeof(uint64_t);
const auto checker1 = [](uint32_t cf, size_t& ts_sz) { const auto checker1 = [](uint32_t cf) {
if (cf == 4 || cf == 5) { if (cf == 4 || cf == 5) {
if (ts_sz != timestamp_size) { return timestamp_size;
return Status::InvalidArgument("Timestamp size mismatch");
}
} else if (cf == 0) { } else if (cf == 0) {
ts_sz = 0; return static_cast<size_t>(0);
return Status::OK();
} else { } else {
return Status::Corruption("Invalid cf"); return std::numeric_limits<size_t>::max();
} }
return Status::OK();
}; };
ASSERT_OK( ASSERT_OK(
batch.AssignTimestamp(std::string(timestamp_size, '\xfe'), checker1)); batch.UpdateTimestamps(std::string(timestamp_size, '\xfe'), checker1));
ASSERT_OK(CheckTimestampsInWriteBatch( ASSERT_OK(CheckTimestampsInWriteBatch(
batch, std::string(timestamp_size, '\xfe'), cf_to_ucmps)); batch, std::string(timestamp_size, '\xfe'), cf_to_ucmps));
@ -1001,65 +1038,30 @@ TEST_F(WriteBatchTest, AssignTimestamps) {
// mapping from cf to user comparators. If indexing is disabled, a transaction // mapping from cf to user comparators. If indexing is disabled, a transaction
// writes directly to the underlying raw WriteBatch. We will need to track the // writes directly to the underlying raw WriteBatch. We will need to track the
// comparator information for the column families to which un-indexed writes // comparator information for the column families to which un-indexed writes
// are performed. When calling AssignTimestamp(s) API of WriteBatch, we need // are performed. When calling UpdateTimestamp API of WriteBatch, we need
// indexed_cf_to_ucmps, non_indexed_cfs_with_ts, and timestamp_size to perform // indexed_cf_to_ucmps, non_indexed_cfs_with_ts, and timestamp_size to perform
// checking. // checking.
std::unordered_map<uint32_t, const Comparator*> indexed_cf_to_ucmps = { std::unordered_map<uint32_t, const Comparator*> indexed_cf_to_ucmps = {
{0, cf0.GetComparator()}, {4, cf4.GetComparator()}}; {0, cf0.GetComparator()}, {4, cf4.GetComparator()}};
std::unordered_set<uint32_t> non_indexed_cfs_with_ts = {cf5.GetID()}; std::unordered_set<uint32_t> non_indexed_cfs_with_ts = {cf5.GetID()};
const auto checker2 = [&indexed_cf_to_ucmps, &non_indexed_cfs_with_ts]( const auto checker2 = [&indexed_cf_to_ucmps,
uint32_t cf, size_t& ts_sz) { &non_indexed_cfs_with_ts](uint32_t cf) {
if (non_indexed_cfs_with_ts.count(cf) > 0) { if (non_indexed_cfs_with_ts.count(cf) > 0) {
if (ts_sz != timestamp_size) { return timestamp_size;
return Status::InvalidArgument("Timestamp size mismatch");
}
return Status::OK();
} }
auto cf_iter = indexed_cf_to_ucmps.find(cf); auto cf_iter = indexed_cf_to_ucmps.find(cf);
if (cf_iter == indexed_cf_to_ucmps.end()) { if (cf_iter == indexed_cf_to_ucmps.end()) {
return Status::Corruption("Unknown cf"); assert(false);
return std::numeric_limits<size_t>::max();
} }
const Comparator* const ucmp = cf_iter->second; const Comparator* const ucmp = cf_iter->second;
assert(ucmp); assert(ucmp);
if (ucmp->timestamp_size() == 0) { return ucmp->timestamp_size();
ts_sz = 0;
} else if (ts_sz != ucmp->timestamp_size()) {
return Status::InvalidArgument("Timestamp size mismatch");
}
return Status::OK();
}; };
ASSERT_OK( ASSERT_OK(
batch.AssignTimestamp(std::string(timestamp_size, '\xef'), checker2)); batch.UpdateTimestamps(std::string(timestamp_size, '\xef'), checker2));
ASSERT_OK(CheckTimestampsInWriteBatch( ASSERT_OK(CheckTimestampsInWriteBatch(
batch, std::string(timestamp_size, '\xef'), cf_to_ucmps)); batch, std::string(timestamp_size, '\xef'), cf_to_ucmps));
std::vector<std::string> ts_strs;
for (size_t i = 0; i < 3 * key_strs.size(); ++i) {
if (0 == (i % 3)) {
ts_strs.emplace_back();
} else {
ts_strs.emplace_back(std::string(timestamp_size, '\xee'));
}
}
std::vector<Slice> ts_vec(ts_strs.size());
for (size_t i = 0; i < ts_vec.size(); ++i) {
ts_vec[i] = ts_strs[i];
}
const auto checker3 = [&cf_to_ucmps](uint32_t cf, size_t& ts_sz) {
auto cf_iter = cf_to_ucmps.find(cf);
if (cf_iter == cf_to_ucmps.end()) {
return Status::Corruption("Invalid cf");
}
const Comparator* const ucmp = cf_iter->second;
assert(ucmp);
if (ucmp->timestamp_size() != ts_sz) {
return Status::InvalidArgument("Timestamp size mismatch");
}
return Status::OK();
};
ASSERT_OK(batch.AssignTimestamps(ts_vec, checker3));
ASSERT_OK(CheckTimestampsInWriteBatch(
batch, std::string(timestamp_size, '\xee'), cf_to_ucmps));
} }
TEST_F(WriteBatchTest, CommitWithTimestamp) { TEST_F(WriteBatchTest, CommitWithTimestamp) {

@ -34,7 +34,8 @@ class BatchedOpsStressTest : public StressTest {
std::string values[10] = {"9", "8", "7", "6", "5", "4", "3", "2", "1", "0"}; std::string values[10] = {"9", "8", "7", "6", "5", "4", "3", "2", "1", "0"};
Slice value_slices[10]; Slice value_slices[10];
WriteBatch batch(0 /* reserved_bytes */, 0 /* max_bytes */, WriteBatch batch(0 /* reserved_bytes */, 0 /* max_bytes */,
FLAGS_batch_protection_bytes_per_key); FLAGS_batch_protection_bytes_per_key,
FLAGS_user_timestamp_size);
Status s; Status s;
auto cfh = column_families_[rand_column_families[0]]; auto cfh = column_families_[rand_column_families[0]];
std::string key_str = Key(rand_keys[0]); std::string key_str = Key(rand_keys[0]);
@ -70,7 +71,8 @@ class BatchedOpsStressTest : public StressTest {
std::string keys[10] = {"9", "7", "5", "3", "1", "8", "6", "4", "2", "0"}; std::string keys[10] = {"9", "7", "5", "3", "1", "8", "6", "4", "2", "0"};
WriteBatch batch(0 /* reserved_bytes */, 0 /* max_bytes */, WriteBatch batch(0 /* reserved_bytes */, 0 /* max_bytes */,
FLAGS_batch_protection_bytes_per_key); FLAGS_batch_protection_bytes_per_key,
FLAGS_user_timestamp_size);
Status s; Status s;
auto cfh = column_families_[rand_column_families[0]]; auto cfh = column_families_[rand_column_families[0]];
std::string key_str = Key(rand_keys[0]); std::string key_str = Key(rand_keys[0]);

@ -513,9 +513,10 @@ void StressTest::PreloadDbAndReopenAsReadOnly(int64_t number_of_keys,
if (FLAGS_user_timestamp_size > 0) { if (FLAGS_user_timestamp_size > 0) {
ts_str = NowNanosStr(); ts_str = NowNanosStr();
ts = ts_str; ts = ts_str;
write_opts.timestamp = &ts; s = db_->Put(write_opts, cfh, key, ts, v);
} else {
s = db_->Put(write_opts, cfh, key, v);
} }
s = db_->Put(write_opts, cfh, key, v);
} else { } else {
#ifndef ROCKSDB_LITE #ifndef ROCKSDB_LITE
Transaction* txn; Transaction* txn;
@ -878,7 +879,6 @@ void StressTest::OperateDb(ThreadState* thread) {
read_opts.timestamp = &read_ts; read_opts.timestamp = &read_ts;
write_ts_str = NowNanosStr(); write_ts_str = NowNanosStr();
write_ts = write_ts_str; write_ts = write_ts_str;
write_opts.timestamp = &write_ts;
} }
int prob_op = thread->rand.Uniform(100); int prob_op = thread->rand.Uniform(100);

@ -500,9 +500,12 @@ class NonBatchedOpsStressTest : public StressTest {
if (FLAGS_user_timestamp_size > 0) { if (FLAGS_user_timestamp_size > 0) {
write_ts_str = NowNanosStr(); write_ts_str = NowNanosStr();
write_ts = write_ts_str; write_ts = write_ts_str;
write_opts.timestamp = &write_ts;
} }
} }
if (write_ts.size() == 0 && FLAGS_user_timestamp_size) {
write_ts_str = NowNanosStr();
write_ts = write_ts_str;
}
std::string key_str = Key(rand_key); std::string key_str = Key(rand_key);
Slice key = key_str; Slice key = key_str;
@ -540,7 +543,11 @@ class NonBatchedOpsStressTest : public StressTest {
} }
} else { } else {
if (!FLAGS_use_txn) { if (!FLAGS_use_txn) {
s = db_->Put(write_opts, cfh, key, v); if (FLAGS_user_timestamp_size == 0) {
s = db_->Put(write_opts, cfh, key, v);
} else {
s = db_->Put(write_opts, cfh, key, write_ts, v);
}
} else { } else {
#ifndef ROCKSDB_LITE #ifndef ROCKSDB_LITE
Transaction* txn; Transaction* txn;
@ -599,9 +606,12 @@ class NonBatchedOpsStressTest : public StressTest {
if (FLAGS_user_timestamp_size > 0) { if (FLAGS_user_timestamp_size > 0) {
write_ts_str = NowNanosStr(); write_ts_str = NowNanosStr();
write_ts = write_ts_str; write_ts = write_ts_str;
write_opts.timestamp = &write_ts;
} }
} }
if (write_ts.size() == 0 && FLAGS_user_timestamp_size) {
write_ts_str = NowNanosStr();
write_ts = write_ts_str;
}
std::string key_str = Key(rand_key); std::string key_str = Key(rand_key);
Slice key = key_str; Slice key = key_str;
@ -613,7 +623,11 @@ class NonBatchedOpsStressTest : public StressTest {
if (shared->AllowsOverwrite(rand_key)) { if (shared->AllowsOverwrite(rand_key)) {
shared->Delete(rand_column_family, rand_key, true /* pending */); shared->Delete(rand_column_family, rand_key, true /* pending */);
if (!FLAGS_use_txn) { if (!FLAGS_use_txn) {
s = db_->Delete(write_opts, cfh, key); if (FLAGS_user_timestamp_size == 0) {
s = db_->Delete(write_opts, cfh, key);
} else {
s = db_->Delete(write_opts, cfh, key, write_ts);
}
} else { } else {
#ifndef ROCKSDB_LITE #ifndef ROCKSDB_LITE
Transaction* txn; Transaction* txn;
@ -646,7 +660,11 @@ class NonBatchedOpsStressTest : public StressTest {
} else { } else {
shared->SingleDelete(rand_column_family, rand_key, true /* pending */); shared->SingleDelete(rand_column_family, rand_key, true /* pending */);
if (!FLAGS_use_txn) { if (!FLAGS_use_txn) {
s = db_->SingleDelete(write_opts, cfh, key); if (FLAGS_user_timestamp_size == 0) {
s = db_->SingleDelete(write_opts, cfh, key);
} else {
s = db_->SingleDelete(write_opts, cfh, key, write_ts);
}
} else { } else {
#ifndef ROCKSDB_LITE #ifndef ROCKSDB_LITE
Transaction* txn; Transaction* txn;

@ -356,10 +356,17 @@ class DB {
virtual Status Put(const WriteOptions& options, virtual Status Put(const WriteOptions& options,
ColumnFamilyHandle* column_family, const Slice& key, ColumnFamilyHandle* column_family, const Slice& key,
const Slice& value) = 0; const Slice& value) = 0;
virtual Status Put(const WriteOptions& options,
ColumnFamilyHandle* column_family, const Slice& key,
const Slice& ts, const Slice& value) = 0;
virtual Status Put(const WriteOptions& options, const Slice& key, virtual Status Put(const WriteOptions& options, const Slice& key,
const Slice& value) { const Slice& value) {
return Put(options, DefaultColumnFamily(), key, value); return Put(options, DefaultColumnFamily(), key, value);
} }
virtual Status Put(const WriteOptions& options, const Slice& key,
const Slice& ts, const Slice& value) {
return Put(options, DefaultColumnFamily(), key, ts, value);
}
// Remove the database entry (if any) for "key". Returns OK on // Remove the database entry (if any) for "key". Returns OK on
// success, and a non-OK status on error. It is not an error if "key" // success, and a non-OK status on error. It is not an error if "key"
@ -368,9 +375,16 @@ class DB {
virtual Status Delete(const WriteOptions& options, virtual Status Delete(const WriteOptions& options,
ColumnFamilyHandle* column_family, ColumnFamilyHandle* column_family,
const Slice& key) = 0; const Slice& key) = 0;
virtual Status Delete(const WriteOptions& options,
ColumnFamilyHandle* column_family, const Slice& key,
const Slice& ts) = 0;
virtual Status Delete(const WriteOptions& options, const Slice& key) { virtual Status Delete(const WriteOptions& options, const Slice& key) {
return Delete(options, DefaultColumnFamily(), key); return Delete(options, DefaultColumnFamily(), key);
} }
virtual Status Delete(const WriteOptions& options, const Slice& key,
const Slice& ts) {
return Delete(options, DefaultColumnFamily(), key, ts);
}
// Remove the database entry for "key". Requires that the key exists // Remove the database entry for "key". Requires that the key exists
// and was not overwritten. Returns OK on success, and a non-OK status // and was not overwritten. Returns OK on success, and a non-OK status
@ -391,9 +405,16 @@ class DB {
virtual Status SingleDelete(const WriteOptions& options, virtual Status SingleDelete(const WriteOptions& options,
ColumnFamilyHandle* column_family, ColumnFamilyHandle* column_family,
const Slice& key) = 0; const Slice& key) = 0;
virtual Status SingleDelete(const WriteOptions& options,
ColumnFamilyHandle* column_family,
const Slice& key, const Slice& ts) = 0;
virtual Status SingleDelete(const WriteOptions& options, const Slice& key) { virtual Status SingleDelete(const WriteOptions& options, const Slice& key) {
return SingleDelete(options, DefaultColumnFamily(), key); return SingleDelete(options, DefaultColumnFamily(), key);
} }
virtual Status SingleDelete(const WriteOptions& options, const Slice& key,
const Slice& ts) {
return SingleDelete(options, DefaultColumnFamily(), key, ts);
}
// Removes the database entries in the range ["begin_key", "end_key"), i.e., // Removes the database entries in the range ["begin_key", "end_key"), i.e.,
// including "begin_key" and excluding "end_key". Returns OK on success, and // including "begin_key" and excluding "end_key". Returns OK on success, and
@ -416,6 +437,13 @@ class DB {
virtual Status DeleteRange(const WriteOptions& options, virtual Status DeleteRange(const WriteOptions& options,
ColumnFamilyHandle* column_family, ColumnFamilyHandle* column_family,
const Slice& begin_key, const Slice& end_key); const Slice& begin_key, const Slice& end_key);
virtual Status DeleteRange(const WriteOptions& /*options*/,
ColumnFamilyHandle* /*column_family*/,
const Slice& /*begin_key*/,
const Slice& /*end_key*/, const Slice& /*ts*/) {
return Status::NotSupported(
"DeleteRange does not support user-defined timestamp yet");
}
// Merge the database entry for "key" with "value". Returns OK on success, // Merge the database entry for "key" with "value". Returns OK on success,
// and a non-OK status on error. The semantics of this operation is // and a non-OK status on error. The semantics of this operation is
@ -428,6 +456,13 @@ class DB {
const Slice& value) { const Slice& value) {
return Merge(options, DefaultColumnFamily(), key, value); return Merge(options, DefaultColumnFamily(), key, value);
} }
virtual Status Merge(const WriteOptions& /*options*/,
ColumnFamilyHandle* /*column_family*/,
const Slice& /*key*/, const Slice& /*ts*/,
const Slice& /*value*/) {
return Status::NotSupported(
"Merge does not support user-defined timestamp yet");
}
// Apply the specified updates to the database. // Apply the specified updates to the database.
// If `updates` contains no update, WAL will still be synced if // If `updates` contains no update, WAL will still be synced if

@ -1661,25 +1661,13 @@ struct WriteOptions {
// Default: false // Default: false
bool memtable_insert_hint_per_batch; bool memtable_insert_hint_per_batch;
// Timestamp of write operation, e.g. Put. All timestamps of the same
// database must share the same length and format. The user is also
// responsible for providing a customized compare function via Comparator to
// order <key, timestamp> tuples. If the user wants to enable timestamp, then
// all write operations must be associated with timestamp because RocksDB, as
// a single-node storage engine currently has no knowledge of global time,
// thus has to rely on the application.
// The user-specified timestamp feature is still under active development,
// and the API is subject to change.
const Slice* timestamp;
WriteOptions() WriteOptions()
: sync(false), : sync(false),
disableWAL(false), disableWAL(false),
ignore_missing_column_families(false), ignore_missing_column_families(false),
no_slowdown(false), no_slowdown(false),
low_pri(false), low_pri(false),
memtable_insert_hint_per_batch(false), memtable_insert_hint_per_batch(false) {}
timestamp(nullptr) {}
}; };
// Options that control flush operations // Options that control flush operations

@ -80,6 +80,10 @@ class StackableDB : public DB {
const Slice& val) override { const Slice& val) override {
return db_->Put(options, column_family, key, val); return db_->Put(options, column_family, key, val);
} }
Status Put(const WriteOptions& options, ColumnFamilyHandle* column_family,
const Slice& key, const Slice& ts, const Slice& val) override {
return db_->Put(options, column_family, key, ts, val);
}
using DB::Get; using DB::Get;
virtual Status Get(const ReadOptions& options, virtual Status Get(const ReadOptions& options,
@ -166,6 +170,10 @@ class StackableDB : public DB {
const Slice& key) override { const Slice& key) override {
return db_->Delete(wopts, column_family, key); return db_->Delete(wopts, column_family, key);
} }
Status Delete(const WriteOptions& wopts, ColumnFamilyHandle* column_family,
const Slice& key, const Slice& ts) override {
return db_->Delete(wopts, column_family, key, ts);
}
using DB::SingleDelete; using DB::SingleDelete;
virtual Status SingleDelete(const WriteOptions& wopts, virtual Status SingleDelete(const WriteOptions& wopts,
@ -173,6 +181,18 @@ class StackableDB : public DB {
const Slice& key) override { const Slice& key) override {
return db_->SingleDelete(wopts, column_family, key); return db_->SingleDelete(wopts, column_family, key);
} }
Status SingleDelete(const WriteOptions& wopts,
ColumnFamilyHandle* column_family, const Slice& key,
const Slice& ts) override {
return db_->SingleDelete(wopts, column_family, key, ts);
}
using DB::DeleteRange;
Status DeleteRange(const WriteOptions& wopts,
ColumnFamilyHandle* column_family, const Slice& start_key,
const Slice& end_key) override {
return db_->DeleteRange(wopts, column_family, start_key, end_key);
}
using DB::Merge; using DB::Merge;
virtual Status Merge(const WriteOptions& options, virtual Status Merge(const WriteOptions& options,

@ -371,6 +371,7 @@ class TransactionDB : public StackableDB {
// used and `skip_concurrency_control` must be set. When using either // used and `skip_concurrency_control` must be set. When using either
// WRITE_PREPARED or WRITE_UNPREPARED , `skip_duplicate_key_check` must // WRITE_PREPARED or WRITE_UNPREPARED , `skip_duplicate_key_check` must
// additionally be set. // additionally be set.
using StackableDB::DeleteRange;
virtual Status DeleteRange(const WriteOptions&, ColumnFamilyHandle*, virtual Status DeleteRange(const WriteOptions&, ColumnFamilyHandle*,
const Slice&, const Slice&) override { const Slice&, const Slice&) override {
return Status::NotSupported(); return Status::NotSupported();

@ -110,20 +110,32 @@ class WriteBatchWithIndex : public WriteBatchBase {
Status Put(const Slice& key, const Slice& value) override; Status Put(const Slice& key, const Slice& value) override;
Status Put(ColumnFamilyHandle* column_family, const Slice& key,
const Slice& ts, const Slice& value) override;
using WriteBatchBase::Merge; using WriteBatchBase::Merge;
Status Merge(ColumnFamilyHandle* column_family, const Slice& key, Status Merge(ColumnFamilyHandle* column_family, const Slice& key,
const Slice& value) override; const Slice& value) override;
Status Merge(const Slice& key, const Slice& value) override; Status Merge(const Slice& key, const Slice& value) override;
Status Merge(ColumnFamilyHandle* /*column_family*/, const Slice& /*key*/,
const Slice& /*ts*/, const Slice& /*value*/) override {
return Status::NotSupported(
"Merge does not support user-defined timestamp");
}
using WriteBatchBase::Delete; using WriteBatchBase::Delete;
Status Delete(ColumnFamilyHandle* column_family, const Slice& key) override; Status Delete(ColumnFamilyHandle* column_family, const Slice& key) override;
Status Delete(const Slice& key) override; Status Delete(const Slice& key) override;
Status Delete(ColumnFamilyHandle* column_family, const Slice& key,
const Slice& ts) override;
using WriteBatchBase::SingleDelete; using WriteBatchBase::SingleDelete;
Status SingleDelete(ColumnFamilyHandle* column_family, Status SingleDelete(ColumnFamilyHandle* column_family,
const Slice& key) override; const Slice& key) override;
Status SingleDelete(const Slice& key) override; Status SingleDelete(const Slice& key) override;
Status SingleDelete(ColumnFamilyHandle* column_family, const Slice& key,
const Slice& ts) override;
using WriteBatchBase::DeleteRange; using WriteBatchBase::DeleteRange;
Status DeleteRange(ColumnFamilyHandle* /* column_family */, Status DeleteRange(ColumnFamilyHandle* /* column_family */,
@ -137,6 +149,12 @@ class WriteBatchWithIndex : public WriteBatchBase {
return Status::NotSupported( return Status::NotSupported(
"DeleteRange unsupported in WriteBatchWithIndex"); "DeleteRange unsupported in WriteBatchWithIndex");
} }
Status DeleteRange(ColumnFamilyHandle* /*column_family*/,
const Slice& /*begin_key*/, const Slice& /*end_key*/,
const Slice& /*ts*/) override {
return Status::NotSupported(
"DeleteRange unsupported in WriteBatchWithIndex");
}
using WriteBatchBase::PutLogData; using WriteBatchBase::PutLogData;
Status PutLogData(const Slice& blob) override; Status PutLogData(const Slice& blob) override;

@ -68,7 +68,7 @@ class WriteBatch : public WriteBatchBase {
// protection information for each key entry. Currently supported values are // protection information for each key entry. Currently supported values are
// zero (disabled) and eight. // zero (disabled) and eight.
explicit WriteBatch(size_t reserved_bytes, size_t max_bytes, explicit WriteBatch(size_t reserved_bytes, size_t max_bytes,
size_t protection_bytes_per_key); size_t protection_bytes_per_key, size_t default_cf_ts_sz);
~WriteBatch() override; ~WriteBatch() override;
using WriteBatchBase::Put; using WriteBatchBase::Put;
@ -82,6 +82,8 @@ class WriteBatch : public WriteBatchBase {
Status Put(const Slice& key, const Slice& value) override { Status Put(const Slice& key, const Slice& value) override {
return Put(nullptr, key, value); return Put(nullptr, key, value);
} }
Status Put(ColumnFamilyHandle* column_family, const Slice& key,
const Slice& ts, const Slice& value) override;
// Variant of Put() that gathers output like writev(2). The key and value // Variant of Put() that gathers output like writev(2). The key and value
// that will be written to the database are concatenations of arrays of // that will be written to the database are concatenations of arrays of
@ -104,6 +106,8 @@ class WriteBatch : public WriteBatchBase {
// up the memory buffer pointed to by `key`. // up the memory buffer pointed to by `key`.
Status Delete(ColumnFamilyHandle* column_family, const Slice& key) override; Status Delete(ColumnFamilyHandle* column_family, const Slice& key) override;
Status Delete(const Slice& key) override { return Delete(nullptr, key); } Status Delete(const Slice& key) override { return Delete(nullptr, key); }
Status Delete(ColumnFamilyHandle* column_family, const Slice& key,
const Slice& ts) override;
// variant that takes SliceParts // variant that takes SliceParts
// These two variants of Delete(..., const SliceParts& key) can be used when // These two variants of Delete(..., const SliceParts& key) can be used when
@ -121,6 +125,8 @@ class WriteBatch : public WriteBatchBase {
Status SingleDelete(const Slice& key) override { Status SingleDelete(const Slice& key) override {
return SingleDelete(nullptr, key); return SingleDelete(nullptr, key);
} }
Status SingleDelete(ColumnFamilyHandle* column_family, const Slice& key,
const Slice& ts) override;
// variant that takes SliceParts // variant that takes SliceParts
Status SingleDelete(ColumnFamilyHandle* column_family, Status SingleDelete(ColumnFamilyHandle* column_family,
@ -136,6 +142,12 @@ class WriteBatch : public WriteBatchBase {
Status DeleteRange(const Slice& begin_key, const Slice& end_key) override { Status DeleteRange(const Slice& begin_key, const Slice& end_key) override {
return DeleteRange(nullptr, begin_key, end_key); return DeleteRange(nullptr, begin_key, end_key);
} }
Status DeleteRange(ColumnFamilyHandle* /*column_family*/,
const Slice& /*begin_key*/, const Slice& /*end_key*/,
const Slice& /*ts*/) override {
return Status::NotSupported(
"DeleteRange does not support user-defined timestamp");
}
// variant that takes SliceParts // variant that takes SliceParts
Status DeleteRange(ColumnFamilyHandle* column_family, Status DeleteRange(ColumnFamilyHandle* column_family,
@ -154,6 +166,11 @@ class WriteBatch : public WriteBatchBase {
Status Merge(const Slice& key, const Slice& value) override { Status Merge(const Slice& key, const Slice& value) override {
return Merge(nullptr, key, value); return Merge(nullptr, key, value);
} }
Status Merge(ColumnFamilyHandle* /*column_family*/, const Slice& /*key*/,
const Slice& /*ts*/, const Slice& /*value*/) override {
return Status::NotSupported(
"Merge does not support user-defined timestamp");
}
// variant that takes SliceParts // variant that takes SliceParts
Status Merge(ColumnFamilyHandle* column_family, const SliceParts& key, Status Merge(ColumnFamilyHandle* column_family, const SliceParts& key,
@ -343,55 +360,25 @@ class WriteBatch : public WriteBatchBase {
bool HasRollback() const; bool HasRollback() const;
// Experimental. // Experimental.
// Assign timestamp to write batch. //
// Update timestamps of existing entries in the write batch if
// applicable. If a key is intended for a column family that disables
// timestamp, then this API won't set the timestamp for this key.
// This requires that all keys, if enable timestamp, (possibly from multiple // This requires that all keys, if enable timestamp, (possibly from multiple
// column families) in the write batch have timestamps of the same format. // column families) in the write batch have timestamps of the same format.
// //
// checker: callable object to check the timestamp sizes of column families. // ts_sz_func: callable object to obtain the timestamp sizes of column
// families. If ts_sz_func() accesses data structures, then the caller of this
// API must guarantee thread-safety. Like other parts of RocksDB, this API is
// not exception-safe. Therefore, ts_sz_func() must not throw.
// //
// in: cf, the column family id. // in: cf, the column family id.
// in/out: ts_sz. Input as the expected timestamp size of the column // ret: timestamp size of the given column family. Return
// family, output as the actual timestamp size of the column family. // std::numeric_limits<size_t>::max() indicating "dont know or column
// ret: OK if assignment succeeds. // family info not found", this will cause UpdateTimestamps() to fail.
// Status checker(uint32_t cf, size_t& ts_sz); // size_t ts_sz_func(uint32_t cf);
// Status UpdateTimestamps(const Slice& ts,
// User can call checker(uint32_t cf, size_t& ts_sz) which does the std::function<size_t(uint32_t /*cf*/)> ts_sz_func);
// following:
// 1. find out the timestamp size of the column family whose id equals `cf`.
// 2. if cf's timestamp size is 0, then set ts_sz to 0 and return OK.
// 3. otherwise, compare ts_sz with cf's timestamp size and return
// Status::InvalidArgument() if different.
Status AssignTimestamp(
const Slice& ts,
std::function<Status(uint32_t, size_t&)> checker =
[](uint32_t /*cf*/, size_t& /*ts_sz*/) { return Status::OK(); });
// Experimental.
// Assign timestamps to write batch.
// This API allows the write batch to include keys from multiple column
// families whose timestamps' formats can differ. For example, some column
// families can enable timestamp, while others disable the feature.
// If key does not have timestamp, then put an empty Slice in ts_list as
// a placeholder.
//
// checker: callable object specified by caller to check the timestamp sizes
// of column families.
//
// in: cf, the column family id.
// in/out: ts_sz. Input as the expected timestamp size of the column
// family, output as the actual timestamp size of the column family.
// ret: OK if assignment succeeds.
// Status checker(uint32_t cf, size_t& ts_sz);
//
// User can call checker(uint32_t cf, size_t& ts_sz) which does the
// following:
// 1. find out the timestamp size of the column family whose id equals `cf`.
// 2. compare ts_sz with cf's timestamp size and return
// Status::InvalidArgument() if different.
Status AssignTimestamps(
const std::vector<Slice>& ts_list,
std::function<Status(uint32_t, size_t&)> checker =
[](uint32_t /*cf*/, size_t& /*ts_sz*/) { return Status::OK(); });
using WriteBatchBase::GetWriteBatch; using WriteBatchBase::GetWriteBatch;
WriteBatch* GetWriteBatch() override { return this; } WriteBatch* GetWriteBatch() override { return this; }
@ -446,6 +433,17 @@ class WriteBatch : public WriteBatchBase {
std::unique_ptr<ProtectionInfo> prot_info_; std::unique_ptr<ProtectionInfo> prot_info_;
size_t default_cf_ts_sz_ = 0;
// False if all keys are from column families that disable user-defined
// timestamp OR UpdateTimestamps() has been called at least once.
// This flag will be set to true if any of the above Put(), Delete(),
// SingleDelete(), etc. APIs are called at least once.
// Calling Put(ts), Delete(ts), SingleDelete(ts), etc. will not set this flag
// to true because the assumption is that these APIs have already set the
// timestamps to desired values.
bool needs_in_place_update_ts_ = false;
protected: protected:
std::string rep_; // See comment in write_batch.cc for the format of rep_ std::string rep_; // See comment in write_batch.cc for the format of rep_
}; };

@ -31,6 +31,8 @@ class WriteBatchBase {
virtual Status Put(ColumnFamilyHandle* column_family, const Slice& key, virtual Status Put(ColumnFamilyHandle* column_family, const Slice& key,
const Slice& value) = 0; const Slice& value) = 0;
virtual Status Put(const Slice& key, const Slice& value) = 0; virtual Status Put(const Slice& key, const Slice& value) = 0;
virtual Status Put(ColumnFamilyHandle* column_family, const Slice& key,
const Slice& ts, const Slice& value) = 0;
// Variant of Put() that gathers output like writev(2). The key and value // Variant of Put() that gathers output like writev(2). The key and value
// that will be written to the database are concatenations of arrays of // that will be written to the database are concatenations of arrays of
@ -44,6 +46,8 @@ class WriteBatchBase {
virtual Status Merge(ColumnFamilyHandle* column_family, const Slice& key, virtual Status Merge(ColumnFamilyHandle* column_family, const Slice& key,
const Slice& value) = 0; const Slice& value) = 0;
virtual Status Merge(const Slice& key, const Slice& value) = 0; virtual Status Merge(const Slice& key, const Slice& value) = 0;
virtual Status Merge(ColumnFamilyHandle* column_family, const Slice& key,
const Slice& ts, const Slice& value) = 0;
// variant that takes SliceParts // variant that takes SliceParts
virtual Status Merge(ColumnFamilyHandle* column_family, const SliceParts& key, virtual Status Merge(ColumnFamilyHandle* column_family, const SliceParts& key,
@ -54,6 +58,8 @@ class WriteBatchBase {
virtual Status Delete(ColumnFamilyHandle* column_family, virtual Status Delete(ColumnFamilyHandle* column_family,
const Slice& key) = 0; const Slice& key) = 0;
virtual Status Delete(const Slice& key) = 0; virtual Status Delete(const Slice& key) = 0;
virtual Status Delete(ColumnFamilyHandle* column_family, const Slice& key,
const Slice& ts) = 0;
// variant that takes SliceParts // variant that takes SliceParts
virtual Status Delete(ColumnFamilyHandle* column_family, virtual Status Delete(ColumnFamilyHandle* column_family,
@ -65,6 +71,8 @@ class WriteBatchBase {
virtual Status SingleDelete(ColumnFamilyHandle* column_family, virtual Status SingleDelete(ColumnFamilyHandle* column_family,
const Slice& key) = 0; const Slice& key) = 0;
virtual Status SingleDelete(const Slice& key) = 0; virtual Status SingleDelete(const Slice& key) = 0;
virtual Status SingleDelete(ColumnFamilyHandle* column_family,
const Slice& key, const Slice& ts) = 0;
// variant that takes SliceParts // variant that takes SliceParts
virtual Status SingleDelete(ColumnFamilyHandle* column_family, virtual Status SingleDelete(ColumnFamilyHandle* column_family,
@ -76,6 +84,9 @@ class WriteBatchBase {
virtual Status DeleteRange(ColumnFamilyHandle* column_family, virtual Status DeleteRange(ColumnFamilyHandle* column_family,
const Slice& begin_key, const Slice& end_key) = 0; const Slice& begin_key, const Slice& end_key) = 0;
virtual Status DeleteRange(const Slice& begin_key, const Slice& end_key) = 0; virtual Status DeleteRange(const Slice& begin_key, const Slice& end_key) = 0;
virtual Status DeleteRange(ColumnFamilyHandle* column_family,
const Slice& begin_key, const Slice& end_key,
const Slice& ts) = 0;
// variant that takes SliceParts // variant that takes SliceParts
virtual Status DeleteRange(ColumnFamilyHandle* column_family, virtual Status DeleteRange(ColumnFamilyHandle* column_family,

@ -4794,7 +4794,7 @@ class Benchmark {
RandomGenerator gen; RandomGenerator gen;
WriteBatch batch(/*reserved_bytes=*/0, /*max_bytes=*/0, WriteBatch batch(/*reserved_bytes=*/0, /*max_bytes=*/0,
user_timestamp_size_); /*protection_bytes_per_key=*/0, user_timestamp_size_);
Status s; Status s;
int64_t bytes = 0; int64_t bytes = 0;
@ -5122,7 +5122,8 @@ class Benchmark {
} }
if (user_timestamp_size_ > 0) { if (user_timestamp_size_ > 0) {
Slice user_ts = mock_app_clock_->Allocate(ts_guard.get()); Slice user_ts = mock_app_clock_->Allocate(ts_guard.get());
s = batch.AssignTimestamp(user_ts); s = batch.UpdateTimestamps(
user_ts, [this](uint32_t) { return user_timestamp_size_; });
if (!s.ok()) { if (!s.ok()) {
fprintf(stderr, "assign timestamp to write batch: %s\n", fprintf(stderr, "assign timestamp to write batch: %s\n",
s.ToString().c_str()); s.ToString().c_str());
@ -6512,7 +6513,7 @@ class Benchmark {
void DoDelete(ThreadState* thread, bool seq) { void DoDelete(ThreadState* thread, bool seq) {
WriteBatch batch(/*reserved_bytes=*/0, /*max_bytes=*/0, WriteBatch batch(/*reserved_bytes=*/0, /*max_bytes=*/0,
user_timestamp_size_); /*protection_bytes_per_key=*/0, user_timestamp_size_);
Duration duration(seq ? 0 : FLAGS_duration, deletes_); Duration duration(seq ? 0 : FLAGS_duration, deletes_);
int64_t i = 0; int64_t i = 0;
std::unique_ptr<const char[]> key_guard; std::unique_ptr<const char[]> key_guard;
@ -6534,7 +6535,8 @@ class Benchmark {
Status s; Status s;
if (user_timestamp_size_ > 0) { if (user_timestamp_size_ > 0) {
ts = mock_app_clock_->Allocate(ts_guard.get()); ts = mock_app_clock_->Allocate(ts_guard.get());
s = batch.AssignTimestamp(ts); s = batch.UpdateTimestamps(
ts, [this](uint32_t) { return user_timestamp_size_; });
if (!s.ok()) { if (!s.ok()) {
fprintf(stderr, "assign timestamp: %s\n", s.ToString().c_str()); fprintf(stderr, "assign timestamp: %s\n", s.ToString().c_str());
ErrorExit(); ErrorExit();
@ -6628,17 +6630,17 @@ class Benchmark {
Slice ts; Slice ts;
if (user_timestamp_size_ > 0) { if (user_timestamp_size_ > 0) {
ts = mock_app_clock_->Allocate(ts_guard.get()); ts = mock_app_clock_->Allocate(ts_guard.get());
write_options_.timestamp = &ts;
} }
if (write_merge == kWrite) { if (write_merge == kWrite) {
s = db->Put(write_options_, key, val); if (user_timestamp_size_ == 0) {
s = db->Put(write_options_, key, val);
} else {
s = db->Put(write_options_, key, ts, val);
}
} else { } else {
s = db->Merge(write_options_, key, val); s = db->Merge(write_options_, key, val);
} }
// Restore write_options_ // Restore write_options_
if (user_timestamp_size_ > 0) {
write_options_.timestamp = nullptr;
}
written++; written++;
if (!s.ok()) { if (!s.ok()) {
@ -6711,7 +6713,7 @@ class Benchmark {
std::string keys[3]; std::string keys[3];
WriteBatch batch(/*reserved_bytes=*/0, /*max_bytes=*/0, WriteBatch batch(/*reserved_bytes=*/0, /*max_bytes=*/0,
user_timestamp_size_); /*protection_bytes_per_key=*/0, user_timestamp_size_);
Status s; Status s;
for (int i = 0; i < 3; i++) { for (int i = 0; i < 3; i++) {
keys[i] = key.ToString() + suffixes[i]; keys[i] = key.ToString() + suffixes[i];
@ -6722,7 +6724,8 @@ class Benchmark {
if (user_timestamp_size_ > 0) { if (user_timestamp_size_ > 0) {
ts_guard.reset(new char[user_timestamp_size_]); ts_guard.reset(new char[user_timestamp_size_]);
Slice ts = mock_app_clock_->Allocate(ts_guard.get()); Slice ts = mock_app_clock_->Allocate(ts_guard.get());
s = batch.AssignTimestamp(ts); s = batch.UpdateTimestamps(
ts, [this](uint32_t) { return user_timestamp_size_; });
if (!s.ok()) { if (!s.ok()) {
fprintf(stderr, "assign timestamp to batch: %s\n", fprintf(stderr, "assign timestamp to batch: %s\n",
s.ToString().c_str()); s.ToString().c_str());
@ -6742,7 +6745,8 @@ class Benchmark {
std::string suffixes[3] = {"1", "2", "0"}; std::string suffixes[3] = {"1", "2", "0"};
std::string keys[3]; std::string keys[3];
WriteBatch batch(0, 0, user_timestamp_size_); WriteBatch batch(0, 0, /*protection_bytes_per_key=*/0,
user_timestamp_size_);
Status s; Status s;
for (int i = 0; i < 3; i++) { for (int i = 0; i < 3; i++) {
keys[i] = key.ToString() + suffixes[i]; keys[i] = key.ToString() + suffixes[i];
@ -6753,7 +6757,8 @@ class Benchmark {
if (user_timestamp_size_ > 0) { if (user_timestamp_size_ > 0) {
ts_guard.reset(new char[user_timestamp_size_]); ts_guard.reset(new char[user_timestamp_size_]);
Slice ts = mock_app_clock_->Allocate(ts_guard.get()); Slice ts = mock_app_clock_->Allocate(ts_guard.get());
s = batch.AssignTimestamp(ts); s = batch.UpdateTimestamps(
ts, [this](uint32_t) { return user_timestamp_size_; });
if (!s.ok()) { if (!s.ok()) {
fprintf(stderr, "assign timestamp to batch: %s\n", fprintf(stderr, "assign timestamp to batch: %s\n",
s.ToString().c_str()); s.ToString().c_str());
@ -6940,12 +6945,13 @@ class Benchmark {
} else if (put_weight > 0) { } else if (put_weight > 0) {
// then do all the corresponding number of puts // then do all the corresponding number of puts
// for all the gets we have done earlier // for all the gets we have done earlier
Slice ts; Status s;
if (user_timestamp_size_ > 0) { if (user_timestamp_size_ > 0) {
ts = mock_app_clock_->Allocate(ts_guard.get()); Slice ts = mock_app_clock_->Allocate(ts_guard.get());
write_options_.timestamp = &ts; s = db->Put(write_options_, key, ts, gen.Generate());
} else {
s = db->Put(write_options_, key, gen.Generate());
} }
Status s = db->Put(write_options_, key, gen.Generate());
if (!s.ok()) { if (!s.ok()) {
fprintf(stderr, "put error: %s\n", s.ToString().c_str()); fprintf(stderr, "put error: %s\n", s.ToString().c_str());
ErrorExit(); ErrorExit();
@ -7006,11 +7012,13 @@ class Benchmark {
} }
Slice val = gen.Generate(); Slice val = gen.Generate();
Status s;
if (user_timestamp_size_ > 0) { if (user_timestamp_size_ > 0) {
ts = mock_app_clock_->Allocate(ts_guard.get()); ts = mock_app_clock_->Allocate(ts_guard.get());
write_options_.timestamp = &ts; s = db->Put(write_options_, key, ts, val);
} else {
s = db->Put(write_options_, key, val);
} }
Status s = db->Put(write_options_, key, val);
if (!s.ok()) { if (!s.ok()) {
fprintf(stderr, "put error: %s\n", s.ToString().c_str()); fprintf(stderr, "put error: %s\n", s.ToString().c_str());
exit(1); exit(1);
@ -7073,12 +7081,13 @@ class Benchmark {
xor_operator.XOR(nullptr, value, &new_value); xor_operator.XOR(nullptr, value, &new_value);
} }
Status s;
if (user_timestamp_size_ > 0) { if (user_timestamp_size_ > 0) {
ts = mock_app_clock_->Allocate(ts_guard.get()); ts = mock_app_clock_->Allocate(ts_guard.get());
write_options_.timestamp = &ts; s = db->Put(write_options_, key, ts, Slice(new_value));
} else {
s = db->Put(write_options_, key, Slice(new_value));
} }
Status s = db->Put(write_options_, key, Slice(new_value));
if (!s.ok()) { if (!s.ok()) {
fprintf(stderr, "put error: %s\n", s.ToString().c_str()); fprintf(stderr, "put error: %s\n", s.ToString().c_str());
ErrorExit(); ErrorExit();
@ -7139,13 +7148,14 @@ class Benchmark {
} }
value.append(operand.data(), operand.size()); value.append(operand.data(), operand.size());
Status s;
if (user_timestamp_size_ > 0) { if (user_timestamp_size_ > 0) {
ts = mock_app_clock_->Allocate(ts_guard.get()); ts = mock_app_clock_->Allocate(ts_guard.get());
write_options_.timestamp = &ts; s = db->Put(write_options_, key, ts, value);
} else {
// Write back to the database
s = db->Put(write_options_, key, value);
} }
// Write back to the database
Status s = db->Put(write_options_, key, value);
if (!s.ok()) { if (!s.ok()) {
fprintf(stderr, "put error: %s\n", s.ToString().c_str()); fprintf(stderr, "put error: %s\n", s.ToString().c_str());
ErrorExit(); ErrorExit();
@ -7521,12 +7531,12 @@ class Benchmark {
DB* db = SelectDB(thread); DB* db = SelectDB(thread);
for (int64_t i = 0; i < FLAGS_numdistinct; i++) { for (int64_t i = 0; i < FLAGS_numdistinct; i++) {
GenerateKeyFromInt(i * max_counter, FLAGS_num, &key); GenerateKeyFromInt(i * max_counter, FLAGS_num, &key);
Slice ts;
if (user_timestamp_size_ > 0) { if (user_timestamp_size_ > 0) {
ts = mock_app_clock_->Allocate(ts_guard.get()); Slice ts = mock_app_clock_->Allocate(ts_guard.get());
write_options_.timestamp = &ts; s = db->Put(write_options_, key, ts, gen.Generate());
} else {
s = db->Put(write_options_, key, gen.Generate());
} }
s = db->Put(write_options_, key, gen.Generate());
if (!s.ok()) { if (!s.ok()) {
fprintf(stderr, "Operation failed: %s\n", s.ToString().c_str()); fprintf(stderr, "Operation failed: %s\n", s.ToString().c_str());
exit(1); exit(1);
@ -7545,22 +7555,24 @@ class Benchmark {
static_cast<int64_t>(0)); static_cast<int64_t>(0));
GenerateKeyFromInt(key_id * max_counter + counters[key_id], FLAGS_num, GenerateKeyFromInt(key_id * max_counter + counters[key_id], FLAGS_num,
&key); &key);
Slice ts;
if (user_timestamp_size_ > 0) { if (user_timestamp_size_ > 0) {
ts = mock_app_clock_->Allocate(ts_guard.get()); Slice ts = mock_app_clock_->Allocate(ts_guard.get());
write_options_.timestamp = &ts; s = FLAGS_use_single_deletes ? db->SingleDelete(write_options_, key, ts)
: db->Delete(write_options_, key, ts);
} else {
s = FLAGS_use_single_deletes ? db->SingleDelete(write_options_, key)
: db->Delete(write_options_, key);
} }
s = FLAGS_use_single_deletes ? db->SingleDelete(write_options_, key)
: db->Delete(write_options_, key);
if (s.ok()) { if (s.ok()) {
counters[key_id] = (counters[key_id] + 1) % max_counter; counters[key_id] = (counters[key_id] + 1) % max_counter;
GenerateKeyFromInt(key_id * max_counter + counters[key_id], FLAGS_num, GenerateKeyFromInt(key_id * max_counter + counters[key_id], FLAGS_num,
&key); &key);
if (user_timestamp_size_ > 0) { if (user_timestamp_size_ > 0) {
ts = mock_app_clock_->Allocate(ts_guard.get()); Slice ts = mock_app_clock_->Allocate(ts_guard.get());
write_options_.timestamp = &ts; s = db->Put(write_options_, key, ts, Slice());
} else {
s = db->Put(write_options_, key, Slice());
} }
s = db->Put(write_options_, key, Slice());
} }
if (!s.ok()) { if (!s.ok()) {

@ -200,6 +200,7 @@ class BlobDB : public StackableDB {
virtual Status Write(const WriteOptions& opts, virtual Status Write(const WriteOptions& opts,
WriteBatch* updates) override = 0; WriteBatch* updates) override = 0;
using ROCKSDB_NAMESPACE::StackableDB::NewIterator; using ROCKSDB_NAMESPACE::StackableDB::NewIterator;
virtual Iterator* NewIterator(const ReadOptions& options) override = 0; virtual Iterator* NewIterator(const ReadOptions& options) override = 0;
virtual Iterator* NewIterator(const ReadOptions& options, virtual Iterator* NewIterator(const ReadOptions& options,

@ -128,6 +128,7 @@ class BlobDBImpl : public BlobDB {
const std::vector<Slice>& keys, const std::vector<Slice>& keys,
std::vector<std::string>* values) override; std::vector<std::string>* values) override;
using BlobDB::Write;
virtual Status Write(const WriteOptions& opts, WriteBatch* updates) override; virtual Status Write(const WriteOptions& opts, WriteBatch* updates) override;
virtual Status Close() override; virtual Status Close() override;

@ -47,6 +47,7 @@ class OptimisticTransactionDBImpl : public OptimisticTransactionDB {
Transaction* old_txn) override; Transaction* old_txn) override;
// Transactional `DeleteRange()` is not yet supported. // Transactional `DeleteRange()` is not yet supported.
using StackableDB::DeleteRange;
virtual Status DeleteRange(const WriteOptions&, ColumnFamilyHandle*, virtual Status DeleteRange(const WriteOptions&, ColumnFamilyHandle*,
const Slice&, const Slice&) override { const Slice&, const Slice&) override {
return Status::NotSupported(); return Status::NotSupported();

@ -246,6 +246,7 @@ Status WritePreparedTxnDB::Get(const ReadOptions& options,
backed_by_snapshot))) { backed_by_snapshot))) {
return res; return res;
} else { } else {
res.PermitUncheckedError();
WPRecordTick(TXN_GET_TRY_AGAIN); WPRecordTick(TXN_GET_TRY_AGAIN);
return Status::TryAgain(); return Status::TryAgain();
} }

@ -322,6 +322,16 @@ Status WriteBatchWithIndex::Put(const Slice& key, const Slice& value) {
return s; return s;
} }
Status WriteBatchWithIndex::Put(ColumnFamilyHandle* column_family,
const Slice& /*key*/, const Slice& /*ts*/,
const Slice& /*value*/) {
if (!column_family) {
return Status::InvalidArgument("column family handle cannot be nullptr");
}
// TODO: support WBWI::Put() with timestamp.
return Status::NotSupported();
}
Status WriteBatchWithIndex::Delete(ColumnFamilyHandle* column_family, Status WriteBatchWithIndex::Delete(ColumnFamilyHandle* column_family,
const Slice& key) { const Slice& key) {
rep->SetLastEntryOffset(); rep->SetLastEntryOffset();
@ -341,6 +351,15 @@ Status WriteBatchWithIndex::Delete(const Slice& key) {
return s; return s;
} }
Status WriteBatchWithIndex::Delete(ColumnFamilyHandle* column_family,
const Slice& /*key*/, const Slice& /*ts*/) {
if (!column_family) {
return Status::InvalidArgument("column family handle cannot be nullptr");
}
// TODO: support WBWI::Delete() with timestamp.
return Status::NotSupported();
}
Status WriteBatchWithIndex::SingleDelete(ColumnFamilyHandle* column_family, Status WriteBatchWithIndex::SingleDelete(ColumnFamilyHandle* column_family,
const Slice& key) { const Slice& key) {
rep->SetLastEntryOffset(); rep->SetLastEntryOffset();
@ -360,6 +379,16 @@ Status WriteBatchWithIndex::SingleDelete(const Slice& key) {
return s; return s;
} }
Status WriteBatchWithIndex::SingleDelete(ColumnFamilyHandle* column_family,
const Slice& /*key*/,
const Slice& /*ts*/) {
if (!column_family) {
return Status::InvalidArgument("column family handle cannot be nullptr");
}
// TODO: support WBWI::SingleDelete() with timestamp.
return Status::NotSupported();
}
Status WriteBatchWithIndex::Merge(ColumnFamilyHandle* column_family, Status WriteBatchWithIndex::Merge(ColumnFamilyHandle* column_family,
const Slice& key, const Slice& value) { const Slice& key, const Slice& value) {
rep->SetLastEntryOffset(); rep->SetLastEntryOffset();

Loading…
Cancel
Save