Revise APIs related to user-defined timestamp (#8946)

Summary:
ajkr reminded me that we have a rule of not including per-kv related data in `WriteOptions`.
Namely, `WriteOptions` should not include information about "what-to-write", but should just
include information about "how-to-write".

According to this rule, `WriteOptions::timestamp` (experimental) is clearly a violation. Therefore,
this PR removes `WriteOptions::timestamp` for compliance.
After the removal, we need to pass timestamp info via another set of APIs. This PR proposes a set
of overloaded functions `Put(write_opts, key, value, ts)`, `Delete(write_opts, key, ts)`, and
`SingleDelete(write_opts, key, ts)`. Planned to add `Write(write_opts, batch, ts)`, but its complexity
made me reconsider doing it in another PR (maybe).

For better checking and returning error early, we also add a new set of APIs to `WriteBatch` that take
extra `timestamp` information when writing to `WriteBatch`es.
These set of APIs in `WriteBatchWithIndex` are currently not supported, and are on our TODO list.

Removed `WriteBatch::AssignTimestamps()` and renamed `WriteBatch::AssignTimestamp()` to
`WriteBatch::UpdateTimestamps()` since this method require that all keys have space for timestamps
allocated already and multiple timestamps can be updated.

The constructor of `WriteBatch` now takes a fourth argument `default_cf_ts_sz` which is the timestamp
size of the default column family. This will be used to allocate space when calling APIs that do not
specify a column family handle.

Also, updated `DB::Get()`, `DB::MultiGet()`, `DB::NewIterator()`, `DB::NewIterators()` methods, replacing
some assertions about timestamp to returning Status code.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8946

Test Plan:
make check
./db_bench -benchmarks=fillseq,fillrandom,readrandom,readseq,deleterandom -user_timestamp_size=8
./db_stress --user_timestamp_size=8 -nooverwritepercent=0 -test_secondary=0 -secondary_catch_up_one_in=0 -continuous_verification_interval=0

Make sure there is no perf regression by running the following
```
./db_bench_opt -db=/dev/shm/rocksdb -use_existing_db=0 -level0_stop_writes_trigger=256 -level0_slowdown_writes_trigger=256 -level0_file_num_compaction_trigger=256 -disable_wal=1 -duration=10 -benchmarks=fillrandom
```

Before this PR
```
DB path: [/dev/shm/rocksdb]
fillrandom   :       1.831 micros/op 546235 ops/sec;   60.4 MB/s
```
After this PR
```
DB path: [/dev/shm/rocksdb]
fillrandom   :       1.820 micros/op 549404 ops/sec;   60.8 MB/s
```

Reviewed By: ltamasi

Differential Revision: D33721359

Pulled By: riversand963

fbshipit-source-id: c131561534272c120ffb80711d42748d21badf09
main
Yanqin Jin 2 years ago committed by Facebook GitHub Bot
parent 920386f2b7
commit 3122cb4358
  1. 1
      HISTORY.md
  2. 11
      db/db_impl/compacted_db_impl.cc
  3. 129
      db/db_impl/db_impl.cc
  4. 81
      db/db_impl/db_impl.h
  5. 31
      db/db_impl/db_impl_readonly.cc
  6. 31
      db/db_impl/db_impl_secondary.cc
  7. 187
      db/db_impl/db_impl_write.cc
  8. 2
      db/db_kv_checksum_test.cc
  9. 13
      db/db_test.cc
  10. 9
      db/db_test2.cc
  11. 574
      db/db_with_timestamp_basic_test.cc
  12. 15
      db/db_with_timestamp_compaction_test.cc
  13. 350
      db/write_batch.cc
  14. 152
      db/write_batch_internal.h
  15. 108
      db/write_batch_test.cc
  16. 6
      db_stress_tool/batched_ops_stress.cc
  17. 6
      db_stress_tool/db_stress_test_base.cc
  18. 28
      db_stress_tool/no_batched_ops_stress.cc
  19. 35
      include/rocksdb/db.h
  20. 14
      include/rocksdb/options.h
  21. 20
      include/rocksdb/utilities/stackable_db.h
  22. 1
      include/rocksdb/utilities/transaction_db.h
  23. 18
      include/rocksdb/utilities/write_batch_with_index.h
  24. 88
      include/rocksdb/write_batch.h
  25. 11
      include/rocksdb/write_batch_base.h
  26. 88
      tools/db_bench_tool.cc
  27. 1
      utilities/blob_db/blob_db.h
  28. 1
      utilities/blob_db/blob_db_impl.h
  29. 1
      utilities/transactions/optimistic_transaction_db_impl.h
  30. 1
      utilities/transactions/write_prepared_txn_db.cc
  31. 29
      utilities/write_batch_with_index/write_batch_with_index.cc

@ -21,6 +21,7 @@
* Remove ReadOptions::iter_start_seqnum which has been deprecated.
* Remove DBOptions::preserved_deletes and DB::SetPreserveDeletesSequenceNumber().
* Remove deprecated API AdvancedColumnFamilyOptions::rate_limit_delay_max_milliseconds.
* Removed timestamp from WriteOptions. Accordingly, added to DB APIs Put, Delete, SingleDelete, etc. accepting an additional argument 'timestamp'. Added Put, Delete, SingleDelete, etc to WriteBatch accepting an additional argument 'timestamp'. Removed WriteBatch::AssignTimestamps(vector<Slice>) API. Renamed WriteBatch::AssignTimestamp() to WriteBatch::UpdateTimestamps() with clarified comments.
### Behavior Changes
* Disallow the combination of DBOptions.use_direct_io_for_flush_and_compaction == true and DBOptions.writable_file_max_buffer_size == 0. This combination can cause WritableFileWriter::Append() to loop forever, and it does not make much sense in direct IO.

@ -40,6 +40,11 @@ size_t CompactedDBImpl::FindFile(const Slice& key) {
Status CompactedDBImpl::Get(const ReadOptions& options, ColumnFamilyHandle*,
const Slice& key, PinnableSlice* value) {
assert(user_comparator_);
if (options.timestamp || user_comparator_->timestamp_size()) {
// TODO: support timestamp
return Status::NotSupported();
}
GetContext get_context(user_comparator_, nullptr, nullptr, nullptr,
GetContext::kNotFound, key, value, nullptr, nullptr,
nullptr, true, nullptr, nullptr);
@ -58,6 +63,11 @@ Status CompactedDBImpl::Get(const ReadOptions& options, ColumnFamilyHandle*,
std::vector<Status> CompactedDBImpl::MultiGet(const ReadOptions& options,
const std::vector<ColumnFamilyHandle*>&,
const std::vector<Slice>& keys, std::vector<std::string>* values) {
assert(user_comparator_);
if (user_comparator_->timestamp_size() || options.timestamp) {
// TODO: support timestamp
return std::vector<Status>(keys.size(), Status::NotSupported());
}
autovector<TableReader*, 16> reader_list;
for (const auto& key : keys) {
const FdWithKeyRange& f = files_.files[FindFile(key)];
@ -69,6 +79,7 @@ std::vector<Status> CompactedDBImpl::MultiGet(const ReadOptions& options,
reader_list.push_back(f.fd.table_reader);
}
}
std::vector<Status> statuses(keys.size(), Status::NotFound());
values->resize(keys.size());
int idx = 0;

@ -1730,19 +1730,21 @@ Status DBImpl::GetImpl(const ReadOptions& read_options, const Slice& key,
get_impl_options.merge_operands != nullptr);
assert(get_impl_options.column_family);
const Comparator* ucmp = get_impl_options.column_family->GetComparator();
assert(ucmp);
size_t ts_sz = ucmp->timestamp_size();
GetWithTimestampReadCallback read_cb(0); // Will call Refresh
#ifndef NDEBUG
if (ts_sz > 0) {
assert(read_options.timestamp);
assert(read_options.timestamp->size() == ts_sz);
if (read_options.timestamp) {
const Status s = FailIfTsSizesMismatch(get_impl_options.column_family,
*(read_options.timestamp));
if (!s.ok()) {
return s;
}
} else {
assert(!read_options.timestamp);
const Status s = FailIfCfHasTs(get_impl_options.column_family);
if (!s.ok()) {
return s;
}
}
#endif // NDEBUG
GetWithTimestampReadCallback read_cb(0); // Will call Refresh
PERF_CPU_TIMER_GUARD(get_cpu_nanos, immutable_db_options_.clock);
StopWatch sw(immutable_db_options_.clock, stats_, DB_GET);
@ -1811,7 +1813,9 @@ Status DBImpl::GetImpl(const ReadOptions& read_options, const Slice& key,
// only if t <= read_opts.timestamp and s <= snapshot.
// HACK: temporarily overwrite input struct field but restore
SaveAndRestore<ReadCallback*> restore_callback(&get_impl_options.callback);
if (ts_sz > 0) {
const Comparator* ucmp = get_impl_options.column_family->GetComparator();
assert(ucmp);
if (ucmp->timestamp_size() > 0) {
assert(!get_impl_options
.callback); // timestamp with callback is not supported
read_cb.Refresh(snapshot);
@ -1834,7 +1838,8 @@ Status DBImpl::GetImpl(const ReadOptions& read_options, const Slice& key,
bool skip_memtable = (read_options.read_tier == kPersistedTier &&
has_unpersisted_data_.load(std::memory_order_relaxed));
bool done = false;
std::string* timestamp = ts_sz > 0 ? get_impl_options.timestamp : nullptr;
std::string* timestamp =
ucmp->timestamp_size() > 0 ? get_impl_options.timestamp : nullptr;
if (!skip_memtable) {
// Get value associated with key
if (get_impl_options.get_value) {
@ -1941,19 +1946,36 @@ std::vector<Status> DBImpl::MultiGet(
StopWatch sw(immutable_db_options_.clock, stats_, DB_MULTIGET);
PERF_TIMER_GUARD(get_snapshot_time);
#ifndef NDEBUG
for (const auto* cfh : column_family) {
assert(cfh);
const Comparator* const ucmp = cfh->GetComparator();
assert(ucmp);
if (ucmp->timestamp_size() > 0) {
assert(read_options.timestamp);
assert(ucmp->timestamp_size() == read_options.timestamp->size());
size_t num_keys = keys.size();
assert(column_family.size() == num_keys);
std::vector<Status> stat_list(num_keys);
bool should_fail = false;
for (size_t i = 0; i < num_keys; ++i) {
assert(column_family[i]);
if (read_options.timestamp) {
stat_list[i] =
FailIfTsSizesMismatch(column_family[i], *(read_options.timestamp));
if (!stat_list[i].ok()) {
should_fail = true;
}
} else {
assert(!read_options.timestamp);
stat_list[i] = FailIfCfHasTs(column_family[i]);
if (!stat_list[i].ok()) {
should_fail = true;
}
}
}
#endif // NDEBUG
if (should_fail) {
for (auto& s : stat_list) {
if (s.ok()) {
s = Status::Incomplete(
"DB not queried due to invalid argument(s) in the same MultiGet");
}
}
return stat_list;
}
if (tracer_) {
// TODO: This mutex should be removed later, to improve performance when
@ -1996,8 +2018,6 @@ std::vector<Status> DBImpl::MultiGet(
MergeContext merge_context;
// Note: this always resizes the values array
size_t num_keys = keys.size();
std::vector<Status> stat_list(num_keys);
values->resize(num_keys);
if (timestamps) {
timestamps->resize(num_keys);
@ -2262,20 +2282,31 @@ void DBImpl::MultiGet(const ReadOptions& read_options, const size_t num_keys,
return;
}
#ifndef NDEBUG
bool should_fail = false;
for (size_t i = 0; i < num_keys; ++i) {
ColumnFamilyHandle* cfh = column_families[i];
assert(cfh);
const Comparator* const ucmp = cfh->GetComparator();
assert(ucmp);
if (ucmp->timestamp_size() > 0) {
assert(read_options.timestamp);
assert(read_options.timestamp->size() == ucmp->timestamp_size());
if (read_options.timestamp) {
statuses[i] = FailIfTsSizesMismatch(cfh, *(read_options.timestamp));
if (!statuses[i].ok()) {
should_fail = true;
}
} else {
assert(!read_options.timestamp);
statuses[i] = FailIfCfHasTs(cfh);
if (!statuses[i].ok()) {
should_fail = true;
}
}
}
#endif // NDEBUG
if (should_fail) {
for (size_t i = 0; i < num_keys; ++i) {
if (statuses[i].ok()) {
statuses[i] = Status::Incomplete(
"DB not queried due to invalid argument(s) in the same MultiGet");
}
}
return;
}
if (tracer_) {
// TODO: This mutex should be removed later, to improve performance when
@ -2903,6 +2934,21 @@ Iterator* DBImpl::NewIterator(const ReadOptions& read_options,
"ReadTier::kPersistedData is not yet supported in iterators."));
}
assert(column_family);
if (read_options.timestamp) {
const Status s =
FailIfTsSizesMismatch(column_family, *(read_options.timestamp));
if (!s.ok()) {
return NewErrorIterator(s);
}
} else {
const Status s = FailIfCfHasTs(column_family);
if (!s.ok()) {
return NewErrorIterator(s);
}
}
auto cfh = static_cast_with_check<ColumnFamilyHandleImpl>(column_family);
ColumnFamilyData* cfd = cfh->cfd();
assert(cfd != nullptr);
@ -3029,6 +3075,25 @@ Status DBImpl::NewIterators(
return Status::NotSupported(
"ReadTier::kPersistedData is not yet supported in iterators.");
}
if (read_options.timestamp) {
for (auto* cf : column_families) {
assert(cf);
const Status s = FailIfTsSizesMismatch(cf, *(read_options.timestamp));
if (!s.ok()) {
return s;
}
}
} else {
for (auto* cf : column_families) {
assert(cf);
const Status s = FailIfCfHasTs(cf);
if (!s.ok()) {
return s;
}
}
}
ReadCallback* read_callback = nullptr; // No read callback provided.
iterators->clear();
iterators->reserve(column_families.size());

@ -146,24 +146,36 @@ class DBImpl : public DB {
// ---- Implementations of the DB interface ----
using DB::Resume;
virtual Status Resume() override;
Status Resume() override;
using DB::Put;
virtual Status Put(const WriteOptions& options,
ColumnFamilyHandle* column_family, const Slice& key,
const Slice& value) override;
Status Put(const WriteOptions& options, ColumnFamilyHandle* column_family,
const Slice& key, const Slice& value) override;
Status Put(const WriteOptions& options, ColumnFamilyHandle* column_family,
const Slice& key, const Slice& ts, const Slice& value) override;
using DB::Merge;
virtual Status Merge(const WriteOptions& options,
ColumnFamilyHandle* column_family, const Slice& key,
const Slice& value) override;
Status Merge(const WriteOptions& options, ColumnFamilyHandle* column_family,
const Slice& key, const Slice& value) override;
using DB::Delete;
virtual Status Delete(const WriteOptions& options,
ColumnFamilyHandle* column_family,
const Slice& key) override;
Status Delete(const WriteOptions& options, ColumnFamilyHandle* column_family,
const Slice& key) override;
Status Delete(const WriteOptions& options, ColumnFamilyHandle* column_family,
const Slice& key, const Slice& ts) override;
using DB::SingleDelete;
virtual Status SingleDelete(const WriteOptions& options,
ColumnFamilyHandle* column_family,
const Slice& key) override;
Status SingleDelete(const WriteOptions& options,
ColumnFamilyHandle* column_family,
const Slice& key) override;
Status SingleDelete(const WriteOptions& options,
ColumnFamilyHandle* column_family, const Slice& key,
const Slice& ts) override;
using DB::DeleteRange;
Status DeleteRange(const WriteOptions& options,
ColumnFamilyHandle* column_family, const Slice& begin_key,
const Slice& end_key) override;
using DB::Write;
virtual Status Write(const WriteOptions& options,
WriteBatch* updates) override;
@ -1368,6 +1380,10 @@ class DBImpl : public DB {
// to ensure that db_session_id_ gets updated every time the DB is opened
void SetDbSessionId();
Status FailIfCfHasTs(const ColumnFamilyHandle* column_family) const;
Status FailIfTsSizesMismatch(const ColumnFamilyHandle* column_family,
const Slice& ts) const;
private:
friend class DB;
friend class ErrorHandler;
@ -2396,4 +2412,43 @@ static void ClipToRange(T* ptr, V minvalue, V maxvalue) {
if (static_cast<V>(*ptr) < minvalue) *ptr = minvalue;
}
inline Status DBImpl::FailIfCfHasTs(
const ColumnFamilyHandle* column_family) const {
column_family = column_family ? column_family : DefaultColumnFamily();
assert(column_family);
const Comparator* const ucmp = column_family->GetComparator();
assert(ucmp);
if (ucmp->timestamp_size() > 0) {
std::ostringstream oss;
oss << "cannot call this method on column family "
<< column_family->GetName() << " that enables timestamp";
return Status::InvalidArgument(oss.str());
}
return Status::OK();
}
inline Status DBImpl::FailIfTsSizesMismatch(
const ColumnFamilyHandle* column_family, const Slice& ts) const {
if (!column_family) {
return Status::InvalidArgument("column family handle cannot be null");
}
assert(column_family);
const Comparator* const ucmp = column_family->GetComparator();
assert(ucmp);
if (0 == ucmp->timestamp_size()) {
std::stringstream oss;
oss << "cannot call this method on column family "
<< column_family->GetName() << " that does not enable timestamp";
return Status::InvalidArgument(oss.str());
}
const size_t ts_sz = ts.size();
if (ts_sz != ucmp->timestamp_size()) {
std::stringstream oss;
oss << "Timestamp sizes mismatch: expect " << ucmp->timestamp_size() << ", "
<< ts_sz << " given";
return Status::InvalidArgument(oss.str());
}
return Status::OK();
}
} // namespace ROCKSDB_NAMESPACE

@ -36,6 +36,15 @@ Status DBImplReadOnly::Get(const ReadOptions& read_options,
assert(pinnable_val != nullptr);
// TODO: stopwatch DB_GET needed?, perf timer needed?
PERF_TIMER_GUARD(get_snapshot_time);
assert(column_family);
const Comparator* ucmp = column_family->GetComparator();
assert(ucmp);
if (ucmp->timestamp_size() || read_options.timestamp) {
// TODO: support timestamp
return Status::NotSupported();
}
Status s;
SequenceNumber snapshot = versions_->LastSequence();
auto cfh = static_cast_with_check<ColumnFamilyHandleImpl>(column_family);
@ -73,6 +82,13 @@ Status DBImplReadOnly::Get(const ReadOptions& read_options,
Iterator* DBImplReadOnly::NewIterator(const ReadOptions& read_options,
ColumnFamilyHandle* column_family) {
assert(column_family);
const Comparator* ucmp = column_family->GetComparator();
assert(ucmp);
if (ucmp->timestamp_size() || read_options.timestamp) {
// TODO: support timestamp
return NewErrorIterator(Status::NotSupported());
}
auto cfh = static_cast_with_check<ColumnFamilyHandleImpl>(column_family);
auto cfd = cfh->cfd();
SuperVersion* super_version = cfd->GetSuperVersion()->Ref();
@ -100,6 +116,21 @@ Status DBImplReadOnly::NewIterators(
const ReadOptions& read_options,
const std::vector<ColumnFamilyHandle*>& column_families,
std::vector<Iterator*>* iterators) {
if (read_options.timestamp) {
// TODO: support timestamp
return Status::NotSupported();
} else {
for (auto* cf : column_families) {
assert(cf);
const Comparator* ucmp = cf->GetComparator();
assert(ucmp);
if (ucmp->timestamp_size()) {
// TODO: support timestamp
return Status::NotSupported();
}
}
}
ReadCallback* read_callback = nullptr; // No read callback provided.
if (iterators == nullptr) {
return Status::InvalidArgument("iterators not allowed to be nullptr");

@ -339,6 +339,14 @@ Status DBImplSecondary::GetImpl(const ReadOptions& read_options,
StopWatch sw(immutable_db_options_.clock, stats_, DB_GET);
PERF_TIMER_GUARD(get_snapshot_time);
assert(column_family);
const Comparator* ucmp = column_family->GetComparator();
assert(ucmp);
if (ucmp->timestamp_size() || read_options.timestamp) {
// TODO: support timestamp
return Status::NotSupported();
}
auto cfh = static_cast<ColumnFamilyHandleImpl*>(column_family);
ColumnFamilyData* cfd = cfh->cfd();
if (tracer_) {
@ -404,6 +412,15 @@ Iterator* DBImplSecondary::NewIterator(const ReadOptions& read_options,
return NewErrorIterator(Status::NotSupported(
"ReadTier::kPersistedData is not yet supported in iterators."));
}
assert(column_family);
const Comparator* ucmp = column_family->GetComparator();
assert(ucmp);
if (ucmp->timestamp_size() || read_options.timestamp) {
// TODO: support timestamp
return NewErrorIterator(Status::NotSupported());
}
Iterator* result = nullptr;
auto cfh = static_cast_with_check<ColumnFamilyHandleImpl>(column_family);
auto cfd = cfh->cfd();
@ -460,6 +477,20 @@ Status DBImplSecondary::NewIterators(
if (iterators == nullptr) {
return Status::InvalidArgument("iterators not allowed to be nullptr");
}
if (read_options.timestamp) {
// TODO: support timestamp
return Status::NotSupported();
} else {
for (auto* cf : column_families) {
assert(cf);
const Comparator* ucmp = cf->GetComparator();
assert(ucmp);
if (ucmp->timestamp_size()) {
// TODO: support timestamp
return Status::NotSupported();
}
}
}
iterators->clear();
iterators->reserve(column_families.size());
if (read_options.tailing) {

@ -21,11 +21,28 @@ namespace ROCKSDB_NAMESPACE {
// Convenience methods
Status DBImpl::Put(const WriteOptions& o, ColumnFamilyHandle* column_family,
const Slice& key, const Slice& val) {
const Status s = FailIfCfHasTs(column_family);
if (!s.ok()) {
return s;
}
return DB::Put(o, column_family, key, val);
}
Status DBImpl::Put(const WriteOptions& o, ColumnFamilyHandle* column_family,
const Slice& key, const Slice& ts, const Slice& val) {
const Status s = FailIfTsSizesMismatch(column_family, ts);
if (!s.ok()) {
return s;
}
return DB::Put(o, column_family, key, ts, val);
}
Status DBImpl::Merge(const WriteOptions& o, ColumnFamilyHandle* column_family,
const Slice& key, const Slice& val) {
const Status s = FailIfCfHasTs(column_family);
if (!s.ok()) {
return s;
}
auto cfh = static_cast_with_check<ColumnFamilyHandleImpl>(column_family);
if (!cfh->cfd()->ioptions()->merge_operator) {
return Status::NotSupported("Provide a merge_operator when opening DB");
@ -36,22 +53,61 @@ Status DBImpl::Merge(const WriteOptions& o, ColumnFamilyHandle* column_family,
Status DBImpl::Delete(const WriteOptions& write_options,
ColumnFamilyHandle* column_family, const Slice& key) {
const Status s = FailIfCfHasTs(column_family);
if (!s.ok()) {
return s;
}
return DB::Delete(write_options, column_family, key);
}
Status DBImpl::Delete(const WriteOptions& write_options,
ColumnFamilyHandle* column_family, const Slice& key,
const Slice& ts) {
const Status s = FailIfTsSizesMismatch(column_family, ts);
if (!s.ok()) {
return s;
}
return DB::Delete(write_options, column_family, key, ts);
}
Status DBImpl::SingleDelete(const WriteOptions& write_options,
ColumnFamilyHandle* column_family,
const Slice& key) {
const Status s = FailIfCfHasTs(column_family);
if (!s.ok()) {
return s;
}
return DB::SingleDelete(write_options, column_family, key);
}
Status DBImpl::SingleDelete(const WriteOptions& write_options,
ColumnFamilyHandle* column_family, const Slice& key,
const Slice& ts) {
const Status s = FailIfTsSizesMismatch(column_family, ts);
if (!s.ok()) {
return s;
}
return DB::SingleDelete(write_options, column_family, key, ts);
}
Status DBImpl::DeleteRange(const WriteOptions& write_options,
ColumnFamilyHandle* column_family,
const Slice& begin_key, const Slice& end_key) {
const Status s = FailIfCfHasTs(column_family);
if (!s.ok()) {
return s;
}
return DB::DeleteRange(write_options, column_family, begin_key, end_key);
}
void DBImpl::SetRecoverableStatePreReleaseCallback(
PreReleaseCallback* callback) {
recoverable_state_pre_release_callback_.reset(callback);
}
Status DBImpl::Write(const WriteOptions& write_options, WriteBatch* my_batch) {
return WriteImpl(write_options, my_batch, nullptr, nullptr);
return WriteImpl(write_options, my_batch, /*callback=*/nullptr,
/*log_used=*/nullptr);
}
#ifndef ROCKSDB_LITE
@ -74,6 +130,8 @@ Status DBImpl::WriteImpl(const WriteOptions& write_options,
assert(!seq_per_batch_ || batch_cnt != 0);
if (my_batch == nullptr) {
return Status::Corruption("Batch is nullptr!");
} else if (WriteBatchInternal::TimestampsUpdateNeeded(*my_batch)) {
return Status::InvalidArgument("write batch must have timestamp(s) set");
}
// TODO: this use of operator bool on `tracer_` can avoid unnecessary lock
// grabs but does not seem thread-safe.
@ -283,6 +341,7 @@ Status DBImpl::WriteImpl(const WriteOptions& write_options,
size_t total_byte_size = 0;
size_t pre_release_callback_cnt = 0;
for (auto* writer : write_group) {
assert(writer);
if (writer->CheckCallback(this)) {
valid_batches += writer->batch_cnt;
if (writer->ShouldWriteToMemtable()) {
@ -487,7 +546,8 @@ Status DBImpl::PipelinedWriteImpl(const WriteOptions& write_options,
WriteContext write_context;
WriteThread::Writer w(write_options, my_batch, callback, log_ref,
disable_memtable);
disable_memtable, /*_batch_cnt=*/0,
/*_pre_release_callback=*/nullptr);
write_thread_.JoinBatchGroup(&w);
TEST_SYNC_POINT("DBImplWrite::PipelinedWriteImpl:AfterJoinBatchGroup");
if (w.state == WriteThread::STATE_GROUP_LEADER) {
@ -526,7 +586,8 @@ Status DBImpl::PipelinedWriteImpl(const WriteOptions& write_options,
}
}
SequenceNumber next_sequence = current_sequence;
for (auto writer : wal_write_group) {
for (auto* writer : wal_write_group) {
assert(writer);
if (writer->CheckCallback(this)) {
if (writer->ShouldWriteToMemtable()) {
writer->sequence = next_sequence;
@ -764,6 +825,7 @@ Status DBImpl::WriteImplWALOnly(
size_t pre_release_callback_cnt = 0;
size_t total_byte_size = 0;
for (auto* writer : write_group) {
assert(writer);
if (writer->CheckCallback(this)) {
total_byte_size = WriteBatchInternal::AppendedByteSize(
total_byte_size, WriteBatchInternal::ByteSize(writer->batch));
@ -2019,34 +2081,25 @@ size_t DBImpl::GetWalPreallocateBlockSize(uint64_t write_buffer_size) const {
// can call if they wish
Status DB::Put(const WriteOptions& opt, ColumnFamilyHandle* column_family,
const Slice& key, const Slice& value) {
if (nullptr == opt.timestamp) {
// Pre-allocate size of write batch conservatively.
// 8 bytes are taken by header, 4 bytes for count, 1 byte for type,
// and we allocate 11 extra bytes for key length, as well as value length.
WriteBatch batch(key.size() + value.size() + 24);
Status s = batch.Put(column_family, key, value);
if (!s.ok()) {
return s;
}
return Write(opt, &batch);
}
const Slice* ts = opt.timestamp;
assert(nullptr != ts);
size_t ts_sz = ts->size();
assert(column_family->GetComparator());
assert(ts_sz == column_family->GetComparator()->timestamp_size());
WriteBatch batch;
Status s;
if (key.data() + key.size() == ts->data()) {
Slice key_with_ts = Slice(key.data(), key.size() + ts_sz);
s = batch.Put(column_family, key_with_ts, value);
} else {
std::array<Slice, 2> key_with_ts_slices{{key, *ts}};
SliceParts key_with_ts(key_with_ts_slices.data(), 2);
std::array<Slice, 1> value_slices{{value}};
SliceParts values(value_slices.data(), 1);
s = batch.Put(column_family, key_with_ts, values);
// Pre-allocate size of write batch conservatively.
// 8 bytes are taken by header, 4 bytes for count, 1 byte for type,
// and we allocate 11 extra bytes for key length, as well as value length.
WriteBatch batch(key.size() + value.size() + 24);
Status s = batch.Put(column_family, key, value);
if (!s.ok()) {
return s;
}
return Write(opt, &batch);
}
Status DB::Put(const WriteOptions& opt, ColumnFamilyHandle* column_family,
const Slice& key, const Slice& ts, const Slice& value) {
ColumnFamilyHandle* default_cf = DefaultColumnFamily();
assert(default_cf);
const Comparator* const default_cf_ucmp = default_cf->GetComparator();
assert(default_cf_ucmp);
WriteBatch batch(0, 0, 0, default_cf_ucmp->timestamp_size());
Status s = batch.Put(column_family, key, ts, value);
if (!s.ok()) {
return s;
}
@ -2055,29 +2108,22 @@ Status DB::Put(const WriteOptions& opt, ColumnFamilyHandle* column_family,
Status DB::Delete(const WriteOptions& opt, ColumnFamilyHandle* column_family,
const Slice& key) {
if (nullptr == opt.timestamp) {
WriteBatch batch;
Status s = batch.Delete(column_family, key);
if (!s.ok()) {
return s;
}
return Write(opt, &batch);
}
const Slice* ts = opt.timestamp;
assert(ts != nullptr);
size_t ts_sz = ts->size();
assert(column_family->GetComparator());
assert(ts_sz == column_family->GetComparator()->timestamp_size());
WriteBatch batch;
Status s;
if (key.data() + key.size() == ts->data()) {
Slice key_with_ts = Slice(key.data(), key.size() + ts_sz);
s = batch.Delete(column_family, key_with_ts);
} else {
std::array<Slice, 2> key_with_ts_slices{{key, *ts}};
SliceParts key_with_ts(key_with_ts_slices.data(), 2);
s = batch.Delete(column_family, key_with_ts);
Status s = batch.Delete(column_family, key);
if (!s.ok()) {
return s;
}
return Write(opt, &batch);
}
Status DB::Delete(const WriteOptions& opt, ColumnFamilyHandle* column_family,
const Slice& key, const Slice& ts) {
ColumnFamilyHandle* default_cf = DefaultColumnFamily();
assert(default_cf);
const Comparator* const default_cf_ucmp = default_cf->GetComparator();
assert(default_cf_ucmp);
WriteBatch batch(0, 0, 0, default_cf_ucmp->timestamp_size());
Status s = batch.Delete(column_family, key, ts);
if (!s.ok()) {
return s;
}
@ -2086,36 +2132,27 @@ Status DB::Delete(const WriteOptions& opt, ColumnFamilyHandle* column_family,
Status DB::SingleDelete(const WriteOptions& opt,
ColumnFamilyHandle* column_family, const Slice& key) {
Status s;
if (opt.timestamp == nullptr) {
WriteBatch batch;
s = batch.SingleDelete(column_family, key);
if (!s.ok()) {
return s;
}
s = Write(opt, &batch);
WriteBatch batch;
Status s = batch.SingleDelete(column_family, key);
if (!s.ok()) {
return s;
}
return Write(opt, &batch);
}
const Slice* ts = opt.timestamp;
assert(ts != nullptr);
size_t ts_sz = ts->size();
assert(column_family->GetComparator());
assert(ts_sz == column_family->GetComparator()->timestamp_size());
WriteBatch batch;
if (key.data() + key.size() == ts->data()) {
Slice key_with_ts = Slice(key.data(), key.size() + ts_sz);
s = batch.SingleDelete(column_family, key_with_ts);
} else {
std::array<Slice, 2> key_with_ts_slices{{key, *ts}};
SliceParts key_with_ts(key_with_ts_slices.data(), 2);
s = batch.SingleDelete(column_family, key_with_ts);
}
Status DB::SingleDelete(const WriteOptions& opt,
ColumnFamilyHandle* column_family, const Slice& key,
const Slice& ts) {
ColumnFamilyHandle* default_cf = DefaultColumnFamily();
assert(default_cf);
const Comparator* const default_cf_ucmp = default_cf->GetComparator();
assert(default_cf_ucmp);
WriteBatch batch(0, 0, 0, default_cf_ucmp->timestamp_size());
Status s = batch.SingleDelete(column_family, key, ts);
if (!s.ok()) {
return s;
}
s = Write(opt, &batch);
return s;
return Write(opt, &batch);
}
Status DB::DeleteRange(const WriteOptions& opt,

@ -37,7 +37,7 @@ class DbKvChecksumTest
std::pair<WriteBatch, Status> GetWriteBatch(ColumnFamilyHandle* cf_handle) {
Status s;
WriteBatch wb(0 /* reserved_bytes */, 0 /* max_bytes */,
8 /* protection_bytes_per_entry */);
8 /* protection_bytes_per_entry */, 0 /* default_cf_ts_sz */);
switch (op_type_) {
case WriteBatchOpType::kPut:
s = wb.Put(cf_handle, "key", "val");

@ -2862,6 +2862,11 @@ class ModelDB : public DB {
}
return Write(o, &batch);
}
Status Put(const WriteOptions& /*o*/, ColumnFamilyHandle* /*cf*/,
const Slice& /*k*/, const Slice& /*ts*/,
const Slice& /*v*/) override {
return Status::NotSupported();
}
using DB::Close;
Status Close() override { return Status::OK(); }
using DB::Delete;
@ -2874,6 +2879,10 @@ class ModelDB : public DB {
}
return Write(o, &batch);
}
Status Delete(const WriteOptions& /*o*/, ColumnFamilyHandle* /*cf*/,
const Slice& /*key*/, const Slice& /*ts*/) override {
return Status::NotSupported();
}
using DB::SingleDelete;
Status SingleDelete(const WriteOptions& o, ColumnFamilyHandle* cf,
const Slice& key) override {
@ -2884,6 +2893,10 @@ class ModelDB : public DB {
}
return Write(o, &batch);
}
Status SingleDelete(const WriteOptions& /*o*/, ColumnFamilyHandle* /*cf*/,
const Slice& /*key*/, const Slice& /*ts*/) override {
return Status::NotSupported();
}
using DB::Merge;
Status Merge(const WriteOptions& o, ColumnFamilyHandle* cf, const Slice& k,
const Slice& v) override {

@ -6845,16 +6845,13 @@ TEST_F(DBTest2, GetLatestSeqAndTsForKey) {
constexpr uint64_t kTsU64Value = 12;
for (uint64_t key = 0; key < 100; ++key) {
std::string ts_str;
PutFixed64(&ts_str, kTsU64Value);
Slice ts = ts_str;
WriteOptions write_opts;
write_opts.timestamp = &ts;
std::string ts;
PutFixed64(&ts, kTsU64Value);
std::string key_str;
PutFixed64(&key_str, key);
std::reverse(key_str.begin(), key_str.end());
ASSERT_OK(Put(key_str, "value", write_opts));
ASSERT_OK(db_->Put(WriteOptions(), key_str, ts, "value"));
}
ASSERT_OK(Flush());

File diff suppressed because it is too large Load Diff

@ -78,9 +78,8 @@ TEST_F(TimestampCompatibleCompactionTest, UserKeyCrossFileBoundary) {
WriteOptions write_opts;
for (; key < kNumKeysPerFile - 1; ++key, ++ts) {
std::string ts_str = Timestamp(ts);
Slice ts_slice = ts_str;
write_opts.timestamp = &ts_slice;
ASSERT_OK(db_->Put(write_opts, Key1(key), "foo_" + std::to_string(key)));
ASSERT_OK(
db_->Put(write_opts, Key1(key), ts_str, "foo_" + std::to_string(key)));
}
// Write another L0 with keys 99 with newer ts.
ASSERT_OK(Flush());
@ -88,18 +87,16 @@ TEST_F(TimestampCompatibleCompactionTest, UserKeyCrossFileBoundary) {
key = 99;
for (int i = 0; i < 4; ++i, ++ts) {
std::string ts_str = Timestamp(ts);
Slice ts_slice = ts_str;
write_opts.timestamp = &ts_slice;
ASSERT_OK(db_->Put(write_opts, Key1(key), "bar_" + std::to_string(key)));
ASSERT_OK(
db_->Put(write_opts, Key1(key), ts_str, "bar_" + std::to_string(key)));
}
ASSERT_OK(Flush());
uint64_t saved_read_ts2 = ts++;
// Write another L0 with keys 99, 100, 101, ..., 150
for (; key <= 150; ++key, ++ts) {
std::string ts_str = Timestamp(ts);
Slice ts_slice = ts_str;
write_opts.timestamp = &ts_slice;
ASSERT_OK(db_->Put(write_opts, Key1(key), "foo1_" + std::to_string(key)));
ASSERT_OK(
db_->Put(write_opts, Key1(key), ts_str, "foo1_" + std::to_string(key)));
}
ASSERT_OK(Flush());
// Wait for compaction to finish

@ -160,8 +160,11 @@ WriteBatch::WriteBatch(size_t reserved_bytes, size_t max_bytes)
}
WriteBatch::WriteBatch(size_t reserved_bytes, size_t max_bytes,
size_t protection_bytes_per_key)
: content_flags_(0), max_bytes_(max_bytes), rep_() {
size_t protection_bytes_per_key, size_t default_cf_ts_sz)
: content_flags_(0),
max_bytes_(max_bytes),
default_cf_ts_sz_(default_cf_ts_sz),
rep_() {
// Currently `protection_bytes_per_key` can only be enabled at 8 bytes per
// entry.
assert(protection_bytes_per_key == 0 || protection_bytes_per_key == 8);
@ -186,6 +189,7 @@ WriteBatch::WriteBatch(const WriteBatch& src)
: wal_term_point_(src.wal_term_point_),
content_flags_(src.content_flags_.load(std::memory_order_relaxed)),
max_bytes_(src.max_bytes_),
default_cf_ts_sz_(src.default_cf_ts_sz_),
rep_(src.rep_) {
if (src.save_points_ != nullptr) {
save_points_.reset(new SavePoints());
@ -203,6 +207,7 @@ WriteBatch::WriteBatch(WriteBatch&& src) noexcept
content_flags_(src.content_flags_.load(std::memory_order_relaxed)),
max_bytes_(src.max_bytes_),
prot_info_(std::move(src.prot_info_)),
default_cf_ts_sz_(src.default_cf_ts_sz_),
rep_(std::move(src.rep_)) {}
WriteBatch& WriteBatch::operator=(const WriteBatch& src) {
@ -250,6 +255,7 @@ void WriteBatch::Clear() {
prot_info_->entries_.clear();
}
wal_term_point_.clear();
default_cf_ts_sz_ = 0;
}
uint32_t WriteBatch::Count() const { return WriteBatchInternal::Count(this); }
@ -700,6 +706,45 @@ size_t WriteBatchInternal::GetFirstOffset(WriteBatch* /*b*/) {
return WriteBatchInternal::kHeader;
}
std::tuple<Status, uint32_t, size_t>
WriteBatchInternal::GetColumnFamilyIdAndTimestampSize(
WriteBatch* b, ColumnFamilyHandle* column_family) {
uint32_t cf_id = GetColumnFamilyID(column_family);
size_t ts_sz = 0;
Status s;
if (column_family) {
const Comparator* const ucmp = column_family->GetComparator();
if (ucmp) {
ts_sz = ucmp->timestamp_size();
if (0 == cf_id && b->default_cf_ts_sz_ != ts_sz) {
s = Status::InvalidArgument("Default cf timestamp size mismatch");
}
}
} else if (b->default_cf_ts_sz_ > 0) {
ts_sz = b->default_cf_ts_sz_;
}
return std::make_tuple(s, cf_id, ts_sz);
}
namespace {
Status CheckColumnFamilyTimestampSize(ColumnFamilyHandle* column_family,
const Slice& ts) {
if (!column_family) {
return Status::InvalidArgument("column family handle cannot be null");
}
const Comparator* const ucmp = column_family->GetComparator();
assert(ucmp);
size_t cf_ts_sz = ucmp->timestamp_size();
if (0 == cf_ts_sz) {
return Status::InvalidArgument("timestamp disabled");
}
if (cf_ts_sz != ts.size()) {
return Status::InvalidArgument("timestamp size mismatch");
}
return Status::OK();
}
} // namespace
Status WriteBatchInternal::Put(WriteBatch* b, uint32_t column_family_id,
const Slice& key, const Slice& value) {
if (key.size() > size_t{port::kMaxUint32}) {
@ -738,8 +783,40 @@ Status WriteBatchInternal::Put(WriteBatch* b, uint32_t column_family_id,
Status WriteBatch::Put(ColumnFamilyHandle* column_family, const Slice& key,
const Slice& value) {
return WriteBatchInternal::Put(this, GetColumnFamilyID(column_family), key,
value);
size_t ts_sz = 0;
uint32_t cf_id = 0;
Status s;
std::tie(s, cf_id, ts_sz) =
WriteBatchInternal::GetColumnFamilyIdAndTimestampSize(this,
column_family);
if (!s.ok()) {
return s;
}
if (0 == ts_sz) {
return WriteBatchInternal::Put(this, cf_id, key, value);
}
needs_in_place_update_ts_ = true;
std::string dummy_ts(ts_sz, '\0');
std::array<Slice, 2> key_with_ts{{key, dummy_ts}};
return WriteBatchInternal::Put(this, cf_id, SliceParts(key_with_ts.data(), 2),
SliceParts(&value, 1));
}
Status WriteBatch::Put(ColumnFamilyHandle* column_family, const Slice& key,
const Slice& ts, const Slice& value) {
const Status s = CheckColumnFamilyTimestampSize(column_family, ts);
if (!s.ok()) {
return s;
}
assert(column_family);
uint32_t cf_id = column_family->GetID();
std::array<Slice, 2> key_with_ts{{key, ts}};
return WriteBatchInternal::Put(this, cf_id, SliceParts(key_with_ts.data(), 2),
SliceParts(&value, 1));
}
Status WriteBatchInternal::CheckSlicePartsLength(const SliceParts& key,
@ -794,8 +871,24 @@ Status WriteBatchInternal::Put(WriteBatch* b, uint32_t column_family_id,
Status WriteBatch::Put(ColumnFamilyHandle* column_family, const SliceParts& key,
const SliceParts& value) {
return WriteBatchInternal::Put(this, GetColumnFamilyID(column_family), key,
value);
size_t ts_sz = 0;
uint32_t cf_id = 0;
Status s;
std::tie(s, cf_id, ts_sz) =
WriteBatchInternal::GetColumnFamilyIdAndTimestampSize(this,
column_family);
if (!s.ok()) {
return s;
}
if (ts_sz == 0) {
return WriteBatchInternal::Put(this, cf_id, key, value);
}
return Status::InvalidArgument(
"Cannot call this method on column family enabling timestamp");
}
Status WriteBatchInternal::InsertNoop(WriteBatch* b) {
@ -892,8 +985,40 @@ Status WriteBatchInternal::Delete(WriteBatch* b, uint32_t column_family_id,
}
Status WriteBatch::Delete(ColumnFamilyHandle* column_family, const Slice& key) {
return WriteBatchInternal::Delete(this, GetColumnFamilyID(column_family),
key);
size_t ts_sz = 0;
uint32_t cf_id = 0;
Status s;
std::tie(s, cf_id, ts_sz) =
WriteBatchInternal::GetColumnFamilyIdAndTimestampSize(this,
column_family);
if (!s.ok()) {
return s;
}
if (0 == ts_sz) {
return WriteBatchInternal::Delete(this, cf_id, key);
}
needs_in_place_update_ts_ = true;
std::string dummy_ts(ts_sz, '\0');
std::array<Slice, 2> key_with_ts{{key, dummy_ts}};
return WriteBatchInternal::Delete(this, cf_id,
SliceParts(key_with_ts.data(), 2));
}
Status WriteBatch::Delete(ColumnFamilyHandle* column_family, const Slice& key,
const Slice& ts) {
const Status s = CheckColumnFamilyTimestampSize(column_family, ts);
if (!s.ok()) {
return s;
}
assert(column_family);
uint32_t cf_id = column_family->GetID();
std::array<Slice, 2> key_with_ts{{key, ts}};
return WriteBatchInternal::Delete(this, cf_id,
SliceParts(key_with_ts.data(), 2));
}
Status WriteBatchInternal::Delete(WriteBatch* b, uint32_t column_family_id,
@ -925,8 +1050,24 @@ Status WriteBatchInternal::Delete(WriteBatch* b, uint32_t column_family_id,
Status WriteBatch::Delete(ColumnFamilyHandle* column_family,
const SliceParts& key) {
return WriteBatchInternal::Delete(this, GetColumnFamilyID(column_family),
key);
size_t ts_sz = 0;
uint32_t cf_id = 0;
Status s;
std::tie(s, cf_id, ts_sz) =
WriteBatchInternal::GetColumnFamilyIdAndTimestampSize(this,
column_family);
if (!s.ok()) {
return s;
}
if (0 == ts_sz) {
return WriteBatchInternal::Delete(this, cf_id, key);
}
return Status::InvalidArgument(
"Cannot call this method on column family enabling timestamp");
}
Status WriteBatchInternal::SingleDelete(WriteBatch* b,
@ -957,8 +1098,40 @@ Status WriteBatchInternal::SingleDelete(WriteBatch* b,
Status WriteBatch::SingleDelete(ColumnFamilyHandle* column_family,
const Slice& key) {
return WriteBatchInternal::SingleDelete(
this, GetColumnFamilyID(column_family), key);
size_t ts_sz = 0;
uint32_t cf_id = 0;
Status s;
std::tie(s, cf_id, ts_sz) =
WriteBatchInternal::GetColumnFamilyIdAndTimestampSize(this,
column_family);
if (!s.ok()) {
return s;
}
if (0 == ts_sz) {
return WriteBatchInternal::SingleDelete(this, cf_id, key);
}
needs_in_place_update_ts_ = true;
std::string dummy_ts(ts_sz, '\0');
std::array<Slice, 2> key_with_ts{{key, dummy_ts}};
return WriteBatchInternal::SingleDelete(this, cf_id,
SliceParts(key_with_ts.data(), 2));
}
Status WriteBatch::SingleDelete(ColumnFamilyHandle* column_family,
const Slice& key, const Slice& ts) {
const Status s = CheckColumnFamilyTimestampSize(column_family, ts);
if (!s.ok()) {
return s;
}
assert(column_family);
uint32_t cf_id = column_family->GetID();
std::array<Slice, 2> key_with_ts{{key, ts}};
return WriteBatchInternal::SingleDelete(this, cf_id,
SliceParts(key_with_ts.data(), 2));
}
Status WriteBatchInternal::SingleDelete(WriteBatch* b,
@ -992,8 +1165,24 @@ Status WriteBatchInternal::SingleDelete(WriteBatch* b,
Status WriteBatch::SingleDelete(ColumnFamilyHandle* column_family,
const SliceParts& key) {
return WriteBatchInternal::SingleDelete(
this, GetColumnFamilyID(column_family), key);
size_t ts_sz = 0;
uint32_t cf_id = 0;
Status s;
std::tie(s, cf_id, ts_sz) =
WriteBatchInternal::GetColumnFamilyIdAndTimestampSize(this,
column_family);
if (!s.ok()) {
return s;
}
if (0 == ts_sz) {
return WriteBatchInternal::SingleDelete(this, cf_id, key);
}
return Status::InvalidArgument(
"Cannot call this method on column family enabling timestamp");
}
Status WriteBatchInternal::DeleteRange(WriteBatch* b, uint32_t column_family_id,
@ -1026,8 +1215,24 @@ Status WriteBatchInternal::DeleteRange(WriteBatch* b, uint32_t column_family_id,
Status WriteBatch::DeleteRange(ColumnFamilyHandle* column_family,
const Slice& begin_key, const Slice& end_key) {
return WriteBatchInternal::DeleteRange(this, GetColumnFamilyID(column_family),
begin_key, end_key);
size_t ts_sz = 0;
uint32_t cf_id = 0;
Status s;
std::tie(s, cf_id, ts_sz) =
WriteBatchInternal::GetColumnFamilyIdAndTimestampSize(this,
column_family);
if (!s.ok()) {
return s;
}
if (0 == ts_sz) {
return WriteBatchInternal::DeleteRange(this, cf_id, begin_key, end_key);
}
return Status::InvalidArgument(
"Cannot call this method on column family enabling timestamp");
}
Status WriteBatchInternal::DeleteRange(WriteBatch* b, uint32_t column_family_id,
@ -1061,8 +1266,24 @@ Status WriteBatchInternal::DeleteRange(WriteBatch* b, uint32_t column_family_id,
Status WriteBatch::DeleteRange(ColumnFamilyHandle* column_family,
const SliceParts& begin_key,
const SliceParts& end_key) {
return WriteBatchInternal::DeleteRange(this, GetColumnFamilyID(column_family),
begin_key, end_key);
size_t ts_sz = 0;
uint32_t cf_id = 0;
Status s;
std::tie(s, cf_id, ts_sz) =
WriteBatchInternal::GetColumnFamilyIdAndTimestampSize(this,
column_family);
if (!s.ok()) {
return s;
}
if (0 == ts_sz) {
return WriteBatchInternal::DeleteRange(this, cf_id, begin_key, end_key);
}
return Status::InvalidArgument(
"Cannot call this method on column family enabling timestamp");
}
Status WriteBatchInternal::Merge(WriteBatch* b, uint32_t column_family_id,
@ -1099,8 +1320,24 @@ Status WriteBatchInternal::Merge(WriteBatch* b, uint32_t column_family_id,
Status WriteBatch::Merge(ColumnFamilyHandle* column_family, const Slice& key,
const Slice& value) {
return WriteBatchInternal::Merge(this, GetColumnFamilyID(column_family), key,
value);
size_t ts_sz = 0;
uint32_t cf_id = 0;
Status s;
std::tie(s, cf_id, ts_sz) =
WriteBatchInternal::GetColumnFamilyIdAndTimestampSize(this,
column_family);
if (!s.ok()) {
return s;
}
if (0 == ts_sz) {
return WriteBatchInternal::Merge(this, cf_id, key, value);
}
return Status::InvalidArgument(
"Cannot call this method on column family enabling timestamp");
}
Status WriteBatchInternal::Merge(WriteBatch* b, uint32_t column_family_id,
@ -1136,8 +1373,24 @@ Status WriteBatchInternal::Merge(WriteBatch* b, uint32_t column_family_id,
Status WriteBatch::Merge(ColumnFamilyHandle* column_family,
const SliceParts& key, const SliceParts& value) {
return WriteBatchInternal::Merge(this, GetColumnFamilyID(column_family), key,
value);
size_t ts_sz = 0;
uint32_t cf_id = 0;
Status s;
std::tie(s, cf_id, ts_sz) =
WriteBatchInternal::GetColumnFamilyIdAndTimestampSize(this,
column_family);
if (!s.ok()) {
return s;
}
if (0 == ts_sz) {
return WriteBatchInternal::Merge(this, cf_id, key, value);
}
return Status::InvalidArgument(
"Cannot call this method on column family enabling timestamp");
}
Status WriteBatchInternal::PutBlobIndex(WriteBatch* b,
@ -1223,18 +1476,15 @@ Status WriteBatch::PopSavePoint() {
return Status::OK();
}
Status WriteBatch::AssignTimestamp(
const Slice& ts, std::function<Status(uint32_t, size_t&)> checker) {
TimestampAssigner ts_assigner(prot_info_.get(), std::move(checker), ts);
return Iterate(&ts_assigner);
}
Status WriteBatch::AssignTimestamps(
const std::vector<Slice>& ts_list,
std::function<Status(uint32_t, size_t&)> checker) {
SimpleListTimestampAssigner ts_assigner(prot_info_.get(), std::move(checker),
ts_list);
return Iterate(&ts_assigner);
Status WriteBatch::UpdateTimestamps(
const Slice& ts, std::function<size_t(uint32_t)> ts_sz_func) {
TimestampUpdater<decltype(ts_sz_func)> ts_updater(prot_info_.get(),
std::move(ts_sz_func), ts);
const Status s = Iterate(&ts_updater);
if (s.ok()) {
needs_in_place_update_ts_ = false;
}
return s;
}
class MemTableInserter : public WriteBatch::Handler {
@ -2189,24 +2439,20 @@ class MemTableInserter : public WriteBatch::Handler {
const auto& batch_info = trx->batches_.begin()->second;
// all inserts must reference this trx log number
log_number_ref_ = batch_info.log_number_;
const auto checker = [this](uint32_t cf, size_t& ts_sz) {
assert(db_);
VersionSet* const vset = db_->GetVersionSet();
assert(vset);
ColumnFamilySet* const cf_set = vset->GetColumnFamilySet();
assert(cf_set);
ColumnFamilyData* cfd = cf_set->GetColumnFamily(cf);
assert(cfd);
const auto* const ucmp = cfd->user_comparator();
assert(ucmp);
if (ucmp->timestamp_size() == 0) {
ts_sz = 0;
} else if (ucmp->timestamp_size() != ts_sz) {
return Status::InvalidArgument("Timestamp size mismatch");
}
return Status::OK();
};
s = batch_info.batch_->AssignTimestamp(commit_ts, checker);
s = batch_info.batch_->UpdateTimestamps(
commit_ts, [this](uint32_t cf) {
assert(db_);
VersionSet* const vset = db_->GetVersionSet();
assert(vset);
ColumnFamilySet* const cf_set = vset->GetColumnFamilySet();
assert(cf_set);
ColumnFamilyData* cfd = cf_set->GetColumnFamily(cf);
assert(cfd);
const auto* const ucmp = cfd->user_comparator();
assert(ucmp);
return ucmp->timestamp_size();
});
if (s.ok()) {
s = batch_info.batch_->Iterate(this);
log_number_ref_ = 0;

@ -79,7 +79,7 @@ class WriteBatchInternal {
public:
// WriteBatch header has an 8-byte sequence number followed by a 4-byte count.
static const size_t kHeader = 12;
static constexpr size_t kHeader = 12;
// WriteBatch methods with column_family_id instead of ColumnFamilyHandle*
static Status Put(WriteBatch* batch, uint32_t column_family_id,
@ -221,6 +221,13 @@ class WriteBatchInternal {
// state meant to be used only during recovery.
static void SetAsLatestPersistentState(WriteBatch* b);
static bool IsLatestPersistentState(const WriteBatch* b);
static std::tuple<Status, uint32_t, size_t> GetColumnFamilyIdAndTimestampSize(
WriteBatch* b, ColumnFamilyHandle* column_family);
static bool TimestampsUpdateNeeded(const WriteBatch& wb) {
return wb.needs_in_place_update_ts_;
}
};
// LocalSavePoint is similar to a scope guard
@ -265,39 +272,42 @@ class LocalSavePoint {
#endif
};
template <typename Derived>
class TimestampAssignerBase : public WriteBatch::Handler {
template <typename TimestampSizeFuncType>
class TimestampUpdater : public WriteBatch::Handler {
public:
explicit TimestampAssignerBase(
WriteBatch::ProtectionInfo* prot_info,
std::function<Status(uint32_t, size_t&)>&& checker)
: prot_info_(prot_info), checker_(std::move(checker)) {}
explicit TimestampUpdater(WriteBatch::ProtectionInfo* prot_info,
TimestampSizeFuncType&& ts_sz_func, const Slice& ts)
: prot_info_(prot_info),
ts_sz_func_(std::move(ts_sz_func)),
timestamp_(ts) {
assert(!timestamp_.empty());
}
~TimestampAssignerBase() override {}
~TimestampUpdater() override {}
Status PutCF(uint32_t cf, const Slice& key, const Slice&) override {
return AssignTimestamp(cf, key);
return UpdateTimestamp(cf, key);
}
Status DeleteCF(uint32_t cf, const Slice& key) override {
return AssignTimestamp(cf, key);
return UpdateTimestamp(cf, key);
}
Status SingleDeleteCF(uint32_t cf, const Slice& key) override {
return AssignTimestamp(cf, key);
return UpdateTimestamp(cf, key);
}
Status DeleteRangeCF(uint32_t cf, const Slice& begin_key,
const Slice&) override {
return AssignTimestamp(cf, begin_key);
return UpdateTimestamp(cf, begin_key);
}
Status MergeCF(uint32_t cf, const Slice& key, const Slice&) override {
return AssignTimestamp(cf, key);
return UpdateTimestamp(cf, key);
}
Status PutBlobIndexCF(uint32_t cf, const Slice& key, const Slice&) override {
return AssignTimestamp(cf, key);
return UpdateTimestamp(cf, key);
}
Status MarkBeginPrepare(bool) override { return Status::OK(); }
@ -314,25 +324,32 @@ class TimestampAssignerBase : public WriteBatch::Handler {
Status MarkNoop(bool /*empty_batch*/) override { return Status::OK(); }
protected:
Status AssignTimestamp(uint32_t cf, const Slice& key) {
Status s = static_cast_with_check<Derived>(this)->AssignTimestampImpl(
cf, key, idx_);
private:
Status UpdateTimestamp(uint32_t cf, const Slice& key) {
Status s = UpdateTimestampImpl(cf, key, idx_);
++idx_;
return s;
}
Status CheckTimestampSize(uint32_t cf, size_t& ts_sz) {
return checker_(cf, ts_sz);
}
Status UpdateTimestampIfNeeded(size_t ts_sz, const Slice& key,
const Slice& ts) {
if (ts_sz > 0) {
assert(ts_sz == ts.size());
UpdateProtectionInformationIfNeeded(key, ts);
UpdateTimestamp(key, ts);
Status UpdateTimestampImpl(uint32_t cf, const Slice& key, size_t /*idx*/) {
if (timestamp_.empty()) {
return Status::InvalidArgument("Timestamp is empty");
}
size_t cf_ts_sz = ts_sz_func_(cf);
if (0 == cf_ts_sz) {
// Skip this column family.
return Status::OK();
} else if (std::numeric_limits<size_t>::max() == cf_ts_sz) {
// Column family timestamp info not found.
return Status::NotFound();
} else if (cf_ts_sz != timestamp_.size()) {
return Status::InvalidArgument("timestamp size mismatch");
}
UpdateProtectionInformationIfNeeded(key, timestamp_);
char* ptr = const_cast<char*>(key.data() + key.size() - cf_ts_sz);
assert(ptr);
memcpy(ptr, timestamp_.data(), timestamp_.size());
return Status::OK();
}
@ -347,83 +364,16 @@ class TimestampAssignerBase : public WriteBatch::Handler {
}
}
void UpdateTimestamp(const Slice& key, const Slice& ts) {
const size_t ts_sz = ts.size();
char* ptr = const_cast<char*>(key.data() + key.size() - ts_sz);
assert(ptr);
memcpy(ptr, ts.data(), ts_sz);
}
// No copy or move.
TimestampAssignerBase(const TimestampAssignerBase&) = delete;
TimestampAssignerBase(TimestampAssignerBase&&) = delete;
TimestampAssignerBase& operator=(const TimestampAssignerBase&) = delete;
TimestampAssignerBase& operator=(TimestampAssignerBase&&) = delete;
TimestampUpdater(const TimestampUpdater&) = delete;
TimestampUpdater(TimestampUpdater&&) = delete;
TimestampUpdater& operator=(const TimestampUpdater&) = delete;
TimestampUpdater& operator=(TimestampUpdater&&) = delete;
WriteBatch::ProtectionInfo* const prot_info_ = nullptr;
const std::function<Status(uint32_t, size_t&)> checker_{};
size_t idx_ = 0;
};
class SimpleListTimestampAssigner
: public TimestampAssignerBase<SimpleListTimestampAssigner> {
public:
explicit SimpleListTimestampAssigner(
WriteBatch::ProtectionInfo* prot_info,
std::function<Status(uint32_t, size_t&)>&& checker,
const std::vector<Slice>& timestamps)
: TimestampAssignerBase<SimpleListTimestampAssigner>(prot_info,
std::move(checker)),
timestamps_(timestamps) {}
~SimpleListTimestampAssigner() override {}
private:
friend class TimestampAssignerBase<SimpleListTimestampAssigner>;
Status AssignTimestampImpl(uint32_t cf, const Slice& key, size_t idx) {
if (idx >= timestamps_.size()) {
return Status::InvalidArgument("Need more timestamps for the assignment");
}
const Slice& ts = timestamps_[idx];
size_t ts_sz = ts.size();
const Status s = this->CheckTimestampSize(cf, ts_sz);
if (!s.ok()) {
return s;
}
return this->UpdateTimestampIfNeeded(ts_sz, key, ts);
}
const std::vector<Slice>& timestamps_;
};
class TimestampAssigner : public TimestampAssignerBase<TimestampAssigner> {
public:
explicit TimestampAssigner(WriteBatch::ProtectionInfo* prot_info,
std::function<Status(uint32_t, size_t&)>&& checker,
const Slice& ts)
: TimestampAssignerBase<TimestampAssigner>(prot_info, std::move(checker)),
timestamp_(ts) {
assert(!timestamp_.empty());
}
~TimestampAssigner() override {}
private:
friend class TimestampAssignerBase<TimestampAssigner>;
Status AssignTimestampImpl(uint32_t cf, const Slice& key, size_t /*idx*/) {
if (timestamp_.empty()) {
return Status::InvalidArgument("Timestamp is empty");
}
size_t ts_sz = timestamp_.size();
const Status s = this->CheckTimestampSize(cf, ts_sz);
if (!s.ok()) {
return s;
}
return this->UpdateTimestampIfNeeded(ts_sz, key, timestamp_);
}
const TimestampSizeFuncType ts_sz_func_{};
const Slice timestamp_;
size_t idx_ = 0;
};
} // namespace ROCKSDB_NAMESPACE

@ -949,7 +949,48 @@ Status CheckTimestampsInWriteBatch(
}
} // namespace
TEST_F(WriteBatchTest, AssignTimestamps) {
TEST_F(WriteBatchTest, SanityChecks) {
ColumnFamilyHandleImplDummy cf0(0, test::ComparatorWithU64Ts());
ColumnFamilyHandleImplDummy cf4(4);
WriteBatch wb(0, 0, 0, /*default_cf_ts_sz=*/sizeof(uint64_t));
// Sanity checks for the new WriteBatch APIs with extra 'ts' arg.
ASSERT_TRUE(wb.Put(nullptr, "key", "ts", "value").IsInvalidArgument());
ASSERT_TRUE(wb.Delete(nullptr, "key", "ts").IsInvalidArgument());
ASSERT_TRUE(wb.SingleDelete(nullptr, "key", "ts").IsInvalidArgument());
ASSERT_TRUE(wb.Merge(nullptr, "key", "ts", "value").IsNotSupported());
ASSERT_TRUE(
wb.DeleteRange(nullptr, "begin_key", "end_key", "ts").IsNotSupported());
ASSERT_TRUE(wb.Put(&cf4, "key", "ts", "value").IsInvalidArgument());
ASSERT_TRUE(wb.Delete(&cf4, "key", "ts").IsInvalidArgument());
ASSERT_TRUE(wb.SingleDelete(&cf4, "key", "ts").IsInvalidArgument());
ASSERT_TRUE(wb.Merge(&cf4, "key", "ts", "value").IsNotSupported());
ASSERT_TRUE(
wb.DeleteRange(&cf4, "begin_key", "end_key", "ts").IsNotSupported());
constexpr size_t wrong_ts_sz = 1 + sizeof(uint64_t);
std::string ts(wrong_ts_sz, '\0');
ASSERT_TRUE(wb.Put(&cf0, "key", ts, "value").IsInvalidArgument());
ASSERT_TRUE(wb.Delete(&cf0, "key", ts).IsInvalidArgument());
ASSERT_TRUE(wb.SingleDelete(&cf0, "key", ts).IsInvalidArgument());
ASSERT_TRUE(wb.Merge(&cf0, "key", ts, "value").IsNotSupported());
ASSERT_TRUE(
wb.DeleteRange(&cf0, "begin_key", "end_key", ts).IsNotSupported());
// Sanity checks for the new WriteBatch APIs without extra 'ts' arg.
WriteBatch wb1(0, 0, 0, wrong_ts_sz);
ASSERT_TRUE(wb1.Put(&cf0, "key", "value").IsInvalidArgument());
ASSERT_TRUE(wb1.Delete(&cf0, "key").IsInvalidArgument());
ASSERT_TRUE(wb1.SingleDelete(&cf0, "key").IsInvalidArgument());
ASSERT_TRUE(wb1.Merge(&cf0, "key", "value").IsInvalidArgument());
ASSERT_TRUE(
wb1.DeleteRange(&cf0, "begin_key", "end_key").IsInvalidArgument());
}
TEST_F(WriteBatchTest, UpdateTimestamps) {
// We assume the last eight bytes of each key is reserved for timestamps.
// Therefore, we must make sure each key is longer than eight bytes.
constexpr size_t key_size = 16;
@ -974,21 +1015,17 @@ TEST_F(WriteBatchTest, AssignTimestamps) {
}
static constexpr size_t timestamp_size = sizeof(uint64_t);
const auto checker1 = [](uint32_t cf, size_t& ts_sz) {
const auto checker1 = [](uint32_t cf) {
if (cf == 4 || cf == 5) {
if (ts_sz != timestamp_size) {
return Status::InvalidArgument("Timestamp size mismatch");
}
return timestamp_size;
} else if (cf == 0) {
ts_sz = 0;
return Status::OK();
return static_cast<size_t>(0);
} else {
return Status::Corruption("Invalid cf");
return std::numeric_limits<size_t>::max();
}
return Status::OK();
};
ASSERT_OK(
batch.AssignTimestamp(std::string(timestamp_size, '\xfe'), checker1));
batch.UpdateTimestamps(std::string(timestamp_size, '\xfe'), checker1));
ASSERT_OK(CheckTimestampsInWriteBatch(
batch, std::string(timestamp_size, '\xfe'), cf_to_ucmps));
@ -1001,65 +1038,30 @@ TEST_F(WriteBatchTest, AssignTimestamps) {
// mapping from cf to user comparators. If indexing is disabled, a transaction
// writes directly to the underlying raw WriteBatch. We will need to track the
// comparator information for the column families to which un-indexed writes
// are performed. When calling AssignTimestamp(s) API of WriteBatch, we need
// are performed. When calling UpdateTimestamp API of WriteBatch, we need
// indexed_cf_to_ucmps, non_indexed_cfs_with_ts, and timestamp_size to perform
// checking.
std::unordered_map<uint32_t, const Comparator*> indexed_cf_to_ucmps = {
{0, cf0.GetComparator()}, {4, cf4.GetComparator()}};
std::unordered_set<uint32_t> non_indexed_cfs_with_ts = {cf5.GetID()};
const auto checker2 = [&indexed_cf_to_ucmps, &non_indexed_cfs_with_ts](
uint32_t cf, size_t& ts_sz) {
const auto checker2 = [&indexed_cf_to_ucmps,
&non_indexed_cfs_with_ts](uint32_t cf) {
if (non_indexed_cfs_with_ts.count(cf) > 0) {
if (ts_sz != timestamp_size) {
return Status::InvalidArgument("Timestamp size mismatch");
}
return Status::OK();
return timestamp_size;
}
auto cf_iter = indexed_cf_to_ucmps.find(cf);
if (cf_iter == indexed_cf_to_ucmps.end()) {
return Status::Corruption("Unknown cf");
assert(false);
return std::numeric_limits<size_t>::max();
}
const Comparator* const ucmp = cf_iter->second;
assert(ucmp);
if (ucmp->timestamp_size() == 0) {
ts_sz = 0;
} else if (ts_sz != ucmp->timestamp_size()) {
return Status::InvalidArgument("Timestamp size mismatch");
}
return Status::OK();
return ucmp->timestamp_size();
};
ASSERT_OK(
batch.AssignTimestamp(std::string(timestamp_size, '\xef'), checker2));
batch.UpdateTimestamps(std::string(timestamp_size, '\xef'), checker2));
ASSERT_OK(CheckTimestampsInWriteBatch(
batch, std::string(timestamp_size, '\xef'), cf_to_ucmps));
std::vector<std::string> ts_strs;
for (size_t i = 0; i < 3 * key_strs.size(); ++i) {
if (0 == (i % 3)) {
ts_strs.emplace_back();
} else {
ts_strs.emplace_back(std::string(timestamp_size, '\xee'));
}
}
std::vector<Slice> ts_vec(ts_strs.size());
for (size_t i = 0; i < ts_vec.size(); ++i) {
ts_vec[i] = ts_strs[i];
}
const auto checker3 = [&cf_to_ucmps](uint32_t cf, size_t& ts_sz) {
auto cf_iter = cf_to_ucmps.find(cf);
if (cf_iter == cf_to_ucmps.end()) {
return Status::Corruption("Invalid cf");
}
const Comparator* const ucmp = cf_iter->second;
assert(ucmp);
if (ucmp->timestamp_size() != ts_sz) {
return Status::InvalidArgument("Timestamp size mismatch");
}
return Status::OK();
};
ASSERT_OK(batch.AssignTimestamps(ts_vec, checker3));
ASSERT_OK(CheckTimestampsInWriteBatch(
batch, std::string(timestamp_size, '\xee'), cf_to_ucmps));
}
TEST_F(WriteBatchTest, CommitWithTimestamp) {

@ -34,7 +34,8 @@ class BatchedOpsStressTest : public StressTest {
std::string values[10] = {"9", "8", "7", "6", "5", "4", "3", "2", "1", "0"};
Slice value_slices[10];
WriteBatch batch(0 /* reserved_bytes */, 0 /* max_bytes */,
FLAGS_batch_protection_bytes_per_key);
FLAGS_batch_protection_bytes_per_key,
FLAGS_user_timestamp_size);
Status s;
auto cfh = column_families_[rand_column_families[0]];
std::string key_str = Key(rand_keys[0]);
@ -70,7 +71,8 @@ class BatchedOpsStressTest : public StressTest {
std::string keys[10] = {"9", "7", "5", "3", "1", "8", "6", "4", "2", "0"};
WriteBatch batch(0 /* reserved_bytes */, 0 /* max_bytes */,
FLAGS_batch_protection_bytes_per_key);
FLAGS_batch_protection_bytes_per_key,
FLAGS_user_timestamp_size);
Status s;
auto cfh = column_families_[rand_column_families[0]];
std::string key_str = Key(rand_keys[0]);

@ -513,9 +513,10 @@ void StressTest::PreloadDbAndReopenAsReadOnly(int64_t number_of_keys,
if (FLAGS_user_timestamp_size > 0) {
ts_str = NowNanosStr();
ts = ts_str;
write_opts.timestamp = &ts;
s = db_->Put(write_opts, cfh, key, ts, v);
} else {
s = db_->Put(write_opts, cfh, key, v);
}
s = db_->Put(write_opts, cfh, key, v);
} else {
#ifndef ROCKSDB_LITE
Transaction* txn;
@ -878,7 +879,6 @@ void StressTest::OperateDb(ThreadState* thread) {
read_opts.timestamp = &read_ts;
write_ts_str = NowNanosStr();
write_ts = write_ts_str;
write_opts.timestamp = &write_ts;
}
int prob_op = thread->rand.Uniform(100);

@ -500,9 +500,12 @@ class NonBatchedOpsStressTest : public StressTest {
if (FLAGS_user_timestamp_size > 0) {
write_ts_str = NowNanosStr();
write_ts = write_ts_str;
write_opts.timestamp = &write_ts;
}
}
if (write_ts.size() == 0 && FLAGS_user_timestamp_size) {
write_ts_str = NowNanosStr();
write_ts = write_ts_str;
}
std::string key_str = Key(rand_key);
Slice key = key_str;
@ -540,7 +543,11 @@ class NonBatchedOpsStressTest : public StressTest {
}
} else {
if (!FLAGS_use_txn) {
s = db_->Put(write_opts, cfh, key, v);
if (FLAGS_user_timestamp_size == 0) {
s = db_->Put(write_opts, cfh, key, v);
} else {
s = db_->Put(write_opts, cfh, key, write_ts, v);
}
} else {
#ifndef ROCKSDB_LITE
Transaction* txn;
@ -599,9 +606,12 @@ class NonBatchedOpsStressTest : public StressTest {
if (FLAGS_user_timestamp_size > 0) {
write_ts_str = NowNanosStr();
write_ts = write_ts_str;
write_opts.timestamp = &write_ts;
}
}
if (write_ts.size() == 0 && FLAGS_user_timestamp_size) {
write_ts_str = NowNanosStr();
write_ts = write_ts_str;
}
std::string key_str = Key(rand_key);
Slice key = key_str;
@ -613,7 +623,11 @@ class NonBatchedOpsStressTest : public StressTest {
if (shared->AllowsOverwrite(rand_key)) {
shared->Delete(rand_column_family, rand_key, true /* pending */);
if (!FLAGS_use_txn) {
s = db_->Delete(write_opts, cfh, key);
if (FLAGS_user_timestamp_size == 0) {
s = db_->Delete(write_opts, cfh, key);
} else {
s = db_->Delete(write_opts, cfh, key, write_ts);
}
} else {
#ifndef ROCKSDB_LITE
Transaction* txn;
@ -646,7 +660,11 @@ class NonBatchedOpsStressTest : public StressTest {
} else {
shared->SingleDelete(rand_column_family, rand_key, true /* pending */);
if (!FLAGS_use_txn) {
s = db_->SingleDelete(write_opts, cfh, key);
if (FLAGS_user_timestamp_size == 0) {
s = db_->SingleDelete(write_opts, cfh, key);
} else {
s = db_->SingleDelete(write_opts, cfh, key, write_ts);
}
} else {
#ifndef ROCKSDB_LITE
Transaction* txn;

@ -356,10 +356,17 @@ class DB {
virtual Status Put(const WriteOptions& options,
ColumnFamilyHandle* column_family, const Slice& key,
const Slice& value) = 0;
virtual Status Put(const WriteOptions& options,
ColumnFamilyHandle* column_family, const Slice& key,
const Slice& ts, const Slice& value) = 0;
virtual Status Put(const WriteOptions& options, const Slice& key,
const Slice& value) {
return Put(options, DefaultColumnFamily(), key, value);
}
virtual Status Put(const WriteOptions& options, const Slice& key,
const Slice& ts, const Slice& value) {
return Put(options, DefaultColumnFamily(), key, ts, value);
}
// Remove the database entry (if any) for "key". Returns OK on
// success, and a non-OK status on error. It is not an error if "key"
@ -368,9 +375,16 @@ class DB {
virtual Status Delete(const WriteOptions& options,
ColumnFamilyHandle* column_family,
const Slice& key) = 0;
virtual Status Delete(const WriteOptions& options,
ColumnFamilyHandle* column_family, const Slice& key,
const Slice& ts) = 0;
virtual Status Delete(const WriteOptions& options, const Slice& key) {
return Delete(options, DefaultColumnFamily(), key);
}
virtual Status Delete(const WriteOptions& options, const Slice& key,
const Slice& ts) {
return Delete(options, DefaultColumnFamily(), key, ts);
}
// Remove the database entry for "key". Requires that the key exists
// and was not overwritten. Returns OK on success, and a non-OK status
@ -391,9 +405,16 @@ class DB {
virtual Status SingleDelete(const WriteOptions& options,
ColumnFamilyHandle* column_family,
const Slice& key) = 0;
virtual Status SingleDelete(const WriteOptions& options,
ColumnFamilyHandle* column_family,
const Slice& key, const Slice& ts) = 0;
virtual Status SingleDelete(const WriteOptions& options, const Slice& key) {
return SingleDelete(options, DefaultColumnFamily(), key);
}
virtual Status SingleDelete(const WriteOptions& options, const Slice& key,
const Slice& ts) {
return SingleDelete(options, DefaultColumnFamily(), key, ts);
}
// Removes the database entries in the range ["begin_key", "end_key"), i.e.,
// including "begin_key" and excluding "end_key". Returns OK on success, and
@ -416,6 +437,13 @@ class DB {
virtual Status DeleteRange(const WriteOptions& options,
ColumnFamilyHandle* column_family,
const Slice& begin_key, const Slice& end_key);
virtual Status DeleteRange(const WriteOptions& /*options*/,
ColumnFamilyHandle* /*column_family*/,
const Slice& /*begin_key*/,
const Slice& /*end_key*/, const Slice& /*ts*/) {
return Status::NotSupported(
"DeleteRange does not support user-defined timestamp yet");
}
// Merge the database entry for "key" with "value". Returns OK on success,
// and a non-OK status on error. The semantics of this operation is
@ -428,6 +456,13 @@ class DB {
const Slice& value) {
return Merge(options, DefaultColumnFamily(), key, value);
}
virtual Status Merge(const WriteOptions& /*options*/,
ColumnFamilyHandle* /*column_family*/,
const Slice& /*key*/, const Slice& /*ts*/,
const Slice& /*value*/) {
return Status::NotSupported(
"Merge does not support user-defined timestamp yet");
}
// Apply the specified updates to the database.
// If `updates` contains no update, WAL will still be synced if

@ -1661,25 +1661,13 @@ struct WriteOptions {
// Default: false
bool memtable_insert_hint_per_batch;
// Timestamp of write operation, e.g. Put. All timestamps of the same
// database must share the same length and format. The user is also
// responsible for providing a customized compare function via Comparator to
// order <key, timestamp> tuples. If the user wants to enable timestamp, then
// all write operations must be associated with timestamp because RocksDB, as
// a single-node storage engine currently has no knowledge of global time,
// thus has to rely on the application.
// The user-specified timestamp feature is still under active development,
// and the API is subject to change.
const Slice* timestamp;
WriteOptions()
: sync(false),
disableWAL(false),
ignore_missing_column_families(false),
no_slowdown(false),
low_pri(false),
memtable_insert_hint_per_batch(false),
timestamp(nullptr) {}
memtable_insert_hint_per_batch(false) {}
};
// Options that control flush operations

@ -80,6 +80,10 @@ class StackableDB : public DB {
const Slice& val) override {
return db_->Put(options, column_family, key, val);
}
Status Put(const WriteOptions& options, ColumnFamilyHandle* column_family,
const Slice& key, const Slice& ts, const Slice& val) override {
return db_->Put(options, column_family, key, ts, val);
}
using DB::Get;
virtual Status Get(const ReadOptions& options,
@ -166,6 +170,10 @@ class StackableDB : public DB {
const Slice& key) override {
return db_->Delete(wopts, column_family, key);
}
Status Delete(const WriteOptions& wopts, ColumnFamilyHandle* column_family,
const Slice& key, const Slice& ts) override {
return db_->Delete(wopts, column_family, key, ts);
}
using DB::SingleDelete;
virtual Status SingleDelete(const WriteOptions& wopts,
@ -173,6 +181,18 @@ class StackableDB : public DB {
const Slice& key) override {
return db_->SingleDelete(wopts, column_family, key);
}
Status SingleDelete(const WriteOptions& wopts,
ColumnFamilyHandle* column_family, const Slice& key,
const Slice& ts) override {
return db_->SingleDelete(wopts, column_family, key, ts);
}
using DB::DeleteRange;
Status DeleteRange(const WriteOptions& wopts,
ColumnFamilyHandle* column_family, const Slice& start_key,
const Slice& end_key) override {
return db_->DeleteRange(wopts, column_family, start_key, end_key);
}
using DB::Merge;
virtual Status Merge(const WriteOptions& options,

@ -371,6 +371,7 @@ class TransactionDB : public StackableDB {
// used and `skip_concurrency_control` must be set. When using either
// WRITE_PREPARED or WRITE_UNPREPARED , `skip_duplicate_key_check` must
// additionally be set.
using StackableDB::DeleteRange;
virtual Status DeleteRange(const WriteOptions&, ColumnFamilyHandle*,
const Slice&, const Slice&) override {
return Status::NotSupported();

@ -110,20 +110,32 @@ class WriteBatchWithIndex : public WriteBatchBase {
Status Put(const Slice& key, const Slice& value) override;
Status Put(ColumnFamilyHandle* column_family, const Slice& key,
const Slice& ts, const Slice& value) override;
using WriteBatchBase::Merge;
Status Merge(ColumnFamilyHandle* column_family, const Slice& key,
const Slice& value) override;
Status Merge(const Slice& key, const Slice& value) override;
Status Merge(ColumnFamilyHandle* /*column_family*/, const Slice& /*key*/,
const Slice& /*ts*/, const Slice& /*value*/) override {
return Status::NotSupported(
"Merge does not support user-defined timestamp");
}
using WriteBatchBase::Delete;
Status Delete(ColumnFamilyHandle* column_family, const Slice& key) override;
Status Delete(const Slice& key) override;
Status Delete(ColumnFamilyHandle* column_family, const Slice& key,
const Slice& ts) override;
using WriteBatchBase::SingleDelete;
Status SingleDelete(ColumnFamilyHandle* column_family,
const Slice& key) override;
Status SingleDelete(const Slice& key) override;
Status SingleDelete(ColumnFamilyHandle* column_family, const Slice& key,
const Slice& ts) override;
using WriteBatchBase::DeleteRange;
Status DeleteRange(ColumnFamilyHandle* /* column_family */,
@ -137,6 +149,12 @@ class WriteBatchWithIndex : public WriteBatchBase {
return Status::NotSupported(
"DeleteRange unsupported in WriteBatchWithIndex");
}
Status DeleteRange(ColumnFamilyHandle* /*column_family*/,
const Slice& /*begin_key*/, const Slice& /*end_key*/,
const Slice& /*ts*/) override {
return Status::NotSupported(
"DeleteRange unsupported in WriteBatchWithIndex");
}
using WriteBatchBase::PutLogData;
Status PutLogData(const Slice& blob) override;

@ -68,7 +68,7 @@ class WriteBatch : public WriteBatchBase {
// protection information for each key entry. Currently supported values are
// zero (disabled) and eight.
explicit WriteBatch(size_t reserved_bytes, size_t max_bytes,
size_t protection_bytes_per_key);
size_t protection_bytes_per_key, size_t default_cf_ts_sz);
~WriteBatch() override;
using WriteBatchBase::Put;
@ -82,6 +82,8 @@ class WriteBatch : public WriteBatchBase {
Status Put(const Slice& key, const Slice& value) override {
return Put(nullptr, key, value);
}
Status Put(ColumnFamilyHandle* column_family, const Slice& key,
const Slice& ts, const Slice& value) override;
// Variant of Put() that gathers output like writev(2). The key and value
// that will be written to the database are concatenations of arrays of
@ -104,6 +106,8 @@ class WriteBatch : public WriteBatchBase {
// up the memory buffer pointed to by `key`.
Status Delete(ColumnFamilyHandle* column_family, const Slice& key) override;
Status Delete(const Slice& key) override { return Delete(nullptr, key); }
Status Delete(ColumnFamilyHandle* column_family, const Slice& key,
const Slice& ts) override;
// variant that takes SliceParts
// These two variants of Delete(..., const SliceParts& key) can be used when
@ -121,6 +125,8 @@ class WriteBatch : public WriteBatchBase {
Status SingleDelete(const Slice& key) override {
return SingleDelete(nullptr, key);
}
Status SingleDelete(ColumnFamilyHandle* column_family, const Slice& key,
const Slice& ts) override;
// variant that takes SliceParts
Status SingleDelete(ColumnFamilyHandle* column_family,
@ -136,6 +142,12 @@ class WriteBatch : public WriteBatchBase {
Status DeleteRange(const Slice& begin_key, const Slice& end_key) override {
return DeleteRange(nullptr, begin_key, end_key);
}
Status DeleteRange(ColumnFamilyHandle* /*column_family*/,
const Slice& /*begin_key*/, const Slice& /*end_key*/,
const Slice& /*ts*/) override {
return Status::NotSupported(
"DeleteRange does not support user-defined timestamp");
}
// variant that takes SliceParts
Status DeleteRange(ColumnFamilyHandle* column_family,
@ -154,6 +166,11 @@ class WriteBatch : public WriteBatchBase {
Status Merge(const Slice& key, const Slice& value) override {
return Merge(nullptr, key, value);
}
Status Merge(ColumnFamilyHandle* /*column_family*/, const Slice& /*key*/,
const Slice& /*ts*/, const Slice& /*value*/) override {
return Status::NotSupported(
"Merge does not support user-defined timestamp");
}
// variant that takes SliceParts
Status Merge(ColumnFamilyHandle* column_family, const SliceParts& key,
@ -343,55 +360,25 @@ class WriteBatch : public WriteBatchBase {
bool HasRollback() const;
// Experimental.
// Assign timestamp to write batch.
//
// Update timestamps of existing entries in the write batch if
// applicable. If a key is intended for a column family that disables
// timestamp, then this API won't set the timestamp for this key.
// This requires that all keys, if enable timestamp, (possibly from multiple
// column families) in the write batch have timestamps of the same format.
//
// checker: callable object to check the timestamp sizes of column families.
// ts_sz_func: callable object to obtain the timestamp sizes of column
// families. If ts_sz_func() accesses data structures, then the caller of this
// API must guarantee thread-safety. Like other parts of RocksDB, this API is
// not exception-safe. Therefore, ts_sz_func() must not throw.
//
// in: cf, the column family id.
// in/out: ts_sz. Input as the expected timestamp size of the column
// family, output as the actual timestamp size of the column family.
// ret: OK if assignment succeeds.
// Status checker(uint32_t cf, size_t& ts_sz);
//
// User can call checker(uint32_t cf, size_t& ts_sz) which does the
// following:
// 1. find out the timestamp size of the column family whose id equals `cf`.
// 2. if cf's timestamp size is 0, then set ts_sz to 0 and return OK.
// 3. otherwise, compare ts_sz with cf's timestamp size and return
// Status::InvalidArgument() if different.
Status AssignTimestamp(
const Slice& ts,
std::function<Status(uint32_t, size_t&)> checker =
[](uint32_t /*cf*/, size_t& /*ts_sz*/) { return Status::OK(); });
// Experimental.
// Assign timestamps to write batch.
// This API allows the write batch to include keys from multiple column
// families whose timestamps' formats can differ. For example, some column
// families can enable timestamp, while others disable the feature.
// If key does not have timestamp, then put an empty Slice in ts_list as
// a placeholder.
//
// checker: callable object specified by caller to check the timestamp sizes
// of column families.
//
// in: cf, the column family id.
// in/out: ts_sz. Input as the expected timestamp size of the column
// family, output as the actual timestamp size of the column family.
// ret: OK if assignment succeeds.
// Status checker(uint32_t cf, size_t& ts_sz);
//
// User can call checker(uint32_t cf, size_t& ts_sz) which does the
// following:
// 1. find out the timestamp size of the column family whose id equals `cf`.
// 2. compare ts_sz with cf's timestamp size and return
// Status::InvalidArgument() if different.
Status AssignTimestamps(
const std::vector<Slice>& ts_list,
std::function<Status(uint32_t, size_t&)> checker =
[](uint32_t /*cf*/, size_t& /*ts_sz*/) { return Status::OK(); });
// ret: timestamp size of the given column family. Return
// std::numeric_limits<size_t>::max() indicating "dont know or column
// family info not found", this will cause UpdateTimestamps() to fail.
// size_t ts_sz_func(uint32_t cf);
Status UpdateTimestamps(const Slice& ts,
std::function<size_t(uint32_t /*cf*/)> ts_sz_func);
using WriteBatchBase::GetWriteBatch;
WriteBatch* GetWriteBatch() override { return this; }
@ -446,6 +433,17 @@ class WriteBatch : public WriteBatchBase {
std::unique_ptr<ProtectionInfo> prot_info_;
size_t default_cf_ts_sz_ = 0;
// False if all keys are from column families that disable user-defined
// timestamp OR UpdateTimestamps() has been called at least once.
// This flag will be set to true if any of the above Put(), Delete(),
// SingleDelete(), etc. APIs are called at least once.
// Calling Put(ts), Delete(ts), SingleDelete(ts), etc. will not set this flag
// to true because the assumption is that these APIs have already set the
// timestamps to desired values.
bool needs_in_place_update_ts_ = false;
protected:
std::string rep_; // See comment in write_batch.cc for the format of rep_
};

@ -31,6 +31,8 @@ class WriteBatchBase {
virtual Status Put(ColumnFamilyHandle* column_family, const Slice& key,
const Slice& value) = 0;
virtual Status Put(const Slice& key, const Slice& value) = 0;
virtual Status Put(ColumnFamilyHandle* column_family, const Slice& key,
const Slice& ts, const Slice& value) = 0;
// Variant of Put() that gathers output like writev(2). The key and value
// that will be written to the database are concatenations of arrays of
@ -44,6 +46,8 @@ class WriteBatchBase {
virtual Status Merge(ColumnFamilyHandle* column_family, const Slice& key,
const Slice& value) = 0;
virtual Status Merge(const Slice& key, const Slice& value) = 0;
virtual Status Merge(ColumnFamilyHandle* column_family, const Slice& key,
const Slice& ts, const Slice& value) = 0;
// variant that takes SliceParts
virtual Status Merge(ColumnFamilyHandle* column_family, const SliceParts& key,
@ -54,6 +58,8 @@ class WriteBatchBase {
virtual Status Delete(ColumnFamilyHandle* column_family,
const Slice& key) = 0;
virtual Status Delete(const Slice& key) = 0;
virtual Status Delete(ColumnFamilyHandle* column_family, const Slice& key,
const Slice& ts) = 0;
// variant that takes SliceParts
virtual Status Delete(ColumnFamilyHandle* column_family,
@ -65,6 +71,8 @@ class WriteBatchBase {
virtual Status SingleDelete(ColumnFamilyHandle* column_family,
const Slice& key) = 0;
virtual Status SingleDelete(const Slice& key) = 0;
virtual Status SingleDelete(ColumnFamilyHandle* column_family,
const Slice& key, const Slice& ts) = 0;
// variant that takes SliceParts
virtual Status SingleDelete(ColumnFamilyHandle* column_family,
@ -76,6 +84,9 @@ class WriteBatchBase {
virtual Status DeleteRange(ColumnFamilyHandle* column_family,
const Slice& begin_key, const Slice& end_key) = 0;
virtual Status DeleteRange(const Slice& begin_key, const Slice& end_key) = 0;
virtual Status DeleteRange(ColumnFamilyHandle* column_family,
const Slice& begin_key, const Slice& end_key,
const Slice& ts) = 0;
// variant that takes SliceParts
virtual Status DeleteRange(ColumnFamilyHandle* column_family,

@ -4794,7 +4794,7 @@ class Benchmark {
RandomGenerator gen;
WriteBatch batch(/*reserved_bytes=*/0, /*max_bytes=*/0,
user_timestamp_size_);
/*protection_bytes_per_key=*/0, user_timestamp_size_);
Status s;
int64_t bytes = 0;
@ -5122,7 +5122,8 @@ class Benchmark {
}
if (user_timestamp_size_ > 0) {
Slice user_ts = mock_app_clock_->Allocate(ts_guard.get());
s = batch.AssignTimestamp(user_ts);
s = batch.UpdateTimestamps(
user_ts, [this](uint32_t) { return user_timestamp_size_; });
if (!s.ok()) {
fprintf(stderr, "assign timestamp to write batch: %s\n",
s.ToString().c_str());
@ -6512,7 +6513,7 @@ class Benchmark {
void DoDelete(ThreadState* thread, bool seq) {
WriteBatch batch(/*reserved_bytes=*/0, /*max_bytes=*/0,
user_timestamp_size_);
/*protection_bytes_per_key=*/0, user_timestamp_size_);
Duration duration(seq ? 0 : FLAGS_duration, deletes_);
int64_t i = 0;
std::unique_ptr<const char[]> key_guard;
@ -6534,7 +6535,8 @@ class Benchmark {
Status s;
if (user_timestamp_size_ > 0) {
ts = mock_app_clock_->Allocate(ts_guard.get());
s = batch.AssignTimestamp(ts);
s = batch.UpdateTimestamps(
ts, [this](uint32_t) { return user_timestamp_size_; });
if (!s.ok()) {
fprintf(stderr, "assign timestamp: %s\n", s.ToString().c_str());
ErrorExit();
@ -6628,17 +6630,17 @@ class Benchmark {
Slice ts;
if (user_timestamp_size_ > 0) {
ts = mock_app_clock_->Allocate(ts_guard.get());
write_options_.timestamp = &ts;
}
if (write_merge == kWrite) {
s = db->Put(write_options_, key, val);
if (user_timestamp_size_ == 0) {
s = db->Put(write_options_, key, val);
} else {
s = db->Put(write_options_, key, ts, val);
}
} else {
s = db->Merge(write_options_, key, val);
}
// Restore write_options_
if (user_timestamp_size_ > 0) {
write_options_.timestamp = nullptr;
}
written++;
if (!s.ok()) {
@ -6711,7 +6713,7 @@ class Benchmark {
std::string keys[3];
WriteBatch batch(/*reserved_bytes=*/0, /*max_bytes=*/0,
user_timestamp_size_);
/*protection_bytes_per_key=*/0, user_timestamp_size_);
Status s;
for (int i = 0; i < 3; i++) {
keys[i] = key.ToString() + suffixes[i];
@ -6722,7 +6724,8 @@ class Benchmark {
if (user_timestamp_size_ > 0) {
ts_guard.reset(new char[user_timestamp_size_]);
Slice ts = mock_app_clock_->Allocate(ts_guard.get());
s = batch.AssignTimestamp(ts);
s = batch.UpdateTimestamps(
ts, [this](uint32_t) { return user_timestamp_size_; });
if (!s.ok()) {
fprintf(stderr, "assign timestamp to batch: %s\n",
s.ToString().c_str());
@ -6742,7 +6745,8 @@ class Benchmark {
std::string suffixes[3] = {"1", "2", "0"};
std::string keys[3];
WriteBatch batch(0, 0, user_timestamp_size_);
WriteBatch batch(0, 0, /*protection_bytes_per_key=*/0,
user_timestamp_size_);
Status s;
for (int i = 0; i < 3; i++) {
keys[i] = key.ToString() + suffixes[i];
@ -6753,7 +6757,8 @@ class Benchmark {
if (user_timestamp_size_ > 0) {
ts_guard.reset(new char[user_timestamp_size_]);
Slice ts = mock_app_clock_->Allocate(ts_guard.get());
s = batch.AssignTimestamp(ts);
s = batch.UpdateTimestamps(
ts, [this](uint32_t) { return user_timestamp_size_; });
if (!s.ok()) {
fprintf(stderr, "assign timestamp to batch: %s\n",
s.ToString().c_str());
@ -6940,12 +6945,13 @@ class Benchmark {
} else if (put_weight > 0) {
// then do all the corresponding number of puts
// for all the gets we have done earlier
Slice ts;
Status s;
if (user_timestamp_size_ > 0) {
ts = mock_app_clock_->Allocate(ts_guard.get());
write_options_.timestamp = &ts;
Slice ts = mock_app_clock_->Allocate(ts_guard.get());
s = db->Put(write_options_, key, ts, gen.Generate());
} else {
s = db->Put(write_options_, key, gen.Generate());
}
Status s = db->Put(write_options_, key, gen.Generate());
if (!s.ok()) {
fprintf(stderr, "put error: %s\n", s.ToString().c_str());
ErrorExit();
@ -7006,11 +7012,13 @@ class Benchmark {
}
Slice val = gen.Generate();
Status s;
if (user_timestamp_size_ > 0) {
ts = mock_app_clock_->Allocate(ts_guard.get());
write_options_.timestamp = &ts;
s = db->Put(write_options_, key, ts, val);
} else {
s = db->Put(write_options_, key, val);
}
Status s = db->Put(write_options_, key, val);
if (!s.ok()) {
fprintf(stderr, "put error: %s\n", s.ToString().c_str());
exit(1);
@ -7073,12 +7081,13 @@ class Benchmark {
xor_operator.XOR(nullptr, value, &new_value);
}
Status s;
if (user_timestamp_size_ > 0) {
ts = mock_app_clock_->Allocate(ts_guard.get());
write_options_.timestamp = &ts;
s = db->Put(write_options_, key, ts, Slice(new_value));
} else {
s = db->Put(write_options_, key, Slice(new_value));
}
Status s = db->Put(write_options_, key, Slice(new_value));
if (!s.ok()) {
fprintf(stderr, "put error: %s\n", s.ToString().c_str());
ErrorExit();
@ -7139,13 +7148,14 @@ class Benchmark {
}
value.append(operand.data(), operand.size());
Status s;
if (user_timestamp_size_ > 0) {
ts = mock_app_clock_->Allocate(ts_guard.get());
write_options_.timestamp = &ts;
s = db->Put(write_options_, key, ts, value);
} else {
// Write back to the database
s = db->Put(write_options_, key, value);
}
// Write back to the database
Status s = db->Put(write_options_, key, value);
if (!s.ok()) {
fprintf(stderr, "put error: %s\n", s.ToString().c_str());
ErrorExit();
@ -7521,12 +7531,12 @@ class Benchmark {
DB* db = SelectDB(thread);
for (int64_t i = 0; i < FLAGS_numdistinct; i++) {
GenerateKeyFromInt(i * max_counter, FLAGS_num, &key);
Slice ts;
if (user_timestamp_size_ > 0) {
ts = mock_app_clock_->Allocate(ts_guard.get());
write_options_.timestamp = &ts;
Slice ts = mock_app_clock_->Allocate(ts_guard.get());
s = db->Put(write_options_, key, ts, gen.Generate());
} else {
s = db->Put(write_options_, key, gen.Generate());
}
s = db->Put(write_options_, key, gen.Generate());
if (!s.ok()) {
fprintf(stderr, "Operation failed: %s\n", s.ToString().c_str());
exit(1);
@ -7545,22 +7555,24 @@ class Benchmark {
static_cast<int64_t>(0));
GenerateKeyFromInt(key_id * max_counter + counters[key_id], FLAGS_num,
&key);
Slice ts;
if (user_timestamp_size_ > 0) {
ts = mock_app_clock_->Allocate(ts_guard.get());
write_options_.timestamp = &ts;
Slice ts = mock_app_clock_->Allocate(ts_guard.get());
s = FLAGS_use_single_deletes ? db->SingleDelete(write_options_, key, ts)
: db->Delete(write_options_, key, ts);
} else {
s = FLAGS_use_single_deletes ? db->SingleDelete(write_options_, key)
: db->Delete(write_options_, key);
}
s = FLAGS_use_single_deletes ? db->SingleDelete(write_options_, key)
: db->Delete(write_options_, key);
if (s.ok()) {
counters[key_id] = (counters[key_id] + 1) % max_counter;
GenerateKeyFromInt(key_id * max_counter + counters[key_id], FLAGS_num,
&key);
if (user_timestamp_size_ > 0) {
ts = mock_app_clock_->Allocate(ts_guard.get());
write_options_.timestamp = &ts;
Slice ts = mock_app_clock_->Allocate(ts_guard.get());
s = db->Put(write_options_, key, ts, Slice());
} else {
s = db->Put(write_options_, key, Slice());
}
s = db->Put(write_options_, key, Slice());
}
if (!s.ok()) {

@ -200,6 +200,7 @@ class BlobDB : public StackableDB {
virtual Status Write(const WriteOptions& opts,
WriteBatch* updates) override = 0;
using ROCKSDB_NAMESPACE::StackableDB::NewIterator;
virtual Iterator* NewIterator(const ReadOptions& options) override = 0;
virtual Iterator* NewIterator(const ReadOptions& options,

@ -128,6 +128,7 @@ class BlobDBImpl : public BlobDB {
const std::vector<Slice>& keys,
std::vector<std::string>* values) override;
using BlobDB::Write;
virtual Status Write(const WriteOptions& opts, WriteBatch* updates) override;
virtual Status Close() override;

@ -47,6 +47,7 @@ class OptimisticTransactionDBImpl : public OptimisticTransactionDB {
Transaction* old_txn) override;
// Transactional `DeleteRange()` is not yet supported.
using StackableDB::DeleteRange;
virtual Status DeleteRange(const WriteOptions&, ColumnFamilyHandle*,
const Slice&, const Slice&) override {
return Status::NotSupported();

@ -246,6 +246,7 @@ Status WritePreparedTxnDB::Get(const ReadOptions& options,
backed_by_snapshot))) {
return res;
} else {
res.PermitUncheckedError();
WPRecordTick(TXN_GET_TRY_AGAIN);
return Status::TryAgain();
}

@ -322,6 +322,16 @@ Status WriteBatchWithIndex::Put(const Slice& key, const Slice& value) {
return s;
}
Status WriteBatchWithIndex::Put(ColumnFamilyHandle* column_family,
const Slice& /*key*/, const Slice& /*ts*/,
const Slice& /*value*/) {
if (!column_family) {
return Status::InvalidArgument("column family handle cannot be nullptr");
}
// TODO: support WBWI::Put() with timestamp.
return Status::NotSupported();
}
Status WriteBatchWithIndex::Delete(ColumnFamilyHandle* column_family,
const Slice& key) {
rep->SetLastEntryOffset();
@ -341,6 +351,15 @@ Status WriteBatchWithIndex::Delete(const Slice& key) {
return s;
}
Status WriteBatchWithIndex::Delete(ColumnFamilyHandle* column_family,
const Slice& /*key*/, const Slice& /*ts*/) {
if (!column_family) {
return Status::InvalidArgument("column family handle cannot be nullptr");
}
// TODO: support WBWI::Delete() with timestamp.
return Status::NotSupported();
}
Status WriteBatchWithIndex::SingleDelete(ColumnFamilyHandle* column_family,
const Slice& key) {
rep->SetLastEntryOffset();
@ -360,6 +379,16 @@ Status WriteBatchWithIndex::SingleDelete(const Slice& key) {
return s;
}
Status WriteBatchWithIndex::SingleDelete(ColumnFamilyHandle* column_family,
const Slice& /*key*/,
const Slice& /*ts*/) {
if (!column_family) {
return Status::InvalidArgument("column family handle cannot be nullptr");
}
// TODO: support WBWI::SingleDelete() with timestamp.
return Status::NotSupported();
}
Status WriteBatchWithIndex::Merge(ColumnFamilyHandle* column_family,
const Slice& key, const Slice& value) {
rep->SetLastEntryOffset();

Loading…
Cancel
Save