Improve Cuckoo Table Reader performance. Inlined hash function and number of buckets a power of two.

Summary:
Use inlined hash functions instead of function pointer. Make number of buckets a power of two and use bitwise and instead of mod.
After these changes, we get almost 50% improvement in performance.

Results:
With 120000000 items, utilization is 89.41%, number of hash functions: 2.
Time taken per op is 0.231us (4.3 Mqps) with batch size of 0
Time taken per op is 0.229us (4.4 Mqps) with batch size of 0
Time taken per op is 0.185us (5.4 Mqps) with batch size of 0
With 120000000 items, utilization is 89.41%, number of hash functions: 2.
Time taken per op is 0.108us (9.3 Mqps) with batch size of 10
Time taken per op is 0.100us (10.0 Mqps) with batch size of 10
Time taken per op is 0.103us (9.7 Mqps) with batch size of 10
With 120000000 items, utilization is 89.41%, number of hash functions: 2.
Time taken per op is 0.101us (9.9 Mqps) with batch size of 25
Time taken per op is 0.098us (10.2 Mqps) with batch size of 25
Time taken per op is 0.097us (10.3 Mqps) with batch size of 25
With 120000000 items, utilization is 89.41%, number of hash functions: 2.
Time taken per op is 0.100us (10.0 Mqps) with batch size of 50
Time taken per op is 0.097us (10.3 Mqps) with batch size of 50
Time taken per op is 0.097us (10.3 Mqps) with batch size of 50
With 120000000 items, utilization is 89.41%, number of hash functions: 2.
Time taken per op is 0.102us (9.8 Mqps) with batch size of 100
Time taken per op is 0.098us (10.2 Mqps) with batch size of 100
Time taken per op is 0.115us (8.7 Mqps) with batch size of 100

With 100000000 items, utilization is 74.51%, number of hash functions: 2.
Time taken per op is 0.201us (5.0 Mqps) with batch size of 0
Time taken per op is 0.155us (6.5 Mqps) with batch size of 0
Time taken per op is 0.152us (6.6 Mqps) with batch size of 0
With 100000000 items, utilization is 74.51%, number of hash functions: 2.
Time taken per op is 0.089us (11.3 Mqps) with batch size of 10
Time taken per op is 0.084us (11.9 Mqps) with batch size of 10
Time taken per op is 0.086us (11.6 Mqps) with batch size of 10
With 100000000 items, utilization is 74.51%, number of hash functions: 2.
Time taken per op is 0.087us (11.5 Mqps) with batch size of 25
Time taken per op is 0.085us (11.7 Mqps) with batch size of 25
Time taken per op is 0.093us (10.8 Mqps) with batch size of 25
With 100000000 items, utilization is 74.51%, number of hash functions: 2.
Time taken per op is 0.094us (10.6 Mqps) with batch size of 50
Time taken per op is 0.094us (10.7 Mqps) with batch size of 50
Time taken per op is 0.093us (10.8 Mqps) with batch size of 50
With 100000000 items, utilization is 74.51%, number of hash functions: 2.
Time taken per op is 0.092us (10.9 Mqps) with batch size of 100
Time taken per op is 0.089us (11.2 Mqps) with batch size of 100
Time taken per op is 0.088us (11.3 Mqps) with batch size of 100

With 80000000 items, utilization is 59.60%, number of hash functions: 2.
Time taken per op is 0.154us (6.5 Mqps) with batch size of 0
Time taken per op is 0.168us (6.0 Mqps) with batch size of 0
Time taken per op is 0.190us (5.3 Mqps) with batch size of 0
With 80000000 items, utilization is 59.60%, number of hash functions: 2.
Time taken per op is 0.081us (12.4 Mqps) with batch size of 10
Time taken per op is 0.077us (13.0 Mqps) with batch size of 10
Time taken per op is 0.083us (12.1 Mqps) with batch size of 10
With 80000000 items, utilization is 59.60%, number of hash functions: 2.
Time taken per op is 0.077us (13.0 Mqps) with batch size of 25
Time taken per op is 0.073us (13.7 Mqps) with batch size of 25
Time taken per op is 0.073us (13.7 Mqps) with batch size of 25
With 80000000 items, utilization is 59.60%, number of hash functions: 2.
Time taken per op is 0.076us (13.1 Mqps) with batch size of 50
Time taken per op is 0.072us (13.8 Mqps) with batch size of 50
Time taken per op is 0.072us (13.8 Mqps) with batch size of 50
With 80000000 items, utilization is 59.60%, number of hash functions: 2.
Time taken per op is 0.077us (13.0 Mqps) with batch size of 100
Time taken per op is 0.074us (13.6 Mqps) with batch size of 100
Time taken per op is 0.073us (13.6 Mqps) with batch size of 100

With 70000000 items, utilization is 52.15%, number of hash functions: 2.
Time taken per op is 0.190us (5.3 Mqps) with batch size of 0
Time taken per op is 0.186us (5.4 Mqps) with batch size of 0
Time taken per op is 0.184us (5.4 Mqps) with batch size of 0
With 70000000 items, utilization is 52.15%, number of hash functions: 2.
Time taken per op is 0.079us (12.7 Mqps) with batch size of 10
Time taken per op is 0.070us (14.2 Mqps) with batch size of 10
Time taken per op is 0.072us (14.0 Mqps) with batch size of 10
With 70000000 items, utilization is 52.15%, number of hash functions: 2.
Time taken per op is 0.080us (12.5 Mqps) with batch size of 25
Time taken per op is 0.072us (14.0 Mqps) with batch size of 25
Time taken per op is 0.071us (14.1 Mqps) with batch size of 25
With 70000000 items, utilization is 52.15%, number of hash functions: 2.
Time taken per op is 0.082us (12.1 Mqps) with batch size of 50
Time taken per op is 0.071us (14.1 Mqps) with batch size of 50
Time taken per op is 0.073us (13.6 Mqps) with batch size of 50
With 70000000 items, utilization is 52.15%, number of hash functions: 2.
Time taken per op is 0.080us (12.5 Mqps) with batch size of 100
Time taken per op is 0.077us (13.0 Mqps) with batch size of 100
Time taken per op is 0.078us (12.8 Mqps) with batch size of 100

Test Plan:
make check all
make valgrind_check
make asan_check

Reviewers: sdong, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22539
main
Radheshyam Balasundaram 10 years ago
parent 0f9c43ea36
commit d20b8cfaa1
  1. 64
      table/cuckoo_table_builder.cc
  2. 10
      table/cuckoo_table_builder.h
  3. 26
      table/cuckoo_table_builder_test.cc
  4. 30
      table/cuckoo_table_factory.cc
  5. 16
      table/cuckoo_table_factory.h
  6. 24
      table/cuckoo_table_reader.cc
  7. 2
      table/cuckoo_table_reader.h
  8. 66
      table/cuckoo_table_reader_test.cc

@ -16,6 +16,7 @@
#include "rocksdb/env.h" #include "rocksdb/env.h"
#include "rocksdb/table.h" #include "rocksdb/table.h"
#include "table/block_builder.h" #include "table/block_builder.h"
#include "table/cuckoo_table_factory.h"
#include "table/format.h" #include "table/format.h"
#include "table/meta_blocks.h" #include "table/meta_blocks.h"
#include "util/autovector.h" #include "util/autovector.h"
@ -39,16 +40,17 @@ const std::string CuckooTablePropertyNames::kCuckooBlockSize =
extern const uint64_t kCuckooTableMagicNumber = 0x926789d0c5f17873ull; extern const uint64_t kCuckooTableMagicNumber = 0x926789d0c5f17873ull;
CuckooTableBuilder::CuckooTableBuilder( CuckooTableBuilder::CuckooTableBuilder(
WritableFile* file, double hash_table_ratio, WritableFile* file, double max_hash_table_ratio,
uint32_t max_num_hash_table, uint32_t max_search_depth, uint32_t max_num_hash_table, uint32_t max_search_depth,
const Comparator* user_comparator, uint32_t cuckoo_block_size, const Comparator* user_comparator, uint32_t cuckoo_block_size,
uint64_t (*get_slice_hash)(const Slice&, uint32_t, uint64_t)) uint64_t (*get_slice_hash)(const Slice&, uint32_t, uint64_t))
: num_hash_func_(2), : num_hash_func_(2),
file_(file), file_(file),
hash_table_ratio_(hash_table_ratio), max_hash_table_ratio_(max_hash_table_ratio),
max_num_hash_func_(max_num_hash_table), max_num_hash_func_(max_num_hash_table),
max_search_depth_(max_search_depth), max_search_depth_(max_search_depth),
cuckoo_block_size_(std::max(1U, cuckoo_block_size)), cuckoo_block_size_(std::max(1U, cuckoo_block_size)),
hash_table_size_(2),
is_last_level_file_(false), is_last_level_file_(false),
has_seen_first_key_(false), has_seen_first_key_(false),
ucomp_(user_comparator), ucomp_(user_comparator),
@ -89,7 +91,6 @@ void CuckooTableBuilder::Add(const Slice& key, const Slice& value) {
} else { } else {
kvs_.emplace_back(std::make_pair(key.ToString(), value.ToString())); kvs_.emplace_back(std::make_pair(key.ToString(), value.ToString()));
} }
properties_.num_entries++;
// In order to fill the empty buckets in the hash table, we identify a // In order to fill the empty buckets in the hash table, we identify a
// key which is not used so far (unused_user_key). We determine this by // key which is not used so far (unused_user_key). We determine this by
@ -101,11 +102,14 @@ void CuckooTableBuilder::Add(const Slice& key, const Slice& value) {
} else if (ikey.user_key.compare(largest_user_key_) > 0) { } else if (ikey.user_key.compare(largest_user_key_) > 0) {
largest_user_key_.assign(ikey.user_key.data(), ikey.user_key.size()); largest_user_key_.assign(ikey.user_key.data(), ikey.user_key.size());
} }
if (hash_table_size_ < kvs_.size() / max_hash_table_ratio_) {
hash_table_size_ *= 2;
}
} }
Status CuckooTableBuilder::MakeHashTable(std::vector<CuckooBucket>* buckets) { Status CuckooTableBuilder::MakeHashTable(std::vector<CuckooBucket>* buckets) {
uint64_t hash_table_size = kvs_.size() / hash_table_ratio_; uint64_t hash_table_size_minus_one = hash_table_size_ - 1;
buckets->resize(hash_table_size + cuckoo_block_size_ - 1); buckets->resize(hash_table_size_minus_one + cuckoo_block_size_);
uint64_t make_space_for_key_call_id = 0; uint64_t make_space_for_key_call_id = 0;
for (uint32_t vector_idx = 0; vector_idx < kvs_.size(); vector_idx++) { for (uint32_t vector_idx = 0; vector_idx < kvs_.size(); vector_idx++) {
uint64_t bucket_id; uint64_t bucket_id;
@ -115,7 +119,8 @@ Status CuckooTableBuilder::MakeHashTable(std::vector<CuckooBucket>* buckets) {
ExtractUserKey(kvs_[vector_idx].first); ExtractUserKey(kvs_[vector_idx].first);
for (uint32_t hash_cnt = 0; hash_cnt < num_hash_func_ && !bucket_found; for (uint32_t hash_cnt = 0; hash_cnt < num_hash_func_ && !bucket_found;
++hash_cnt) { ++hash_cnt) {
uint64_t hash_val = get_slice_hash_(user_key, hash_cnt, hash_table_size); uint64_t hash_val = CuckooHash(user_key, hash_cnt,
hash_table_size_minus_one, get_slice_hash_);
// If there is a collision, check next cuckoo_block_size_ locations for // If there is a collision, check next cuckoo_block_size_ locations for
// empty locations. While checking, if we reach end of the hash table, // empty locations. While checking, if we reach end of the hash table,
// stop searching and proceed for next hash function. // stop searching and proceed for next hash function.
@ -137,15 +142,15 @@ Status CuckooTableBuilder::MakeHashTable(std::vector<CuckooBucket>* buckets) {
} }
} }
while (!bucket_found && !MakeSpaceForKey(hash_vals, while (!bucket_found && !MakeSpaceForKey(hash_vals,
hash_table_size, ++make_space_for_key_call_id, buckets, &bucket_id)) { ++make_space_for_key_call_id, buckets, &bucket_id)) {
// Rehash by increashing number of hash tables. // Rehash by increashing number of hash tables.
if (num_hash_func_ >= max_num_hash_func_) { if (num_hash_func_ >= max_num_hash_func_) {
return Status::NotSupported("Too many collisions. Unable to hash."); return Status::NotSupported("Too many collisions. Unable to hash.");
} }
// We don't really need to rehash the entire table because old hashes are // We don't really need to rehash the entire table because old hashes are
// still valid and we only increased the number of hash functions. // still valid and we only increased the number of hash functions.
uint64_t hash_val = get_slice_hash_(user_key, uint64_t hash_val = CuckooHash(user_key, num_hash_func_,
num_hash_func_, hash_table_size); hash_table_size_minus_one, get_slice_hash_);
++num_hash_func_; ++num_hash_func_;
for (uint32_t block_idx = 0; block_idx < cuckoo_block_size_; for (uint32_t block_idx = 0; block_idx < cuckoo_block_size_;
++block_idx, ++hash_val) { ++block_idx, ++hash_val) {
@ -167,13 +172,14 @@ Status CuckooTableBuilder::Finish() {
assert(!closed_); assert(!closed_);
closed_ = true; closed_ = true;
std::vector<CuckooBucket> buckets; std::vector<CuckooBucket> buckets;
Status s = MakeHashTable(&buckets); Status s;
if (!s.ok()) {
return s;
}
// Determine unused_user_key to fill empty buckets.
std::string unused_bucket; std::string unused_bucket;
if (!kvs_.empty()) { if (!kvs_.empty()) {
s = MakeHashTable(&buckets);
if (!s.ok()) {
return s;
}
// Determine unused_user_key to fill empty buckets.
std::string unused_user_key = smallest_user_key_; std::string unused_user_key = smallest_user_key_;
int curr_pos = unused_user_key.size() - 1; int curr_pos = unused_user_key.size() - 1;
while (curr_pos >= 0) { while (curr_pos >= 0) {
@ -205,6 +211,7 @@ Status CuckooTableBuilder::Finish() {
AppendInternalKey(&unused_bucket, ikey); AppendInternalKey(&unused_bucket, ikey);
} }
} }
properties_.num_entries = kvs_.size();
properties_.fixed_key_len = unused_bucket.size(); properties_.fixed_key_len = unused_bucket.size();
uint32_t value_length = kvs_.empty() ? 0 : kvs_[0].second.size(); uint32_t value_length = kvs_.empty() ? 0 : kvs_[0].second.size();
uint32_t bucket_size = value_length + properties_.fixed_key_len; uint32_t bucket_size = value_length + properties_.fixed_key_len;
@ -298,7 +305,7 @@ void CuckooTableBuilder::Abandon() {
} }
uint64_t CuckooTableBuilder::NumEntries() const { uint64_t CuckooTableBuilder::NumEntries() const {
return properties_.num_entries; return kvs_.size();
} }
uint64_t CuckooTableBuilder::FileSize() const { uint64_t CuckooTableBuilder::FileSize() const {
@ -307,11 +314,17 @@ uint64_t CuckooTableBuilder::FileSize() const {
} else if (properties_.num_entries == 0) { } else if (properties_.num_entries == 0) {
return 0; return 0;
} }
// This is not the actual size of the file as we need to account for
// hash table ratio. This returns the size of filled buckets in the table // Account for buckets being a power of two.
// scaled up by a factor of 1/hash_table_ratio. // As elements are added, file size remains constant for a while and doubles
return ((kvs_[0].first.size() + kvs_[0].second.size()) * // its size. Since compaction algorithm stops adding elements only after it
properties_.num_entries) / hash_table_ratio_; // exceeds the file limit, we account for the extra element being added here.
uint64_t expected_hash_table_size = hash_table_size_;
if (expected_hash_table_size < (kvs_.size() + 1) / max_hash_table_ratio_) {
expected_hash_table_size *= 2;
}
return (kvs_[0].first.size() + kvs_[0].second.size()) *
expected_hash_table_size;
} }
// This method is invoked when there is no place to insert the target key. // This method is invoked when there is no place to insert the target key.
@ -326,7 +339,6 @@ uint64_t CuckooTableBuilder::FileSize() const {
// If tree depth exceedes max depth, we return false indicating failure. // If tree depth exceedes max depth, we return false indicating failure.
bool CuckooTableBuilder::MakeSpaceForKey( bool CuckooTableBuilder::MakeSpaceForKey(
const autovector<uint64_t>& hash_vals, const autovector<uint64_t>& hash_vals,
const uint64_t hash_table_size,
const uint64_t make_space_for_key_call_id, const uint64_t make_space_for_key_call_id,
std::vector<CuckooBucket>* buckets, std::vector<CuckooBucket>* buckets,
uint64_t* bucket_id) { uint64_t* bucket_id) {
@ -354,6 +366,7 @@ bool CuckooTableBuilder::MakeSpaceForKey(
make_space_for_key_call_id; make_space_for_key_call_id;
tree.push_back(CuckooNode(bucket_id, 0, 0)); tree.push_back(CuckooNode(bucket_id, 0, 0));
} }
uint64_t hash_table_size_minus_one = hash_table_size_ - 1;
bool null_found = false; bool null_found = false;
uint32_t curr_pos = 0; uint32_t curr_pos = 0;
while (!null_found && curr_pos < tree.size()) { while (!null_found && curr_pos < tree.size()) {
@ -365,10 +378,11 @@ bool CuckooTableBuilder::MakeSpaceForKey(
CuckooBucket& curr_bucket = (*buckets)[curr_node.bucket_id]; CuckooBucket& curr_bucket = (*buckets)[curr_node.bucket_id];
for (uint32_t hash_cnt = 0; for (uint32_t hash_cnt = 0;
hash_cnt < num_hash_func_ && !null_found; ++hash_cnt) { hash_cnt < num_hash_func_ && !null_found; ++hash_cnt) {
uint64_t child_bucket_id = get_slice_hash_( uint64_t child_bucket_id = CuckooHash(
is_last_level_file_ ? kvs_[curr_bucket.vector_idx].first (is_last_level_file_ ? kvs_[curr_bucket.vector_idx].first :
: ExtractUserKey(Slice(kvs_[curr_bucket.vector_idx].first)), ExtractUserKey(Slice(kvs_[curr_bucket.vector_idx].first))),
hash_cnt, hash_table_size); hash_cnt, hash_table_size_minus_one, get_slice_hash_);
// Iterate inside Cuckoo Block.
for (uint32_t block_idx = 0; block_idx < cuckoo_block_size_; for (uint32_t block_idx = 0; block_idx < cuckoo_block_size_;
++block_idx, ++child_bucket_id) { ++block_idx, ++child_bucket_id) {
if ((*buckets)[child_bucket_id].make_space_for_key_call_id == if ((*buckets)[child_bucket_id].make_space_for_key_call_id ==

@ -21,9 +21,9 @@ namespace rocksdb {
class CuckooTableBuilder: public TableBuilder { class CuckooTableBuilder: public TableBuilder {
public: public:
CuckooTableBuilder( CuckooTableBuilder(
WritableFile* file, double hash_table_ratio, uint32_t max_num_hash_table, WritableFile* file, double max_hash_table_ratio,
uint32_t max_search_depth, const Comparator* user_comparator, uint32_t max_num_hash_func, uint32_t max_search_depth,
uint32_t cuckoo_block_size, const Comparator* user_comparator, uint32_t cuckoo_block_size,
uint64_t (*get_slice_hash)(const Slice&, uint32_t, uint64_t)); uint64_t (*get_slice_hash)(const Slice&, uint32_t, uint64_t));
// REQUIRES: Either Finish() or Abandon() has been called. // REQUIRES: Either Finish() or Abandon() has been called.
@ -69,7 +69,6 @@ class CuckooTableBuilder: public TableBuilder {
bool MakeSpaceForKey( bool MakeSpaceForKey(
const autovector<uint64_t>& hash_vals, const autovector<uint64_t>& hash_vals,
const uint64_t hash_table_size,
const uint64_t call_id, const uint64_t call_id,
std::vector<CuckooBucket>* buckets, std::vector<CuckooBucket>* buckets,
uint64_t* bucket_id); uint64_t* bucket_id);
@ -77,10 +76,11 @@ class CuckooTableBuilder: public TableBuilder {
uint32_t num_hash_func_; uint32_t num_hash_func_;
WritableFile* file_; WritableFile* file_;
const double hash_table_ratio_; const double max_hash_table_ratio_;
const uint32_t max_num_hash_func_; const uint32_t max_num_hash_func_;
const uint32_t max_search_depth_; const uint32_t max_search_depth_;
const uint32_t cuckoo_block_size_; const uint32_t cuckoo_block_size_;
uint64_t hash_table_size_;
bool is_last_level_file_; bool is_last_level_file_;
Status status_; Status status_;
std::vector<std::pair<std::string, std::string>> kvs_; std::vector<std::pair<std::string, std::string>> kvs_;

@ -114,6 +114,14 @@ class CuckooBuilderTest {
return ikey.GetKey().ToString(); return ikey.GetKey().ToString();
} }
uint64_t NextPowOf2(uint64_t num) {
uint64_t n = 2;
while (n <= num) {
n *= 2;
}
return n;
}
Env* env_; Env* env_;
EnvOptions env_options_; EnvOptions env_options_;
std::string fname; std::string fname;
@ -122,7 +130,7 @@ class CuckooBuilderTest {
TEST(CuckooBuilderTest, SuccessWithEmptyFile) { TEST(CuckooBuilderTest, SuccessWithEmptyFile) {
unique_ptr<WritableFile> writable_file; unique_ptr<WritableFile> writable_file;
fname = test::TmpDir() + "/NoCollisionFullKey"; fname = test::TmpDir() + "/EmptyFile";
ASSERT_OK(env_->NewWritableFile(fname, &writable_file, env_options_)); ASSERT_OK(env_->NewWritableFile(fname, &writable_file, env_options_));
CuckooTableBuilder builder(writable_file.get(), kHashTableRatio, CuckooTableBuilder builder(writable_file.get(), kHashTableRatio,
4, 100, BytewiseComparator(), 1, GetSliceHash); 4, 100, BytewiseComparator(), 1, GetSliceHash);
@ -162,7 +170,7 @@ TEST(CuckooBuilderTest, WriteSuccessNoCollisionFullKey) {
ASSERT_OK(builder.Finish()); ASSERT_OK(builder.Finish());
ASSERT_OK(writable_file->Close()); ASSERT_OK(writable_file->Close());
uint32_t expected_table_size = keys.size() / kHashTableRatio; uint32_t expected_table_size = NextPowOf2(keys.size() / kHashTableRatio);
std::string expected_unused_bucket = GetInternalKey("key00", true); std::string expected_unused_bucket = GetInternalKey("key00", true);
expected_unused_bucket += std::string(values[0].size(), 'a'); expected_unused_bucket += std::string(values[0].size(), 'a');
CheckFileContents(keys, values, expected_locations, CheckFileContents(keys, values, expected_locations,
@ -199,7 +207,7 @@ TEST(CuckooBuilderTest, WriteSuccessWithCollisionFullKey) {
ASSERT_OK(builder.Finish()); ASSERT_OK(builder.Finish());
ASSERT_OK(writable_file->Close()); ASSERT_OK(writable_file->Close());
uint32_t expected_table_size = keys.size() / kHashTableRatio; uint32_t expected_table_size = NextPowOf2(keys.size() / kHashTableRatio);
std::string expected_unused_bucket = GetInternalKey("key00", true); std::string expected_unused_bucket = GetInternalKey("key00", true);
expected_unused_bucket += std::string(values[0].size(), 'a'); expected_unused_bucket += std::string(values[0].size(), 'a');
CheckFileContents(keys, values, expected_locations, CheckFileContents(keys, values, expected_locations,
@ -237,7 +245,7 @@ TEST(CuckooBuilderTest, WriteSuccessWithCollisionAndCuckooBlock) {
ASSERT_OK(builder.Finish()); ASSERT_OK(builder.Finish());
ASSERT_OK(writable_file->Close()); ASSERT_OK(writable_file->Close());
uint32_t expected_table_size = keys.size() / kHashTableRatio; uint32_t expected_table_size = NextPowOf2(keys.size() / kHashTableRatio);
std::string expected_unused_bucket = GetInternalKey("key00", true); std::string expected_unused_bucket = GetInternalKey("key00", true);
expected_unused_bucket += std::string(values[0].size(), 'a'); expected_unused_bucket += std::string(values[0].size(), 'a');
CheckFileContents(keys, values, expected_locations, CheckFileContents(keys, values, expected_locations,
@ -279,7 +287,7 @@ TEST(CuckooBuilderTest, WithCollisionPathFullKey) {
ASSERT_OK(builder.Finish()); ASSERT_OK(builder.Finish());
ASSERT_OK(writable_file->Close()); ASSERT_OK(writable_file->Close());
uint32_t expected_table_size = keys.size() / kHashTableRatio; uint32_t expected_table_size = NextPowOf2(keys.size() / kHashTableRatio);
std::string expected_unused_bucket = GetInternalKey("key00", true); std::string expected_unused_bucket = GetInternalKey("key00", true);
expected_unused_bucket += std::string(values[0].size(), 'a'); expected_unused_bucket += std::string(values[0].size(), 'a');
CheckFileContents(keys, values, expected_locations, CheckFileContents(keys, values, expected_locations,
@ -318,7 +326,7 @@ TEST(CuckooBuilderTest, WithCollisionPathFullKeyAndCuckooBlock) {
ASSERT_OK(builder.Finish()); ASSERT_OK(builder.Finish());
ASSERT_OK(writable_file->Close()); ASSERT_OK(writable_file->Close());
uint32_t expected_table_size = keys.size() / kHashTableRatio; uint32_t expected_table_size = NextPowOf2(keys.size() / kHashTableRatio);
std::string expected_unused_bucket = GetInternalKey("key00", true); std::string expected_unused_bucket = GetInternalKey("key00", true);
expected_unused_bucket += std::string(values[0].size(), 'a'); expected_unused_bucket += std::string(values[0].size(), 'a');
CheckFileContents(keys, values, expected_locations, CheckFileContents(keys, values, expected_locations,
@ -351,7 +359,7 @@ TEST(CuckooBuilderTest, WriteSuccessNoCollisionUserKey) {
ASSERT_OK(builder.Finish()); ASSERT_OK(builder.Finish());
ASSERT_OK(writable_file->Close()); ASSERT_OK(writable_file->Close());
uint32_t expected_table_size = user_keys.size() / kHashTableRatio; uint32_t expected_table_size = NextPowOf2(user_keys.size() / kHashTableRatio);
std::string expected_unused_bucket = "key00"; std::string expected_unused_bucket = "key00";
expected_unused_bucket += std::string(values[0].size(), 'a'); expected_unused_bucket += std::string(values[0].size(), 'a');
CheckFileContents(user_keys, values, expected_locations, CheckFileContents(user_keys, values, expected_locations,
@ -384,7 +392,7 @@ TEST(CuckooBuilderTest, WriteSuccessWithCollisionUserKey) {
ASSERT_OK(builder.Finish()); ASSERT_OK(builder.Finish());
ASSERT_OK(writable_file->Close()); ASSERT_OK(writable_file->Close());
uint32_t expected_table_size = user_keys.size() / kHashTableRatio; uint32_t expected_table_size = NextPowOf2(user_keys.size() / kHashTableRatio);
std::string expected_unused_bucket = "key00"; std::string expected_unused_bucket = "key00";
expected_unused_bucket += std::string(values[0].size(), 'a'); expected_unused_bucket += std::string(values[0].size(), 'a');
CheckFileContents(user_keys, values, expected_locations, CheckFileContents(user_keys, values, expected_locations,
@ -419,7 +427,7 @@ TEST(CuckooBuilderTest, WithCollisionPathUserKey) {
ASSERT_OK(builder.Finish()); ASSERT_OK(builder.Finish());
ASSERT_OK(writable_file->Close()); ASSERT_OK(writable_file->Close());
uint32_t expected_table_size = user_keys.size() / kHashTableRatio; uint32_t expected_table_size = NextPowOf2(user_keys.size() / kHashTableRatio);
std::string expected_unused_bucket = "key00"; std::string expected_unused_bucket = "key00";
expected_unused_bucket += std::string(values[0].size(), 'a'); expected_unused_bucket += std::string(values[0].size(), 'a');
CheckFileContents(user_keys, values, expected_locations, CheckFileContents(user_keys, values, expected_locations,

@ -9,34 +9,14 @@
#include "db/dbformat.h" #include "db/dbformat.h"
#include "table/cuckoo_table_builder.h" #include "table/cuckoo_table_builder.h"
#include "table/cuckoo_table_reader.h" #include "table/cuckoo_table_reader.h"
#include "util/murmurhash.h"
namespace rocksdb { namespace rocksdb {
extern const uint32_t kMaxNumHashTable = 64;
extern uint64_t GetSliceMurmurHash(const Slice& s, uint32_t index,
uint64_t max_num_buckets) {
static constexpr uint32_t seeds[kMaxNumHashTable] = {
816922183, 506425713, 949485004, 22513986, 421427259, 500437285,
888981693, 847587269, 511007211, 722295391, 934013645, 566947683,
193618736, 428277388, 770956674, 819994962, 755946528, 40807421,
263144466, 241420041, 444294464, 731606396, 304158902, 563235655,
968740453, 336996831, 462831574, 407970157, 985877240, 637708754,
736932700, 205026023, 755371467, 729648411, 807744117, 46482135,
847092855, 620960699, 102476362, 314094354, 625838942, 550889395,
639071379, 834567510, 397667304, 151945969, 443634243, 196618243,
421986347, 407218337, 964502417, 327741231, 493359459, 452453139,
692216398, 108161624, 816246924, 234779764, 618949448, 496133787,
156374056, 316589799, 982915425, 553105889 };
return MurmurHash(s.data(), s.size(), seeds[index]) % max_num_buckets;
}
Status CuckooTableFactory::NewTableReader(const Options& options, Status CuckooTableFactory::NewTableReader(const Options& options,
const EnvOptions& soptions, const InternalKeyComparator& icomp, const EnvOptions& soptions, const InternalKeyComparator& icomp,
std::unique_ptr<RandomAccessFile>&& file, uint64_t file_size, std::unique_ptr<RandomAccessFile>&& file, uint64_t file_size,
std::unique_ptr<TableReader>* table) const { std::unique_ptr<TableReader>* table) const {
std::unique_ptr<CuckooTableReader> new_reader(new CuckooTableReader(options, std::unique_ptr<CuckooTableReader> new_reader(new CuckooTableReader(options,
std::move(file), file_size, icomp.user_comparator(), GetSliceMurmurHash)); std::move(file), file_size, icomp.user_comparator(), nullptr));
Status s = new_reader->status(); Status s = new_reader->status();
if (s.ok()) { if (s.ok()) {
*table = std::move(new_reader); *table = std::move(new_reader);
@ -47,9 +27,8 @@ Status CuckooTableFactory::NewTableReader(const Options& options,
TableBuilder* CuckooTableFactory::NewTableBuilder( TableBuilder* CuckooTableFactory::NewTableBuilder(
const Options& options, const InternalKeyComparator& internal_comparator, const Options& options, const InternalKeyComparator& internal_comparator,
WritableFile* file, CompressionType compression_type) const { WritableFile* file, CompressionType compression_type) const {
return new CuckooTableBuilder(file, hash_table_ratio_, kMaxNumHashTable, return new CuckooTableBuilder(file, hash_table_ratio_, 64, max_search_depth_,
max_search_depth_, internal_comparator.user_comparator(), internal_comparator.user_comparator(), cuckoo_block_size_, nullptr);
cuckoo_block_size_, GetSliceMurmurHash);
} }
std::string CuckooTableFactory::GetPrintableTableOptions() const { std::string CuckooTableFactory::GetPrintableTableOptions() const {
@ -64,6 +43,9 @@ std::string CuckooTableFactory::GetPrintableTableOptions() const {
snprintf(buffer, kBufferSize, " max_search_depth: %u\n", snprintf(buffer, kBufferSize, " max_search_depth: %u\n",
max_search_depth_); max_search_depth_);
ret.append(buffer); ret.append(buffer);
snprintf(buffer, kBufferSize, " cuckoo_block_size: %u\n",
cuckoo_block_size_);
ret.append(buffer);
return ret; return ret;
} }

@ -8,11 +8,23 @@
#include <string> #include <string>
#include "rocksdb/table.h" #include "rocksdb/table.h"
#include "util/murmurhash.h"
namespace rocksdb { namespace rocksdb {
extern uint64_t GetSliceMurmurHash(const Slice& s, uint32_t index, const uint32_t kCuckooMurmurSeedMultiplier = 816922183;
uint64_t max_num_buckets); static inline uint64_t CuckooHash(
const Slice& user_key, uint32_t hash_cnt, uint64_t table_size_minus_one,
uint64_t (*get_slice_hash)(const Slice&, uint32_t, uint64_t)) {
#ifndef NDEBUG
// This part is used only in unit tests.
if (get_slice_hash != nullptr) {
return get_slice_hash(user_key, hash_cnt, table_size_minus_one + 1);
}
#endif
return MurmurHash(user_key.data(), user_key.size(),
kCuckooMurmurSeedMultiplier * hash_cnt) & table_size_minus_one;
}
// Cuckoo Table is designed for applications that require fast point lookups // Cuckoo Table is designed for applications that require fast point lookups
// but not fast range scans. // but not fast range scans.

@ -17,12 +17,13 @@
#include <vector> #include <vector>
#include "rocksdb/iterator.h" #include "rocksdb/iterator.h"
#include "table/meta_blocks.h" #include "table/meta_blocks.h"
#include "table/cuckoo_table_factory.h"
#include "util/arena.h" #include "util/arena.h"
#include "util/coding.h" #include "util/coding.h"
namespace rocksdb { namespace rocksdb {
namespace { namespace {
static const uint64_t CACHE_LINE_MASK = ~(CACHE_LINE_SIZE - 1); static const uint64_t CACHE_LINE_MASK = ~((uint64_t)CACHE_LINE_SIZE - 1);
} }
extern const uint64_t kCuckooTableMagicNumber; extern const uint64_t kCuckooTableMagicNumber;
@ -76,8 +77,8 @@ CuckooTableReader::CuckooTableReader(
status_ = Status::InvalidArgument("Hash table size not found"); status_ = Status::InvalidArgument("Hash table size not found");
return; return;
} }
hash_table_size_ = *reinterpret_cast<const uint64_t*>( table_size_minus_one_ = *reinterpret_cast<const uint64_t*>(
hash_table_size->second.data()); hash_table_size->second.data()) - 1;
auto is_last_level = user_props.find(CuckooTablePropertyNames::kIsLastLevel); auto is_last_level = user_props.find(CuckooTablePropertyNames::kIsLastLevel);
if (is_last_level == user_props.end()) { if (is_last_level == user_props.end()) {
status_ = Status::InvalidArgument("Is last level not found"); status_ = Status::InvalidArgument("Is last level not found");
@ -104,11 +105,11 @@ Status CuckooTableReader::Get(
assert(key.size() == key_length_ + (is_last_level_ ? 8 : 0)); assert(key.size() == key_length_ + (is_last_level_ ? 8 : 0));
Slice user_key = ExtractUserKey(key); Slice user_key = ExtractUserKey(key);
for (uint32_t hash_cnt = 0; hash_cnt < num_hash_func_; ++hash_cnt) { for (uint32_t hash_cnt = 0; hash_cnt < num_hash_func_; ++hash_cnt) {
uint64_t hash_val = get_slice_hash_(user_key, hash_cnt, hash_table_size_); uint64_t offset = bucket_length_ * CuckooHash(
assert(hash_val < hash_table_size_); user_key, hash_cnt, table_size_minus_one_, get_slice_hash_);
const char* bucket = &file_data_.data()[offset];
for (uint32_t block_idx = 0; block_idx < cuckoo_block_size_; for (uint32_t block_idx = 0; block_idx < cuckoo_block_size_;
++block_idx, ++hash_val) { ++block_idx, bucket += bucket_length_) {
const char* bucket = &file_data_.data()[hash_val * bucket_length_];
if (ucomp_->Compare(Slice(unused_key_.data(), user_key.size()), if (ucomp_->Compare(Slice(unused_key_.data(), user_key.size()),
Slice(bucket, user_key.size())) == 0) { Slice(bucket, user_key.size())) == 0) {
return Status::OK(); return Status::OK();
@ -137,8 +138,9 @@ Status CuckooTableReader::Get(
void CuckooTableReader::Prepare(const Slice& key) { void CuckooTableReader::Prepare(const Slice& key) {
// Prefetch the first Cuckoo Block. // Prefetch the first Cuckoo Block.
uint64_t addr = reinterpret_cast<uint64_t>(file_data_.data()) + bucket_length_ Slice user_key = ExtractUserKey(key);
* get_slice_hash_(ExtractUserKey(key), 0, hash_table_size_); uint64_t addr = reinterpret_cast<uint64_t>(file_data_.data()) +
bucket_length_ * CuckooHash(user_key, 0, table_size_minus_one_, nullptr);
uint64_t end_addr = addr + cuckoo_block_bytes_minus_one_; uint64_t end_addr = addr + cuckoo_block_bytes_minus_one_;
for (addr &= CACHE_LINE_MASK; addr < end_addr; addr += CACHE_LINE_SIZE) { for (addr &= CACHE_LINE_MASK; addr < end_addr; addr += CACHE_LINE_SIZE) {
PREFETCH(reinterpret_cast<const char*>(addr), 0, 3); PREFETCH(reinterpret_cast<const char*>(addr), 0, 3);
@ -205,8 +207,8 @@ CuckooTableIterator::CuckooTableIterator(CuckooTableReader* reader)
void CuckooTableIterator::LoadKeysFromReader() { void CuckooTableIterator::LoadKeysFromReader() {
key_to_bucket_id_.reserve(reader_->GetTableProperties()->num_entries); key_to_bucket_id_.reserve(reader_->GetTableProperties()->num_entries);
uint64_t num_buckets = reader_->hash_table_size_ + uint64_t num_buckets = reader_->table_size_minus_one_ +
reader_->cuckoo_block_size_ - 1; reader_->cuckoo_block_size_;
for (uint32_t bucket_id = 0; bucket_id < num_buckets; bucket_id++) { for (uint32_t bucket_id = 0; bucket_id < num_buckets; bucket_id++) {
Slice read_key; Slice read_key;
status_ = reader_->file_->Read(bucket_id * reader_->bucket_length_, status_ = reader_->file_->Read(bucket_id * reader_->bucket_length_,

@ -72,7 +72,7 @@ class CuckooTableReader: public TableReader {
uint32_t bucket_length_; uint32_t bucket_length_;
uint32_t cuckoo_block_size_; uint32_t cuckoo_block_size_;
uint32_t cuckoo_block_bytes_minus_one_; uint32_t cuckoo_block_bytes_minus_one_;
uint64_t hash_table_size_; uint64_t table_size_minus_one_;
const Comparator* ucomp_; const Comparator* ucomp_;
uint64_t (*get_slice_hash_)(const Slice& s, uint32_t index, uint64_t (*get_slice_hash_)(const Slice& s, uint32_t index,
uint64_t max_num_buckets); uint64_t max_num_buckets);

@ -38,9 +38,6 @@ DEFINE_bool(write, false,
namespace rocksdb { namespace rocksdb {
extern const uint64_t kCuckooTableMagicNumber;
extern const uint64_t kMaxNumHashTable;
namespace { namespace {
const uint32_t kNumHashFunc = 10; const uint32_t kNumHashFunc = 10;
// Methods, variables related to Hash functions. // Methods, variables related to Hash functions.
@ -397,13 +394,12 @@ void GetKeys(uint64_t num, std::vector<std::string>* keys) {
} }
} }
std::string GetFileName(uint64_t num, double hash_ratio) { std::string GetFileName(uint64_t num) {
if (FLAGS_file_dir.empty()) { if (FLAGS_file_dir.empty()) {
FLAGS_file_dir = test::TmpDir(); FLAGS_file_dir = test::TmpDir();
} }
return FLAGS_file_dir + "/cuckoo_read_benchmark" + return FLAGS_file_dir + "/cuckoo_read_benchmark" +
std::to_string(num/1000000) + "Mratio" + std::to_string(num/1000000) + "Mkeys";
std::to_string(static_cast<int>(100*hash_ratio));
} }
// Create last level file as we are interested in measuring performance of // Create last level file as we are interested in measuring performance of
@ -414,13 +410,13 @@ void WriteFile(const std::vector<std::string>& keys,
options.allow_mmap_reads = true; options.allow_mmap_reads = true;
Env* env = options.env; Env* env = options.env;
EnvOptions env_options = EnvOptions(options); EnvOptions env_options = EnvOptions(options);
std::string fname = GetFileName(num, hash_ratio); std::string fname = GetFileName(num);
std::unique_ptr<WritableFile> writable_file; std::unique_ptr<WritableFile> writable_file;
ASSERT_OK(env->NewWritableFile(fname, &writable_file, env_options)); ASSERT_OK(env->NewWritableFile(fname, &writable_file, env_options));
CuckooTableBuilder builder( CuckooTableBuilder builder(
writable_file.get(), hash_ratio, writable_file.get(), hash_ratio,
kMaxNumHashTable, 1000, test::Uint64Comparator(), 5, GetSliceMurmurHash); 64, 1000, test::Uint64Comparator(), 5, nullptr);
ASSERT_OK(builder.status()); ASSERT_OK(builder.status());
for (uint64_t key_idx = 0; key_idx < num; ++key_idx) { for (uint64_t key_idx = 0; key_idx < num; ++key_idx) {
// Value is just a part of key. // Value is just a part of key.
@ -439,27 +435,25 @@ void WriteFile(const std::vector<std::string>& keys,
CuckooTableReader reader( CuckooTableReader reader(
options, std::move(read_file), file_size, options, std::move(read_file), file_size,
test::Uint64Comparator(), GetSliceMurmurHash); test::Uint64Comparator(), nullptr);
ASSERT_OK(reader.status()); ASSERT_OK(reader.status());
ReadOptions r_options; ReadOptions r_options;
for (const auto& key : keys) { for (uint64_t i = 0; i < num; ++i) {
int cnt = 0; int cnt = 0;
ASSERT_OK(reader.Get(r_options, Slice(key), &cnt, CheckValue, nullptr)); ASSERT_OK(reader.Get(r_options, Slice(keys[i]), &cnt, CheckValue, nullptr));
if (cnt != 1) { if (cnt != 1) {
fprintf(stderr, "%" PRIu64 " not found.\n", fprintf(stderr, "%" PRIu64 " not found.\n", i);
*reinterpret_cast<const uint64_t*>(key.data()));
ASSERT_EQ(1, cnt); ASSERT_EQ(1, cnt);
} }
} }
} }
void ReadKeys(const std::vector<std::string>& keys, uint64_t num, void ReadKeys(uint64_t num, uint32_t batch_size) {
double hash_ratio, uint32_t batch_size) {
Options options; Options options;
options.allow_mmap_reads = true; options.allow_mmap_reads = true;
Env* env = options.env; Env* env = options.env;
EnvOptions env_options = EnvOptions(options); EnvOptions env_options = EnvOptions(options);
std::string fname = GetFileName(num, hash_ratio); std::string fname = GetFileName(num);
uint64_t file_size; uint64_t file_size;
env->GetFileSize(fname, &file_size); env->GetFileSize(fname, &file_size);
@ -468,29 +462,33 @@ void ReadKeys(const std::vector<std::string>& keys, uint64_t num,
CuckooTableReader reader( CuckooTableReader reader(
options, std::move(read_file), file_size, test::Uint64Comparator(), options, std::move(read_file), file_size, test::Uint64Comparator(),
GetSliceMurmurHash); nullptr);
ASSERT_OK(reader.status()); ASSERT_OK(reader.status());
const UserCollectedProperties user_props = const UserCollectedProperties user_props =
reader.GetTableProperties()->user_collected_properties; reader.GetTableProperties()->user_collected_properties;
const uint32_t num_hash_fun = *reinterpret_cast<const uint32_t*>( const uint32_t num_hash_fun = *reinterpret_cast<const uint32_t*>(
user_props.at(CuckooTablePropertyNames::kNumHashFunc).data()); user_props.at(CuckooTablePropertyNames::kNumHashFunc).data());
fprintf(stderr, "With %" PRIu64 " items and hash table ratio %f, number of" const uint64_t table_size = *reinterpret_cast<const uint64_t*>(
" hash functions used: %u.\n", num, hash_ratio, num_hash_fun); user_props.at(CuckooTablePropertyNames::kHashTableSize).data());
fprintf(stderr, "With %" PRIu64 " items, utilization is %.2f%%, number of"
" hash functions: %u.\n", num, num * 100.0 / (table_size), num_hash_fun);
ReadOptions r_options; ReadOptions r_options;
uint64_t start_time = env->NowMicros(); uint64_t start_time = env->NowMicros();
if (batch_size > 0) { if (batch_size > 0) {
for (uint64_t i = 0; i < num; i += batch_size) { for (uint64_t i = 0; i < num; i += batch_size) {
for (uint64_t j = i; j < i+batch_size && j < num; ++j) { for (uint64_t j = i; j < i+batch_size && j < num; ++j) {
reader.Prepare(Slice(keys[j])); reader.Prepare(Slice(reinterpret_cast<char*>(&j), 16));
} }
for (uint64_t j = i; j < i+batch_size && j < num; ++j) { for (uint64_t j = i; j < i+batch_size && j < num; ++j) {
reader.Get(r_options, Slice(keys[j]), nullptr, DoNothing, nullptr); reader.Get(r_options, Slice(reinterpret_cast<char*>(&j), 16),
nullptr, DoNothing, nullptr);
} }
} }
} else { } else {
for (uint64_t i = 0; i < num; i++) { for (uint64_t i = 0; i < num; i++) {
reader.Get(r_options, Slice(keys[i]), nullptr, DoNothing, nullptr); reader.Get(r_options, Slice(reinterpret_cast<char*>(&i), 16), nullptr,
DoNothing, nullptr);
} }
} }
float time_per_op = (env->NowMicros() - start_time) * 1.0 / num; float time_per_op = (env->NowMicros() - start_time) * 1.0 / num;
@ -501,26 +499,30 @@ void ReadKeys(const std::vector<std::string>& keys, uint64_t num,
} // namespace. } // namespace.
TEST(CuckooReaderTest, TestReadPerformance) { TEST(CuckooReaderTest, TestReadPerformance) {
uint64_t num = 1000*1000*100;
if (!FLAGS_enable_perf) { if (!FLAGS_enable_perf) {
return; return;
} }
double hash_ratio = 0.95;
// These numbers are chosen to have a hash utilizaiton % close to
// 0.9, 0.75, 0.6 and 0.5 respectively.
// They all create 128 M buckets.
std::vector<uint64_t> nums = {120*1000*1000, 100*1000*1000, 80*1000*1000,
70*1000*1000};
#ifndef NDEBUG #ifndef NDEBUG
fprintf(stdout, fprintf(stdout,
"WARNING: Not compiled with DNDEBUG. Performance tests may be slow.\n"); "WARNING: Not compiled with DNDEBUG. Performance tests may be slow.\n");
#endif #endif
std::vector<std::string> keys; std::vector<std::string> keys;
GetKeys(num, &keys); GetKeys(*std::max_element(nums.begin(), nums.end()), &keys);
for (double hash_ratio : std::vector<double>({0.5, 0.6, 0.75, 0.9})) { for (uint64_t num : nums) {
if (FLAGS_write || !Env::Default()->FileExists( if (FLAGS_write || !Env::Default()->FileExists(GetFileName(num))) {
GetFileName(num, hash_ratio))) {
WriteFile(keys, num, hash_ratio); WriteFile(keys, num, hash_ratio);
} }
ReadKeys(keys, num, hash_ratio, 0); ReadKeys(num, 0);
ReadKeys(keys, num, hash_ratio, 10); ReadKeys(num, 10);
ReadKeys(keys, num, hash_ratio, 25); ReadKeys(num, 25);
ReadKeys(keys, num, hash_ratio, 50); ReadKeys(num, 50);
ReadKeys(keys, num, hash_ratio, 100); ReadKeys(num, 100);
fprintf(stderr, "\n"); fprintf(stderr, "\n");
} }
} }

Loading…
Cancel
Save