|
|
|
// Copyright (c) 2011-present, Facebook, Inc. All rights reserved.
|
|
|
|
// This source code is licensed under both the GPLv2 (found in the
|
|
|
|
// COPYING file in the root directory) and Apache 2.0 License
|
|
|
|
// (found in the LICENSE.Apache file in the root directory).
|
|
|
|
//
|
|
|
|
// Copyright (c) 2011 The LevelDB Authors. All rights reserved.
|
|
|
|
// Use of this source code is governed by a BSD-style license that can be
|
|
|
|
// found in the LICENSE file. See the AUTHORS file for names of contributors.
|
|
|
|
//
|
Add new persistent 64-bit hash (#5984)
Summary:
For upcoming new SST filter implementations, we will use a new
64-bit hash function (XXH3 preview, slightly modified). This change
updates hash.{h,cc} for that change, adds unit tests, and out-of-lines
the implementations to keep hash.h as clean/small as possible.
In developing the unit tests, I discovered that the XXH3 preview always
returns zero for the empty string. Zero is problematic for some
algorithms (including an upcoming SST filter implementation) if it
occurs more often than at the "natural" rate, so it should not be
returned from trivial values using trivial seeds. I modified our fork
of XXH3 to return a modest hash of the seed for the empty string.
With hash function details out-of-lines in hash.h, it makes sense to
enable XXH_INLINE_ALL, so that direct calls to XXH64/XXH32/XXH3p
are inlined. To fix array-bounds warnings on some inline calls, I
injected some casts to uintptr_t in xxhash.cc. (Issue reported to Yann.)
Revised: Reverted using XXH_INLINE_ALL for now. Some Facebook
checks are unhappy about #include on xxhash.cc file. I would
fix that by rename to xxhash_cc.h, but to best preserve history I want
to do that in a separate commit (PR) from the uintptr casts.
Also updated filter_bench for this change, improving the performance
predictability of dry run hashing and adding support for 64-bit hash
(for upcoming new SST filter implementations, minor dead code in the
tool for now).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5984
Differential Revision: D18246567
Pulled By: pdillinger
fbshipit-source-id: 6162fbf6381d63c8cc611dd7ec70e1ddc883fbb8
5 years ago
|
|
|
// Common hash functions with convenient interfaces. If hashing a
|
|
|
|
// statically-sized input in a performance-critical context, consider
|
|
|
|
// calling a specific hash implementation directly, such as
|
|
|
|
// XXH3p_64bits from xxhash.h.
|
|
|
|
//
|
|
|
|
// Since this is a very common header, implementation details are kept
|
|
|
|
// out-of-line. Out-of-lining also aids in tracking the time spent in
|
|
|
|
// hashing functions. Inlining is of limited benefit for runtime-sized
|
|
|
|
// hash inputs.
|
|
|
|
|
|
|
|
#pragma once
|
|
|
|
|
|
|
|
#include <cstddef>
|
|
|
|
#include <cstdint>
|
|
|
|
|
|
|
|
#include "rocksdb/slice.h"
|
|
|
|
#include "util/fastrange.h"
|
|
|
|
|
|
|
|
namespace ROCKSDB_NAMESPACE {
|
|
|
|
|
Add new persistent 64-bit hash (#5984)
Summary:
For upcoming new SST filter implementations, we will use a new
64-bit hash function (XXH3 preview, slightly modified). This change
updates hash.{h,cc} for that change, adds unit tests, and out-of-lines
the implementations to keep hash.h as clean/small as possible.
In developing the unit tests, I discovered that the XXH3 preview always
returns zero for the empty string. Zero is problematic for some
algorithms (including an upcoming SST filter implementation) if it
occurs more often than at the "natural" rate, so it should not be
returned from trivial values using trivial seeds. I modified our fork
of XXH3 to return a modest hash of the seed for the empty string.
With hash function details out-of-lines in hash.h, it makes sense to
enable XXH_INLINE_ALL, so that direct calls to XXH64/XXH32/XXH3p
are inlined. To fix array-bounds warnings on some inline calls, I
injected some casts to uintptr_t in xxhash.cc. (Issue reported to Yann.)
Revised: Reverted using XXH_INLINE_ALL for now. Some Facebook
checks are unhappy about #include on xxhash.cc file. I would
fix that by rename to xxhash_cc.h, but to best preserve history I want
to do that in a separate commit (PR) from the uintptr casts.
Also updated filter_bench for this change, improving the performance
predictability of dry run hashing and adding support for 64-bit hash
(for upcoming new SST filter implementations, minor dead code in the
tool for now).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5984
Differential Revision: D18246567
Pulled By: pdillinger
fbshipit-source-id: 6162fbf6381d63c8cc611dd7ec70e1ddc883fbb8
5 years ago
|
|
|
// Stable/persistent 64-bit hash. Higher quality and generally faster than
|
|
|
|
// Hash(), especially for inputs > 24 bytes.
|
|
|
|
extern uint64_t Hash64(const char* data, size_t n, uint64_t seed);
|
|
|
|
|
|
|
|
// Specific optimization without seed (same as seed = 0)
|
|
|
|
extern uint64_t Hash64(const char* data, size_t n);
|
|
|
|
|
|
|
|
// Non-persistent hash. Must only used for in-memory data structure.
|
|
|
|
// The hash results are thus applicable to change. (Thus, it rarely makes
|
|
|
|
// sense to specify a seed for this function.)
|
Add new persistent 64-bit hash (#5984)
Summary:
For upcoming new SST filter implementations, we will use a new
64-bit hash function (XXH3 preview, slightly modified). This change
updates hash.{h,cc} for that change, adds unit tests, and out-of-lines
the implementations to keep hash.h as clean/small as possible.
In developing the unit tests, I discovered that the XXH3 preview always
returns zero for the empty string. Zero is problematic for some
algorithms (including an upcoming SST filter implementation) if it
occurs more often than at the "natural" rate, so it should not be
returned from trivial values using trivial seeds. I modified our fork
of XXH3 to return a modest hash of the seed for the empty string.
With hash function details out-of-lines in hash.h, it makes sense to
enable XXH_INLINE_ALL, so that direct calls to XXH64/XXH32/XXH3p
are inlined. To fix array-bounds warnings on some inline calls, I
injected some casts to uintptr_t in xxhash.cc. (Issue reported to Yann.)
Revised: Reverted using XXH_INLINE_ALL for now. Some Facebook
checks are unhappy about #include on xxhash.cc file. I would
fix that by rename to xxhash_cc.h, but to best preserve history I want
to do that in a separate commit (PR) from the uintptr casts.
Also updated filter_bench for this change, improving the performance
predictability of dry run hashing and adding support for 64-bit hash
(for upcoming new SST filter implementations, minor dead code in the
tool for now).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5984
Differential Revision: D18246567
Pulled By: pdillinger
fbshipit-source-id: 6162fbf6381d63c8cc611dd7ec70e1ddc883fbb8
5 years ago
|
|
|
inline uint64_t NPHash64(const char* data, size_t n, uint32_t seed) {
|
|
|
|
// Currently same as Hash64
|
|
|
|
return Hash64(data, n, seed);
|
|
|
|
}
|
|
|
|
|
|
|
|
// Specific optimization without seed (same as seed = 0)
|
|
|
|
inline uint64_t NPHash64(const char* data, size_t n) {
|
|
|
|
// Currently same as Hash64
|
|
|
|
return Hash64(data, n);
|
|
|
|
}
|
|
|
|
|
Add new persistent 64-bit hash (#5984)
Summary:
For upcoming new SST filter implementations, we will use a new
64-bit hash function (XXH3 preview, slightly modified). This change
updates hash.{h,cc} for that change, adds unit tests, and out-of-lines
the implementations to keep hash.h as clean/small as possible.
In developing the unit tests, I discovered that the XXH3 preview always
returns zero for the empty string. Zero is problematic for some
algorithms (including an upcoming SST filter implementation) if it
occurs more often than at the "natural" rate, so it should not be
returned from trivial values using trivial seeds. I modified our fork
of XXH3 to return a modest hash of the seed for the empty string.
With hash function details out-of-lines in hash.h, it makes sense to
enable XXH_INLINE_ALL, so that direct calls to XXH64/XXH32/XXH3p
are inlined. To fix array-bounds warnings on some inline calls, I
injected some casts to uintptr_t in xxhash.cc. (Issue reported to Yann.)
Revised: Reverted using XXH_INLINE_ALL for now. Some Facebook
checks are unhappy about #include on xxhash.cc file. I would
fix that by rename to xxhash_cc.h, but to best preserve history I want
to do that in a separate commit (PR) from the uintptr casts.
Also updated filter_bench for this change, improving the performance
predictability of dry run hashing and adding support for 64-bit hash
(for upcoming new SST filter implementations, minor dead code in the
tool for now).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5984
Differential Revision: D18246567
Pulled By: pdillinger
fbshipit-source-id: 6162fbf6381d63c8cc611dd7ec70e1ddc883fbb8
5 years ago
|
|
|
// Stable/persistent 32-bit hash. Moderate quality and high speed on
|
|
|
|
// small inputs.
|
|
|
|
// TODO: consider rename to Hash32
|
|
|
|
extern uint32_t Hash(const char* data, size_t n, uint32_t seed);
|
|
|
|
|
Add new persistent 64-bit hash (#5984)
Summary:
For upcoming new SST filter implementations, we will use a new
64-bit hash function (XXH3 preview, slightly modified). This change
updates hash.{h,cc} for that change, adds unit tests, and out-of-lines
the implementations to keep hash.h as clean/small as possible.
In developing the unit tests, I discovered that the XXH3 preview always
returns zero for the empty string. Zero is problematic for some
algorithms (including an upcoming SST filter implementation) if it
occurs more often than at the "natural" rate, so it should not be
returned from trivial values using trivial seeds. I modified our fork
of XXH3 to return a modest hash of the seed for the empty string.
With hash function details out-of-lines in hash.h, it makes sense to
enable XXH_INLINE_ALL, so that direct calls to XXH64/XXH32/XXH3p
are inlined. To fix array-bounds warnings on some inline calls, I
injected some casts to uintptr_t in xxhash.cc. (Issue reported to Yann.)
Revised: Reverted using XXH_INLINE_ALL for now. Some Facebook
checks are unhappy about #include on xxhash.cc file. I would
fix that by rename to xxhash_cc.h, but to best preserve history I want
to do that in a separate commit (PR) from the uintptr casts.
Also updated filter_bench for this change, improving the performance
predictability of dry run hashing and adding support for 64-bit hash
(for upcoming new SST filter implementations, minor dead code in the
tool for now).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5984
Differential Revision: D18246567
Pulled By: pdillinger
fbshipit-source-id: 6162fbf6381d63c8cc611dd7ec70e1ddc883fbb8
5 years ago
|
|
|
// TODO: consider rename to LegacyBloomHash32
|
|
|
|
inline uint32_t BloomHash(const Slice& key) {
|
|
|
|
return Hash(key.data(), key.size(), 0xbc9f1d34);
|
|
|
|
}
|
|
|
|
|
Add new persistent 64-bit hash (#5984)
Summary:
For upcoming new SST filter implementations, we will use a new
64-bit hash function (XXH3 preview, slightly modified). This change
updates hash.{h,cc} for that change, adds unit tests, and out-of-lines
the implementations to keep hash.h as clean/small as possible.
In developing the unit tests, I discovered that the XXH3 preview always
returns zero for the empty string. Zero is problematic for some
algorithms (including an upcoming SST filter implementation) if it
occurs more often than at the "natural" rate, so it should not be
returned from trivial values using trivial seeds. I modified our fork
of XXH3 to return a modest hash of the seed for the empty string.
With hash function details out-of-lines in hash.h, it makes sense to
enable XXH_INLINE_ALL, so that direct calls to XXH64/XXH32/XXH3p
are inlined. To fix array-bounds warnings on some inline calls, I
injected some casts to uintptr_t in xxhash.cc. (Issue reported to Yann.)
Revised: Reverted using XXH_INLINE_ALL for now. Some Facebook
checks are unhappy about #include on xxhash.cc file. I would
fix that by rename to xxhash_cc.h, but to best preserve history I want
to do that in a separate commit (PR) from the uintptr casts.
Also updated filter_bench for this change, improving the performance
predictability of dry run hashing and adding support for 64-bit hash
(for upcoming new SST filter implementations, minor dead code in the
tool for now).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5984
Differential Revision: D18246567
Pulled By: pdillinger
fbshipit-source-id: 6162fbf6381d63c8cc611dd7ec70e1ddc883fbb8
5 years ago
|
|
|
inline uint64_t GetSliceHash64(const Slice& key) {
|
|
|
|
return Hash64(key.data(), key.size());
|
|
|
|
}
|
|
|
|
|
|
|
|
inline uint64_t GetSliceNPHash64(const Slice& s) {
|
|
|
|
return NPHash64(s.data(), s.size());
|
|
|
|
}
|
|
|
|
|
|
|
|
inline size_t GetSliceRangedNPHash(const Slice& s, size_t range) {
|
|
|
|
return FastRange64(NPHash64(s.data(), s.size()), range);
|
|
|
|
}
|
|
|
|
|
Add new persistent 64-bit hash (#5984)
Summary:
For upcoming new SST filter implementations, we will use a new
64-bit hash function (XXH3 preview, slightly modified). This change
updates hash.{h,cc} for that change, adds unit tests, and out-of-lines
the implementations to keep hash.h as clean/small as possible.
In developing the unit tests, I discovered that the XXH3 preview always
returns zero for the empty string. Zero is problematic for some
algorithms (including an upcoming SST filter implementation) if it
occurs more often than at the "natural" rate, so it should not be
returned from trivial values using trivial seeds. I modified our fork
of XXH3 to return a modest hash of the seed for the empty string.
With hash function details out-of-lines in hash.h, it makes sense to
enable XXH_INLINE_ALL, so that direct calls to XXH64/XXH32/XXH3p
are inlined. To fix array-bounds warnings on some inline calls, I
injected some casts to uintptr_t in xxhash.cc. (Issue reported to Yann.)
Revised: Reverted using XXH_INLINE_ALL for now. Some Facebook
checks are unhappy about #include on xxhash.cc file. I would
fix that by rename to xxhash_cc.h, but to best preserve history I want
to do that in a separate commit (PR) from the uintptr casts.
Also updated filter_bench for this change, improving the performance
predictability of dry run hashing and adding support for 64-bit hash
(for upcoming new SST filter implementations, minor dead code in the
tool for now).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5984
Differential Revision: D18246567
Pulled By: pdillinger
fbshipit-source-id: 6162fbf6381d63c8cc611dd7ec70e1ddc883fbb8
5 years ago
|
|
|
// TODO: consider rename to GetSliceHash32
|
|
|
|
inline uint32_t GetSliceHash(const Slice& s) {
|
|
|
|
return Hash(s.data(), s.size(), 397);
|
|
|
|
}
|
|
|
|
|
Add new persistent 64-bit hash (#5984)
Summary:
For upcoming new SST filter implementations, we will use a new
64-bit hash function (XXH3 preview, slightly modified). This change
updates hash.{h,cc} for that change, adds unit tests, and out-of-lines
the implementations to keep hash.h as clean/small as possible.
In developing the unit tests, I discovered that the XXH3 preview always
returns zero for the empty string. Zero is problematic for some
algorithms (including an upcoming SST filter implementation) if it
occurs more often than at the "natural" rate, so it should not be
returned from trivial values using trivial seeds. I modified our fork
of XXH3 to return a modest hash of the seed for the empty string.
With hash function details out-of-lines in hash.h, it makes sense to
enable XXH_INLINE_ALL, so that direct calls to XXH64/XXH32/XXH3p
are inlined. To fix array-bounds warnings on some inline calls, I
injected some casts to uintptr_t in xxhash.cc. (Issue reported to Yann.)
Revised: Reverted using XXH_INLINE_ALL for now. Some Facebook
checks are unhappy about #include on xxhash.cc file. I would
fix that by rename to xxhash_cc.h, but to best preserve history I want
to do that in a separate commit (PR) from the uintptr casts.
Also updated filter_bench for this change, improving the performance
predictability of dry run hashing and adding support for 64-bit hash
(for upcoming new SST filter implementations, minor dead code in the
tool for now).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5984
Differential Revision: D18246567
Pulled By: pdillinger
fbshipit-source-id: 6162fbf6381d63c8cc611dd7ec70e1ddc883fbb8
5 years ago
|
|
|
// Useful for splitting up a 64-bit hash
|
|
|
|
inline uint32_t Upper32of64(uint64_t v) {
|
|
|
|
return static_cast<uint32_t>(v >> 32);
|
|
|
|
}
|
|
|
|
inline uint32_t Lower32of64(uint64_t v) { return static_cast<uint32_t>(v); }
|
|
|
|
|
|
|
|
// std::hash compatible interface.
|
Add new persistent 64-bit hash (#5984)
Summary:
For upcoming new SST filter implementations, we will use a new
64-bit hash function (XXH3 preview, slightly modified). This change
updates hash.{h,cc} for that change, adds unit tests, and out-of-lines
the implementations to keep hash.h as clean/small as possible.
In developing the unit tests, I discovered that the XXH3 preview always
returns zero for the empty string. Zero is problematic for some
algorithms (including an upcoming SST filter implementation) if it
occurs more often than at the "natural" rate, so it should not be
returned from trivial values using trivial seeds. I modified our fork
of XXH3 to return a modest hash of the seed for the empty string.
With hash function details out-of-lines in hash.h, it makes sense to
enable XXH_INLINE_ALL, so that direct calls to XXH64/XXH32/XXH3p
are inlined. To fix array-bounds warnings on some inline calls, I
injected some casts to uintptr_t in xxhash.cc. (Issue reported to Yann.)
Revised: Reverted using XXH_INLINE_ALL for now. Some Facebook
checks are unhappy about #include on xxhash.cc file. I would
fix that by rename to xxhash_cc.h, but to best preserve history I want
to do that in a separate commit (PR) from the uintptr casts.
Also updated filter_bench for this change, improving the performance
predictability of dry run hashing and adding support for 64-bit hash
(for upcoming new SST filter implementations, minor dead code in the
tool for now).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5984
Differential Revision: D18246567
Pulled By: pdillinger
fbshipit-source-id: 6162fbf6381d63c8cc611dd7ec70e1ddc883fbb8
5 years ago
|
|
|
// TODO: consider rename to SliceHasher32
|
|
|
|
struct SliceHasher {
|
|
|
|
uint32_t operator()(const Slice& s) const { return GetSliceHash(s); }
|
|
|
|
};
|
|
|
|
|
|
|
|
} // namespace ROCKSDB_NAMESPACE
|