Prefetch cache lines for filter lookup (#4068)

Summary:
Since the filter data is unaligned, even though we ensure all probes are within a span of `cache_line_size` bytes, those bytes can span two cache lines. In that case I doubt hardware prefetching does a great job considering we don't necessarily access those two cache lines in order. This guess seems correct since adding explicit prefetch instructions reduced filter lookup overhead by 19.4%.
Closes https://github.com/facebook/rocksdb/pull/4068

Differential Revision: D8674189

Pulled By: ajkr

fbshipit-source-id: 747427d9a17900151c17820488e3f7efe06b1871
main
Andrew Kryczka 6 years ago committed by Facebook Github Bot
parent 52d4c9b7f6
commit 25403c2265
  1. 2
      util/bloom.cc

@ -228,6 +228,8 @@ bool FullFilterBitsReader::HashMayMatch(const uint32_t& hash,
uint32_t h = hash;
const uint32_t delta = (h >> 17) | (h << 15); // Rotate right 17 bits
uint32_t b = (h % num_lines) * (cache_line_size * 8);
PREFETCH(&data[b / 8], 0 /* rw */, 1 /* locality */);
PREFETCH(&data[b / 8 + cache_line_size - 1], 0 /* rw */, 1 /* locality */);
for (uint32_t i = 0; i < num_probes; ++i) {
// Since CACHE_LINE_SIZE is defined as 2^n, this line will be optimized

Loading…
Cancel
Save