Prefetch cache lines for filter lookup (#4068)

Summary: Since the filter data is unaligned, even though we ensure all probes are within a span of `cache_line_size` bytes, those bytes can span two cache lines. In that case I doubt hardware prefetching does a great job considering we don't necessarily access those two cache lines in order. This guess seems correct since adding explicit prefetch instructions reduced filter lookup overhead by 19.4%. Closes https://github.com/facebook/rocksdb/pull/4068 Differential Revision: D8674189 Pulled By: ajkr fbshipit-source-id: 747427d9a17900151c17820488e3f7efe06b1871
7 years ago · 25403c2265
parent 52d4c9b7f6
commit 25403c2265
1 changed files with 2 additions and 0 deletions
--- a/util/bloom.cc
+++ b/util/bloom.cc
@ -228,6 +228,8 @@ bool FullFilterBitsReader::HashMayMatch(const uint32_t& hash,
  uint32_t h = hash;
  const uint32_t delta = (h >> 17) | (h << 15);  // Rotate right 17 bits
  uint32_t b = (h % num_lines) * (cache_line_size * 8);
+  PREFETCH(&data[b / 8], 0 /* rw */, 1 /* locality */);
+  PREFETCH(&data[b / 8 + cache_line_size - 1], 0 /* rw */, 1 /* locality */);

  for (uint32_t i = 0; i < num_probes; ++i) {
    // Since CACHE_LINE_SIZE is defined as 2^n, this line will be optimized