Summary:
Adds support for prefetching data in Ribbon queries,
which especially optimizes batched Ribbon queries for MultiGet
(~222ns/key to ~97ns/key) but also single key queries on cold memory
(~333ns to ~226ns) because many queries span more than one cache line.
This required some refactoring of the query algorithm, and there
does not appear to be a noticeable regression in "hot memory" query
times (perhaps from 48ns to 50ns).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7889
Test Plan:
existing unit tests, plus performance validation with
filter_bench:
Each data point is the best of two runs. I saturated the machine
CPUs with other filter_bench runs in the background.
Before:
$ ./filter_bench -impl=3 -m_keys_total_max=200 -average_keys_per_filter=100000 -m_queries=50
WARNING: Assertions are enabled; benchmarks unnecessarily slow
Building...
Build avg ns/key: 125.86
Number of filters: 1993
Total size (MB): 168.166
Reported total allocated memory (MB): 183.211
Reported internal fragmentation: 8.94626%
Bits/key stored: 7.05341
Prelim FP rate %: 0.951827
----------------------------
Mixed inside/outside queries...
Single filter net ns/op: 48.0111
Batched, prepared net ns/op: 222.384
Batched, unprepared net ns/op: 343.908
Skewed 50% in 1% net ns/op: 252.916
Skewed 80% in 20% net ns/op: 320.579
Random filter net ns/op: 332.957
After:
$ ./filter_bench -impl=3 -m_keys_total_max=200 -average_keys_per_filter=100000 -m_queries=50
WARNING: Assertions are enabled; benchmarks unnecessarily slow
Building...
Build avg ns/key: 128.117
Number of filters: 1993
Total size (MB): 168.166
Reported total allocated memory (MB): 183.211
Reported internal fragmentation: 8.94626%
Bits/key stored: 7.05341
Prelim FP rate %: 0.951827
----------------------------
Mixed inside/outside queries...
Single filter net ns/op: 49.8812
Batched, prepared net ns/op: 97.1514
Batched, unprepared net ns/op: 222.025
Skewed 50% in 1% net ns/op: 197.48
Skewed 80% in 20% net ns/op: 212.457
Random filter net ns/op: 226.464
Bloom comparison, for reference:
$ ./filter_bench -impl=2 -m_keys_total_max=200 -average_keys_per_filter=100000 -m_queries=50
WARNING: Assertions are enabled; benchmarks unnecessarily slow
Building...
Build avg ns/key: 35.3042
Number of filters: 1993
Total size (MB): 238.488
Reported total allocated memory (MB): 262.875
Reported internal fragmentation: 10.2255%
Bits/key stored: 10.0029
Prelim FP rate %: 0.965327
----------------------------
Mixed inside/outside queries...
Single filter net ns/op: 9.09931
Batched, prepared net ns/op: 34.21
Batched, unprepared net ns/op: 88.8564
Skewed 50% in 1% net ns/op: 139.75
Skewed 80% in 20% net ns/op: 181.264
Random filter net ns/op: 173.88
Reviewed By: jay-zhuang
Differential Revision: D26378710
Pulled By: pdillinger
fbshipit-source-id: 058428967c55ed763698284cd3b4bbe3351b6e69