From a0c63083d36b55d58afe635c44e5820324af05b4 Mon Sep 17 00:00:00 2001 From: Andrew Kryczka Date: Tue, 19 Jul 2022 21:39:34 -0700 Subject: [PATCH] Fix explanation of XOR usage in KV checksum blog post (#10392) Summary: Thanks pdillinger for reminding us that we are protected from swapping corruptions due to independent seeds (and for suggesting that approach in the first place). Pull Request resolved: https://github.com/facebook/rocksdb/pull/10392 Reviewed By: cbi42 Differential Revision: D37981819 Pulled By: ajkr fbshipit-source-id: 3ed32982ae1dbc88eb92569010f9f2e8d190c962 --- docs/_posts/2022-07-18-per-key-value-checksum.markdown | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/_posts/2022-07-18-per-key-value-checksum.markdown b/docs/_posts/2022-07-18-per-key-value-checksum.markdown index deaaa2f56..6b9ad801c 100644 --- a/docs/_posts/2022-07-18-per-key-value-checksum.markdown +++ b/docs/_posts/2022-07-18-per-key-value-checksum.markdown @@ -37,7 +37,9 @@ Key-value pairs have multiple representations in RocksDB: in [WriteBatch](https: Besides user key and value, RocksDB includes internal metadata in the per key-value checksum calculation. Depending on the representation, internal metadata consists of some combination of sequence number, operation type, and column family ID. Note that since timestamp (when enabled) is part of the user key it is protected as well. -The protection info consists of the XOR’d result of the xxh3 hash for all the protected components. Using XOR introduces a risk that swapping corruptions (e.g., key becomes the value and the value becomes the key) are undetectable. However, we think this is a reasonable tradeoff for the advantage it provides: we can efficiently transform protection info for different representations. +The protection info consists of the XOR’d result of the xxh3 hash for all the protected components. This allows us to efficiently transform protection info for different representations. See below for an example converting WriteBatch protection info to memtable protection info. + +A risk of using XOR is the possibility of swapping corruptions (e.g., key becomes the value and the value becomes the key). To mitigate this risk, we use an independent seed for hashing each type of component. The following two figures illustrate how protection info in WriteBatch and memtable are calculated from a key-value’s components.