You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
rocksdb/cache/compressed_secondary_cache.cc

247 lines
9.0 KiB

Add a secondary cache implementation based on LRUCache 1 (#9518) Summary: **Summary:** RocksDB uses a block cache to reduce IO and make queries more efficient. The block cache is based on the LRU algorithm (LRUCache) and keeps objects containing uncompressed data, such as Block, ParsedFullFilterBlock etc. It allows the user to configure a second level cache (rocksdb::SecondaryCache) to extend the primary block cache by holding items evicted from it. Some of the major RocksDB users, like MyRocks, use direct IO and would like to use a primary block cache for uncompressed data and a secondary cache for compressed data. The latter allows us to mitigate the loss of the Linux page cache due to direct IO. This PR includes a concrete implementation of rocksdb::SecondaryCache that integrates with compression libraries such as LZ4 and implements an LRU cache to hold compressed blocks. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9518 Test Plan: In this PR, the lru_secondary_cache_test.cc includes the following tests: 1. The unit tests for the secondary cache with either compression or no compression, such as basic tests, fails tests. 2. The integration tests with both primary cache and this secondary cache . **Follow Up:** 1. Statistics (e.g. compression ratio) will be added in another PR. 2. Once this implementation is ready, I will do some shadow testing and benchmarking with UDB to measure the impact. Reviewed By: anand1976 Differential Revision: D34430930 Pulled By: gitbw95 fbshipit-source-id: 218d78b672a2f914856d8a90ff32f2f5b5043ded
3 years ago
// Copyright (c) 2011-present, Facebook, Inc. All rights reserved.
// This source code is licensed under both the GPLv2 (found in the
// COPYING file in the root directory) and Apache 2.0 License
// (found in the LICENSE.Apache file in the root directory).
Prevent double caching in the compressed secondary cache (#9747) Summary: ### **Summary:** When both LRU Cache and CompressedSecondaryCache are configured together, there possibly are some data blocks double cached. **Changes include:** 1. Update IS_PROMOTED to IS_IN_SECONDARY_CACHE to prevent confusions. 2. This PR updates SecondaryCacheResultHandle and use IsErasedFromSecondaryCache to determine whether the handle is erased in the secondary cache. Then, the caller can determine whether to SetIsInSecondaryCache(). 3. Rename LRUSecondaryCache to CompressedSecondaryCache. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9747 Test Plan: **Test Scripts:** 1. Populate a DB. The on disk footprint is 482 MB. The data is set to be 50% compressible, so the total decompressed size is expected to be 964 MB. ./db_bench --benchmarks=fillrandom --num=10000000 -db=/db_bench_1 2. overwrite it to a stable state: ./db_bench --benchmarks=overwrite,stats --num=10000000 -use_existing_db -duration=10 --benchmark_write_rate_limit=2000000 -db=/db_bench_1 4. Run read tests with diffeernt cache setting: T1: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=520000000 --statistics -db=/db_bench_1 T2: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=320000000 -compressed_secondary_cache_size=400000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1 T3: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=520000000 -compressed_secondary_cache_size=400000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1 T4: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=20000000 -compressed_secondary_cache_size=500000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1 **Before this PR** | Cache Size | Compressed Secondary Cache Size | Cache Hit Rate | |------------|-------------------------------------|----------------| |520 MB | 0 MB | 85.5% | |320 MB | 400 MB | 96.2% | |520 MB | 400 MB | 98.3% | |20 MB | 500 MB | 98.8% | **Before this PR** | Cache Size | Compressed Secondary Cache Size | Cache Hit Rate | |------------|-------------------------------------|----------------| |520 MB | 0 MB | 85.5% | |320 MB | 400 MB | 99.9% | |520 MB | 400 MB | 99.9% | |20 MB | 500 MB | 99.2% | Reviewed By: anand1976 Differential Revision: D35117499 Pulled By: gitbw95 fbshipit-source-id: ea2657749fc13efebe91a8a1b56bc61d6a224a12
2 years ago
#include "cache/compressed_secondary_cache.h"
Add a secondary cache implementation based on LRUCache 1 (#9518) Summary: **Summary:** RocksDB uses a block cache to reduce IO and make queries more efficient. The block cache is based on the LRU algorithm (LRUCache) and keeps objects containing uncompressed data, such as Block, ParsedFullFilterBlock etc. It allows the user to configure a second level cache (rocksdb::SecondaryCache) to extend the primary block cache by holding items evicted from it. Some of the major RocksDB users, like MyRocks, use direct IO and would like to use a primary block cache for uncompressed data and a secondary cache for compressed data. The latter allows us to mitigate the loss of the Linux page cache due to direct IO. This PR includes a concrete implementation of rocksdb::SecondaryCache that integrates with compression libraries such as LZ4 and implements an LRU cache to hold compressed blocks. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9518 Test Plan: In this PR, the lru_secondary_cache_test.cc includes the following tests: 1. The unit tests for the secondary cache with either compression or no compression, such as basic tests, fails tests. 2. The integration tests with both primary cache and this secondary cache . **Follow Up:** 1. Statistics (e.g. compression ratio) will be added in another PR. 2. Once this implementation is ready, I will do some shadow testing and benchmarking with UDB to measure the impact. Reviewed By: anand1976 Differential Revision: D34430930 Pulled By: gitbw95 fbshipit-source-id: 218d78b672a2f914856d8a90ff32f2f5b5043ded
3 years ago
Split cache to minimize internal fragmentation (#10287) Summary: ### **Summary:** To minimize the internal fragmentation caused by the variable size of the compressed blocks, the original block is split according to the jemalloc bin size in `Insert()` and then merged back in `Lookup()`. Based on the analysis of the results of the following tests, from the overall internal fragmentation perspective, this PR does mitigate the internal fragmentation issue. _Do more myshadow tests with the latest commit. I finished several myshadow AB Testing and the results are promising. For the config of 4GB primary cache and 3GB secondary cache, Jemalloc resident stats shows consistently ~0.15GB memory saving; the allocated and active stats show similar memory savings. The CPU usage is almost the same before and after this PR._ To evaluate the issue of memory fragmentations and the benefits of this PR, I conducted two sets of local tests as follows. **T1** Keys: 16 bytes each (+ 0 bytes user-defined timestamp) Values: 100 bytes each (50 bytes after compression) Entries: 90000000 RawSize: 9956.4 MB (estimated) FileSize: 5664.8 MB (estimated) | Test Name | Primary Cache Size (MB) | Compressed Secondary Cache Size (MB) | | - | - | - | | T1_3 | 4000 | 4000 | | T1_4 | 2000 | 3000 | Populate the DB: ./db_bench --benchmarks=fillrandom --num=90000000 -db=/mem_fragmentation/db_bench_1 Overwrite it to a stable state: ./db_bench --benchmarks=overwrite --num=90000000 -use_existing_db -db=/mem_fragmentation/db_bench_1 Run read tests with differnt cache setting: T1_3: MALLOC_CONF="prof:true,prof_stats:true" ../rocksdb/db_bench --benchmarks=seekrandom --threads=16 --num=90000000 -use_existing_db --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=4000000000 -compressed_secondary_cache_size=4000000000 -use_compressed_secondary_cache -db=/mem_fragmentation/db_bench_1 --print_malloc_stats=true > ~/temp/mem_frag/20220710/jemalloc_stats_json_T1_3_20220710 -duration=1800 & T1_4: MALLOC_CONF="prof:true,prof_stats:true" ../rocksdb/db_bench --benchmarks=seekrandom --threads=16 --num=90000000 -use_existing_db --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=2000000000 -compressed_secondary_cache_size=3000000000 -use_compressed_secondary_cache -db=/mem_fragmentation/db_bench_1 --print_malloc_stats=true > ~/temp/mem_frag/20220710/jemalloc_stats_json_T1_4_20220710 -duration=1800 & For T1_3 and T1_4, I also conducted the tests before and after this PR. The following table show the important jemalloc stats. | Test Name | T1_3 | T1_3 after mem defrag | T1_4 | T1_4 after mem defrag | | - | - | - | - | - | | allocated (MB) | 8728 | 8076 | 5518 | 5043 | | available (MB) | 8753 | 8092 | 5536 | 5051 | | external fragmentation rate | 0.003 | 0.002 | 0.003 | 0.0016 | | resident (MB) | 8956 | 8365 | 5655 | 5235 | **T2** Keys: 32 bytes each (+ 0 bytes user-defined timestamp) Values: 256 bytes each (128 bytes after compression) Entries: 40000000 RawSize: 10986.3 MB (estimated) FileSize: 6103.5 MB (estimated) | Test Name | Primary Cache Size (MB) | Compressed Secondary Cache Size (MB) | | - | - | - | | T2_3 | 4000 | 4000 | | T2_4 | 2000 | 3000 | Create DB (10GB): ./db_bench -benchmarks=fillrandom -use_direct_reads=true -num=40000000 -key_size=32 -value_size=256 -db=/mem_fragmentation/db_bench_2 Overwrite it to a stable state: ./db_bench --benchmarks=overwrite --num=40000000 -use_existing_db -key_size=32 -value_size=256 -db=/mem_fragmentation/db_bench_2 Run read tests with differnt cache setting: T2_3: MALLOC_CONF="prof:true,prof_stats:true" ./db_bench --benchmarks="mixgraph" -use_direct_io_for_flush_and_compaction=true -use_direct_reads=true -cache_size=4000000000 -compressed_secondary_cache_size=4000000000 -use_compressed_secondary_cache -keyrange_dist_a=14.18 -keyrange_dist_b=-2.917 -keyrange_dist_c=0.0164 -keyrange_dist_d=-0.08082 -keyrange_num=30 -value_k=0.2615 -value_sigma=25.45 -iter_k=2.517 -iter_sigma=14.236 -mix_get_ratio=0.85 -mix_put_ratio=0.14 -mix_seek_ratio=0.01 -sine_mix_rate_interval_milliseconds=5000 -sine_a=1000 -sine_b=0.000073 -sine_d=400000 -reads=80000000 -num=40000000 -key_size=32 -value_size=256 -use_existing_db=true -db=/mem_fragmentation/db_bench_2 --print_malloc_stats=true > ~/temp/mem_frag/jemalloc_stats_T2_3 -duration=1800 & T2_4: MALLOC_CONF="prof:true,prof_stats:true" ./db_bench --benchmarks="mixgraph" -use_direct_io_for_flush_and_compaction=true -use_direct_reads=true -cache_size=2000000000 -compressed_secondary_cache_size=3000000000 -use_compressed_secondary_cache -keyrange_dist_a=14.18 -keyrange_dist_b=-2.917 -keyrange_dist_c=0.0164 -keyrange_dist_d=-0.08082 -keyrange_num=30 -value_k=0.2615 -value_sigma=25.45 -iter_k=2.517 -iter_sigma=14.236 -mix_get_ratio=0.85 -mix_put_ratio=0.14 -mix_seek_ratio=0.01 -sine_mix_rate_interval_milliseconds=5000 -sine_a=1000 -sine_b=0.000073 -sine_d=400000 -reads=80000000 -num=40000000 -key_size=32 -value_size=256 -use_existing_db=true -db=/mem_fragmentation/db_bench_2 --print_malloc_stats=true > ~/temp/mem_frag/jemalloc_stats_T2_4 -duration=1800 & For T2_3 and T2_4, I also conducted the tests before and after this PR. The following table show the important jemalloc stats. | Test Name | T2_3 | T2_3 after mem defrag | T2_4 | T2_4 after mem defrag | | - | - | - | - | - | | allocated (MB) | 8425 | 8093 | 5426 | 5149 | | available (MB) | 8489 | 8138 | 5435 | 5158 | | external fragmentation rate | 0.008 | 0.0055 | 0.0017 | 0.0017 | | resident (MB) | 8676 | 8392 | 5541 | 5321 | Pull Request resolved: https://github.com/facebook/rocksdb/pull/10287 Test Plan: Unit tests. Reviewed By: anand1976 Differential Revision: D37743362 Pulled By: gitbw95 fbshipit-source-id: 0010c5af08addeacc5ebbc4ffe5be882fb1d38ad
2 years ago
#include <algorithm>
#include <cstdint>
Add a secondary cache implementation based on LRUCache 1 (#9518) Summary: **Summary:** RocksDB uses a block cache to reduce IO and make queries more efficient. The block cache is based on the LRU algorithm (LRUCache) and keeps objects containing uncompressed data, such as Block, ParsedFullFilterBlock etc. It allows the user to configure a second level cache (rocksdb::SecondaryCache) to extend the primary block cache by holding items evicted from it. Some of the major RocksDB users, like MyRocks, use direct IO and would like to use a primary block cache for uncompressed data and a secondary cache for compressed data. The latter allows us to mitigate the loss of the Linux page cache due to direct IO. This PR includes a concrete implementation of rocksdb::SecondaryCache that integrates with compression libraries such as LZ4 and implements an LRU cache to hold compressed blocks. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9518 Test Plan: In this PR, the lru_secondary_cache_test.cc includes the following tests: 1. The unit tests for the secondary cache with either compression or no compression, such as basic tests, fails tests. 2. The integration tests with both primary cache and this secondary cache . **Follow Up:** 1. Statistics (e.g. compression ratio) will be added in another PR. 2. Once this implementation is ready, I will do some shadow testing and benchmarking with UDB to measure the impact. Reviewed By: anand1976 Differential Revision: D34430930 Pulled By: gitbw95 fbshipit-source-id: 218d78b672a2f914856d8a90ff32f2f5b5043ded
3 years ago
#include <memory>
#include "memory/memory_allocator.h"
#include "util/compression.h"
#include "util/string_util.h"
namespace ROCKSDB_NAMESPACE {
Prevent double caching in the compressed secondary cache (#9747) Summary: ### **Summary:** When both LRU Cache and CompressedSecondaryCache are configured together, there possibly are some data blocks double cached. **Changes include:** 1. Update IS_PROMOTED to IS_IN_SECONDARY_CACHE to prevent confusions. 2. This PR updates SecondaryCacheResultHandle and use IsErasedFromSecondaryCache to determine whether the handle is erased in the secondary cache. Then, the caller can determine whether to SetIsInSecondaryCache(). 3. Rename LRUSecondaryCache to CompressedSecondaryCache. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9747 Test Plan: **Test Scripts:** 1. Populate a DB. The on disk footprint is 482 MB. The data is set to be 50% compressible, so the total decompressed size is expected to be 964 MB. ./db_bench --benchmarks=fillrandom --num=10000000 -db=/db_bench_1 2. overwrite it to a stable state: ./db_bench --benchmarks=overwrite,stats --num=10000000 -use_existing_db -duration=10 --benchmark_write_rate_limit=2000000 -db=/db_bench_1 4. Run read tests with diffeernt cache setting: T1: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=520000000 --statistics -db=/db_bench_1 T2: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=320000000 -compressed_secondary_cache_size=400000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1 T3: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=520000000 -compressed_secondary_cache_size=400000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1 T4: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=20000000 -compressed_secondary_cache_size=500000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1 **Before this PR** | Cache Size | Compressed Secondary Cache Size | Cache Hit Rate | |------------|-------------------------------------|----------------| |520 MB | 0 MB | 85.5% | |320 MB | 400 MB | 96.2% | |520 MB | 400 MB | 98.3% | |20 MB | 500 MB | 98.8% | **Before this PR** | Cache Size | Compressed Secondary Cache Size | Cache Hit Rate | |------------|-------------------------------------|----------------| |520 MB | 0 MB | 85.5% | |320 MB | 400 MB | 99.9% | |520 MB | 400 MB | 99.9% | |20 MB | 500 MB | 99.2% | Reviewed By: anand1976 Differential Revision: D35117499 Pulled By: gitbw95 fbshipit-source-id: ea2657749fc13efebe91a8a1b56bc61d6a224a12
2 years ago
CompressedSecondaryCache::CompressedSecondaryCache(
Add a secondary cache implementation based on LRUCache 1 (#9518) Summary: **Summary:** RocksDB uses a block cache to reduce IO and make queries more efficient. The block cache is based on the LRU algorithm (LRUCache) and keeps objects containing uncompressed data, such as Block, ParsedFullFilterBlock etc. It allows the user to configure a second level cache (rocksdb::SecondaryCache) to extend the primary block cache by holding items evicted from it. Some of the major RocksDB users, like MyRocks, use direct IO and would like to use a primary block cache for uncompressed data and a secondary cache for compressed data. The latter allows us to mitigate the loss of the Linux page cache due to direct IO. This PR includes a concrete implementation of rocksdb::SecondaryCache that integrates with compression libraries such as LZ4 and implements an LRU cache to hold compressed blocks. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9518 Test Plan: In this PR, the lru_secondary_cache_test.cc includes the following tests: 1. The unit tests for the secondary cache with either compression or no compression, such as basic tests, fails tests. 2. The integration tests with both primary cache and this secondary cache . **Follow Up:** 1. Statistics (e.g. compression ratio) will be added in another PR. 2. Once this implementation is ready, I will do some shadow testing and benchmarking with UDB to measure the impact. Reviewed By: anand1976 Differential Revision: D34430930 Pulled By: gitbw95 fbshipit-source-id: 218d78b672a2f914856d8a90ff32f2f5b5043ded
3 years ago
size_t capacity, int num_shard_bits, bool strict_capacity_limit,
double high_pri_pool_ratio,
Add a secondary cache implementation based on LRUCache 1 (#9518) Summary: **Summary:** RocksDB uses a block cache to reduce IO and make queries more efficient. The block cache is based on the LRU algorithm (LRUCache) and keeps objects containing uncompressed data, such as Block, ParsedFullFilterBlock etc. It allows the user to configure a second level cache (rocksdb::SecondaryCache) to extend the primary block cache by holding items evicted from it. Some of the major RocksDB users, like MyRocks, use direct IO and would like to use a primary block cache for uncompressed data and a secondary cache for compressed data. The latter allows us to mitigate the loss of the Linux page cache due to direct IO. This PR includes a concrete implementation of rocksdb::SecondaryCache that integrates with compression libraries such as LZ4 and implements an LRU cache to hold compressed blocks. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9518 Test Plan: In this PR, the lru_secondary_cache_test.cc includes the following tests: 1. The unit tests for the secondary cache with either compression or no compression, such as basic tests, fails tests. 2. The integration tests with both primary cache and this secondary cache . **Follow Up:** 1. Statistics (e.g. compression ratio) will be added in another PR. 2. Once this implementation is ready, I will do some shadow testing and benchmarking with UDB to measure the impact. Reviewed By: anand1976 Differential Revision: D34430930 Pulled By: gitbw95 fbshipit-source-id: 218d78b672a2f914856d8a90ff32f2f5b5043ded
3 years ago
std::shared_ptr<MemoryAllocator> memory_allocator, bool use_adaptive_mutex,
CacheMetadataChargePolicy metadata_charge_policy,
CompressionType compression_type, uint32_t compress_format_version)
: cache_options_(capacity, num_shard_bits, strict_capacity_limit,
high_pri_pool_ratio, memory_allocator, use_adaptive_mutex,
metadata_charge_policy, compression_type,
compress_format_version) {
cache_ = NewLRUCache(capacity, num_shard_bits, strict_capacity_limit,
high_pri_pool_ratio, memory_allocator,
use_adaptive_mutex, metadata_charge_policy);
Add a secondary cache implementation based on LRUCache 1 (#9518) Summary: **Summary:** RocksDB uses a block cache to reduce IO and make queries more efficient. The block cache is based on the LRU algorithm (LRUCache) and keeps objects containing uncompressed data, such as Block, ParsedFullFilterBlock etc. It allows the user to configure a second level cache (rocksdb::SecondaryCache) to extend the primary block cache by holding items evicted from it. Some of the major RocksDB users, like MyRocks, use direct IO and would like to use a primary block cache for uncompressed data and a secondary cache for compressed data. The latter allows us to mitigate the loss of the Linux page cache due to direct IO. This PR includes a concrete implementation of rocksdb::SecondaryCache that integrates with compression libraries such as LZ4 and implements an LRU cache to hold compressed blocks. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9518 Test Plan: In this PR, the lru_secondary_cache_test.cc includes the following tests: 1. The unit tests for the secondary cache with either compression or no compression, such as basic tests, fails tests. 2. The integration tests with both primary cache and this secondary cache . **Follow Up:** 1. Statistics (e.g. compression ratio) will be added in another PR. 2. Once this implementation is ready, I will do some shadow testing and benchmarking with UDB to measure the impact. Reviewed By: anand1976 Differential Revision: D34430930 Pulled By: gitbw95 fbshipit-source-id: 218d78b672a2f914856d8a90ff32f2f5b5043ded
3 years ago
}
Prevent double caching in the compressed secondary cache (#9747) Summary: ### **Summary:** When both LRU Cache and CompressedSecondaryCache are configured together, there possibly are some data blocks double cached. **Changes include:** 1. Update IS_PROMOTED to IS_IN_SECONDARY_CACHE to prevent confusions. 2. This PR updates SecondaryCacheResultHandle and use IsErasedFromSecondaryCache to determine whether the handle is erased in the secondary cache. Then, the caller can determine whether to SetIsInSecondaryCache(). 3. Rename LRUSecondaryCache to CompressedSecondaryCache. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9747 Test Plan: **Test Scripts:** 1. Populate a DB. The on disk footprint is 482 MB. The data is set to be 50% compressible, so the total decompressed size is expected to be 964 MB. ./db_bench --benchmarks=fillrandom --num=10000000 -db=/db_bench_1 2. overwrite it to a stable state: ./db_bench --benchmarks=overwrite,stats --num=10000000 -use_existing_db -duration=10 --benchmark_write_rate_limit=2000000 -db=/db_bench_1 4. Run read tests with diffeernt cache setting: T1: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=520000000 --statistics -db=/db_bench_1 T2: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=320000000 -compressed_secondary_cache_size=400000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1 T3: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=520000000 -compressed_secondary_cache_size=400000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1 T4: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=20000000 -compressed_secondary_cache_size=500000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1 **Before this PR** | Cache Size | Compressed Secondary Cache Size | Cache Hit Rate | |------------|-------------------------------------|----------------| |520 MB | 0 MB | 85.5% | |320 MB | 400 MB | 96.2% | |520 MB | 400 MB | 98.3% | |20 MB | 500 MB | 98.8% | **Before this PR** | Cache Size | Compressed Secondary Cache Size | Cache Hit Rate | |------------|-------------------------------------|----------------| |520 MB | 0 MB | 85.5% | |320 MB | 400 MB | 99.9% | |520 MB | 400 MB | 99.9% | |20 MB | 500 MB | 99.2% | Reviewed By: anand1976 Differential Revision: D35117499 Pulled By: gitbw95 fbshipit-source-id: ea2657749fc13efebe91a8a1b56bc61d6a224a12
2 years ago
CompressedSecondaryCache::~CompressedSecondaryCache() { cache_.reset(); }
Add a secondary cache implementation based on LRUCache 1 (#9518) Summary: **Summary:** RocksDB uses a block cache to reduce IO and make queries more efficient. The block cache is based on the LRU algorithm (LRUCache) and keeps objects containing uncompressed data, such as Block, ParsedFullFilterBlock etc. It allows the user to configure a second level cache (rocksdb::SecondaryCache) to extend the primary block cache by holding items evicted from it. Some of the major RocksDB users, like MyRocks, use direct IO and would like to use a primary block cache for uncompressed data and a secondary cache for compressed data. The latter allows us to mitigate the loss of the Linux page cache due to direct IO. This PR includes a concrete implementation of rocksdb::SecondaryCache that integrates with compression libraries such as LZ4 and implements an LRU cache to hold compressed blocks. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9518 Test Plan: In this PR, the lru_secondary_cache_test.cc includes the following tests: 1. The unit tests for the secondary cache with either compression or no compression, such as basic tests, fails tests. 2. The integration tests with both primary cache and this secondary cache . **Follow Up:** 1. Statistics (e.g. compression ratio) will be added in another PR. 2. Once this implementation is ready, I will do some shadow testing and benchmarking with UDB to measure the impact. Reviewed By: anand1976 Differential Revision: D34430930 Pulled By: gitbw95 fbshipit-source-id: 218d78b672a2f914856d8a90ff32f2f5b5043ded
3 years ago
Prevent double caching in the compressed secondary cache (#9747) Summary: ### **Summary:** When both LRU Cache and CompressedSecondaryCache are configured together, there possibly are some data blocks double cached. **Changes include:** 1. Update IS_PROMOTED to IS_IN_SECONDARY_CACHE to prevent confusions. 2. This PR updates SecondaryCacheResultHandle and use IsErasedFromSecondaryCache to determine whether the handle is erased in the secondary cache. Then, the caller can determine whether to SetIsInSecondaryCache(). 3. Rename LRUSecondaryCache to CompressedSecondaryCache. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9747 Test Plan: **Test Scripts:** 1. Populate a DB. The on disk footprint is 482 MB. The data is set to be 50% compressible, so the total decompressed size is expected to be 964 MB. ./db_bench --benchmarks=fillrandom --num=10000000 -db=/db_bench_1 2. overwrite it to a stable state: ./db_bench --benchmarks=overwrite,stats --num=10000000 -use_existing_db -duration=10 --benchmark_write_rate_limit=2000000 -db=/db_bench_1 4. Run read tests with diffeernt cache setting: T1: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=520000000 --statistics -db=/db_bench_1 T2: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=320000000 -compressed_secondary_cache_size=400000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1 T3: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=520000000 -compressed_secondary_cache_size=400000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1 T4: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=20000000 -compressed_secondary_cache_size=500000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1 **Before this PR** | Cache Size | Compressed Secondary Cache Size | Cache Hit Rate | |------------|-------------------------------------|----------------| |520 MB | 0 MB | 85.5% | |320 MB | 400 MB | 96.2% | |520 MB | 400 MB | 98.3% | |20 MB | 500 MB | 98.8% | **Before this PR** | Cache Size | Compressed Secondary Cache Size | Cache Hit Rate | |------------|-------------------------------------|----------------| |520 MB | 0 MB | 85.5% | |320 MB | 400 MB | 99.9% | |520 MB | 400 MB | 99.9% | |20 MB | 500 MB | 99.2% | Reviewed By: anand1976 Differential Revision: D35117499 Pulled By: gitbw95 fbshipit-source-id: ea2657749fc13efebe91a8a1b56bc61d6a224a12
2 years ago
std::unique_ptr<SecondaryCacheResultHandle> CompressedSecondaryCache::Lookup(
const Slice& key, const Cache::CreateCallback& create_cb, bool /*wait*/,
bool& is_in_sec_cache) {
Add a secondary cache implementation based on LRUCache 1 (#9518) Summary: **Summary:** RocksDB uses a block cache to reduce IO and make queries more efficient. The block cache is based on the LRU algorithm (LRUCache) and keeps objects containing uncompressed data, such as Block, ParsedFullFilterBlock etc. It allows the user to configure a second level cache (rocksdb::SecondaryCache) to extend the primary block cache by holding items evicted from it. Some of the major RocksDB users, like MyRocks, use direct IO and would like to use a primary block cache for uncompressed data and a secondary cache for compressed data. The latter allows us to mitigate the loss of the Linux page cache due to direct IO. This PR includes a concrete implementation of rocksdb::SecondaryCache that integrates with compression libraries such as LZ4 and implements an LRU cache to hold compressed blocks. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9518 Test Plan: In this PR, the lru_secondary_cache_test.cc includes the following tests: 1. The unit tests for the secondary cache with either compression or no compression, such as basic tests, fails tests. 2. The integration tests with both primary cache and this secondary cache . **Follow Up:** 1. Statistics (e.g. compression ratio) will be added in another PR. 2. Once this implementation is ready, I will do some shadow testing and benchmarking with UDB to measure the impact. Reviewed By: anand1976 Differential Revision: D34430930 Pulled By: gitbw95 fbshipit-source-id: 218d78b672a2f914856d8a90ff32f2f5b5043ded
3 years ago
std::unique_ptr<SecondaryCacheResultHandle> handle;
Prevent double caching in the compressed secondary cache (#9747) Summary: ### **Summary:** When both LRU Cache and CompressedSecondaryCache are configured together, there possibly are some data blocks double cached. **Changes include:** 1. Update IS_PROMOTED to IS_IN_SECONDARY_CACHE to prevent confusions. 2. This PR updates SecondaryCacheResultHandle and use IsErasedFromSecondaryCache to determine whether the handle is erased in the secondary cache. Then, the caller can determine whether to SetIsInSecondaryCache(). 3. Rename LRUSecondaryCache to CompressedSecondaryCache. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9747 Test Plan: **Test Scripts:** 1. Populate a DB. The on disk footprint is 482 MB. The data is set to be 50% compressible, so the total decompressed size is expected to be 964 MB. ./db_bench --benchmarks=fillrandom --num=10000000 -db=/db_bench_1 2. overwrite it to a stable state: ./db_bench --benchmarks=overwrite,stats --num=10000000 -use_existing_db -duration=10 --benchmark_write_rate_limit=2000000 -db=/db_bench_1 4. Run read tests with diffeernt cache setting: T1: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=520000000 --statistics -db=/db_bench_1 T2: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=320000000 -compressed_secondary_cache_size=400000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1 T3: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=520000000 -compressed_secondary_cache_size=400000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1 T4: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=20000000 -compressed_secondary_cache_size=500000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1 **Before this PR** | Cache Size | Compressed Secondary Cache Size | Cache Hit Rate | |------------|-------------------------------------|----------------| |520 MB | 0 MB | 85.5% | |320 MB | 400 MB | 96.2% | |520 MB | 400 MB | 98.3% | |20 MB | 500 MB | 98.8% | **Before this PR** | Cache Size | Compressed Secondary Cache Size | Cache Hit Rate | |------------|-------------------------------------|----------------| |520 MB | 0 MB | 85.5% | |320 MB | 400 MB | 99.9% | |520 MB | 400 MB | 99.9% | |20 MB | 500 MB | 99.2% | Reviewed By: anand1976 Differential Revision: D35117499 Pulled By: gitbw95 fbshipit-source-id: ea2657749fc13efebe91a8a1b56bc61d6a224a12
2 years ago
is_in_sec_cache = false;
Add a secondary cache implementation based on LRUCache 1 (#9518) Summary: **Summary:** RocksDB uses a block cache to reduce IO and make queries more efficient. The block cache is based on the LRU algorithm (LRUCache) and keeps objects containing uncompressed data, such as Block, ParsedFullFilterBlock etc. It allows the user to configure a second level cache (rocksdb::SecondaryCache) to extend the primary block cache by holding items evicted from it. Some of the major RocksDB users, like MyRocks, use direct IO and would like to use a primary block cache for uncompressed data and a secondary cache for compressed data. The latter allows us to mitigate the loss of the Linux page cache due to direct IO. This PR includes a concrete implementation of rocksdb::SecondaryCache that integrates with compression libraries such as LZ4 and implements an LRU cache to hold compressed blocks. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9518 Test Plan: In this PR, the lru_secondary_cache_test.cc includes the following tests: 1. The unit tests for the secondary cache with either compression or no compression, such as basic tests, fails tests. 2. The integration tests with both primary cache and this secondary cache . **Follow Up:** 1. Statistics (e.g. compression ratio) will be added in another PR. 2. Once this implementation is ready, I will do some shadow testing and benchmarking with UDB to measure the impact. Reviewed By: anand1976 Differential Revision: D34430930 Pulled By: gitbw95 fbshipit-source-id: 218d78b672a2f914856d8a90ff32f2f5b5043ded
3 years ago
Cache::Handle* lru_handle = cache_->Lookup(key);
if (lru_handle == nullptr) {
return handle;
}
Split cache to minimize internal fragmentation (#10287) Summary: ### **Summary:** To minimize the internal fragmentation caused by the variable size of the compressed blocks, the original block is split according to the jemalloc bin size in `Insert()` and then merged back in `Lookup()`. Based on the analysis of the results of the following tests, from the overall internal fragmentation perspective, this PR does mitigate the internal fragmentation issue. _Do more myshadow tests with the latest commit. I finished several myshadow AB Testing and the results are promising. For the config of 4GB primary cache and 3GB secondary cache, Jemalloc resident stats shows consistently ~0.15GB memory saving; the allocated and active stats show similar memory savings. The CPU usage is almost the same before and after this PR._ To evaluate the issue of memory fragmentations and the benefits of this PR, I conducted two sets of local tests as follows. **T1** Keys: 16 bytes each (+ 0 bytes user-defined timestamp) Values: 100 bytes each (50 bytes after compression) Entries: 90000000 RawSize: 9956.4 MB (estimated) FileSize: 5664.8 MB (estimated) | Test Name | Primary Cache Size (MB) | Compressed Secondary Cache Size (MB) | | - | - | - | | T1_3 | 4000 | 4000 | | T1_4 | 2000 | 3000 | Populate the DB: ./db_bench --benchmarks=fillrandom --num=90000000 -db=/mem_fragmentation/db_bench_1 Overwrite it to a stable state: ./db_bench --benchmarks=overwrite --num=90000000 -use_existing_db -db=/mem_fragmentation/db_bench_1 Run read tests with differnt cache setting: T1_3: MALLOC_CONF="prof:true,prof_stats:true" ../rocksdb/db_bench --benchmarks=seekrandom --threads=16 --num=90000000 -use_existing_db --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=4000000000 -compressed_secondary_cache_size=4000000000 -use_compressed_secondary_cache -db=/mem_fragmentation/db_bench_1 --print_malloc_stats=true > ~/temp/mem_frag/20220710/jemalloc_stats_json_T1_3_20220710 -duration=1800 & T1_4: MALLOC_CONF="prof:true,prof_stats:true" ../rocksdb/db_bench --benchmarks=seekrandom --threads=16 --num=90000000 -use_existing_db --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=2000000000 -compressed_secondary_cache_size=3000000000 -use_compressed_secondary_cache -db=/mem_fragmentation/db_bench_1 --print_malloc_stats=true > ~/temp/mem_frag/20220710/jemalloc_stats_json_T1_4_20220710 -duration=1800 & For T1_3 and T1_4, I also conducted the tests before and after this PR. The following table show the important jemalloc stats. | Test Name | T1_3 | T1_3 after mem defrag | T1_4 | T1_4 after mem defrag | | - | - | - | - | - | | allocated (MB) | 8728 | 8076 | 5518 | 5043 | | available (MB) | 8753 | 8092 | 5536 | 5051 | | external fragmentation rate | 0.003 | 0.002 | 0.003 | 0.0016 | | resident (MB) | 8956 | 8365 | 5655 | 5235 | **T2** Keys: 32 bytes each (+ 0 bytes user-defined timestamp) Values: 256 bytes each (128 bytes after compression) Entries: 40000000 RawSize: 10986.3 MB (estimated) FileSize: 6103.5 MB (estimated) | Test Name | Primary Cache Size (MB) | Compressed Secondary Cache Size (MB) | | - | - | - | | T2_3 | 4000 | 4000 | | T2_4 | 2000 | 3000 | Create DB (10GB): ./db_bench -benchmarks=fillrandom -use_direct_reads=true -num=40000000 -key_size=32 -value_size=256 -db=/mem_fragmentation/db_bench_2 Overwrite it to a stable state: ./db_bench --benchmarks=overwrite --num=40000000 -use_existing_db -key_size=32 -value_size=256 -db=/mem_fragmentation/db_bench_2 Run read tests with differnt cache setting: T2_3: MALLOC_CONF="prof:true,prof_stats:true" ./db_bench --benchmarks="mixgraph" -use_direct_io_for_flush_and_compaction=true -use_direct_reads=true -cache_size=4000000000 -compressed_secondary_cache_size=4000000000 -use_compressed_secondary_cache -keyrange_dist_a=14.18 -keyrange_dist_b=-2.917 -keyrange_dist_c=0.0164 -keyrange_dist_d=-0.08082 -keyrange_num=30 -value_k=0.2615 -value_sigma=25.45 -iter_k=2.517 -iter_sigma=14.236 -mix_get_ratio=0.85 -mix_put_ratio=0.14 -mix_seek_ratio=0.01 -sine_mix_rate_interval_milliseconds=5000 -sine_a=1000 -sine_b=0.000073 -sine_d=400000 -reads=80000000 -num=40000000 -key_size=32 -value_size=256 -use_existing_db=true -db=/mem_fragmentation/db_bench_2 --print_malloc_stats=true > ~/temp/mem_frag/jemalloc_stats_T2_3 -duration=1800 & T2_4: MALLOC_CONF="prof:true,prof_stats:true" ./db_bench --benchmarks="mixgraph" -use_direct_io_for_flush_and_compaction=true -use_direct_reads=true -cache_size=2000000000 -compressed_secondary_cache_size=3000000000 -use_compressed_secondary_cache -keyrange_dist_a=14.18 -keyrange_dist_b=-2.917 -keyrange_dist_c=0.0164 -keyrange_dist_d=-0.08082 -keyrange_num=30 -value_k=0.2615 -value_sigma=25.45 -iter_k=2.517 -iter_sigma=14.236 -mix_get_ratio=0.85 -mix_put_ratio=0.14 -mix_seek_ratio=0.01 -sine_mix_rate_interval_milliseconds=5000 -sine_a=1000 -sine_b=0.000073 -sine_d=400000 -reads=80000000 -num=40000000 -key_size=32 -value_size=256 -use_existing_db=true -db=/mem_fragmentation/db_bench_2 --print_malloc_stats=true > ~/temp/mem_frag/jemalloc_stats_T2_4 -duration=1800 & For T2_3 and T2_4, I also conducted the tests before and after this PR. The following table show the important jemalloc stats. | Test Name | T2_3 | T2_3 after mem defrag | T2_4 | T2_4 after mem defrag | | - | - | - | - | - | | allocated (MB) | 8425 | 8093 | 5426 | 5149 | | available (MB) | 8489 | 8138 | 5435 | 5158 | | external fragmentation rate | 0.008 | 0.0055 | 0.0017 | 0.0017 | | resident (MB) | 8676 | 8392 | 5541 | 5321 | Pull Request resolved: https://github.com/facebook/rocksdb/pull/10287 Test Plan: Unit tests. Reviewed By: anand1976 Differential Revision: D37743362 Pulled By: gitbw95 fbshipit-source-id: 0010c5af08addeacc5ebbc4ffe5be882fb1d38ad
2 years ago
CacheValueChunk* handle_value =
reinterpret_cast<CacheValueChunk*>(cache_->Value(lru_handle));
size_t handle_value_charge{0};
CacheAllocationPtr merged_value =
MergeChunksIntoValue(handle_value, handle_value_charge);
Add a secondary cache implementation based on LRUCache 1 (#9518) Summary: **Summary:** RocksDB uses a block cache to reduce IO and make queries more efficient. The block cache is based on the LRU algorithm (LRUCache) and keeps objects containing uncompressed data, such as Block, ParsedFullFilterBlock etc. It allows the user to configure a second level cache (rocksdb::SecondaryCache) to extend the primary block cache by holding items evicted from it. Some of the major RocksDB users, like MyRocks, use direct IO and would like to use a primary block cache for uncompressed data and a secondary cache for compressed data. The latter allows us to mitigate the loss of the Linux page cache due to direct IO. This PR includes a concrete implementation of rocksdb::SecondaryCache that integrates with compression libraries such as LZ4 and implements an LRU cache to hold compressed blocks. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9518 Test Plan: In this PR, the lru_secondary_cache_test.cc includes the following tests: 1. The unit tests for the secondary cache with either compression or no compression, such as basic tests, fails tests. 2. The integration tests with both primary cache and this secondary cache . **Follow Up:** 1. Statistics (e.g. compression ratio) will be added in another PR. 2. Once this implementation is ready, I will do some shadow testing and benchmarking with UDB to measure the impact. Reviewed By: anand1976 Differential Revision: D34430930 Pulled By: gitbw95 fbshipit-source-id: 218d78b672a2f914856d8a90ff32f2f5b5043ded
3 years ago
Split cache to minimize internal fragmentation (#10287) Summary: ### **Summary:** To minimize the internal fragmentation caused by the variable size of the compressed blocks, the original block is split according to the jemalloc bin size in `Insert()` and then merged back in `Lookup()`. Based on the analysis of the results of the following tests, from the overall internal fragmentation perspective, this PR does mitigate the internal fragmentation issue. _Do more myshadow tests with the latest commit. I finished several myshadow AB Testing and the results are promising. For the config of 4GB primary cache and 3GB secondary cache, Jemalloc resident stats shows consistently ~0.15GB memory saving; the allocated and active stats show similar memory savings. The CPU usage is almost the same before and after this PR._ To evaluate the issue of memory fragmentations and the benefits of this PR, I conducted two sets of local tests as follows. **T1** Keys: 16 bytes each (+ 0 bytes user-defined timestamp) Values: 100 bytes each (50 bytes after compression) Entries: 90000000 RawSize: 9956.4 MB (estimated) FileSize: 5664.8 MB (estimated) | Test Name | Primary Cache Size (MB) | Compressed Secondary Cache Size (MB) | | - | - | - | | T1_3 | 4000 | 4000 | | T1_4 | 2000 | 3000 | Populate the DB: ./db_bench --benchmarks=fillrandom --num=90000000 -db=/mem_fragmentation/db_bench_1 Overwrite it to a stable state: ./db_bench --benchmarks=overwrite --num=90000000 -use_existing_db -db=/mem_fragmentation/db_bench_1 Run read tests with differnt cache setting: T1_3: MALLOC_CONF="prof:true,prof_stats:true" ../rocksdb/db_bench --benchmarks=seekrandom --threads=16 --num=90000000 -use_existing_db --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=4000000000 -compressed_secondary_cache_size=4000000000 -use_compressed_secondary_cache -db=/mem_fragmentation/db_bench_1 --print_malloc_stats=true > ~/temp/mem_frag/20220710/jemalloc_stats_json_T1_3_20220710 -duration=1800 & T1_4: MALLOC_CONF="prof:true,prof_stats:true" ../rocksdb/db_bench --benchmarks=seekrandom --threads=16 --num=90000000 -use_existing_db --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=2000000000 -compressed_secondary_cache_size=3000000000 -use_compressed_secondary_cache -db=/mem_fragmentation/db_bench_1 --print_malloc_stats=true > ~/temp/mem_frag/20220710/jemalloc_stats_json_T1_4_20220710 -duration=1800 & For T1_3 and T1_4, I also conducted the tests before and after this PR. The following table show the important jemalloc stats. | Test Name | T1_3 | T1_3 after mem defrag | T1_4 | T1_4 after mem defrag | | - | - | - | - | - | | allocated (MB) | 8728 | 8076 | 5518 | 5043 | | available (MB) | 8753 | 8092 | 5536 | 5051 | | external fragmentation rate | 0.003 | 0.002 | 0.003 | 0.0016 | | resident (MB) | 8956 | 8365 | 5655 | 5235 | **T2** Keys: 32 bytes each (+ 0 bytes user-defined timestamp) Values: 256 bytes each (128 bytes after compression) Entries: 40000000 RawSize: 10986.3 MB (estimated) FileSize: 6103.5 MB (estimated) | Test Name | Primary Cache Size (MB) | Compressed Secondary Cache Size (MB) | | - | - | - | | T2_3 | 4000 | 4000 | | T2_4 | 2000 | 3000 | Create DB (10GB): ./db_bench -benchmarks=fillrandom -use_direct_reads=true -num=40000000 -key_size=32 -value_size=256 -db=/mem_fragmentation/db_bench_2 Overwrite it to a stable state: ./db_bench --benchmarks=overwrite --num=40000000 -use_existing_db -key_size=32 -value_size=256 -db=/mem_fragmentation/db_bench_2 Run read tests with differnt cache setting: T2_3: MALLOC_CONF="prof:true,prof_stats:true" ./db_bench --benchmarks="mixgraph" -use_direct_io_for_flush_and_compaction=true -use_direct_reads=true -cache_size=4000000000 -compressed_secondary_cache_size=4000000000 -use_compressed_secondary_cache -keyrange_dist_a=14.18 -keyrange_dist_b=-2.917 -keyrange_dist_c=0.0164 -keyrange_dist_d=-0.08082 -keyrange_num=30 -value_k=0.2615 -value_sigma=25.45 -iter_k=2.517 -iter_sigma=14.236 -mix_get_ratio=0.85 -mix_put_ratio=0.14 -mix_seek_ratio=0.01 -sine_mix_rate_interval_milliseconds=5000 -sine_a=1000 -sine_b=0.000073 -sine_d=400000 -reads=80000000 -num=40000000 -key_size=32 -value_size=256 -use_existing_db=true -db=/mem_fragmentation/db_bench_2 --print_malloc_stats=true > ~/temp/mem_frag/jemalloc_stats_T2_3 -duration=1800 & T2_4: MALLOC_CONF="prof:true,prof_stats:true" ./db_bench --benchmarks="mixgraph" -use_direct_io_for_flush_and_compaction=true -use_direct_reads=true -cache_size=2000000000 -compressed_secondary_cache_size=3000000000 -use_compressed_secondary_cache -keyrange_dist_a=14.18 -keyrange_dist_b=-2.917 -keyrange_dist_c=0.0164 -keyrange_dist_d=-0.08082 -keyrange_num=30 -value_k=0.2615 -value_sigma=25.45 -iter_k=2.517 -iter_sigma=14.236 -mix_get_ratio=0.85 -mix_put_ratio=0.14 -mix_seek_ratio=0.01 -sine_mix_rate_interval_milliseconds=5000 -sine_a=1000 -sine_b=0.000073 -sine_d=400000 -reads=80000000 -num=40000000 -key_size=32 -value_size=256 -use_existing_db=true -db=/mem_fragmentation/db_bench_2 --print_malloc_stats=true > ~/temp/mem_frag/jemalloc_stats_T2_4 -duration=1800 & For T2_3 and T2_4, I also conducted the tests before and after this PR. The following table show the important jemalloc stats. | Test Name | T2_3 | T2_3 after mem defrag | T2_4 | T2_4 after mem defrag | | - | - | - | - | - | | allocated (MB) | 8425 | 8093 | 5426 | 5149 | | available (MB) | 8489 | 8138 | 5435 | 5158 | | external fragmentation rate | 0.008 | 0.0055 | 0.0017 | 0.0017 | | resident (MB) | 8676 | 8392 | 5541 | 5321 | Pull Request resolved: https://github.com/facebook/rocksdb/pull/10287 Test Plan: Unit tests. Reviewed By: anand1976 Differential Revision: D37743362 Pulled By: gitbw95 fbshipit-source-id: 0010c5af08addeacc5ebbc4ffe5be882fb1d38ad
2 years ago
Status s;
void* value{nullptr};
size_t charge{0};
Add a secondary cache implementation based on LRUCache 1 (#9518) Summary: **Summary:** RocksDB uses a block cache to reduce IO and make queries more efficient. The block cache is based on the LRU algorithm (LRUCache) and keeps objects containing uncompressed data, such as Block, ParsedFullFilterBlock etc. It allows the user to configure a second level cache (rocksdb::SecondaryCache) to extend the primary block cache by holding items evicted from it. Some of the major RocksDB users, like MyRocks, use direct IO and would like to use a primary block cache for uncompressed data and a secondary cache for compressed data. The latter allows us to mitigate the loss of the Linux page cache due to direct IO. This PR includes a concrete implementation of rocksdb::SecondaryCache that integrates with compression libraries such as LZ4 and implements an LRU cache to hold compressed blocks. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9518 Test Plan: In this PR, the lru_secondary_cache_test.cc includes the following tests: 1. The unit tests for the secondary cache with either compression or no compression, such as basic tests, fails tests. 2. The integration tests with both primary cache and this secondary cache . **Follow Up:** 1. Statistics (e.g. compression ratio) will be added in another PR. 2. Once this implementation is ready, I will do some shadow testing and benchmarking with UDB to measure the impact. Reviewed By: anand1976 Differential Revision: D34430930 Pulled By: gitbw95 fbshipit-source-id: 218d78b672a2f914856d8a90ff32f2f5b5043ded
3 years ago
if (cache_options_.compression_type == kNoCompression) {
Split cache to minimize internal fragmentation (#10287) Summary: ### **Summary:** To minimize the internal fragmentation caused by the variable size of the compressed blocks, the original block is split according to the jemalloc bin size in `Insert()` and then merged back in `Lookup()`. Based on the analysis of the results of the following tests, from the overall internal fragmentation perspective, this PR does mitigate the internal fragmentation issue. _Do more myshadow tests with the latest commit. I finished several myshadow AB Testing and the results are promising. For the config of 4GB primary cache and 3GB secondary cache, Jemalloc resident stats shows consistently ~0.15GB memory saving; the allocated and active stats show similar memory savings. The CPU usage is almost the same before and after this PR._ To evaluate the issue of memory fragmentations and the benefits of this PR, I conducted two sets of local tests as follows. **T1** Keys: 16 bytes each (+ 0 bytes user-defined timestamp) Values: 100 bytes each (50 bytes after compression) Entries: 90000000 RawSize: 9956.4 MB (estimated) FileSize: 5664.8 MB (estimated) | Test Name | Primary Cache Size (MB) | Compressed Secondary Cache Size (MB) | | - | - | - | | T1_3 | 4000 | 4000 | | T1_4 | 2000 | 3000 | Populate the DB: ./db_bench --benchmarks=fillrandom --num=90000000 -db=/mem_fragmentation/db_bench_1 Overwrite it to a stable state: ./db_bench --benchmarks=overwrite --num=90000000 -use_existing_db -db=/mem_fragmentation/db_bench_1 Run read tests with differnt cache setting: T1_3: MALLOC_CONF="prof:true,prof_stats:true" ../rocksdb/db_bench --benchmarks=seekrandom --threads=16 --num=90000000 -use_existing_db --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=4000000000 -compressed_secondary_cache_size=4000000000 -use_compressed_secondary_cache -db=/mem_fragmentation/db_bench_1 --print_malloc_stats=true > ~/temp/mem_frag/20220710/jemalloc_stats_json_T1_3_20220710 -duration=1800 & T1_4: MALLOC_CONF="prof:true,prof_stats:true" ../rocksdb/db_bench --benchmarks=seekrandom --threads=16 --num=90000000 -use_existing_db --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=2000000000 -compressed_secondary_cache_size=3000000000 -use_compressed_secondary_cache -db=/mem_fragmentation/db_bench_1 --print_malloc_stats=true > ~/temp/mem_frag/20220710/jemalloc_stats_json_T1_4_20220710 -duration=1800 & For T1_3 and T1_4, I also conducted the tests before and after this PR. The following table show the important jemalloc stats. | Test Name | T1_3 | T1_3 after mem defrag | T1_4 | T1_4 after mem defrag | | - | - | - | - | - | | allocated (MB) | 8728 | 8076 | 5518 | 5043 | | available (MB) | 8753 | 8092 | 5536 | 5051 | | external fragmentation rate | 0.003 | 0.002 | 0.003 | 0.0016 | | resident (MB) | 8956 | 8365 | 5655 | 5235 | **T2** Keys: 32 bytes each (+ 0 bytes user-defined timestamp) Values: 256 bytes each (128 bytes after compression) Entries: 40000000 RawSize: 10986.3 MB (estimated) FileSize: 6103.5 MB (estimated) | Test Name | Primary Cache Size (MB) | Compressed Secondary Cache Size (MB) | | - | - | - | | T2_3 | 4000 | 4000 | | T2_4 | 2000 | 3000 | Create DB (10GB): ./db_bench -benchmarks=fillrandom -use_direct_reads=true -num=40000000 -key_size=32 -value_size=256 -db=/mem_fragmentation/db_bench_2 Overwrite it to a stable state: ./db_bench --benchmarks=overwrite --num=40000000 -use_existing_db -key_size=32 -value_size=256 -db=/mem_fragmentation/db_bench_2 Run read tests with differnt cache setting: T2_3: MALLOC_CONF="prof:true,prof_stats:true" ./db_bench --benchmarks="mixgraph" -use_direct_io_for_flush_and_compaction=true -use_direct_reads=true -cache_size=4000000000 -compressed_secondary_cache_size=4000000000 -use_compressed_secondary_cache -keyrange_dist_a=14.18 -keyrange_dist_b=-2.917 -keyrange_dist_c=0.0164 -keyrange_dist_d=-0.08082 -keyrange_num=30 -value_k=0.2615 -value_sigma=25.45 -iter_k=2.517 -iter_sigma=14.236 -mix_get_ratio=0.85 -mix_put_ratio=0.14 -mix_seek_ratio=0.01 -sine_mix_rate_interval_milliseconds=5000 -sine_a=1000 -sine_b=0.000073 -sine_d=400000 -reads=80000000 -num=40000000 -key_size=32 -value_size=256 -use_existing_db=true -db=/mem_fragmentation/db_bench_2 --print_malloc_stats=true > ~/temp/mem_frag/jemalloc_stats_T2_3 -duration=1800 & T2_4: MALLOC_CONF="prof:true,prof_stats:true" ./db_bench --benchmarks="mixgraph" -use_direct_io_for_flush_and_compaction=true -use_direct_reads=true -cache_size=2000000000 -compressed_secondary_cache_size=3000000000 -use_compressed_secondary_cache -keyrange_dist_a=14.18 -keyrange_dist_b=-2.917 -keyrange_dist_c=0.0164 -keyrange_dist_d=-0.08082 -keyrange_num=30 -value_k=0.2615 -value_sigma=25.45 -iter_k=2.517 -iter_sigma=14.236 -mix_get_ratio=0.85 -mix_put_ratio=0.14 -mix_seek_ratio=0.01 -sine_mix_rate_interval_milliseconds=5000 -sine_a=1000 -sine_b=0.000073 -sine_d=400000 -reads=80000000 -num=40000000 -key_size=32 -value_size=256 -use_existing_db=true -db=/mem_fragmentation/db_bench_2 --print_malloc_stats=true > ~/temp/mem_frag/jemalloc_stats_T2_4 -duration=1800 & For T2_3 and T2_4, I also conducted the tests before and after this PR. The following table show the important jemalloc stats. | Test Name | T2_3 | T2_3 after mem defrag | T2_4 | T2_4 after mem defrag | | - | - | - | - | - | | allocated (MB) | 8425 | 8093 | 5426 | 5149 | | available (MB) | 8489 | 8138 | 5435 | 5158 | | external fragmentation rate | 0.008 | 0.0055 | 0.0017 | 0.0017 | | resident (MB) | 8676 | 8392 | 5541 | 5321 | Pull Request resolved: https://github.com/facebook/rocksdb/pull/10287 Test Plan: Unit tests. Reviewed By: anand1976 Differential Revision: D37743362 Pulled By: gitbw95 fbshipit-source-id: 0010c5af08addeacc5ebbc4ffe5be882fb1d38ad
2 years ago
s = create_cb(merged_value.get(), handle_value_charge, &value, &charge);
Add a secondary cache implementation based on LRUCache 1 (#9518) Summary: **Summary:** RocksDB uses a block cache to reduce IO and make queries more efficient. The block cache is based on the LRU algorithm (LRUCache) and keeps objects containing uncompressed data, such as Block, ParsedFullFilterBlock etc. It allows the user to configure a second level cache (rocksdb::SecondaryCache) to extend the primary block cache by holding items evicted from it. Some of the major RocksDB users, like MyRocks, use direct IO and would like to use a primary block cache for uncompressed data and a secondary cache for compressed data. The latter allows us to mitigate the loss of the Linux page cache due to direct IO. This PR includes a concrete implementation of rocksdb::SecondaryCache that integrates with compression libraries such as LZ4 and implements an LRU cache to hold compressed blocks. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9518 Test Plan: In this PR, the lru_secondary_cache_test.cc includes the following tests: 1. The unit tests for the secondary cache with either compression or no compression, such as basic tests, fails tests. 2. The integration tests with both primary cache and this secondary cache . **Follow Up:** 1. Statistics (e.g. compression ratio) will be added in another PR. 2. Once this implementation is ready, I will do some shadow testing and benchmarking with UDB to measure the impact. Reviewed By: anand1976 Differential Revision: D34430930 Pulled By: gitbw95 fbshipit-source-id: 218d78b672a2f914856d8a90ff32f2f5b5043ded
3 years ago
} else {
UncompressionContext uncompression_context(cache_options_.compression_type);
UncompressionInfo uncompression_info(uncompression_context,
UncompressionDict::GetEmptyDict(),
cache_options_.compression_type);
Split cache to minimize internal fragmentation (#10287) Summary: ### **Summary:** To minimize the internal fragmentation caused by the variable size of the compressed blocks, the original block is split according to the jemalloc bin size in `Insert()` and then merged back in `Lookup()`. Based on the analysis of the results of the following tests, from the overall internal fragmentation perspective, this PR does mitigate the internal fragmentation issue. _Do more myshadow tests with the latest commit. I finished several myshadow AB Testing and the results are promising. For the config of 4GB primary cache and 3GB secondary cache, Jemalloc resident stats shows consistently ~0.15GB memory saving; the allocated and active stats show similar memory savings. The CPU usage is almost the same before and after this PR._ To evaluate the issue of memory fragmentations and the benefits of this PR, I conducted two sets of local tests as follows. **T1** Keys: 16 bytes each (+ 0 bytes user-defined timestamp) Values: 100 bytes each (50 bytes after compression) Entries: 90000000 RawSize: 9956.4 MB (estimated) FileSize: 5664.8 MB (estimated) | Test Name | Primary Cache Size (MB) | Compressed Secondary Cache Size (MB) | | - | - | - | | T1_3 | 4000 | 4000 | | T1_4 | 2000 | 3000 | Populate the DB: ./db_bench --benchmarks=fillrandom --num=90000000 -db=/mem_fragmentation/db_bench_1 Overwrite it to a stable state: ./db_bench --benchmarks=overwrite --num=90000000 -use_existing_db -db=/mem_fragmentation/db_bench_1 Run read tests with differnt cache setting: T1_3: MALLOC_CONF="prof:true,prof_stats:true" ../rocksdb/db_bench --benchmarks=seekrandom --threads=16 --num=90000000 -use_existing_db --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=4000000000 -compressed_secondary_cache_size=4000000000 -use_compressed_secondary_cache -db=/mem_fragmentation/db_bench_1 --print_malloc_stats=true > ~/temp/mem_frag/20220710/jemalloc_stats_json_T1_3_20220710 -duration=1800 & T1_4: MALLOC_CONF="prof:true,prof_stats:true" ../rocksdb/db_bench --benchmarks=seekrandom --threads=16 --num=90000000 -use_existing_db --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=2000000000 -compressed_secondary_cache_size=3000000000 -use_compressed_secondary_cache -db=/mem_fragmentation/db_bench_1 --print_malloc_stats=true > ~/temp/mem_frag/20220710/jemalloc_stats_json_T1_4_20220710 -duration=1800 & For T1_3 and T1_4, I also conducted the tests before and after this PR. The following table show the important jemalloc stats. | Test Name | T1_3 | T1_3 after mem defrag | T1_4 | T1_4 after mem defrag | | - | - | - | - | - | | allocated (MB) | 8728 | 8076 | 5518 | 5043 | | available (MB) | 8753 | 8092 | 5536 | 5051 | | external fragmentation rate | 0.003 | 0.002 | 0.003 | 0.0016 | | resident (MB) | 8956 | 8365 | 5655 | 5235 | **T2** Keys: 32 bytes each (+ 0 bytes user-defined timestamp) Values: 256 bytes each (128 bytes after compression) Entries: 40000000 RawSize: 10986.3 MB (estimated) FileSize: 6103.5 MB (estimated) | Test Name | Primary Cache Size (MB) | Compressed Secondary Cache Size (MB) | | - | - | - | | T2_3 | 4000 | 4000 | | T2_4 | 2000 | 3000 | Create DB (10GB): ./db_bench -benchmarks=fillrandom -use_direct_reads=true -num=40000000 -key_size=32 -value_size=256 -db=/mem_fragmentation/db_bench_2 Overwrite it to a stable state: ./db_bench --benchmarks=overwrite --num=40000000 -use_existing_db -key_size=32 -value_size=256 -db=/mem_fragmentation/db_bench_2 Run read tests with differnt cache setting: T2_3: MALLOC_CONF="prof:true,prof_stats:true" ./db_bench --benchmarks="mixgraph" -use_direct_io_for_flush_and_compaction=true -use_direct_reads=true -cache_size=4000000000 -compressed_secondary_cache_size=4000000000 -use_compressed_secondary_cache -keyrange_dist_a=14.18 -keyrange_dist_b=-2.917 -keyrange_dist_c=0.0164 -keyrange_dist_d=-0.08082 -keyrange_num=30 -value_k=0.2615 -value_sigma=25.45 -iter_k=2.517 -iter_sigma=14.236 -mix_get_ratio=0.85 -mix_put_ratio=0.14 -mix_seek_ratio=0.01 -sine_mix_rate_interval_milliseconds=5000 -sine_a=1000 -sine_b=0.000073 -sine_d=400000 -reads=80000000 -num=40000000 -key_size=32 -value_size=256 -use_existing_db=true -db=/mem_fragmentation/db_bench_2 --print_malloc_stats=true > ~/temp/mem_frag/jemalloc_stats_T2_3 -duration=1800 & T2_4: MALLOC_CONF="prof:true,prof_stats:true" ./db_bench --benchmarks="mixgraph" -use_direct_io_for_flush_and_compaction=true -use_direct_reads=true -cache_size=2000000000 -compressed_secondary_cache_size=3000000000 -use_compressed_secondary_cache -keyrange_dist_a=14.18 -keyrange_dist_b=-2.917 -keyrange_dist_c=0.0164 -keyrange_dist_d=-0.08082 -keyrange_num=30 -value_k=0.2615 -value_sigma=25.45 -iter_k=2.517 -iter_sigma=14.236 -mix_get_ratio=0.85 -mix_put_ratio=0.14 -mix_seek_ratio=0.01 -sine_mix_rate_interval_milliseconds=5000 -sine_a=1000 -sine_b=0.000073 -sine_d=400000 -reads=80000000 -num=40000000 -key_size=32 -value_size=256 -use_existing_db=true -db=/mem_fragmentation/db_bench_2 --print_malloc_stats=true > ~/temp/mem_frag/jemalloc_stats_T2_4 -duration=1800 & For T2_3 and T2_4, I also conducted the tests before and after this PR. The following table show the important jemalloc stats. | Test Name | T2_3 | T2_3 after mem defrag | T2_4 | T2_4 after mem defrag | | - | - | - | - | - | | allocated (MB) | 8425 | 8093 | 5426 | 5149 | | available (MB) | 8489 | 8138 | 5435 | 5158 | | external fragmentation rate | 0.008 | 0.0055 | 0.0017 | 0.0017 | | resident (MB) | 8676 | 8392 | 5541 | 5321 | Pull Request resolved: https://github.com/facebook/rocksdb/pull/10287 Test Plan: Unit tests. Reviewed By: anand1976 Differential Revision: D37743362 Pulled By: gitbw95 fbshipit-source-id: 0010c5af08addeacc5ebbc4ffe5be882fb1d38ad
2 years ago
size_t uncompressed_size{0};
Add a secondary cache implementation based on LRUCache 1 (#9518) Summary: **Summary:** RocksDB uses a block cache to reduce IO and make queries more efficient. The block cache is based on the LRU algorithm (LRUCache) and keeps objects containing uncompressed data, such as Block, ParsedFullFilterBlock etc. It allows the user to configure a second level cache (rocksdb::SecondaryCache) to extend the primary block cache by holding items evicted from it. Some of the major RocksDB users, like MyRocks, use direct IO and would like to use a primary block cache for uncompressed data and a secondary cache for compressed data. The latter allows us to mitigate the loss of the Linux page cache due to direct IO. This PR includes a concrete implementation of rocksdb::SecondaryCache that integrates with compression libraries such as LZ4 and implements an LRU cache to hold compressed blocks. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9518 Test Plan: In this PR, the lru_secondary_cache_test.cc includes the following tests: 1. The unit tests for the secondary cache with either compression or no compression, such as basic tests, fails tests. 2. The integration tests with both primary cache and this secondary cache . **Follow Up:** 1. Statistics (e.g. compression ratio) will be added in another PR. 2. Once this implementation is ready, I will do some shadow testing and benchmarking with UDB to measure the impact. Reviewed By: anand1976 Differential Revision: D34430930 Pulled By: gitbw95 fbshipit-source-id: 218d78b672a2f914856d8a90ff32f2f5b5043ded
3 years ago
CacheAllocationPtr uncompressed;
Split cache to minimize internal fragmentation (#10287) Summary: ### **Summary:** To minimize the internal fragmentation caused by the variable size of the compressed blocks, the original block is split according to the jemalloc bin size in `Insert()` and then merged back in `Lookup()`. Based on the analysis of the results of the following tests, from the overall internal fragmentation perspective, this PR does mitigate the internal fragmentation issue. _Do more myshadow tests with the latest commit. I finished several myshadow AB Testing and the results are promising. For the config of 4GB primary cache and 3GB secondary cache, Jemalloc resident stats shows consistently ~0.15GB memory saving; the allocated and active stats show similar memory savings. The CPU usage is almost the same before and after this PR._ To evaluate the issue of memory fragmentations and the benefits of this PR, I conducted two sets of local tests as follows. **T1** Keys: 16 bytes each (+ 0 bytes user-defined timestamp) Values: 100 bytes each (50 bytes after compression) Entries: 90000000 RawSize: 9956.4 MB (estimated) FileSize: 5664.8 MB (estimated) | Test Name | Primary Cache Size (MB) | Compressed Secondary Cache Size (MB) | | - | - | - | | T1_3 | 4000 | 4000 | | T1_4 | 2000 | 3000 | Populate the DB: ./db_bench --benchmarks=fillrandom --num=90000000 -db=/mem_fragmentation/db_bench_1 Overwrite it to a stable state: ./db_bench --benchmarks=overwrite --num=90000000 -use_existing_db -db=/mem_fragmentation/db_bench_1 Run read tests with differnt cache setting: T1_3: MALLOC_CONF="prof:true,prof_stats:true" ../rocksdb/db_bench --benchmarks=seekrandom --threads=16 --num=90000000 -use_existing_db --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=4000000000 -compressed_secondary_cache_size=4000000000 -use_compressed_secondary_cache -db=/mem_fragmentation/db_bench_1 --print_malloc_stats=true > ~/temp/mem_frag/20220710/jemalloc_stats_json_T1_3_20220710 -duration=1800 & T1_4: MALLOC_CONF="prof:true,prof_stats:true" ../rocksdb/db_bench --benchmarks=seekrandom --threads=16 --num=90000000 -use_existing_db --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=2000000000 -compressed_secondary_cache_size=3000000000 -use_compressed_secondary_cache -db=/mem_fragmentation/db_bench_1 --print_malloc_stats=true > ~/temp/mem_frag/20220710/jemalloc_stats_json_T1_4_20220710 -duration=1800 & For T1_3 and T1_4, I also conducted the tests before and after this PR. The following table show the important jemalloc stats. | Test Name | T1_3 | T1_3 after mem defrag | T1_4 | T1_4 after mem defrag | | - | - | - | - | - | | allocated (MB) | 8728 | 8076 | 5518 | 5043 | | available (MB) | 8753 | 8092 | 5536 | 5051 | | external fragmentation rate | 0.003 | 0.002 | 0.003 | 0.0016 | | resident (MB) | 8956 | 8365 | 5655 | 5235 | **T2** Keys: 32 bytes each (+ 0 bytes user-defined timestamp) Values: 256 bytes each (128 bytes after compression) Entries: 40000000 RawSize: 10986.3 MB (estimated) FileSize: 6103.5 MB (estimated) | Test Name | Primary Cache Size (MB) | Compressed Secondary Cache Size (MB) | | - | - | - | | T2_3 | 4000 | 4000 | | T2_4 | 2000 | 3000 | Create DB (10GB): ./db_bench -benchmarks=fillrandom -use_direct_reads=true -num=40000000 -key_size=32 -value_size=256 -db=/mem_fragmentation/db_bench_2 Overwrite it to a stable state: ./db_bench --benchmarks=overwrite --num=40000000 -use_existing_db -key_size=32 -value_size=256 -db=/mem_fragmentation/db_bench_2 Run read tests with differnt cache setting: T2_3: MALLOC_CONF="prof:true,prof_stats:true" ./db_bench --benchmarks="mixgraph" -use_direct_io_for_flush_and_compaction=true -use_direct_reads=true -cache_size=4000000000 -compressed_secondary_cache_size=4000000000 -use_compressed_secondary_cache -keyrange_dist_a=14.18 -keyrange_dist_b=-2.917 -keyrange_dist_c=0.0164 -keyrange_dist_d=-0.08082 -keyrange_num=30 -value_k=0.2615 -value_sigma=25.45 -iter_k=2.517 -iter_sigma=14.236 -mix_get_ratio=0.85 -mix_put_ratio=0.14 -mix_seek_ratio=0.01 -sine_mix_rate_interval_milliseconds=5000 -sine_a=1000 -sine_b=0.000073 -sine_d=400000 -reads=80000000 -num=40000000 -key_size=32 -value_size=256 -use_existing_db=true -db=/mem_fragmentation/db_bench_2 --print_malloc_stats=true > ~/temp/mem_frag/jemalloc_stats_T2_3 -duration=1800 & T2_4: MALLOC_CONF="prof:true,prof_stats:true" ./db_bench --benchmarks="mixgraph" -use_direct_io_for_flush_and_compaction=true -use_direct_reads=true -cache_size=2000000000 -compressed_secondary_cache_size=3000000000 -use_compressed_secondary_cache -keyrange_dist_a=14.18 -keyrange_dist_b=-2.917 -keyrange_dist_c=0.0164 -keyrange_dist_d=-0.08082 -keyrange_num=30 -value_k=0.2615 -value_sigma=25.45 -iter_k=2.517 -iter_sigma=14.236 -mix_get_ratio=0.85 -mix_put_ratio=0.14 -mix_seek_ratio=0.01 -sine_mix_rate_interval_milliseconds=5000 -sine_a=1000 -sine_b=0.000073 -sine_d=400000 -reads=80000000 -num=40000000 -key_size=32 -value_size=256 -use_existing_db=true -db=/mem_fragmentation/db_bench_2 --print_malloc_stats=true > ~/temp/mem_frag/jemalloc_stats_T2_4 -duration=1800 & For T2_3 and T2_4, I also conducted the tests before and after this PR. The following table show the important jemalloc stats. | Test Name | T2_3 | T2_3 after mem defrag | T2_4 | T2_4 after mem defrag | | - | - | - | - | - | | allocated (MB) | 8425 | 8093 | 5426 | 5149 | | available (MB) | 8489 | 8138 | 5435 | 5158 | | external fragmentation rate | 0.008 | 0.0055 | 0.0017 | 0.0017 | | resident (MB) | 8676 | 8392 | 5541 | 5321 | Pull Request resolved: https://github.com/facebook/rocksdb/pull/10287 Test Plan: Unit tests. Reviewed By: anand1976 Differential Revision: D37743362 Pulled By: gitbw95 fbshipit-source-id: 0010c5af08addeacc5ebbc4ffe5be882fb1d38ad
2 years ago
uncompressed = UncompressData(uncompression_info, (char*)merged_value.get(),
handle_value_charge, &uncompressed_size,
cache_options_.compress_format_version,
cache_options_.memory_allocator.get());
Add a secondary cache implementation based on LRUCache 1 (#9518) Summary: **Summary:** RocksDB uses a block cache to reduce IO and make queries more efficient. The block cache is based on the LRU algorithm (LRUCache) and keeps objects containing uncompressed data, such as Block, ParsedFullFilterBlock etc. It allows the user to configure a second level cache (rocksdb::SecondaryCache) to extend the primary block cache by holding items evicted from it. Some of the major RocksDB users, like MyRocks, use direct IO and would like to use a primary block cache for uncompressed data and a secondary cache for compressed data. The latter allows us to mitigate the loss of the Linux page cache due to direct IO. This PR includes a concrete implementation of rocksdb::SecondaryCache that integrates with compression libraries such as LZ4 and implements an LRU cache to hold compressed blocks. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9518 Test Plan: In this PR, the lru_secondary_cache_test.cc includes the following tests: 1. The unit tests for the secondary cache with either compression or no compression, such as basic tests, fails tests. 2. The integration tests with both primary cache and this secondary cache . **Follow Up:** 1. Statistics (e.g. compression ratio) will be added in another PR. 2. Once this implementation is ready, I will do some shadow testing and benchmarking with UDB to measure the impact. Reviewed By: anand1976 Differential Revision: D34430930 Pulled By: gitbw95 fbshipit-source-id: 218d78b672a2f914856d8a90ff32f2f5b5043ded
3 years ago
if (!uncompressed) {
Prevent double caching in the compressed secondary cache (#9747) Summary: ### **Summary:** When both LRU Cache and CompressedSecondaryCache are configured together, there possibly are some data blocks double cached. **Changes include:** 1. Update IS_PROMOTED to IS_IN_SECONDARY_CACHE to prevent confusions. 2. This PR updates SecondaryCacheResultHandle and use IsErasedFromSecondaryCache to determine whether the handle is erased in the secondary cache. Then, the caller can determine whether to SetIsInSecondaryCache(). 3. Rename LRUSecondaryCache to CompressedSecondaryCache. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9747 Test Plan: **Test Scripts:** 1. Populate a DB. The on disk footprint is 482 MB. The data is set to be 50% compressible, so the total decompressed size is expected to be 964 MB. ./db_bench --benchmarks=fillrandom --num=10000000 -db=/db_bench_1 2. overwrite it to a stable state: ./db_bench --benchmarks=overwrite,stats --num=10000000 -use_existing_db -duration=10 --benchmark_write_rate_limit=2000000 -db=/db_bench_1 4. Run read tests with diffeernt cache setting: T1: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=520000000 --statistics -db=/db_bench_1 T2: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=320000000 -compressed_secondary_cache_size=400000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1 T3: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=520000000 -compressed_secondary_cache_size=400000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1 T4: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=20000000 -compressed_secondary_cache_size=500000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1 **Before this PR** | Cache Size | Compressed Secondary Cache Size | Cache Hit Rate | |------------|-------------------------------------|----------------| |520 MB | 0 MB | 85.5% | |320 MB | 400 MB | 96.2% | |520 MB | 400 MB | 98.3% | |20 MB | 500 MB | 98.8% | **Before this PR** | Cache Size | Compressed Secondary Cache Size | Cache Hit Rate | |------------|-------------------------------------|----------------| |520 MB | 0 MB | 85.5% | |320 MB | 400 MB | 99.9% | |520 MB | 400 MB | 99.9% | |20 MB | 500 MB | 99.2% | Reviewed By: anand1976 Differential Revision: D35117499 Pulled By: gitbw95 fbshipit-source-id: ea2657749fc13efebe91a8a1b56bc61d6a224a12
2 years ago
cache_->Release(lru_handle, /* erase_if_last_ref */ true);
Add a secondary cache implementation based on LRUCache 1 (#9518) Summary: **Summary:** RocksDB uses a block cache to reduce IO and make queries more efficient. The block cache is based on the LRU algorithm (LRUCache) and keeps objects containing uncompressed data, such as Block, ParsedFullFilterBlock etc. It allows the user to configure a second level cache (rocksdb::SecondaryCache) to extend the primary block cache by holding items evicted from it. Some of the major RocksDB users, like MyRocks, use direct IO and would like to use a primary block cache for uncompressed data and a secondary cache for compressed data. The latter allows us to mitigate the loss of the Linux page cache due to direct IO. This PR includes a concrete implementation of rocksdb::SecondaryCache that integrates with compression libraries such as LZ4 and implements an LRU cache to hold compressed blocks. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9518 Test Plan: In this PR, the lru_secondary_cache_test.cc includes the following tests: 1. The unit tests for the secondary cache with either compression or no compression, such as basic tests, fails tests. 2. The integration tests with both primary cache and this secondary cache . **Follow Up:** 1. Statistics (e.g. compression ratio) will be added in another PR. 2. Once this implementation is ready, I will do some shadow testing and benchmarking with UDB to measure the impact. Reviewed By: anand1976 Differential Revision: D34430930 Pulled By: gitbw95 fbshipit-source-id: 218d78b672a2f914856d8a90ff32f2f5b5043ded
3 years ago
return handle;
}
s = create_cb(uncompressed.get(), uncompressed_size, &value, &charge);
}
if (!s.ok()) {
Prevent double caching in the compressed secondary cache (#9747) Summary: ### **Summary:** When both LRU Cache and CompressedSecondaryCache are configured together, there possibly are some data blocks double cached. **Changes include:** 1. Update IS_PROMOTED to IS_IN_SECONDARY_CACHE to prevent confusions. 2. This PR updates SecondaryCacheResultHandle and use IsErasedFromSecondaryCache to determine whether the handle is erased in the secondary cache. Then, the caller can determine whether to SetIsInSecondaryCache(). 3. Rename LRUSecondaryCache to CompressedSecondaryCache. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9747 Test Plan: **Test Scripts:** 1. Populate a DB. The on disk footprint is 482 MB. The data is set to be 50% compressible, so the total decompressed size is expected to be 964 MB. ./db_bench --benchmarks=fillrandom --num=10000000 -db=/db_bench_1 2. overwrite it to a stable state: ./db_bench --benchmarks=overwrite,stats --num=10000000 -use_existing_db -duration=10 --benchmark_write_rate_limit=2000000 -db=/db_bench_1 4. Run read tests with diffeernt cache setting: T1: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=520000000 --statistics -db=/db_bench_1 T2: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=320000000 -compressed_secondary_cache_size=400000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1 T3: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=520000000 -compressed_secondary_cache_size=400000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1 T4: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=20000000 -compressed_secondary_cache_size=500000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1 **Before this PR** | Cache Size | Compressed Secondary Cache Size | Cache Hit Rate | |------------|-------------------------------------|----------------| |520 MB | 0 MB | 85.5% | |320 MB | 400 MB | 96.2% | |520 MB | 400 MB | 98.3% | |20 MB | 500 MB | 98.8% | **Before this PR** | Cache Size | Compressed Secondary Cache Size | Cache Hit Rate | |------------|-------------------------------------|----------------| |520 MB | 0 MB | 85.5% | |320 MB | 400 MB | 99.9% | |520 MB | 400 MB | 99.9% | |20 MB | 500 MB | 99.2% | Reviewed By: anand1976 Differential Revision: D35117499 Pulled By: gitbw95 fbshipit-source-id: ea2657749fc13efebe91a8a1b56bc61d6a224a12
2 years ago
cache_->Release(lru_handle, /* erase_if_last_ref */ true);
Add a secondary cache implementation based on LRUCache 1 (#9518) Summary: **Summary:** RocksDB uses a block cache to reduce IO and make queries more efficient. The block cache is based on the LRU algorithm (LRUCache) and keeps objects containing uncompressed data, such as Block, ParsedFullFilterBlock etc. It allows the user to configure a second level cache (rocksdb::SecondaryCache) to extend the primary block cache by holding items evicted from it. Some of the major RocksDB users, like MyRocks, use direct IO and would like to use a primary block cache for uncompressed data and a secondary cache for compressed data. The latter allows us to mitigate the loss of the Linux page cache due to direct IO. This PR includes a concrete implementation of rocksdb::SecondaryCache that integrates with compression libraries such as LZ4 and implements an LRU cache to hold compressed blocks. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9518 Test Plan: In this PR, the lru_secondary_cache_test.cc includes the following tests: 1. The unit tests for the secondary cache with either compression or no compression, such as basic tests, fails tests. 2. The integration tests with both primary cache and this secondary cache . **Follow Up:** 1. Statistics (e.g. compression ratio) will be added in another PR. 2. Once this implementation is ready, I will do some shadow testing and benchmarking with UDB to measure the impact. Reviewed By: anand1976 Differential Revision: D34430930 Pulled By: gitbw95 fbshipit-source-id: 218d78b672a2f914856d8a90ff32f2f5b5043ded
3 years ago
return handle;
}
Prevent double caching in the compressed secondary cache (#9747) Summary: ### **Summary:** When both LRU Cache and CompressedSecondaryCache are configured together, there possibly are some data blocks double cached. **Changes include:** 1. Update IS_PROMOTED to IS_IN_SECONDARY_CACHE to prevent confusions. 2. This PR updates SecondaryCacheResultHandle and use IsErasedFromSecondaryCache to determine whether the handle is erased in the secondary cache. Then, the caller can determine whether to SetIsInSecondaryCache(). 3. Rename LRUSecondaryCache to CompressedSecondaryCache. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9747 Test Plan: **Test Scripts:** 1. Populate a DB. The on disk footprint is 482 MB. The data is set to be 50% compressible, so the total decompressed size is expected to be 964 MB. ./db_bench --benchmarks=fillrandom --num=10000000 -db=/db_bench_1 2. overwrite it to a stable state: ./db_bench --benchmarks=overwrite,stats --num=10000000 -use_existing_db -duration=10 --benchmark_write_rate_limit=2000000 -db=/db_bench_1 4. Run read tests with diffeernt cache setting: T1: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=520000000 --statistics -db=/db_bench_1 T2: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=320000000 -compressed_secondary_cache_size=400000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1 T3: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=520000000 -compressed_secondary_cache_size=400000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1 T4: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=20000000 -compressed_secondary_cache_size=500000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1 **Before this PR** | Cache Size | Compressed Secondary Cache Size | Cache Hit Rate | |------------|-------------------------------------|----------------| |520 MB | 0 MB | 85.5% | |320 MB | 400 MB | 96.2% | |520 MB | 400 MB | 98.3% | |20 MB | 500 MB | 98.8% | **Before this PR** | Cache Size | Compressed Secondary Cache Size | Cache Hit Rate | |------------|-------------------------------------|----------------| |520 MB | 0 MB | 85.5% | |320 MB | 400 MB | 99.9% | |520 MB | 400 MB | 99.9% | |20 MB | 500 MB | 99.2% | Reviewed By: anand1976 Differential Revision: D35117499 Pulled By: gitbw95 fbshipit-source-id: ea2657749fc13efebe91a8a1b56bc61d6a224a12
2 years ago
cache_->Release(lru_handle, /* erase_if_last_ref */ true);
handle.reset(new CompressedSecondaryCacheResultHandle(value, charge));
Add a secondary cache implementation based on LRUCache 1 (#9518) Summary: **Summary:** RocksDB uses a block cache to reduce IO and make queries more efficient. The block cache is based on the LRU algorithm (LRUCache) and keeps objects containing uncompressed data, such as Block, ParsedFullFilterBlock etc. It allows the user to configure a second level cache (rocksdb::SecondaryCache) to extend the primary block cache by holding items evicted from it. Some of the major RocksDB users, like MyRocks, use direct IO and would like to use a primary block cache for uncompressed data and a secondary cache for compressed data. The latter allows us to mitigate the loss of the Linux page cache due to direct IO. This PR includes a concrete implementation of rocksdb::SecondaryCache that integrates with compression libraries such as LZ4 and implements an LRU cache to hold compressed blocks. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9518 Test Plan: In this PR, the lru_secondary_cache_test.cc includes the following tests: 1. The unit tests for the secondary cache with either compression or no compression, such as basic tests, fails tests. 2. The integration tests with both primary cache and this secondary cache . **Follow Up:** 1. Statistics (e.g. compression ratio) will be added in another PR. 2. Once this implementation is ready, I will do some shadow testing and benchmarking with UDB to measure the impact. Reviewed By: anand1976 Differential Revision: D34430930 Pulled By: gitbw95 fbshipit-source-id: 218d78b672a2f914856d8a90ff32f2f5b5043ded
3 years ago
return handle;
}
Prevent double caching in the compressed secondary cache (#9747) Summary: ### **Summary:** When both LRU Cache and CompressedSecondaryCache are configured together, there possibly are some data blocks double cached. **Changes include:** 1. Update IS_PROMOTED to IS_IN_SECONDARY_CACHE to prevent confusions. 2. This PR updates SecondaryCacheResultHandle and use IsErasedFromSecondaryCache to determine whether the handle is erased in the secondary cache. Then, the caller can determine whether to SetIsInSecondaryCache(). 3. Rename LRUSecondaryCache to CompressedSecondaryCache. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9747 Test Plan: **Test Scripts:** 1. Populate a DB. The on disk footprint is 482 MB. The data is set to be 50% compressible, so the total decompressed size is expected to be 964 MB. ./db_bench --benchmarks=fillrandom --num=10000000 -db=/db_bench_1 2. overwrite it to a stable state: ./db_bench --benchmarks=overwrite,stats --num=10000000 -use_existing_db -duration=10 --benchmark_write_rate_limit=2000000 -db=/db_bench_1 4. Run read tests with diffeernt cache setting: T1: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=520000000 --statistics -db=/db_bench_1 T2: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=320000000 -compressed_secondary_cache_size=400000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1 T3: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=520000000 -compressed_secondary_cache_size=400000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1 T4: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=20000000 -compressed_secondary_cache_size=500000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1 **Before this PR** | Cache Size | Compressed Secondary Cache Size | Cache Hit Rate | |------------|-------------------------------------|----------------| |520 MB | 0 MB | 85.5% | |320 MB | 400 MB | 96.2% | |520 MB | 400 MB | 98.3% | |20 MB | 500 MB | 98.8% | **Before this PR** | Cache Size | Compressed Secondary Cache Size | Cache Hit Rate | |------------|-------------------------------------|----------------| |520 MB | 0 MB | 85.5% | |320 MB | 400 MB | 99.9% | |520 MB | 400 MB | 99.9% | |20 MB | 500 MB | 99.2% | Reviewed By: anand1976 Differential Revision: D35117499 Pulled By: gitbw95 fbshipit-source-id: ea2657749fc13efebe91a8a1b56bc61d6a224a12
2 years ago
Status CompressedSecondaryCache::Insert(const Slice& key, void* value,
const Cache::CacheItemHelper* helper) {
Add a secondary cache implementation based on LRUCache 1 (#9518) Summary: **Summary:** RocksDB uses a block cache to reduce IO and make queries more efficient. The block cache is based on the LRU algorithm (LRUCache) and keeps objects containing uncompressed data, such as Block, ParsedFullFilterBlock etc. It allows the user to configure a second level cache (rocksdb::SecondaryCache) to extend the primary block cache by holding items evicted from it. Some of the major RocksDB users, like MyRocks, use direct IO and would like to use a primary block cache for uncompressed data and a secondary cache for compressed data. The latter allows us to mitigate the loss of the Linux page cache due to direct IO. This PR includes a concrete implementation of rocksdb::SecondaryCache that integrates with compression libraries such as LZ4 and implements an LRU cache to hold compressed blocks. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9518 Test Plan: In this PR, the lru_secondary_cache_test.cc includes the following tests: 1. The unit tests for the secondary cache with either compression or no compression, such as basic tests, fails tests. 2. The integration tests with both primary cache and this secondary cache . **Follow Up:** 1. Statistics (e.g. compression ratio) will be added in another PR. 2. Once this implementation is ready, I will do some shadow testing and benchmarking with UDB to measure the impact. Reviewed By: anand1976 Differential Revision: D34430930 Pulled By: gitbw95 fbshipit-source-id: 218d78b672a2f914856d8a90ff32f2f5b5043ded
3 years ago
size_t size = (*helper->size_cb)(value);
CacheAllocationPtr ptr =
AllocateBlock(size, cache_options_.memory_allocator.get());
Status s = (*helper->saveto_cb)(value, 0, size, ptr.get());
if (!s.ok()) {
return s;
}
Slice val(ptr.get(), size);
std::string compressed_val;
if (cache_options_.compression_type != kNoCompression) {
CompressionOptions compression_opts;
CompressionContext compression_context(cache_options_.compression_type);
Split cache to minimize internal fragmentation (#10287) Summary: ### **Summary:** To minimize the internal fragmentation caused by the variable size of the compressed blocks, the original block is split according to the jemalloc bin size in `Insert()` and then merged back in `Lookup()`. Based on the analysis of the results of the following tests, from the overall internal fragmentation perspective, this PR does mitigate the internal fragmentation issue. _Do more myshadow tests with the latest commit. I finished several myshadow AB Testing and the results are promising. For the config of 4GB primary cache and 3GB secondary cache, Jemalloc resident stats shows consistently ~0.15GB memory saving; the allocated and active stats show similar memory savings. The CPU usage is almost the same before and after this PR._ To evaluate the issue of memory fragmentations and the benefits of this PR, I conducted two sets of local tests as follows. **T1** Keys: 16 bytes each (+ 0 bytes user-defined timestamp) Values: 100 bytes each (50 bytes after compression) Entries: 90000000 RawSize: 9956.4 MB (estimated) FileSize: 5664.8 MB (estimated) | Test Name | Primary Cache Size (MB) | Compressed Secondary Cache Size (MB) | | - | - | - | | T1_3 | 4000 | 4000 | | T1_4 | 2000 | 3000 | Populate the DB: ./db_bench --benchmarks=fillrandom --num=90000000 -db=/mem_fragmentation/db_bench_1 Overwrite it to a stable state: ./db_bench --benchmarks=overwrite --num=90000000 -use_existing_db -db=/mem_fragmentation/db_bench_1 Run read tests with differnt cache setting: T1_3: MALLOC_CONF="prof:true,prof_stats:true" ../rocksdb/db_bench --benchmarks=seekrandom --threads=16 --num=90000000 -use_existing_db --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=4000000000 -compressed_secondary_cache_size=4000000000 -use_compressed_secondary_cache -db=/mem_fragmentation/db_bench_1 --print_malloc_stats=true > ~/temp/mem_frag/20220710/jemalloc_stats_json_T1_3_20220710 -duration=1800 & T1_4: MALLOC_CONF="prof:true,prof_stats:true" ../rocksdb/db_bench --benchmarks=seekrandom --threads=16 --num=90000000 -use_existing_db --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=2000000000 -compressed_secondary_cache_size=3000000000 -use_compressed_secondary_cache -db=/mem_fragmentation/db_bench_1 --print_malloc_stats=true > ~/temp/mem_frag/20220710/jemalloc_stats_json_T1_4_20220710 -duration=1800 & For T1_3 and T1_4, I also conducted the tests before and after this PR. The following table show the important jemalloc stats. | Test Name | T1_3 | T1_3 after mem defrag | T1_4 | T1_4 after mem defrag | | - | - | - | - | - | | allocated (MB) | 8728 | 8076 | 5518 | 5043 | | available (MB) | 8753 | 8092 | 5536 | 5051 | | external fragmentation rate | 0.003 | 0.002 | 0.003 | 0.0016 | | resident (MB) | 8956 | 8365 | 5655 | 5235 | **T2** Keys: 32 bytes each (+ 0 bytes user-defined timestamp) Values: 256 bytes each (128 bytes after compression) Entries: 40000000 RawSize: 10986.3 MB (estimated) FileSize: 6103.5 MB (estimated) | Test Name | Primary Cache Size (MB) | Compressed Secondary Cache Size (MB) | | - | - | - | | T2_3 | 4000 | 4000 | | T2_4 | 2000 | 3000 | Create DB (10GB): ./db_bench -benchmarks=fillrandom -use_direct_reads=true -num=40000000 -key_size=32 -value_size=256 -db=/mem_fragmentation/db_bench_2 Overwrite it to a stable state: ./db_bench --benchmarks=overwrite --num=40000000 -use_existing_db -key_size=32 -value_size=256 -db=/mem_fragmentation/db_bench_2 Run read tests with differnt cache setting: T2_3: MALLOC_CONF="prof:true,prof_stats:true" ./db_bench --benchmarks="mixgraph" -use_direct_io_for_flush_and_compaction=true -use_direct_reads=true -cache_size=4000000000 -compressed_secondary_cache_size=4000000000 -use_compressed_secondary_cache -keyrange_dist_a=14.18 -keyrange_dist_b=-2.917 -keyrange_dist_c=0.0164 -keyrange_dist_d=-0.08082 -keyrange_num=30 -value_k=0.2615 -value_sigma=25.45 -iter_k=2.517 -iter_sigma=14.236 -mix_get_ratio=0.85 -mix_put_ratio=0.14 -mix_seek_ratio=0.01 -sine_mix_rate_interval_milliseconds=5000 -sine_a=1000 -sine_b=0.000073 -sine_d=400000 -reads=80000000 -num=40000000 -key_size=32 -value_size=256 -use_existing_db=true -db=/mem_fragmentation/db_bench_2 --print_malloc_stats=true > ~/temp/mem_frag/jemalloc_stats_T2_3 -duration=1800 & T2_4: MALLOC_CONF="prof:true,prof_stats:true" ./db_bench --benchmarks="mixgraph" -use_direct_io_for_flush_and_compaction=true -use_direct_reads=true -cache_size=2000000000 -compressed_secondary_cache_size=3000000000 -use_compressed_secondary_cache -keyrange_dist_a=14.18 -keyrange_dist_b=-2.917 -keyrange_dist_c=0.0164 -keyrange_dist_d=-0.08082 -keyrange_num=30 -value_k=0.2615 -value_sigma=25.45 -iter_k=2.517 -iter_sigma=14.236 -mix_get_ratio=0.85 -mix_put_ratio=0.14 -mix_seek_ratio=0.01 -sine_mix_rate_interval_milliseconds=5000 -sine_a=1000 -sine_b=0.000073 -sine_d=400000 -reads=80000000 -num=40000000 -key_size=32 -value_size=256 -use_existing_db=true -db=/mem_fragmentation/db_bench_2 --print_malloc_stats=true > ~/temp/mem_frag/jemalloc_stats_T2_4 -duration=1800 & For T2_3 and T2_4, I also conducted the tests before and after this PR. The following table show the important jemalloc stats. | Test Name | T2_3 | T2_3 after mem defrag | T2_4 | T2_4 after mem defrag | | - | - | - | - | - | | allocated (MB) | 8425 | 8093 | 5426 | 5149 | | available (MB) | 8489 | 8138 | 5435 | 5158 | | external fragmentation rate | 0.008 | 0.0055 | 0.0017 | 0.0017 | | resident (MB) | 8676 | 8392 | 5541 | 5321 | Pull Request resolved: https://github.com/facebook/rocksdb/pull/10287 Test Plan: Unit tests. Reviewed By: anand1976 Differential Revision: D37743362 Pulled By: gitbw95 fbshipit-source-id: 0010c5af08addeacc5ebbc4ffe5be882fb1d38ad
2 years ago
uint64_t sample_for_compression{0};
Add a secondary cache implementation based on LRUCache 1 (#9518) Summary: **Summary:** RocksDB uses a block cache to reduce IO and make queries more efficient. The block cache is based on the LRU algorithm (LRUCache) and keeps objects containing uncompressed data, such as Block, ParsedFullFilterBlock etc. It allows the user to configure a second level cache (rocksdb::SecondaryCache) to extend the primary block cache by holding items evicted from it. Some of the major RocksDB users, like MyRocks, use direct IO and would like to use a primary block cache for uncompressed data and a secondary cache for compressed data. The latter allows us to mitigate the loss of the Linux page cache due to direct IO. This PR includes a concrete implementation of rocksdb::SecondaryCache that integrates with compression libraries such as LZ4 and implements an LRU cache to hold compressed blocks. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9518 Test Plan: In this PR, the lru_secondary_cache_test.cc includes the following tests: 1. The unit tests for the secondary cache with either compression or no compression, such as basic tests, fails tests. 2. The integration tests with both primary cache and this secondary cache . **Follow Up:** 1. Statistics (e.g. compression ratio) will be added in another PR. 2. Once this implementation is ready, I will do some shadow testing and benchmarking with UDB to measure the impact. Reviewed By: anand1976 Differential Revision: D34430930 Pulled By: gitbw95 fbshipit-source-id: 218d78b672a2f914856d8a90ff32f2f5b5043ded
3 years ago
CompressionInfo compression_info(
compression_opts, compression_context, CompressionDict::GetEmptyDict(),
cache_options_.compression_type, sample_for_compression);
bool success =
CompressData(val, compression_info,
cache_options_.compress_format_version, &compressed_val);
if (!success) {
return Status::Corruption("Error compressing value.");
}
val = Slice(compressed_val);
}
Split cache to minimize internal fragmentation (#10287) Summary: ### **Summary:** To minimize the internal fragmentation caused by the variable size of the compressed blocks, the original block is split according to the jemalloc bin size in `Insert()` and then merged back in `Lookup()`. Based on the analysis of the results of the following tests, from the overall internal fragmentation perspective, this PR does mitigate the internal fragmentation issue. _Do more myshadow tests with the latest commit. I finished several myshadow AB Testing and the results are promising. For the config of 4GB primary cache and 3GB secondary cache, Jemalloc resident stats shows consistently ~0.15GB memory saving; the allocated and active stats show similar memory savings. The CPU usage is almost the same before and after this PR._ To evaluate the issue of memory fragmentations and the benefits of this PR, I conducted two sets of local tests as follows. **T1** Keys: 16 bytes each (+ 0 bytes user-defined timestamp) Values: 100 bytes each (50 bytes after compression) Entries: 90000000 RawSize: 9956.4 MB (estimated) FileSize: 5664.8 MB (estimated) | Test Name | Primary Cache Size (MB) | Compressed Secondary Cache Size (MB) | | - | - | - | | T1_3 | 4000 | 4000 | | T1_4 | 2000 | 3000 | Populate the DB: ./db_bench --benchmarks=fillrandom --num=90000000 -db=/mem_fragmentation/db_bench_1 Overwrite it to a stable state: ./db_bench --benchmarks=overwrite --num=90000000 -use_existing_db -db=/mem_fragmentation/db_bench_1 Run read tests with differnt cache setting: T1_3: MALLOC_CONF="prof:true,prof_stats:true" ../rocksdb/db_bench --benchmarks=seekrandom --threads=16 --num=90000000 -use_existing_db --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=4000000000 -compressed_secondary_cache_size=4000000000 -use_compressed_secondary_cache -db=/mem_fragmentation/db_bench_1 --print_malloc_stats=true > ~/temp/mem_frag/20220710/jemalloc_stats_json_T1_3_20220710 -duration=1800 & T1_4: MALLOC_CONF="prof:true,prof_stats:true" ../rocksdb/db_bench --benchmarks=seekrandom --threads=16 --num=90000000 -use_existing_db --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=2000000000 -compressed_secondary_cache_size=3000000000 -use_compressed_secondary_cache -db=/mem_fragmentation/db_bench_1 --print_malloc_stats=true > ~/temp/mem_frag/20220710/jemalloc_stats_json_T1_4_20220710 -duration=1800 & For T1_3 and T1_4, I also conducted the tests before and after this PR. The following table show the important jemalloc stats. | Test Name | T1_3 | T1_3 after mem defrag | T1_4 | T1_4 after mem defrag | | - | - | - | - | - | | allocated (MB) | 8728 | 8076 | 5518 | 5043 | | available (MB) | 8753 | 8092 | 5536 | 5051 | | external fragmentation rate | 0.003 | 0.002 | 0.003 | 0.0016 | | resident (MB) | 8956 | 8365 | 5655 | 5235 | **T2** Keys: 32 bytes each (+ 0 bytes user-defined timestamp) Values: 256 bytes each (128 bytes after compression) Entries: 40000000 RawSize: 10986.3 MB (estimated) FileSize: 6103.5 MB (estimated) | Test Name | Primary Cache Size (MB) | Compressed Secondary Cache Size (MB) | | - | - | - | | T2_3 | 4000 | 4000 | | T2_4 | 2000 | 3000 | Create DB (10GB): ./db_bench -benchmarks=fillrandom -use_direct_reads=true -num=40000000 -key_size=32 -value_size=256 -db=/mem_fragmentation/db_bench_2 Overwrite it to a stable state: ./db_bench --benchmarks=overwrite --num=40000000 -use_existing_db -key_size=32 -value_size=256 -db=/mem_fragmentation/db_bench_2 Run read tests with differnt cache setting: T2_3: MALLOC_CONF="prof:true,prof_stats:true" ./db_bench --benchmarks="mixgraph" -use_direct_io_for_flush_and_compaction=true -use_direct_reads=true -cache_size=4000000000 -compressed_secondary_cache_size=4000000000 -use_compressed_secondary_cache -keyrange_dist_a=14.18 -keyrange_dist_b=-2.917 -keyrange_dist_c=0.0164 -keyrange_dist_d=-0.08082 -keyrange_num=30 -value_k=0.2615 -value_sigma=25.45 -iter_k=2.517 -iter_sigma=14.236 -mix_get_ratio=0.85 -mix_put_ratio=0.14 -mix_seek_ratio=0.01 -sine_mix_rate_interval_milliseconds=5000 -sine_a=1000 -sine_b=0.000073 -sine_d=400000 -reads=80000000 -num=40000000 -key_size=32 -value_size=256 -use_existing_db=true -db=/mem_fragmentation/db_bench_2 --print_malloc_stats=true > ~/temp/mem_frag/jemalloc_stats_T2_3 -duration=1800 & T2_4: MALLOC_CONF="prof:true,prof_stats:true" ./db_bench --benchmarks="mixgraph" -use_direct_io_for_flush_and_compaction=true -use_direct_reads=true -cache_size=2000000000 -compressed_secondary_cache_size=3000000000 -use_compressed_secondary_cache -keyrange_dist_a=14.18 -keyrange_dist_b=-2.917 -keyrange_dist_c=0.0164 -keyrange_dist_d=-0.08082 -keyrange_num=30 -value_k=0.2615 -value_sigma=25.45 -iter_k=2.517 -iter_sigma=14.236 -mix_get_ratio=0.85 -mix_put_ratio=0.14 -mix_seek_ratio=0.01 -sine_mix_rate_interval_milliseconds=5000 -sine_a=1000 -sine_b=0.000073 -sine_d=400000 -reads=80000000 -num=40000000 -key_size=32 -value_size=256 -use_existing_db=true -db=/mem_fragmentation/db_bench_2 --print_malloc_stats=true > ~/temp/mem_frag/jemalloc_stats_T2_4 -duration=1800 & For T2_3 and T2_4, I also conducted the tests before and after this PR. The following table show the important jemalloc stats. | Test Name | T2_3 | T2_3 after mem defrag | T2_4 | T2_4 after mem defrag | | - | - | - | - | - | | allocated (MB) | 8425 | 8093 | 5426 | 5149 | | available (MB) | 8489 | 8138 | 5435 | 5158 | | external fragmentation rate | 0.008 | 0.0055 | 0.0017 | 0.0017 | | resident (MB) | 8676 | 8392 | 5541 | 5321 | Pull Request resolved: https://github.com/facebook/rocksdb/pull/10287 Test Plan: Unit tests. Reviewed By: anand1976 Differential Revision: D37743362 Pulled By: gitbw95 fbshipit-source-id: 0010c5af08addeacc5ebbc4ffe5be882fb1d38ad
2 years ago
size_t charge{0};
CacheValueChunk* value_chunks_head =
SplitValueIntoChunks(val, cache_options_.compression_type, charge);
return cache_->Insert(key, value_chunks_head, charge, DeletionCallback);
Add a secondary cache implementation based on LRUCache 1 (#9518) Summary: **Summary:** RocksDB uses a block cache to reduce IO and make queries more efficient. The block cache is based on the LRU algorithm (LRUCache) and keeps objects containing uncompressed data, such as Block, ParsedFullFilterBlock etc. It allows the user to configure a second level cache (rocksdb::SecondaryCache) to extend the primary block cache by holding items evicted from it. Some of the major RocksDB users, like MyRocks, use direct IO and would like to use a primary block cache for uncompressed data and a secondary cache for compressed data. The latter allows us to mitigate the loss of the Linux page cache due to direct IO. This PR includes a concrete implementation of rocksdb::SecondaryCache that integrates with compression libraries such as LZ4 and implements an LRU cache to hold compressed blocks. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9518 Test Plan: In this PR, the lru_secondary_cache_test.cc includes the following tests: 1. The unit tests for the secondary cache with either compression or no compression, such as basic tests, fails tests. 2. The integration tests with both primary cache and this secondary cache . **Follow Up:** 1. Statistics (e.g. compression ratio) will be added in another PR. 2. Once this implementation is ready, I will do some shadow testing and benchmarking with UDB to measure the impact. Reviewed By: anand1976 Differential Revision: D34430930 Pulled By: gitbw95 fbshipit-source-id: 218d78b672a2f914856d8a90ff32f2f5b5043ded
3 years ago
}
Prevent double caching in the compressed secondary cache (#9747) Summary: ### **Summary:** When both LRU Cache and CompressedSecondaryCache are configured together, there possibly are some data blocks double cached. **Changes include:** 1. Update IS_PROMOTED to IS_IN_SECONDARY_CACHE to prevent confusions. 2. This PR updates SecondaryCacheResultHandle and use IsErasedFromSecondaryCache to determine whether the handle is erased in the secondary cache. Then, the caller can determine whether to SetIsInSecondaryCache(). 3. Rename LRUSecondaryCache to CompressedSecondaryCache. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9747 Test Plan: **Test Scripts:** 1. Populate a DB. The on disk footprint is 482 MB. The data is set to be 50% compressible, so the total decompressed size is expected to be 964 MB. ./db_bench --benchmarks=fillrandom --num=10000000 -db=/db_bench_1 2. overwrite it to a stable state: ./db_bench --benchmarks=overwrite,stats --num=10000000 -use_existing_db -duration=10 --benchmark_write_rate_limit=2000000 -db=/db_bench_1 4. Run read tests with diffeernt cache setting: T1: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=520000000 --statistics -db=/db_bench_1 T2: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=320000000 -compressed_secondary_cache_size=400000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1 T3: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=520000000 -compressed_secondary_cache_size=400000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1 T4: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=20000000 -compressed_secondary_cache_size=500000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1 **Before this PR** | Cache Size | Compressed Secondary Cache Size | Cache Hit Rate | |------------|-------------------------------------|----------------| |520 MB | 0 MB | 85.5% | |320 MB | 400 MB | 96.2% | |520 MB | 400 MB | 98.3% | |20 MB | 500 MB | 98.8% | **Before this PR** | Cache Size | Compressed Secondary Cache Size | Cache Hit Rate | |------------|-------------------------------------|----------------| |520 MB | 0 MB | 85.5% | |320 MB | 400 MB | 99.9% | |520 MB | 400 MB | 99.9% | |20 MB | 500 MB | 99.2% | Reviewed By: anand1976 Differential Revision: D35117499 Pulled By: gitbw95 fbshipit-source-id: ea2657749fc13efebe91a8a1b56bc61d6a224a12
2 years ago
void CompressedSecondaryCache::Erase(const Slice& key) { cache_->Erase(key); }
Add a secondary cache implementation based on LRUCache 1 (#9518) Summary: **Summary:** RocksDB uses a block cache to reduce IO and make queries more efficient. The block cache is based on the LRU algorithm (LRUCache) and keeps objects containing uncompressed data, such as Block, ParsedFullFilterBlock etc. It allows the user to configure a second level cache (rocksdb::SecondaryCache) to extend the primary block cache by holding items evicted from it. Some of the major RocksDB users, like MyRocks, use direct IO and would like to use a primary block cache for uncompressed data and a secondary cache for compressed data. The latter allows us to mitigate the loss of the Linux page cache due to direct IO. This PR includes a concrete implementation of rocksdb::SecondaryCache that integrates with compression libraries such as LZ4 and implements an LRU cache to hold compressed blocks. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9518 Test Plan: In this PR, the lru_secondary_cache_test.cc includes the following tests: 1. The unit tests for the secondary cache with either compression or no compression, such as basic tests, fails tests. 2. The integration tests with both primary cache and this secondary cache . **Follow Up:** 1. Statistics (e.g. compression ratio) will be added in another PR. 2. Once this implementation is ready, I will do some shadow testing and benchmarking with UDB to measure the impact. Reviewed By: anand1976 Differential Revision: D34430930 Pulled By: gitbw95 fbshipit-source-id: 218d78b672a2f914856d8a90ff32f2f5b5043ded
3 years ago
Prevent double caching in the compressed secondary cache (#9747) Summary: ### **Summary:** When both LRU Cache and CompressedSecondaryCache are configured together, there possibly are some data blocks double cached. **Changes include:** 1. Update IS_PROMOTED to IS_IN_SECONDARY_CACHE to prevent confusions. 2. This PR updates SecondaryCacheResultHandle and use IsErasedFromSecondaryCache to determine whether the handle is erased in the secondary cache. Then, the caller can determine whether to SetIsInSecondaryCache(). 3. Rename LRUSecondaryCache to CompressedSecondaryCache. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9747 Test Plan: **Test Scripts:** 1. Populate a DB. The on disk footprint is 482 MB. The data is set to be 50% compressible, so the total decompressed size is expected to be 964 MB. ./db_bench --benchmarks=fillrandom --num=10000000 -db=/db_bench_1 2. overwrite it to a stable state: ./db_bench --benchmarks=overwrite,stats --num=10000000 -use_existing_db -duration=10 --benchmark_write_rate_limit=2000000 -db=/db_bench_1 4. Run read tests with diffeernt cache setting: T1: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=520000000 --statistics -db=/db_bench_1 T2: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=320000000 -compressed_secondary_cache_size=400000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1 T3: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=520000000 -compressed_secondary_cache_size=400000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1 T4: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=20000000 -compressed_secondary_cache_size=500000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1 **Before this PR** | Cache Size | Compressed Secondary Cache Size | Cache Hit Rate | |------------|-------------------------------------|----------------| |520 MB | 0 MB | 85.5% | |320 MB | 400 MB | 96.2% | |520 MB | 400 MB | 98.3% | |20 MB | 500 MB | 98.8% | **Before this PR** | Cache Size | Compressed Secondary Cache Size | Cache Hit Rate | |------------|-------------------------------------|----------------| |520 MB | 0 MB | 85.5% | |320 MB | 400 MB | 99.9% | |520 MB | 400 MB | 99.9% | |20 MB | 500 MB | 99.2% | Reviewed By: anand1976 Differential Revision: D35117499 Pulled By: gitbw95 fbshipit-source-id: ea2657749fc13efebe91a8a1b56bc61d6a224a12
2 years ago
std::string CompressedSecondaryCache::GetPrintableOptions() const {
Add a secondary cache implementation based on LRUCache 1 (#9518) Summary: **Summary:** RocksDB uses a block cache to reduce IO and make queries more efficient. The block cache is based on the LRU algorithm (LRUCache) and keeps objects containing uncompressed data, such as Block, ParsedFullFilterBlock etc. It allows the user to configure a second level cache (rocksdb::SecondaryCache) to extend the primary block cache by holding items evicted from it. Some of the major RocksDB users, like MyRocks, use direct IO and would like to use a primary block cache for uncompressed data and a secondary cache for compressed data. The latter allows us to mitigate the loss of the Linux page cache due to direct IO. This PR includes a concrete implementation of rocksdb::SecondaryCache that integrates with compression libraries such as LZ4 and implements an LRU cache to hold compressed blocks. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9518 Test Plan: In this PR, the lru_secondary_cache_test.cc includes the following tests: 1. The unit tests for the secondary cache with either compression or no compression, such as basic tests, fails tests. 2. The integration tests with both primary cache and this secondary cache . **Follow Up:** 1. Statistics (e.g. compression ratio) will be added in another PR. 2. Once this implementation is ready, I will do some shadow testing and benchmarking with UDB to measure the impact. Reviewed By: anand1976 Differential Revision: D34430930 Pulled By: gitbw95 fbshipit-source-id: 218d78b672a2f914856d8a90ff32f2f5b5043ded
3 years ago
std::string ret;
ret.reserve(20000);
Split cache to minimize internal fragmentation (#10287) Summary: ### **Summary:** To minimize the internal fragmentation caused by the variable size of the compressed blocks, the original block is split according to the jemalloc bin size in `Insert()` and then merged back in `Lookup()`. Based on the analysis of the results of the following tests, from the overall internal fragmentation perspective, this PR does mitigate the internal fragmentation issue. _Do more myshadow tests with the latest commit. I finished several myshadow AB Testing and the results are promising. For the config of 4GB primary cache and 3GB secondary cache, Jemalloc resident stats shows consistently ~0.15GB memory saving; the allocated and active stats show similar memory savings. The CPU usage is almost the same before and after this PR._ To evaluate the issue of memory fragmentations and the benefits of this PR, I conducted two sets of local tests as follows. **T1** Keys: 16 bytes each (+ 0 bytes user-defined timestamp) Values: 100 bytes each (50 bytes after compression) Entries: 90000000 RawSize: 9956.4 MB (estimated) FileSize: 5664.8 MB (estimated) | Test Name | Primary Cache Size (MB) | Compressed Secondary Cache Size (MB) | | - | - | - | | T1_3 | 4000 | 4000 | | T1_4 | 2000 | 3000 | Populate the DB: ./db_bench --benchmarks=fillrandom --num=90000000 -db=/mem_fragmentation/db_bench_1 Overwrite it to a stable state: ./db_bench --benchmarks=overwrite --num=90000000 -use_existing_db -db=/mem_fragmentation/db_bench_1 Run read tests with differnt cache setting: T1_3: MALLOC_CONF="prof:true,prof_stats:true" ../rocksdb/db_bench --benchmarks=seekrandom --threads=16 --num=90000000 -use_existing_db --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=4000000000 -compressed_secondary_cache_size=4000000000 -use_compressed_secondary_cache -db=/mem_fragmentation/db_bench_1 --print_malloc_stats=true > ~/temp/mem_frag/20220710/jemalloc_stats_json_T1_3_20220710 -duration=1800 & T1_4: MALLOC_CONF="prof:true,prof_stats:true" ../rocksdb/db_bench --benchmarks=seekrandom --threads=16 --num=90000000 -use_existing_db --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=2000000000 -compressed_secondary_cache_size=3000000000 -use_compressed_secondary_cache -db=/mem_fragmentation/db_bench_1 --print_malloc_stats=true > ~/temp/mem_frag/20220710/jemalloc_stats_json_T1_4_20220710 -duration=1800 & For T1_3 and T1_4, I also conducted the tests before and after this PR. The following table show the important jemalloc stats. | Test Name | T1_3 | T1_3 after mem defrag | T1_4 | T1_4 after mem defrag | | - | - | - | - | - | | allocated (MB) | 8728 | 8076 | 5518 | 5043 | | available (MB) | 8753 | 8092 | 5536 | 5051 | | external fragmentation rate | 0.003 | 0.002 | 0.003 | 0.0016 | | resident (MB) | 8956 | 8365 | 5655 | 5235 | **T2** Keys: 32 bytes each (+ 0 bytes user-defined timestamp) Values: 256 bytes each (128 bytes after compression) Entries: 40000000 RawSize: 10986.3 MB (estimated) FileSize: 6103.5 MB (estimated) | Test Name | Primary Cache Size (MB) | Compressed Secondary Cache Size (MB) | | - | - | - | | T2_3 | 4000 | 4000 | | T2_4 | 2000 | 3000 | Create DB (10GB): ./db_bench -benchmarks=fillrandom -use_direct_reads=true -num=40000000 -key_size=32 -value_size=256 -db=/mem_fragmentation/db_bench_2 Overwrite it to a stable state: ./db_bench --benchmarks=overwrite --num=40000000 -use_existing_db -key_size=32 -value_size=256 -db=/mem_fragmentation/db_bench_2 Run read tests with differnt cache setting: T2_3: MALLOC_CONF="prof:true,prof_stats:true" ./db_bench --benchmarks="mixgraph" -use_direct_io_for_flush_and_compaction=true -use_direct_reads=true -cache_size=4000000000 -compressed_secondary_cache_size=4000000000 -use_compressed_secondary_cache -keyrange_dist_a=14.18 -keyrange_dist_b=-2.917 -keyrange_dist_c=0.0164 -keyrange_dist_d=-0.08082 -keyrange_num=30 -value_k=0.2615 -value_sigma=25.45 -iter_k=2.517 -iter_sigma=14.236 -mix_get_ratio=0.85 -mix_put_ratio=0.14 -mix_seek_ratio=0.01 -sine_mix_rate_interval_milliseconds=5000 -sine_a=1000 -sine_b=0.000073 -sine_d=400000 -reads=80000000 -num=40000000 -key_size=32 -value_size=256 -use_existing_db=true -db=/mem_fragmentation/db_bench_2 --print_malloc_stats=true > ~/temp/mem_frag/jemalloc_stats_T2_3 -duration=1800 & T2_4: MALLOC_CONF="prof:true,prof_stats:true" ./db_bench --benchmarks="mixgraph" -use_direct_io_for_flush_and_compaction=true -use_direct_reads=true -cache_size=2000000000 -compressed_secondary_cache_size=3000000000 -use_compressed_secondary_cache -keyrange_dist_a=14.18 -keyrange_dist_b=-2.917 -keyrange_dist_c=0.0164 -keyrange_dist_d=-0.08082 -keyrange_num=30 -value_k=0.2615 -value_sigma=25.45 -iter_k=2.517 -iter_sigma=14.236 -mix_get_ratio=0.85 -mix_put_ratio=0.14 -mix_seek_ratio=0.01 -sine_mix_rate_interval_milliseconds=5000 -sine_a=1000 -sine_b=0.000073 -sine_d=400000 -reads=80000000 -num=40000000 -key_size=32 -value_size=256 -use_existing_db=true -db=/mem_fragmentation/db_bench_2 --print_malloc_stats=true > ~/temp/mem_frag/jemalloc_stats_T2_4 -duration=1800 & For T2_3 and T2_4, I also conducted the tests before and after this PR. The following table show the important jemalloc stats. | Test Name | T2_3 | T2_3 after mem defrag | T2_4 | T2_4 after mem defrag | | - | - | - | - | - | | allocated (MB) | 8425 | 8093 | 5426 | 5149 | | available (MB) | 8489 | 8138 | 5435 | 5158 | | external fragmentation rate | 0.008 | 0.0055 | 0.0017 | 0.0017 | | resident (MB) | 8676 | 8392 | 5541 | 5321 | Pull Request resolved: https://github.com/facebook/rocksdb/pull/10287 Test Plan: Unit tests. Reviewed By: anand1976 Differential Revision: D37743362 Pulled By: gitbw95 fbshipit-source-id: 0010c5af08addeacc5ebbc4ffe5be882fb1d38ad
2 years ago
const int kBufferSize{200};
Add a secondary cache implementation based on LRUCache 1 (#9518) Summary: **Summary:** RocksDB uses a block cache to reduce IO and make queries more efficient. The block cache is based on the LRU algorithm (LRUCache) and keeps objects containing uncompressed data, such as Block, ParsedFullFilterBlock etc. It allows the user to configure a second level cache (rocksdb::SecondaryCache) to extend the primary block cache by holding items evicted from it. Some of the major RocksDB users, like MyRocks, use direct IO and would like to use a primary block cache for uncompressed data and a secondary cache for compressed data. The latter allows us to mitigate the loss of the Linux page cache due to direct IO. This PR includes a concrete implementation of rocksdb::SecondaryCache that integrates with compression libraries such as LZ4 and implements an LRU cache to hold compressed blocks. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9518 Test Plan: In this PR, the lru_secondary_cache_test.cc includes the following tests: 1. The unit tests for the secondary cache with either compression or no compression, such as basic tests, fails tests. 2. The integration tests with both primary cache and this secondary cache . **Follow Up:** 1. Statistics (e.g. compression ratio) will be added in another PR. 2. Once this implementation is ready, I will do some shadow testing and benchmarking with UDB to measure the impact. Reviewed By: anand1976 Differential Revision: D34430930 Pulled By: gitbw95 fbshipit-source-id: 218d78b672a2f914856d8a90ff32f2f5b5043ded
3 years ago
char buffer[kBufferSize];
ret.append(cache_->GetPrintableOptions());
snprintf(buffer, kBufferSize, " compression_type : %s\n",
CompressionTypeToString(cache_options_.compression_type).c_str());
ret.append(buffer);
snprintf(buffer, kBufferSize, " compress_format_version : %d\n",
Add a secondary cache implementation based on LRUCache 1 (#9518) Summary: **Summary:** RocksDB uses a block cache to reduce IO and make queries more efficient. The block cache is based on the LRU algorithm (LRUCache) and keeps objects containing uncompressed data, such as Block, ParsedFullFilterBlock etc. It allows the user to configure a second level cache (rocksdb::SecondaryCache) to extend the primary block cache by holding items evicted from it. Some of the major RocksDB users, like MyRocks, use direct IO and would like to use a primary block cache for uncompressed data and a secondary cache for compressed data. The latter allows us to mitigate the loss of the Linux page cache due to direct IO. This PR includes a concrete implementation of rocksdb::SecondaryCache that integrates with compression libraries such as LZ4 and implements an LRU cache to hold compressed blocks. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9518 Test Plan: In this PR, the lru_secondary_cache_test.cc includes the following tests: 1. The unit tests for the secondary cache with either compression or no compression, such as basic tests, fails tests. 2. The integration tests with both primary cache and this secondary cache . **Follow Up:** 1. Statistics (e.g. compression ratio) will be added in another PR. 2. Once this implementation is ready, I will do some shadow testing and benchmarking with UDB to measure the impact. Reviewed By: anand1976 Differential Revision: D34430930 Pulled By: gitbw95 fbshipit-source-id: 218d78b672a2f914856d8a90ff32f2f5b5043ded
3 years ago
cache_options_.compress_format_version);
ret.append(buffer);
return ret;
}
Split cache to minimize internal fragmentation (#10287) Summary: ### **Summary:** To minimize the internal fragmentation caused by the variable size of the compressed blocks, the original block is split according to the jemalloc bin size in `Insert()` and then merged back in `Lookup()`. Based on the analysis of the results of the following tests, from the overall internal fragmentation perspective, this PR does mitigate the internal fragmentation issue. _Do more myshadow tests with the latest commit. I finished several myshadow AB Testing and the results are promising. For the config of 4GB primary cache and 3GB secondary cache, Jemalloc resident stats shows consistently ~0.15GB memory saving; the allocated and active stats show similar memory savings. The CPU usage is almost the same before and after this PR._ To evaluate the issue of memory fragmentations and the benefits of this PR, I conducted two sets of local tests as follows. **T1** Keys: 16 bytes each (+ 0 bytes user-defined timestamp) Values: 100 bytes each (50 bytes after compression) Entries: 90000000 RawSize: 9956.4 MB (estimated) FileSize: 5664.8 MB (estimated) | Test Name | Primary Cache Size (MB) | Compressed Secondary Cache Size (MB) | | - | - | - | | T1_3 | 4000 | 4000 | | T1_4 | 2000 | 3000 | Populate the DB: ./db_bench --benchmarks=fillrandom --num=90000000 -db=/mem_fragmentation/db_bench_1 Overwrite it to a stable state: ./db_bench --benchmarks=overwrite --num=90000000 -use_existing_db -db=/mem_fragmentation/db_bench_1 Run read tests with differnt cache setting: T1_3: MALLOC_CONF="prof:true,prof_stats:true" ../rocksdb/db_bench --benchmarks=seekrandom --threads=16 --num=90000000 -use_existing_db --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=4000000000 -compressed_secondary_cache_size=4000000000 -use_compressed_secondary_cache -db=/mem_fragmentation/db_bench_1 --print_malloc_stats=true > ~/temp/mem_frag/20220710/jemalloc_stats_json_T1_3_20220710 -duration=1800 & T1_4: MALLOC_CONF="prof:true,prof_stats:true" ../rocksdb/db_bench --benchmarks=seekrandom --threads=16 --num=90000000 -use_existing_db --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=2000000000 -compressed_secondary_cache_size=3000000000 -use_compressed_secondary_cache -db=/mem_fragmentation/db_bench_1 --print_malloc_stats=true > ~/temp/mem_frag/20220710/jemalloc_stats_json_T1_4_20220710 -duration=1800 & For T1_3 and T1_4, I also conducted the tests before and after this PR. The following table show the important jemalloc stats. | Test Name | T1_3 | T1_3 after mem defrag | T1_4 | T1_4 after mem defrag | | - | - | - | - | - | | allocated (MB) | 8728 | 8076 | 5518 | 5043 | | available (MB) | 8753 | 8092 | 5536 | 5051 | | external fragmentation rate | 0.003 | 0.002 | 0.003 | 0.0016 | | resident (MB) | 8956 | 8365 | 5655 | 5235 | **T2** Keys: 32 bytes each (+ 0 bytes user-defined timestamp) Values: 256 bytes each (128 bytes after compression) Entries: 40000000 RawSize: 10986.3 MB (estimated) FileSize: 6103.5 MB (estimated) | Test Name | Primary Cache Size (MB) | Compressed Secondary Cache Size (MB) | | - | - | - | | T2_3 | 4000 | 4000 | | T2_4 | 2000 | 3000 | Create DB (10GB): ./db_bench -benchmarks=fillrandom -use_direct_reads=true -num=40000000 -key_size=32 -value_size=256 -db=/mem_fragmentation/db_bench_2 Overwrite it to a stable state: ./db_bench --benchmarks=overwrite --num=40000000 -use_existing_db -key_size=32 -value_size=256 -db=/mem_fragmentation/db_bench_2 Run read tests with differnt cache setting: T2_3: MALLOC_CONF="prof:true,prof_stats:true" ./db_bench --benchmarks="mixgraph" -use_direct_io_for_flush_and_compaction=true -use_direct_reads=true -cache_size=4000000000 -compressed_secondary_cache_size=4000000000 -use_compressed_secondary_cache -keyrange_dist_a=14.18 -keyrange_dist_b=-2.917 -keyrange_dist_c=0.0164 -keyrange_dist_d=-0.08082 -keyrange_num=30 -value_k=0.2615 -value_sigma=25.45 -iter_k=2.517 -iter_sigma=14.236 -mix_get_ratio=0.85 -mix_put_ratio=0.14 -mix_seek_ratio=0.01 -sine_mix_rate_interval_milliseconds=5000 -sine_a=1000 -sine_b=0.000073 -sine_d=400000 -reads=80000000 -num=40000000 -key_size=32 -value_size=256 -use_existing_db=true -db=/mem_fragmentation/db_bench_2 --print_malloc_stats=true > ~/temp/mem_frag/jemalloc_stats_T2_3 -duration=1800 & T2_4: MALLOC_CONF="prof:true,prof_stats:true" ./db_bench --benchmarks="mixgraph" -use_direct_io_for_flush_and_compaction=true -use_direct_reads=true -cache_size=2000000000 -compressed_secondary_cache_size=3000000000 -use_compressed_secondary_cache -keyrange_dist_a=14.18 -keyrange_dist_b=-2.917 -keyrange_dist_c=0.0164 -keyrange_dist_d=-0.08082 -keyrange_num=30 -value_k=0.2615 -value_sigma=25.45 -iter_k=2.517 -iter_sigma=14.236 -mix_get_ratio=0.85 -mix_put_ratio=0.14 -mix_seek_ratio=0.01 -sine_mix_rate_interval_milliseconds=5000 -sine_a=1000 -sine_b=0.000073 -sine_d=400000 -reads=80000000 -num=40000000 -key_size=32 -value_size=256 -use_existing_db=true -db=/mem_fragmentation/db_bench_2 --print_malloc_stats=true > ~/temp/mem_frag/jemalloc_stats_T2_4 -duration=1800 & For T2_3 and T2_4, I also conducted the tests before and after this PR. The following table show the important jemalloc stats. | Test Name | T2_3 | T2_3 after mem defrag | T2_4 | T2_4 after mem defrag | | - | - | - | - | - | | allocated (MB) | 8425 | 8093 | 5426 | 5149 | | available (MB) | 8489 | 8138 | 5435 | 5158 | | external fragmentation rate | 0.008 | 0.0055 | 0.0017 | 0.0017 | | resident (MB) | 8676 | 8392 | 5541 | 5321 | Pull Request resolved: https://github.com/facebook/rocksdb/pull/10287 Test Plan: Unit tests. Reviewed By: anand1976 Differential Revision: D37743362 Pulled By: gitbw95 fbshipit-source-id: 0010c5af08addeacc5ebbc4ffe5be882fb1d38ad
2 years ago
CompressedSecondaryCache::CacheValueChunk*
CompressedSecondaryCache::SplitValueIntoChunks(
const Slice& value, const CompressionType compression_type,
size_t& charge) {
assert(!value.empty());
const char* src_ptr = value.data();
size_t src_size{value.size()};
CacheValueChunk dummy_head = CacheValueChunk();
CacheValueChunk* current_chunk = &dummy_head;
CacheAllocationPtr ptr;
// Do not split when value size is large or there is no compression.
size_t predicted_chunk_size{0};
size_t actual_chunk_size{0};
size_t tmp_size{0};
while (src_size > 0) {
predicted_chunk_size = sizeof(CacheValueChunk) - 1 + src_size;
auto upper =
std::upper_bound(malloc_bin_sizes_.begin(), malloc_bin_sizes_.end(),
predicted_chunk_size);
// Do not split when value size is too small, too large, close to a bin
// size, or there is no compression.
if (upper == malloc_bin_sizes_.begin() ||
upper == malloc_bin_sizes_.end() ||
*upper - predicted_chunk_size < malloc_bin_sizes_.front() ||
compression_type == kNoCompression) {
tmp_size = predicted_chunk_size;
} else {
tmp_size = *(--upper);
}
ptr = AllocateBlock(tmp_size, cache_options_.memory_allocator.get());
current_chunk->next = reinterpret_cast<CacheValueChunk*>(ptr.release());
current_chunk = current_chunk->next;
actual_chunk_size = tmp_size - sizeof(CacheValueChunk) + 1;
memcpy(current_chunk->data, src_ptr, actual_chunk_size);
current_chunk->size = actual_chunk_size;
src_ptr += actual_chunk_size;
src_size -= actual_chunk_size;
charge += tmp_size;
}
current_chunk->next = nullptr;
return dummy_head.next;
}
CacheAllocationPtr CompressedSecondaryCache::MergeChunksIntoValue(
const void* chunks_head, size_t& charge) {
const CacheValueChunk* head =
reinterpret_cast<const CacheValueChunk*>(chunks_head);
const CacheValueChunk* current_chunk = head;
charge = 0;
while (current_chunk != nullptr) {
charge += current_chunk->size;
current_chunk = current_chunk->next;
}
CacheAllocationPtr ptr =
AllocateBlock(charge, cache_options_.memory_allocator.get());
current_chunk = head;
size_t pos{0};
while (current_chunk != nullptr) {
memcpy(ptr.get() + pos, current_chunk->data, current_chunk->size);
pos += current_chunk->size;
current_chunk = current_chunk->next;
}
return ptr;
}
void CompressedSecondaryCache::DeletionCallback(const Slice& /*key*/,
void* obj) {
CacheValueChunk* chunks_head = reinterpret_cast<CacheValueChunk*>(obj);
while (chunks_head != nullptr) {
CacheValueChunk* tmp_chunk = chunks_head;
chunks_head = chunks_head->next;
tmp_chunk->Free();
}
obj = nullptr;
}
Prevent double caching in the compressed secondary cache (#9747) Summary: ### **Summary:** When both LRU Cache and CompressedSecondaryCache are configured together, there possibly are some data blocks double cached. **Changes include:** 1. Update IS_PROMOTED to IS_IN_SECONDARY_CACHE to prevent confusions. 2. This PR updates SecondaryCacheResultHandle and use IsErasedFromSecondaryCache to determine whether the handle is erased in the secondary cache. Then, the caller can determine whether to SetIsInSecondaryCache(). 3. Rename LRUSecondaryCache to CompressedSecondaryCache. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9747 Test Plan: **Test Scripts:** 1. Populate a DB. The on disk footprint is 482 MB. The data is set to be 50% compressible, so the total decompressed size is expected to be 964 MB. ./db_bench --benchmarks=fillrandom --num=10000000 -db=/db_bench_1 2. overwrite it to a stable state: ./db_bench --benchmarks=overwrite,stats --num=10000000 -use_existing_db -duration=10 --benchmark_write_rate_limit=2000000 -db=/db_bench_1 4. Run read tests with diffeernt cache setting: T1: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=520000000 --statistics -db=/db_bench_1 T2: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=320000000 -compressed_secondary_cache_size=400000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1 T3: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=520000000 -compressed_secondary_cache_size=400000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1 T4: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=20000000 -compressed_secondary_cache_size=500000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1 **Before this PR** | Cache Size | Compressed Secondary Cache Size | Cache Hit Rate | |------------|-------------------------------------|----------------| |520 MB | 0 MB | 85.5% | |320 MB | 400 MB | 96.2% | |520 MB | 400 MB | 98.3% | |20 MB | 500 MB | 98.8% | **Before this PR** | Cache Size | Compressed Secondary Cache Size | Cache Hit Rate | |------------|-------------------------------------|----------------| |520 MB | 0 MB | 85.5% | |320 MB | 400 MB | 99.9% | |520 MB | 400 MB | 99.9% | |20 MB | 500 MB | 99.2% | Reviewed By: anand1976 Differential Revision: D35117499 Pulled By: gitbw95 fbshipit-source-id: ea2657749fc13efebe91a8a1b56bc61d6a224a12
2 years ago
std::shared_ptr<SecondaryCache> NewCompressedSecondaryCache(
Add a secondary cache implementation based on LRUCache 1 (#9518) Summary: **Summary:** RocksDB uses a block cache to reduce IO and make queries more efficient. The block cache is based on the LRU algorithm (LRUCache) and keeps objects containing uncompressed data, such as Block, ParsedFullFilterBlock etc. It allows the user to configure a second level cache (rocksdb::SecondaryCache) to extend the primary block cache by holding items evicted from it. Some of the major RocksDB users, like MyRocks, use direct IO and would like to use a primary block cache for uncompressed data and a secondary cache for compressed data. The latter allows us to mitigate the loss of the Linux page cache due to direct IO. This PR includes a concrete implementation of rocksdb::SecondaryCache that integrates with compression libraries such as LZ4 and implements an LRU cache to hold compressed blocks. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9518 Test Plan: In this PR, the lru_secondary_cache_test.cc includes the following tests: 1. The unit tests for the secondary cache with either compression or no compression, such as basic tests, fails tests. 2. The integration tests with both primary cache and this secondary cache . **Follow Up:** 1. Statistics (e.g. compression ratio) will be added in another PR. 2. Once this implementation is ready, I will do some shadow testing and benchmarking with UDB to measure the impact. Reviewed By: anand1976 Differential Revision: D34430930 Pulled By: gitbw95 fbshipit-source-id: 218d78b672a2f914856d8a90ff32f2f5b5043ded
3 years ago
size_t capacity, int num_shard_bits, bool strict_capacity_limit,
double high_pri_pool_ratio,
std::shared_ptr<MemoryAllocator> memory_allocator, bool use_adaptive_mutex,
CacheMetadataChargePolicy metadata_charge_policy,
CompressionType compression_type, uint32_t compress_format_version) {
Prevent double caching in the compressed secondary cache (#9747) Summary: ### **Summary:** When both LRU Cache and CompressedSecondaryCache are configured together, there possibly are some data blocks double cached. **Changes include:** 1. Update IS_PROMOTED to IS_IN_SECONDARY_CACHE to prevent confusions. 2. This PR updates SecondaryCacheResultHandle and use IsErasedFromSecondaryCache to determine whether the handle is erased in the secondary cache. Then, the caller can determine whether to SetIsInSecondaryCache(). 3. Rename LRUSecondaryCache to CompressedSecondaryCache. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9747 Test Plan: **Test Scripts:** 1. Populate a DB. The on disk footprint is 482 MB. The data is set to be 50% compressible, so the total decompressed size is expected to be 964 MB. ./db_bench --benchmarks=fillrandom --num=10000000 -db=/db_bench_1 2. overwrite it to a stable state: ./db_bench --benchmarks=overwrite,stats --num=10000000 -use_existing_db -duration=10 --benchmark_write_rate_limit=2000000 -db=/db_bench_1 4. Run read tests with diffeernt cache setting: T1: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=520000000 --statistics -db=/db_bench_1 T2: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=320000000 -compressed_secondary_cache_size=400000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1 T3: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=520000000 -compressed_secondary_cache_size=400000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1 T4: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=20000000 -compressed_secondary_cache_size=500000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1 **Before this PR** | Cache Size | Compressed Secondary Cache Size | Cache Hit Rate | |------------|-------------------------------------|----------------| |520 MB | 0 MB | 85.5% | |320 MB | 400 MB | 96.2% | |520 MB | 400 MB | 98.3% | |20 MB | 500 MB | 98.8% | **Before this PR** | Cache Size | Compressed Secondary Cache Size | Cache Hit Rate | |------------|-------------------------------------|----------------| |520 MB | 0 MB | 85.5% | |320 MB | 400 MB | 99.9% | |520 MB | 400 MB | 99.9% | |20 MB | 500 MB | 99.2% | Reviewed By: anand1976 Differential Revision: D35117499 Pulled By: gitbw95 fbshipit-source-id: ea2657749fc13efebe91a8a1b56bc61d6a224a12
2 years ago
return std::make_shared<CompressedSecondaryCache>(
Add a secondary cache implementation based on LRUCache 1 (#9518) Summary: **Summary:** RocksDB uses a block cache to reduce IO and make queries more efficient. The block cache is based on the LRU algorithm (LRUCache) and keeps objects containing uncompressed data, such as Block, ParsedFullFilterBlock etc. It allows the user to configure a second level cache (rocksdb::SecondaryCache) to extend the primary block cache by holding items evicted from it. Some of the major RocksDB users, like MyRocks, use direct IO and would like to use a primary block cache for uncompressed data and a secondary cache for compressed data. The latter allows us to mitigate the loss of the Linux page cache due to direct IO. This PR includes a concrete implementation of rocksdb::SecondaryCache that integrates with compression libraries such as LZ4 and implements an LRU cache to hold compressed blocks. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9518 Test Plan: In this PR, the lru_secondary_cache_test.cc includes the following tests: 1. The unit tests for the secondary cache with either compression or no compression, such as basic tests, fails tests. 2. The integration tests with both primary cache and this secondary cache . **Follow Up:** 1. Statistics (e.g. compression ratio) will be added in another PR. 2. Once this implementation is ready, I will do some shadow testing and benchmarking with UDB to measure the impact. Reviewed By: anand1976 Differential Revision: D34430930 Pulled By: gitbw95 fbshipit-source-id: 218d78b672a2f914856d8a90ff32f2f5b5043ded
3 years ago
capacity, num_shard_bits, strict_capacity_limit, high_pri_pool_ratio,
memory_allocator, use_adaptive_mutex, metadata_charge_policy,
compression_type, compress_format_version);
Add a secondary cache implementation based on LRUCache 1 (#9518) Summary: **Summary:** RocksDB uses a block cache to reduce IO and make queries more efficient. The block cache is based on the LRU algorithm (LRUCache) and keeps objects containing uncompressed data, such as Block, ParsedFullFilterBlock etc. It allows the user to configure a second level cache (rocksdb::SecondaryCache) to extend the primary block cache by holding items evicted from it. Some of the major RocksDB users, like MyRocks, use direct IO and would like to use a primary block cache for uncompressed data and a secondary cache for compressed data. The latter allows us to mitigate the loss of the Linux page cache due to direct IO. This PR includes a concrete implementation of rocksdb::SecondaryCache that integrates with compression libraries such as LZ4 and implements an LRU cache to hold compressed blocks. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9518 Test Plan: In this PR, the lru_secondary_cache_test.cc includes the following tests: 1. The unit tests for the secondary cache with either compression or no compression, such as basic tests, fails tests. 2. The integration tests with both primary cache and this secondary cache . **Follow Up:** 1. Statistics (e.g. compression ratio) will be added in another PR. 2. Once this implementation is ready, I will do some shadow testing and benchmarking with UDB to measure the impact. Reviewed By: anand1976 Differential Revision: D34430930 Pulled By: gitbw95 fbshipit-source-id: 218d78b672a2f914856d8a90ff32f2f5b5043ded
3 years ago
}
Prevent double caching in the compressed secondary cache (#9747) Summary: ### **Summary:** When both LRU Cache and CompressedSecondaryCache are configured together, there possibly are some data blocks double cached. **Changes include:** 1. Update IS_PROMOTED to IS_IN_SECONDARY_CACHE to prevent confusions. 2. This PR updates SecondaryCacheResultHandle and use IsErasedFromSecondaryCache to determine whether the handle is erased in the secondary cache. Then, the caller can determine whether to SetIsInSecondaryCache(). 3. Rename LRUSecondaryCache to CompressedSecondaryCache. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9747 Test Plan: **Test Scripts:** 1. Populate a DB. The on disk footprint is 482 MB. The data is set to be 50% compressible, so the total decompressed size is expected to be 964 MB. ./db_bench --benchmarks=fillrandom --num=10000000 -db=/db_bench_1 2. overwrite it to a stable state: ./db_bench --benchmarks=overwrite,stats --num=10000000 -use_existing_db -duration=10 --benchmark_write_rate_limit=2000000 -db=/db_bench_1 4. Run read tests with diffeernt cache setting: T1: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=520000000 --statistics -db=/db_bench_1 T2: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=320000000 -compressed_secondary_cache_size=400000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1 T3: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=520000000 -compressed_secondary_cache_size=400000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1 T4: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=20000000 -compressed_secondary_cache_size=500000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1 **Before this PR** | Cache Size | Compressed Secondary Cache Size | Cache Hit Rate | |------------|-------------------------------------|----------------| |520 MB | 0 MB | 85.5% | |320 MB | 400 MB | 96.2% | |520 MB | 400 MB | 98.3% | |20 MB | 500 MB | 98.8% | **Before this PR** | Cache Size | Compressed Secondary Cache Size | Cache Hit Rate | |------------|-------------------------------------|----------------| |520 MB | 0 MB | 85.5% | |320 MB | 400 MB | 99.9% | |520 MB | 400 MB | 99.9% | |20 MB | 500 MB | 99.2% | Reviewed By: anand1976 Differential Revision: D35117499 Pulled By: gitbw95 fbshipit-source-id: ea2657749fc13efebe91a8a1b56bc61d6a224a12
2 years ago
std::shared_ptr<SecondaryCache> NewCompressedSecondaryCache(
const CompressedSecondaryCacheOptions& opts) {
Add a secondary cache implementation based on LRUCache 1 (#9518) Summary: **Summary:** RocksDB uses a block cache to reduce IO and make queries more efficient. The block cache is based on the LRU algorithm (LRUCache) and keeps objects containing uncompressed data, such as Block, ParsedFullFilterBlock etc. It allows the user to configure a second level cache (rocksdb::SecondaryCache) to extend the primary block cache by holding items evicted from it. Some of the major RocksDB users, like MyRocks, use direct IO and would like to use a primary block cache for uncompressed data and a secondary cache for compressed data. The latter allows us to mitigate the loss of the Linux page cache due to direct IO. This PR includes a concrete implementation of rocksdb::SecondaryCache that integrates with compression libraries such as LZ4 and implements an LRU cache to hold compressed blocks. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9518 Test Plan: In this PR, the lru_secondary_cache_test.cc includes the following tests: 1. The unit tests for the secondary cache with either compression or no compression, such as basic tests, fails tests. 2. The integration tests with both primary cache and this secondary cache . **Follow Up:** 1. Statistics (e.g. compression ratio) will be added in another PR. 2. Once this implementation is ready, I will do some shadow testing and benchmarking with UDB to measure the impact. Reviewed By: anand1976 Differential Revision: D34430930 Pulled By: gitbw95 fbshipit-source-id: 218d78b672a2f914856d8a90ff32f2f5b5043ded
3 years ago
// The secondary_cache is disabled for this LRUCache instance.
assert(opts.secondary_cache == nullptr);
Prevent double caching in the compressed secondary cache (#9747) Summary: ### **Summary:** When both LRU Cache and CompressedSecondaryCache are configured together, there possibly are some data blocks double cached. **Changes include:** 1. Update IS_PROMOTED to IS_IN_SECONDARY_CACHE to prevent confusions. 2. This PR updates SecondaryCacheResultHandle and use IsErasedFromSecondaryCache to determine whether the handle is erased in the secondary cache. Then, the caller can determine whether to SetIsInSecondaryCache(). 3. Rename LRUSecondaryCache to CompressedSecondaryCache. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9747 Test Plan: **Test Scripts:** 1. Populate a DB. The on disk footprint is 482 MB. The data is set to be 50% compressible, so the total decompressed size is expected to be 964 MB. ./db_bench --benchmarks=fillrandom --num=10000000 -db=/db_bench_1 2. overwrite it to a stable state: ./db_bench --benchmarks=overwrite,stats --num=10000000 -use_existing_db -duration=10 --benchmark_write_rate_limit=2000000 -db=/db_bench_1 4. Run read tests with diffeernt cache setting: T1: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=520000000 --statistics -db=/db_bench_1 T2: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=320000000 -compressed_secondary_cache_size=400000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1 T3: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=520000000 -compressed_secondary_cache_size=400000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1 T4: ./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=20000000 -compressed_secondary_cache_size=500000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1 **Before this PR** | Cache Size | Compressed Secondary Cache Size | Cache Hit Rate | |------------|-------------------------------------|----------------| |520 MB | 0 MB | 85.5% | |320 MB | 400 MB | 96.2% | |520 MB | 400 MB | 98.3% | |20 MB | 500 MB | 98.8% | **Before this PR** | Cache Size | Compressed Secondary Cache Size | Cache Hit Rate | |------------|-------------------------------------|----------------| |520 MB | 0 MB | 85.5% | |320 MB | 400 MB | 99.9% | |520 MB | 400 MB | 99.9% | |20 MB | 500 MB | 99.2% | Reviewed By: anand1976 Differential Revision: D35117499 Pulled By: gitbw95 fbshipit-source-id: ea2657749fc13efebe91a8a1b56bc61d6a224a12
2 years ago
return NewCompressedSecondaryCache(
Add a secondary cache implementation based on LRUCache 1 (#9518) Summary: **Summary:** RocksDB uses a block cache to reduce IO and make queries more efficient. The block cache is based on the LRU algorithm (LRUCache) and keeps objects containing uncompressed data, such as Block, ParsedFullFilterBlock etc. It allows the user to configure a second level cache (rocksdb::SecondaryCache) to extend the primary block cache by holding items evicted from it. Some of the major RocksDB users, like MyRocks, use direct IO and would like to use a primary block cache for uncompressed data and a secondary cache for compressed data. The latter allows us to mitigate the loss of the Linux page cache due to direct IO. This PR includes a concrete implementation of rocksdb::SecondaryCache that integrates with compression libraries such as LZ4 and implements an LRU cache to hold compressed blocks. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9518 Test Plan: In this PR, the lru_secondary_cache_test.cc includes the following tests: 1. The unit tests for the secondary cache with either compression or no compression, such as basic tests, fails tests. 2. The integration tests with both primary cache and this secondary cache . **Follow Up:** 1. Statistics (e.g. compression ratio) will be added in another PR. 2. Once this implementation is ready, I will do some shadow testing and benchmarking with UDB to measure the impact. Reviewed By: anand1976 Differential Revision: D34430930 Pulled By: gitbw95 fbshipit-source-id: 218d78b672a2f914856d8a90ff32f2f5b5043ded
3 years ago
opts.capacity, opts.num_shard_bits, opts.strict_capacity_limit,
opts.high_pri_pool_ratio, opts.memory_allocator, opts.use_adaptive_mutex,
opts.metadata_charge_policy, opts.compression_type,
opts.compress_format_version);
Add a secondary cache implementation based on LRUCache 1 (#9518) Summary: **Summary:** RocksDB uses a block cache to reduce IO and make queries more efficient. The block cache is based on the LRU algorithm (LRUCache) and keeps objects containing uncompressed data, such as Block, ParsedFullFilterBlock etc. It allows the user to configure a second level cache (rocksdb::SecondaryCache) to extend the primary block cache by holding items evicted from it. Some of the major RocksDB users, like MyRocks, use direct IO and would like to use a primary block cache for uncompressed data and a secondary cache for compressed data. The latter allows us to mitigate the loss of the Linux page cache due to direct IO. This PR includes a concrete implementation of rocksdb::SecondaryCache that integrates with compression libraries such as LZ4 and implements an LRU cache to hold compressed blocks. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9518 Test Plan: In this PR, the lru_secondary_cache_test.cc includes the following tests: 1. The unit tests for the secondary cache with either compression or no compression, such as basic tests, fails tests. 2. The integration tests with both primary cache and this secondary cache . **Follow Up:** 1. Statistics (e.g. compression ratio) will be added in another PR. 2. Once this implementation is ready, I will do some shadow testing and benchmarking with UDB to measure the impact. Reviewed By: anand1976 Differential Revision: D34430930 Pulled By: gitbw95 fbshipit-source-id: 218d78b672a2f914856d8a90ff32f2f5b5043ded
3 years ago
}
} // namespace ROCKSDB_NAMESPACE