Summary:
Previously the only way to increment a handle's refcount was to invoke Lookup(), which (1) did hash table lookup to get cache handle, (2) incremented that handle's refcount. For a future DeleteRange optimization, I added a function, Ref(), for when the caller already has a cache handle and only needs to do (2).
Closes https://github.com/facebook/rocksdb/pull/1761
Differential Revision: D4397114
Pulled By: ajkr
fbshipit-source-id: 9addbe5
Summary:
Improve cache options logging to info log.
Also print the value of
cache_index_and_filter_blocks_with_high_priority.
Closes https://github.com/facebook/rocksdb/pull/1709
Differential Revision: D4358776
Pulled By: yiwu-arbug
fbshipit-source-id: 8f030a0
Summary:
We used to allow insert into full block cache as long as `strict_capacity_limit=false`. This diff further restrict insert to full cache if caller don't intent to hold handle to the cache entry after insert.
Hope this diff fix the assertion failure with db_stress: https://our.intern.facebook.com/intern/sandcastle/log/?instance_id=211853102&step_id=2475070014
db_stress: util/lru_cache.cc:278: virtual void rocksdb::LRUCacheShard::Release(rocksdb::Cache::Handle*): Assertion `lru_.next == &lru_' failed.
The assertion at lru_cache.cc:278 can fail when an entry is inserted into full cache and stay in LRU list.
Test Plan:
make all check
Reviewers: IslamAbdelRahman, lightmark, sdong
Reviewed By: sdong
Subscribers: andrewkr, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D62325
Summary:
Add option to block based table to insert index/filter blocks to block cache with priority. Combined with LRUCache with high_pri_pool_ratio, we can reserved space for index/filter blocks, make them less likely to be evicted.
Depends on D61977.
Test Plan: See unit test.
Reviewers: lightmark, IslamAbdelRahman, sdong
Reviewed By: sdong
Subscribers: andrewkr, dhruba, march, leveldb
Differential Revision: https://reviews.facebook.net/D62241
Summary:
Add mid-point insertion functionality to LRU cache. Caller of `Cache::Insert()` can set an additional parameter to make a cache entry have higher priority. The LRU cache will reserve at most `capacity * high_pri_pool_pct` bytes for high-pri cache entries. If `high_pri_pool_pct` is zero, the cache degenerates to normal LRU cache.
Context: If we are to put index and filter blocks into RocksDB block cache, index/filter block can be swap out too early. We want to add an option to RocksDB to reserve some capacity in block cache just for index/filter blocks, to mitigate the issue.
In later diffs I'll update block based table reader to use the interface to cache index/filter blocks at high priority, and expose the option to `DBOptions` and make it dynamic changeable.
Test Plan: unit test.
Reviewers: IslamAbdelRahman, sdong, lightmark
Reviewed By: lightmark
Subscribers: andrewkr, dhruba, march, leveldb
Differential Revision: https://reviews.facebook.net/D61977
Summary: ... so that I can include the header and create LRUCache specific tests for D61977
Test Plan:
make check
Reviewers: lightmark, IslamAbdelRahman, sdong
Reviewed By: sdong
Subscribers: andrewkr, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D62145
Summary: preparation for detecting Cache type. If SimCache, we then may trigger some command like "setSimCapacity()" with setOptions()
Test Plan: make all check
Reviewers: yiwu, sdong
Reviewed By: sdong
Subscribers: andrewkr, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D61953
Summary: Refactor cache.cc so that I can plugin clock cache (D55581). Mainly move `ShardedCache` to separate file, move `LRUHandle` back to cache.cc and rename it lru_cache.cc.
Test Plan:
make check -j64
Reviewers: lightmark, sdong
Reviewed By: sdong
Subscribers: andrewkr, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D59655
Summary: Cache shard bit 4 is sometimes too small and 6 is a more common value picked by users. Make that default. It shouldn't hurt much to change options.max_file_opening_threads default to be 16, which will reduce the worst case DB open time.
Test Plan: Run all existing tests.
Reviewers: IslamAbdelRahman, yhchiang, kradhakrishnan, igor, andrewkr
Reviewed By: andrewkr
Subscribers: andrewkr, MarkCallaghan, leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D55047
Summary:
When a block based table file is opened, if prefetch_index_and_filter is true, it will prefetch the index and filter blocks, putting them into the block cache.
What this feature adds: when a L0 block based table file is opened, if pin_l0_filter_and_index_blocks_in_cache is true in the options (and prefetch_index_and_filter is true), then the filter and index blocks aren't released back to the block cache at the end of BlockBasedTableReader::Open(). Instead the table reader takes ownership of them, hence pinning them, ie. the LRU cache will never push them out. Meanwhile in the table reader, further accesses will not hit the block cache, thus avoiding lock contention.
Test Plan:
'export TEST_TMPDIR=/dev/shm/ && DISABLE_JEMALLOC=1 OPT=-g make all valgrind_check -j32' is OK.
I didn't run the Java tests, I don't have Java set up on my devserver.
Reviewers: sdong
Reviewed By: sdong
Subscribers: andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D56133
Summary:
Cache to have an option to fail Cache::Insert() when full. Update call sites to check status and handle error.
I totally have no idea what's correct behavior of all the call sites when they encounter error. Please let me know if you see something wrong or more unit test is needed.
Test Plan: make check -j32, see tests pass.
Reviewers: anthony, yhchiang, andrewkr, IslamAbdelRahman, kradhakrishnan, sdong
Reviewed By: sdong
Subscribers: andrewkr, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D54705
Summary: Add 2 new counters BLOCK_CACHE_BYTES_WRITE, BLOCK_CACHE_BYTES_READ to keep track of how many bytes were written to the cache and how many bytes that we read from cache
Test Plan: make check
Reviewers: sdong, yhchiang, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D48195
Summary:
Add the funcion Cache.GetPinnedUsage() to return the memory size of entries
that are in use by the system (that is, all the entries not in the LRU list).
Test Plan:
Run ./cache_test and examine PinnedUsageTest.
Reviewers: tnovak, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D40305
Summary: Added keyword override for SetCapacity()
Test Plan: Fixes build
Reviewers: igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D37647
Summary:
When new capacity is larger than existing capacity, simply update the capacity to the new valie
When new capacity is less than existing capacity, but more than the usage, simply update the capacity to new value
When new capacity is less than the existing capacity and existing usage both, try to purge entries in LRU if feasible to make usage < capacity
Test Plan: Created unit tests in cache_test.cc
Reviewers: sdong, rven, yhchiang, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D37527
Summary: It is no longer used by the implementation, so we should also remove it from the public API.
Test Plan: make check
Reviewers: sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D34971
Summary:
When using latest clang (3.6 or 3.7/trunck) rocksdb is failing with many errors. Almost all of them are missing override errors. This diff adds missing override keyword. No manual changes.
Prerequisites: bear and clang 3.5 build with extra tools
```lang=bash
% USE_CLANG=1 bear make all # generate a compilation database http://clang.llvm.org/docs/JSONCompilationDatabase.html
% clang-modernize -p . -include . -add-override
% make format
```
Test Plan:
Make sure all tests are passing.
```lang=bash
% #Use default fb code clang.
% make check
```
Verify less error and no missing override errors.
```lang=bash
% # Have trunk clang present in path.
% ROCKSDB_NO_FBCODE=1 CC=clang CXX=clang++ make
```
Reviewers: igor, kradhakrishnan, rven, meyering, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D34077
Summary:
Currently, blocks which have more than one reference (ie referenced by something other than cache itself) are evicted from cache. This doesn't make much sense:
- blocks are still in RAM, so the RAM usage reported by the cache is incorrect
- if the same block is needed by another iterator, it will be loaded and decompressed again
This diff changes the reference counting scheme a bit. Previously, if the cache contained the block, this was accounted for in its refcount. After this change, the refcount is only used to track external references. There is a boolean flag which indicates whether or not the block is contained in the cache.
This diff also changes how LRU list is used. Previously, both hashtable and the LRU list contained all blocks. After this change, the LRU list contains blocks with the refcount==0, ie those which can be evicted from the cache.
Note that this change still allows for cache to grow beyond its capacity. This happens when all blocks are pinned (ie refcount>0). This is consistent with the current behavior. The cache's insert function never fails. I spent lots of time trying to make table_reader and other places work with the insert which might failed. It turned out to be pretty hard. It might really destabilize some customers, so finally, I decided against doing this.
table_cache_remove_scan_count_limit option will be unneeded after this change, but I will remove it in the following diff, if this one gets approved
Test Plan: Ran tests, made sure they pass
Reviewers: sdong, ljin
Differential Revision: https://reviews.facebook.net/D25503
Summary:
We need to turn on -Wshorten-64-to-32 for mobile. See D1671432 (internal phabricator) for details.
This diff turns on the warning flag and fixes all the errors. There were also some interesting errors that I might call bugs, especially in plain table. Going forward, I think it makes sense to have this flag turned on and be very very careful when converting 64-bit to 32-bit variables.
Test Plan: compiles
Reviewers: ljin, rven, yhchiang, sdong
Reviewed By: yhchiang
Subscribers: bobbaldwin, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D28689
Summary:
...and fix all the errors :)
Jim suggested turning on -Wshadow because it helped him fix number of critical bugs in fbcode. I think it's a good idea to be -Wshadow clean.
Test Plan: compiles
Reviewers: yhchiang, rven, sdong, ljin
Reviewed By: ljin
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D27711
Summary: Added a method that executes a callback on every cache entry.
Test Plan: added a unit test
Reviewers: haobo
Reviewed By: haobo
CC: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D18441
Summary: On a shutdown, freeing memory takes a long time. If we're shutting down, we don't really care about memory leaks. I added a call to Cache that will avoid freeing all objects in cache.
Test Plan:
I created a script to test the speedup and demonstrate how to use the call: https://phabricator.fb.com/P3864368
Clean shutdown took 7.2 seconds, while fast and dirty one took 6.3 seconds. Unfortunately, the speedup is not that big, but should be bigger with bigger block_cache. I have set up the capacity to 80GB, but the script filled up only ~7GB.
Reviewers: dhruba, haobo, MarkCallaghan, xjin
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15069
Summary: this diff only replace the cases when we need to frequently create vector with small amount of entries. This diff doesn't aim to improve performance of a specific area, but more like a small scale test for the autovector and see how it works in real life.
Test Plan:
make check
I also ran the performance tests, however there is no performance gain/loss. All performance numbers are pretty much the same before/after the change.
Reviewers: dhruba, haobo, sdong, igor
CC: leveldb
Differential Revision: https://reviews.facebook.net/D14985
Summary:
By disassemble the function, we found that the atomic variables do invoke the `lock` that locks the memory bus.
As a tradeoff, we protect the GetUsage by mutex and leave usage_ as plain size_t.
Test Plan: passed `cache_test`
Reviewers: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D14667
Summary: This diff will help us to figure out the memory usage for the cache part.
Test Plan: added a new memory usage test for cache
Reviewers: haobo, sdong, dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D14559
Summary:
I have ran a get benchmark where all the data is in the cache and observed that most of the time is spent on waiting for lock in LRUCache.
This is an effort to optimize LRUCache.
Test Plan:
The data was loaded with fillseq. Then, I ran a benchmark:
/db_bench --db=/tmp/rocksdb_stat_bench --num=1000000 --benchmarks=readrandom --statistics=1 --use_existing_db=1 --threads=16 --disable_seek_compaction=1 --cache_size=20000000000 --cache_numshardbits=8 --table_cache_numshardbits=8
I ran the benchmark three times. Here are the results:
AFTER THE PATCH: 798072, 803998, 811807
BEFORE THE PATCH: 782008, 815593, 763017
Reviewers: dhruba, haobo, kailiu
Reviewed By: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D14571
Summary: This will allow us to access constant via `DB::GetOptions().table_cache.GetCapacity()` or `DB::GetOptions().block_cache.GetCapacity()` since GetOptions() is also constant method.
Summary: The work to make sure mac os compiles rocksdb is not completed yet. But at least we can start cleaning some warnings captured only by g++ from mac os..
Test Plan: ran make in mac os
Reviewers: dhruba
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D14049
Summary:
With this patch, when LRUCache.Insert() is called and the cache is full, it will first try to free up entries whose reference counter is 1 (would become 0 after remo\
ving from the cache). We do it in two passes, in the first pass, we only try to release those unreferenced entries. If we cannot free enough space after traversing t\
he first remove_scan_cnt_ entries, we start from the beginning again and remove those entries being used.
Test Plan: add two unit tests to cover the codes
Reviewers: dhruba, haobo, emayanke
Reviewed By: emayanke
CC: leveldb, emayanke, xjin
Differential Revision: https://reviews.facebook.net/D13377
Summary: Split Unref into two parts -> cheap and expensive. Try to call expensive Unref outside of critical section to decrease lock contention.
Test Plan: unittests
Reviewers: dhruba, haobo
Reviewed By: dhruba
CC: leveldb, kailiu
Differential Revision: https://reviews.facebook.net/D13299
Summary:
Change namespace from leveldb to rocksdb. This allows a single
application to link in open-source leveldb code as well as
rocksdb code into the same process.
Test Plan: compile rocksdb
Reviewers: emayanke
Reviewed By: emayanke
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13287
Summary: Replace include/leveldb with include/rocksdb.
Test Plan:
make clean; make check
make clean; make release
Differential Revision: https://reviews.facebook.net/D12489
Summary:
1. The stock LRUCache nukes itself whenever the working set (the total number of entries not released by client at a certain time) is bigger than the cache capacity.
See https://our.dev.facebook.com/intern/tasks/?t=2252281
2. There's a bug in shard calculation leading to segmentation fault when only one shard is needed.
Test Plan: make check
Reviewers: dhruba, heyongqiang
Reviewed By: heyongqiang
CC: leveldb, zshao, sheki
Differential Revision: https://reviews.facebook.net/D9927
Summary:
Replace manual memory management with std::unique_ptr in a
number of places; not exhaustive, but this fixes a few leaks with file
handles as well as clarifies semantics of the ownership of file handles
with log classes.
Test Plan: db_stress, make check
Reviewers: dhruba
Reviewed By: dhruba
CC: zshao, leveldb, heyongqiang
Differential Revision: https://reviews.facebook.net/D8043
Summary: `~ShardedLRUCache()` was empty despite `init()` allocating memory on the heap. Fixed the leak by freeing memory allocated by `init()`.
Test Plan:
make check
Ran valgrind on db_test before and after patch and saw leaked memory went down
Reviewers: vamsi, dhruba, emayanke, sheki
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D7791
Summary:
The default compilation process now uses "-Wall" to compile.
Fix all compilation error generated by gcc.
Test Plan: make all check
Reviewers: heyongqiang, emayanke, sheki
Reviewed By: heyongqiang
CC: MarkCallaghan
Differential Revision: https://reviews.facebook.net/D6525
Summary: Print the block cache size in the LOG.
Test Plan: run db_bench and look at LOG. This is helpful while I was debugging one use-case.
Reviewers: heyongqiang, MarkCallaghan
Reviewed By: heyongqiang
Differential Revision: https://reviews.facebook.net/D5739
Summary:
The numbers of shards that the block cache is divided into is
configurable. However, if the user specifies that he/she wants
the block cache to be divided into more than 2**20 pieces, then
the system will rey to allocate a huge array of that size) that
could fail.
It is better to limit the sharding of the block cache to an
upper bound. The default sharding is 16 shards (i.e. 2**4)
and the maximum is now 2 million shards (i.e. 2**20).
Also, fixed a bug with the LRUCache where the numShardBits
should be a private member of the LRUCache object rather than
a static variable.
Test Plan:
run db_bench with --cache_numshardbits=64.
Task ID: #
Blame Rev:
Reviewers: heyongqiang
Reviewed By: heyongqiang
Differential Revision: https://reviews.facebook.net/D5013
- Replace raw slice comparison with a call to user comparator.
Added test for custom comparators.
- Fix end of namespace comments.
- Fixed bug in picking inputs for a level-0 compaction.
When finding overlapping files, the covered range may expand
as files are added to the input set. We now correctly expand
the range when this happens instead of continuing to use the
old range. For example, suppose L0 contains files with the
following ranges:
F1: a .. d
F2: c .. g
F3: f .. j
and the initial compaction target is F3. We used to search
for range f..j which yielded {F2,F3}. However we now expand
the range as soon as another file is added. In this case,
when F2 is added, we expand the range to c..j and restart the
search. That picks up file F1 as well.
This change fixes a bug related to deleted keys showing up
incorrectly after a compaction as described in Issue 44.
(Sync with upstream @25072954)
Fix GCC -Wshadow warnings in LevelDB's public header files,
reported by Dustin.
Add in-memory Env implementation (helpers/memenv/*).
This enables users to create LevelDB databases in-memory.
Initialize ShardedLRUCache::last_id_ to zero.
This fixes a Valgrind warning.
(Also delete port/sha1_* which were removed upstream some time ago.)
- Fix for issue 33 (non-null-terminated result from
leveldb_property_value())
- Support for running multiple instances of a benchmark in parallel.
- Reduce lock contention on Get():
(1) Do not hold the lock while searching memtables.
(2) Shard block and table caches 16-ways.
Benchmark for evaluating this change:
$ db_bench --benchmarks=fillseq1,readrandom --threads=$n
(fillseq1 is a small hack to make sure fillseq runs once regardless
of number of threads specified on the command line).
git-svn-id: https://leveldb.googlecode.com/svn/trunk@49 62dab493-f737-651d-591e-8d6aee1b9529