rocksdb

Commit Graph

Author	SHA1	Message	Date
Nathan Bronson	9a9d4759b2	InlineSkipList part 3/3 - new skiplist type that colocates key and node Summary: This diff completes the creation of InlineSkipList<Cmp>, which is like SkipList<const char, Cmp> but it always allocates the key contiguously with the node. This allows us to remove the pointer from the node to the key. As a result the memory usage of the skip list is reduced (by 1 to sizeof(void) bytes depending on the padding required to align the key storage), cache locality is improved, and we halve the number of calls to the allocator. For skip lists whose keys are freshly-allocated const char*, InlineSkipList is stricly preferrable to SkipList. This diff doesn't replace SkipList, however, because some of the use cases of SkipList in RocksDB are either character sequences that are not allocated at the same time as the skip list node allocation (for example hash_linklist_rep) or have different key types (for example write_batch_with_index). Taking advantage of inline allocation for those cases is left to future work. The perf win is biggest for small values. For single-threaded CPU-bound (32M fillrandom operations with no WAL log) with 16 byte keys and 0 byte values, the db_bench perf goes from ~310k ops/sec to ~410k ops/sec. For large values the improvement is less pronounced, but seems to be between 5% and 10% on the same configuration. Test Plan: make check Reviewers: igor, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D51123	9 years ago
Nathan Bronson	5201729545	InlineSkipList - part 2/3 Summary: This diff is 2/3 in a sequence that introduces a skip list optimized for a key that is a freshly-allocated const char*. The change is broken into pieces to make it easier to review. This piece removes the Key template type, introduces the AllocateKey interface, and changes the unit test from using uint64_t as the Key type to using pointers to an 8 byte blob. Test Plan: unit test Reviewers: igor, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D51285	9 years ago
Nathan Bronson	78812ec6bf	InlineSkipList - part 1/3 Summary: This diff is 1/3 in a sequence that introduces a skip list optimized for a key that is a freshly-allocated const char*. The diff is broken into pieces to make it easier to review. This piece only introduces the new type by copying the existing SkipList, with mechanical naming changes and reformatting. Test Plan: new unit test Reviewers: igor, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D51279	9 years ago
Nathan Bronson	b81b430987	Switch to thread-local random for skiplist Summary: Using a TLS random instance for skiplist makes it smaller (useful for hash_skiplist_rep) and prepares skiplist for concurrent adds. This diff also modifies the branching factor math to avoid an unnecessary division. This diff has the effect of changing the sequence of skip list node height choices made by tests, so it has the potential to cause unit test failures for tests that implicitly rely on the exact structure of the skip list. Tests that try to exactly trigger a compaction are likely suspects for this problem (these tests have always been brittle to changes in the skiplist details). I've minimizes this risk by reseeding the main thread's Random at the beginning of each test, increasing the universal compaction size_ratio limit from 101% to 105% for some tests, and verifying that the tests pass many times. Test Plan: for i in `seq 0 9`; do make check; done Reviewers: sdong, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D50439	9 years ago
Nathan Bronson	1ae27113c7	reduce comparisons by skiplist Summary: Key comparison is the single largest CPU user for CPU-bound workloads. This diff reduces the number of comparisons in two ways. The first is that it moves predecessor array gathering from FindGreaterOrEqual to FindLessThan, so that FindGreaterOrEqual can return immediately if compare_ returns 0. As part of this change I moved the sequential insertion optimization into Insert, to remove the undocumented (and smelly) requirement that prev must be equal to prev_ if it is non-null. The second optimization is that all of the search functions skip calling compare_ when moving to a lower level that has the same Next pointer. With a branching factor of 4 we would expect this to happen 1/4 of the time. On a single-threaded CPU-bound workload (-benchmarks=fillrandom -threads=1 -batch_size=1 -memtablerep=skip_list -value_size=0 --num=1600000 -level0_slowdown_writes_trigger=9999 -level0_stop_writes_trigger=9999 -disable_auto_compactions --max_write_buffer_number=8 -max_background_flushes=8 --disable_wal --write_buffer_size=160000000) on my dev server this is good for a 7% perf win. Test Plan: unit tests Reviewers: rven, ljin, yhchiang, sdong, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D43233	9 years ago
sdong	40f562e747	Allow GetApproximateSize() to include mem table size if it is skip list memtable Summary: Add an option in GetApproximateSize() so that the result will include estimated sizes in mem tables. To implement it, implement an estimated count from the beginning to a key in skip list. The approach is to count to find the entry, how many Next() is issued from each level, and sum them with a weight that is <branching factor> ^ <level>. Test Plan: Add a test case Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D40119	10 years ago
Jonah Cohen	a14b7873ee	Enforce write buffer memory limit across column families Summary: Introduces a new class for managing write buffer memory across column families. We supplement ColumnFamilyOptions::write_buffer_size with ColumnFamilyOptions::write_buffer, a shared pointer to a WriteBuffer instance that enforces memory limits before flushing out to disk. Test Plan: Added SharedWriteBuffer unit test to db_test.cc Reviewers: sdong, rven, ljin, igor Reviewed By: igor Subscribers: tnovak, yhchiang, dhruba, xjin, MarkCallaghan, yoshinorim Differential Revision: https://reviews.facebook.net/D22581	10 years ago
Igor Canadi	7c303f0e78	Include atomic	10 years ago
Igor Canadi	48842ab316	Deprecate AtomicPointer Summary: RocksDB already depends on C++11, so we might as well all the goodness that C++11 provides. This means that we don't need AtomicPointer anymore. The less things in port/, the easier it will be to port to other platforms. Test Plan: make check + careful visual review verifying that NoBarried got memory_order_relaxed, while Acquire/Release methods got memory_order_acquire and memory_order_release Reviewers: rven, yhchiang, ljin, sdong Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D27543	10 years ago
Lei Jin	8d007b4aaf	Consolidate SliceTransform object ownership Summary: (1) Fix SanitizeOptions() to also check HashLinkList. The current dynamic case just happens to work because the 2 classes have the same layout. (2) Do not delete SliceTransform object in HashSkipListFactory and HashLinkListFactory destructor. Reason: SanitizeOptions() enforces prefix_extractor and SliceTransform to be the same object when HashFactory is used. This makes the behavior strange: when HashFactory is used, prefix_extractor will be released by RocksDB. If other memtable factory is used, prefix_extractor should be released by user. Test Plan: db_bench && make asan_check Reviewers: haobo, igor, sdong Reviewed By: igor CC: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D16587	11 years ago
Siying Dong	33042669f6	Reduce malloc of iterators in Get() code paths Summary: This patch optimized Get() code paths by avoiding malloc of iterators. Iterator creation is moved to mem table rep implementations, where a callback is called when any key is found. This is the same practice as what we do in (SST) table readers. db_bench result for readrandom following a writeseq, with no compression, single thread and tmpfs, we see throughput improved to 144958 from 139027, about 3%. Test Plan: make all check Reviewers: dhruba, haobo, igor Reviewed By: haobo CC: leveldb, yhchiang Differential Revision: https://reviews.facebook.net/D14685	11 years ago
kailiu	4e0298f23c	Clean up arena API Summary: Easy thing goes first. This patch moves arena to internal dir; based on which, the coming patch will deal with memtable_rep. Test Plan: make check Reviewers: haobo, sdong, dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D15615	11 years ago
Siying Dong	8477255da3	Moving Some includes from options.h to forward declaration Summary: By removing some includes form options.h and reply on forward declaration, we can more easily reason the dependencies. Test Plan: make all check Reviewers: kailiu, haobo, igor, dhruba Reviewed By: kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D15411	11 years ago
Haobo Xu	4e6463ea44	[RocksDB][Performance Branch] Make height and branching factor configurable for skiplist implementation Summary: As title. Especially, HashSkipListRepFactory will be able to specify a relatively small height, to reduce the memory overhead of one skiplist per bucket. Test Plan: make check and test it on leaf4 Reviewers: dhruba, sdong, kailiu CC: reconnect.grayhat, leveldb Differential Revision: https://reviews.facebook.net/D14307	11 years ago
Igor Canadi	8b3379dc0a	Implementing DynamicIterator for TransformRepNoLock Summary: What @haobo done with TransformRep, now in TransformRepNoLock. Similar implementation, except that I made DynamicIterator a subclass of Iterator which makes me have less iterator initializations. Test Plan: ./prefix_test. Seeing huge savings vs. TransformRep again! Reviewers: dhruba, haobo, sdong, kailiu Reviewed By: haobo CC: leveldb, haobo Differential Revision: https://reviews.facebook.net/D13953	11 years ago
Dhruba Borthakur	9cd221094c	Add appropriate LICENSE and Copyright message. Summary: Add appropriate LICENSE and Copyright message. Test Plan: make check Reviewers: CC: Task ID: # Blame Rev:	11 years ago
Dhruba Borthakur	4463b11cad	Migrate names of properties from 'leveldb' prefix to 'rocksdb' prefix. Summary: Migrate names of properties from 'leveldb' prefix to 'rocksdb' prefix. Test Plan: make check Reviewers: emayanke, haobo Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D13311	11 years ago
Dhruba Borthakur	a143ef9b38	Change namespace from leveldb to rocksdb Summary: Change namespace from leveldb to rocksdb. This allows a single application to link in open-source leveldb code as well as rocksdb code into the same process. Test Plan: compile rocksdb Reviewers: emayanke Reviewed By: emayanke CC: leveldb Differential Revision: https://reviews.facebook.net/D13287	11 years ago
Haobo Xu	08740b15a4	[RocksDB] Fix skiplist sequential insertion optimization Summary: The original optimization missed updating links other than the lowest level. Test Plan: make check; perf_context_test Reviewers: dhruba Reviewed By: dhruba CC: leveldb, adsharma Differential Revision: https://reviews.facebook.net/D13119	11 years ago
Xing Jin	0f0a24e298	Make arena block size configurable Summary: Add an option for arena block size, default value 4096 bytes. Arena will allocate blocks with such size. I am not sure about passing parameter to skiplist in the new virtualized framework, though I talked to Jim a bit. So add Jim as reviewer. Test Plan: new unit test, I am running db_test. For passing paramter from configured option to Arena, I tried tests like: TEST(DBTest, Arena_Option) { std::string dbname = test::TmpDir() + "/db_arena_option_test"; DestroyDB(dbname, Options()); DB* db = nullptr; Options opts; opts.create_if_missing = true; opts.arena_block_size = 1000000; // tested 99, 999999 Status s = DB::Open(opts, dbname, &db); db->Put(WriteOptions(), "a", "123"); } and printed some debug info. The results look good. Any suggestion for such a unit-test? Reviewers: haobo, dhruba, emayanke, jpaton Reviewed By: dhruba CC: leveldb, zshao Differential Revision: https://reviews.facebook.net/D11799	11 years ago
Abhishek Kona	c41f1e995c	Codemod NULL to nullptr Summary: scripted NULL to nullptr in * include/leveldb/ * db/ * table/ * util/ Test Plan: make all check Reviewers: dhruba, emayanke Reviewed By: emayanke CC: leveldb Differential Revision: https://reviews.facebook.net/D9003	12 years ago
Dhruba Borthakur	1ca0584345	This is the mega-patch multi-threaded compaction published in https://reviews.facebook.net/D5997. Summary: This patch allows compaction to occur in multiple background threads concurrently. If a manual compaction is issued, the system falls back to a single-compaction-thread model. This is done to ensure correctess and simplicity of code. When the manual compaction is finished, the system resumes its concurrent-compaction mode automatically. The updates to the manifest are done via group-commit approach. Test Plan: run db_bench	12 years ago
Arun Sharma	90b2924fb2	skiplist: optimize for sequential insert pattern Summary: skiplist doesn't cache the location of the last insert and becomes CPU bound when the input data has sequential keys. Notes on thread safety: ::Insert() already requires external synchronization. So this change is not making it any worse. Test Plan: skiplist_test Reviewers: dhruba Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D3129	13 years ago
Sanjay Ghemawat	bc1ee4d25e	build shared libraries; updated version to 1.3; add Status accessors	13 years ago
Hans Wennborg	36a5f8ed7f	A number of fixes: - Replace raw slice comparison with a call to user comparator. Added test for custom comparators. - Fix end of namespace comments. - Fixed bug in picking inputs for a level-0 compaction. When finding overlapping files, the covered range may expand as files are added to the input set. We now correctly expand the range when this happens instead of continuing to use the old range. For example, suppose L0 contains files with the following ranges: F1: a .. d F2: c .. g F3: f .. j and the initial compaction target is F3. We used to search for range f..j which yielded {F2,F3}. However we now expand the range as soon as another file is added. In this case, when F2 is added, we expand the range to c..j and restart the search. That picks up file F1 as well. This change fixes a bug related to deleted keys showing up incorrectly after a compaction as described in Issue 44. (Sync with upstream @25072954)	13 years ago
dgrogan@chromium.org	69c6d38342	reverting disastrous MOE commit, returning to r21 git-svn-id: https://leveldb.googlecode.com/svn/trunk@23 62dab493-f737-651d-591e-8d6aee1b9529	14 years ago
dgrogan@chromium.org	b743906eea	Revision created by MOE tool push_codebase.	14 years ago
dgrogan@chromium.org	b409afe968	chmod a-x git-svn-id: https://leveldb.googlecode.com/svn/trunk@21 62dab493-f737-651d-591e-8d6aee1b9529	14 years ago
dgrogan@chromium.org	f779e7a5d8	@20602303 . Default file permission is now 755. git-svn-id: https://leveldb.googlecode.com/svn/trunk@20 62dab493-f737-651d-591e-8d6aee1b9529	14 years ago
jorlow@chromium.org	f67e15e50f	Initial checkin. git-svn-id: https://leveldb.googlecode.com/svn/trunk@2 62dab493-f737-651d-591e-8d6aee1b9529	14 years ago

3 Commits (0ad68518bb2e04707c06b6c87535b21ae44b6ced)