rocksdb

Commit Graph

Author	SHA1	Message	Date
Igor Canadi	511b03a5b5	LogAndApply to take ColumnFamilyData Summary: This removes the default implementation of LogAndApply that applied the changed to the default column family by default. It is mostly simple reformatting. Test Plan: make check Reviewers: dhruba, kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D15465	11 years ago
Igor Canadi	eb055609e4	[column families] Move memtable and immutable memtable list to column family data Summary: All memtables and immutable memtables are moved from DBImpl to ColumnFamilyData. For now, they are all referenced from default column family in DBImpl. It shouldn't be hard to get them from custom column family. Test Plan: make check Reviewers: dhruba, kailiu, sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D15459	11 years ago
Igor Canadi	832158e7f7	Fsync directory after we create a new file Summary: @dhruba, I'm not sure where we need to sync the directory. I implemented the function in Env() and added the dir sync just after we close the newly created file in the builder. Should I also add FsyncDir() to new files that get created by a compaction? Test Plan: Confirmed that FsyncDir is returning Status::OK() Reviewers: dhruba, haobo Reviewed By: dhruba CC: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D14751	11 years ago
Igor Canadi	e55b3c040c	Fixing ref-counting memtables	11 years ago
Igor Canadi	983fafa56c	Fix memory leak	11 years ago
Igor Canadi	c583157d49	MemTableListVersion Summary: MemTableListVersion is to MemTableList what Version is to VersionSet. I took almost the same ideas to develop MemTableListVersion. The reason is to have copying std::list done in background, while flushing, rather than in foreground (MultiGet() and NewIterator()) under a mutex! Also, whenever we copied MemTableList, we copied also some MemTableList metadata (flush_requested_, commit_in_progress_, etc.), which was wasteful. This diff avoids std::list copy under a mutex in both MultiGet() and NewIterator(). I created a small database with some number of immutable memtables, and creating 100.000 iterators in a single-thread (!) decreased from {188739, 215703, 198028} to {154352, 164035, 159817}. A lot of the savings come from code under a mutex, so we should see much higher savings with multiple threads. Creating new iterator is very important to LogDevice team. I also think this diff will make SuperVersion obsolete for performance reasons. I will try it in the next diff. SuperVersion gave us huge savings on Get() code path, but I think that most of the savings came from copying MemTableList under a mutex. If we had MemTableListVersion, we would never need to copy the entire object (like we still do in NewIterator() and MultiGet()) Test Plan: `make check` works. I will also do `make valgrind_check` before commit Reviewers: dhruba, haobo, kailiu, sdong, emayanke, tnovak Reviewed By: kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D15255	11 years ago
Siying Dong	a8029fdc75	Introduce MergeContext to Lazily Initialize merge operand list Summary: In get operations, merge_operands is only used in few cases. Lazily initialize it can reduce average latency in some cases Test Plan: make all check Reviewers: haobo, kailiu, dhruba Reviewed By: haobo CC: igor, nkg-, leveldb Differential Revision: https://reviews.facebook.net/D14415 Conflicts: db/db_impl.cc db/memtable.cc	11 years ago
Dhruba Borthakur	98968ba937	Free obsolete memtables outside the dbmutex had a memory leak. Summary: The commit at `27bbef1180` had a memory leak that was detected by valgrind. The memtable that has a refcount decrement in MemTableList::InstallMemtableFlushResults was not freed. Test Plan: valgrind ./db_test --leak-check=full Reviewers: igor Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D14391	11 years ago
Dhruba Borthakur	27bbef1180	Free obsolete memtables outside the dbmutex. Summary: Large memory allocations and frees are costly and best done outside the db-mutex. The memtables are already allocated outside the db-mutex but they were being freed while holding the db-mutex. This patch frees obsolete memtables outside the db-mutex. Test Plan: make check db_stress Unit tests pass, I am in the process of running stress tests. Reviewers: haobo, igor, emayanke Reviewed By: haobo CC: reconnect.grayhat, leveldb Differential Revision: https://reviews.facebook.net/D14319	11 years ago
Kai Liu	c17607a251	Fix the log number bug when updating MANIFEST file Summary: Crash may occur during the flushes of more than two mem tables. As the info log suggested, even when both were successfully flushed, the recovery process still pick up one of the memtable's log for recovery. This diff fix the problem by setting the correct "log number" in MANIFEST. Test Plan: make test; deployed to leaf4 and make sure it doesn't result in crashes of this type. Reviewers: haobo, dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D13659	11 years ago
Dhruba Borthakur	9cd221094c	Add appropriate LICENSE and Copyright message. Summary: Add appropriate LICENSE and Copyright message. Test Plan: make check Reviewers: CC: Task ID: # Blame Rev:	11 years ago
Dhruba Borthakur	4463b11cad	Migrate names of properties from 'leveldb' prefix to 'rocksdb' prefix. Summary: Migrate names of properties from 'leveldb' prefix to 'rocksdb' prefix. Test Plan: make check Reviewers: emayanke, haobo Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D13311	11 years ago
Dhruba Borthakur	a143ef9b38	Change namespace from leveldb to rocksdb Summary: Change namespace from leveldb to rocksdb. This allows a single application to link in open-source leveldb code as well as rocksdb code into the same process. Test Plan: compile rocksdb Reviewers: emayanke Reviewed By: emayanke CC: leveldb Differential Revision: https://reviews.facebook.net/D13287	11 years ago
Dhruba Borthakur	32c965d417	Flush was hanging because the configured options specified that more than 1 memtable need to be merged. Summary: There is an config option called Options.min_write_buffer_number_to_merge that specifies the minimum number of write buffers to merge in memory before flushing to a file in L0. But in the the case when the db is being closed, we should not be using this config, instead we should flush whatever write buffers were available at that time. Test Plan: Unit test attached. Reviewers: haobo, emayanke Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D12717	11 years ago
Dhruba Borthakur	1186192ed1	Replace include/leveldb with include/rocksdb. Summary: Replace include/leveldb with include/rocksdb. Test Plan: make clean; make check make clean; make release Differential Revision: https://reviews.facebook.net/D12489	11 years ago
Deon Nicholas	c2d7826ced	[RocksDB] [MergeOperator] The new Merge Interface! Uses merge sequences. Summary: Here are the major changes to the Merge Interface. It has been expanded to handle cases where the MergeOperator is not associative. It does so by stacking up merge operations while scanning through the key history (i.e.: during Get() or Compaction), until a valid Put/Delete/end-of-history is encountered; it then applies all of the merge operations in the correct sequence starting with the base/sentinel value. I have also introduced an "AssociativeMerge" function which allows the user to take advantage of associative merge operations (such as in the case of counters). The implementation will always attempt to merge the operations/operands themselves together when they are encountered, and will resort to the "stacking" method if and only if the "associative-merge" fails. This implementation is conjectured to allow MergeOperator to handle the general case, while still providing the user with the ability to take advantage of certain efficiencies in their own merge-operator / data-structure. NOTE: This is a preliminary diff. This must still go through a lot of review, revision, and testing. Feedback welcome! Test Plan: -This is a preliminary diff. I have only just begun testing/debugging it. -I will be testing this with the existing MergeOperator use-cases and unit-tests (counters, string-append, and redis-lists) -I will be "desk-checking" and walking through the code with the help gdb. -I will find a way of stress-testing the new interface / implementation using db_bench, db_test, merge_test, and/or db_stress. -I will ensure that my tests cover all cases: Get-Memtable, Get-Immutable-Memtable, Get-from-Disk, Iterator-Range-Scan, Flush-Memtable-to-L0, Compaction-L0-L1, Compaction-Ln-L(n+1), Put/Delete found, Put/Delete not-found, end-of-history, end-of-file, etc. -A lot of feedback from the reviewers. Reviewers: haobo, dhruba, zshao, emayanke Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D11499	11 years ago
Mayank Agarwal	59d0b02f8b	Expand KeyMayExist to return the proper value if it can be found in memory and also check block_cache Summary: Removed KeyMayExistImpl because KeyMayExist demanded Get like semantics now. Removed no_io from memtable and imm because we need the proper value now and shouldn't just stop when we see Merge in memtable. Added checks to block_cache. Updated documentation and unit-test Test Plan: make all check;db_stress for 1 hour Reviewers: dhruba, haobo Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D11853	11 years ago
Xing Jin	0f0a24e298	Make arena block size configurable Summary: Add an option for arena block size, default value 4096 bytes. Arena will allocate blocks with such size. I am not sure about passing parameter to skiplist in the new virtualized framework, though I talked to Jim a bit. So add Jim as reviewer. Test Plan: new unit test, I am running db_test. For passing paramter from configured option to Arena, I tried tests like: TEST(DBTest, Arena_Option) { std::string dbname = test::TmpDir() + "/db_arena_option_test"; DestroyDB(dbname, Options()); DB* db = nullptr; Options opts; opts.create_if_missing = true; opts.arena_block_size = 1000000; // tested 99, 999999 Status s = DB::Open(opts, dbname, &db); db->Put(WriteOptions(), "a", "123"); } and printed some debug info. The results look good. Any suggestion for such a unit-test? Reviewers: haobo, dhruba, emayanke, jpaton Reviewed By: dhruba CC: leveldb, zshao Differential Revision: https://reviews.facebook.net/D11799	11 years ago
Mayank Agarwal	2a986919d6	Make rocksdb-deletes faster using bloom filter Summary: Wrote a new function in db_impl.c-CheckKeyMayExist that calls Get but with a new parameter turned on which makes Get return false only if bloom filters can guarantee that key is not in database. Delete calls this function and if the option- deletes_use_filter is turned on and CheckKeyMayExist returns false, the delete will be dropped saving: 1. Put of delete type 2. Space in the db,and 3. Compaction time Test Plan: make all check; will run db_stress and db_bench and enhance unit-test once the basic design gets approved Reviewers: dhruba, haobo, vamsi Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D11607	12 years ago
Dhruba Borthakur	6acbe0fc45	Compact multiple memtables before flushing to storage. Summary: Merge multiple multiple memtables in memory before writing it out to a file in L0. There is a new config parameter min_write_buffer_number_to_merge that specifies the number of write buffers that should be merged together to a single file in storage. The system will not flush wrte buffers to storage unless at least these many buffers have accumulated in memory. The default value of this new parameter is 1, which means that a write buffer will be immediately flushed to disk as soon it is ready. Test Plan: make check Differential Revision: https://reviews.facebook.net/D11241	12 years ago
Haobo Xu	05e8854085	[Rocksdb] Support Merge operation in rocksdb Summary: This diff introduces a new Merge operation into rocksdb. The purpose of this review is mostly getting feedback from the team (everyone please) on the design. Please focus on the four files under include/leveldb/, as they spell the client visible interface change. include/leveldb/db.h include/leveldb/merge_operator.h include/leveldb/options.h include/leveldb/write_batch.h Please go over local/my_test.cc carefully, as it is a concerete use case. Please also review the impelmentation files to see if the straw man implementation makes sense. Note that, the diff does pass all make check and truly supports forward iterator over db and a version of Get that's based on iterator. Future work: - Integration with compaction - A raw Get implementation I am working on a wiki that explains the design and implementation choices, but coding comes just naturally and I think it might be a good idea to share the code earlier. The code is heavily commented. Test Plan: run all local tests Reviewers: dhruba, heyongqiang Reviewed By: dhruba CC: leveldb, zshao, sheki, emayanke, MarkCallaghan Differential Revision: https://reviews.facebook.net/D9651	12 years ago
Abhishek Kona	c41f1e995c	Codemod NULL to nullptr Summary: scripted NULL to nullptr in * include/leveldb/ * db/ * table/ * util/ Test Plan: make all check Reviewers: dhruba, emayanke Reviewed By: emayanke CC: leveldb Differential Revision: https://reviews.facebook.net/D9003	12 years ago
Kosie van der Merwe	8cd86a7be5	Fixing and adding some comments Summary: `MemTableList::Add()` neglected to mention that it took ownership of the reference held by its caller. The comment in `MemTable::Get()` was wrong in describing the format of the key. Test Plan: None Reviewers: dhruba, sheki, emayanke, vamsi Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D7755	12 years ago
Abhishek Kona	d29f181923	Fix all the lint errors. Summary: Scripted and removed all trailing spaces and converted all tabs to spaces. Also fixed other lint errors. All lint errors from this point of time should be taken seriously. Test Plan: make all check Reviewers: dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D7059	12 years ago
Dhruba Borthakur	1ca0584345	This is the mega-patch multi-threaded compaction published in https://reviews.facebook.net/D5997. Summary: This patch allows compaction to occur in multiple background threads concurrently. If a manual compaction is issued, the system falls back to a single-compaction-thread model. This is done to ensure correctess and simplicity of code. When the manual compaction is finished, the system resumes its concurrent-compaction mode automatically. The updates to the manifest are done via group-commit approach. Test Plan: run db_bench	12 years ago

25 Commits (73f62255c1bc2b7d35c6da6decc40d41a949a63e)