rocksdb

Commit Graph

Author	SHA1	Message	Date
Lei Jin	5b3b6549d6	use super_version in NewIterator() and MultiGet() function Summary: Use super_version insider NewIterator to avoid Ref() each component separately under mutex The new added bench shows NewIterator QPS increases from 515K to 719K No meaningful improvement for multiget I guess due to its relatively small cost comparing to 90 keys fetch in the test. Test Plan: unit test and db_bench Reviewers: igor, sdong Reviewed By: igor CC: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D15609	12 years ago
Igor Canadi	29bacb2eb6	VersionSet cleanup Summary: Removed icmp_ from VersionSet (since it's per-column-family, not per-DB-instance) Unfriended VersionSet and ColumnFamilyData (yay!) Removed VersionSet::NumberLevels() Cleaned up DBImpl Test Plan: make check Reviewers: dhruba, haobo, kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D15819	12 years ago
Siying Dong	d169b67680	[Performance Branch] PlainTable to encode rows with seqID 0, value type using 1 internal byte. Summary: In PlainTable, use one single byte to represent 8 bytes of internal bytes, if seqID = 0 and it is value type (which should be common for bottom most files). It is to save 7 bytes for uncompressed cases. Test Plan: make all check Reviewers: haobo, dhruba, kailiu Reviewed By: haobo CC: igor, leveldb Differential Revision: https://reviews.facebook.net/D15489	12 years ago
Igor Canadi	5c6ef56152	Fix printf format	12 years ago
kailiu	4f6cb17bdb	First phase API clean up Summary: Addressed all the issues in https://reviews.facebook.net/D15447. Now most table-related modules are hidden from user land. Test Plan: make check Reviewers: sdong, haobo, dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D15525	12 years ago
Igor Canadi	27a8856c23	Compacting column families Summary: This diff enables non-default column families to get compacted both automatically and also by calling CompactRange() Test Plan: make check Reviewers: dhruba, haobo, kailiu, sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D15813	12 years ago
Igor Canadi	5661ed8b80	Fix reduce_levels_test	12 years ago
Dhruba Borthakur	30a700657d	Fix corruption_test failure caused by auto-enablement of checksum verification. Summary: Patch https://reviews.facebook.net/D15591 enabled checksum verification by default. This caused the unit test to fail. Test Plan: ./corruption_test Reviewers: igor, kailiu Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D15795	12 years ago
Igor Canadi	f7489123e2	Move compaction picker and internal key comparator to ColumnFamilyData Summary: Compaction picker and internal key comparator are different for each column family (not global), so they should live in ColumnFamilyData Test Plan: make check Reviewers: dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D15801	12 years ago
Igor Canadi	dbbffbd772	Mark the log_number file number used Summary: VersionSet::next_file_number_ is always assumed to be strictly greater than VersionSet::log_number_. In our new recovery code, we artificially set log_number_ to be (log_number + 1), so that once we flush, we don't recover from the same log file again (this is important because of merge operator non-idempotence) When we set VersionSet::log_number_ to (log_number + 1), we also have to mark that file number used, such that next_file_number_ is increased to a legal level. Otherwise, VersionSet might assert. This has not be a problem so far because here's what happens: 1. assume next_file_number is 5, we're recovering log_number 10 2. in DBImpl::Recover() we call MarkFileNumberUsed with 10. This will set VersionSet::next_file_number_ to 11. 3. If there are some updates, we will call WriteTable0ForRecovery(), which will use file number 11 as a new table file and advance VersionSet::next_file_number_ to 12. 4. When we LogAndApply() with log_number 11, assertion is true: assert(11 <= 12); However, this was a lucky occurrence. Even though this diff doesn't cause a bug, I think the issue is important to fix. Test Plan: In column families I have different recovery logic and this code path asserted. When adding MarkFileNumberUsed(log_number + 1) assert is gone. Reviewers: dhruba, kailiu Reviewed By: kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D15783	12 years ago
Igor Canadi	3615f534d1	Enable flushing memtables from arbitrary column families Summary: Removed default_cfd_ from all flush code paths. This means we can now flush memtables from arbitrary column families! Test Plan: Added a new unit test Reviewers: dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D15789	12 years ago
Siying Dong	56bea9f80d	When using Universal Compaction, Zero out seqID in the last file too Summary: I didn't figure out the reason why the feature of zeroing out earlier sequence ID is disabled in universal compaction. I do see bottommost_level is set correctly. It should simply work if we remove the constraint of universal compaction. Test Plan: make all check Reviewers: haobo, dhruba, kailiu, igor Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D15423	12 years ago
kailiu	4e0298f23c	Clean up arena API Summary: Easy thing goes first. This patch moves arena to internal dir; based on which, the coming patch will deal with memtable_rep. Test Plan: make check Reviewers: haobo, sdong, dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D15615	12 years ago
Dhruba Borthakur	abd70ecc2b	The default settings enable checksum verification on every read. Summary: The default settings enable checksum verification on every read. Test Plan: make check Reviewers: haobo Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D15591	12 years ago
Igor Canadi	9ca638a86d	Enable iterating column families with a concurrent writer Summary: Sometimes we iterate through column families, and unlock the mutex in the body of the iteration. While mutex is unlocked, some column family might be created or dropped. We need to be able to continue iterating through column families even though our current column family got dropped. This diff implements circular linked lists that connect all column families. It then uses the link list to enable iterating through linked lists. Even if the column family is dropped, its next_ pointer still can be used to advance to another alive column family. Test Plan: make check Reviewers: dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D15603	12 years ago
Igor Canadi	6973bb1722	MakeRoomForWrite() support for column families Summary: Making room for write will be the hardest part of the column family implementation. For now, I just iterate through all column families and run MakeRoomForWrite() for every one. Test Plan: make check does not complain Reviewers: dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D15597	12 years ago
Igor Canadi	3c0dcf0e25	InternalStatistics Summary: In DBImpl we keep track of some statistics internally and expose them via GetProperty(). This diff encapsulates all the internal statistics into a class InternalStatisics. Most of it is copy/paste. Apart from cleaning up db_impl.cc, this diff is also necessary for Column families, since every column family should have its own CompactionStats, MakeRoomForWrite-stall stats, etc. It's much easier to keep track of it in every column family if it's nicely encapsulated in its own class. Test Plan: make check Reviewers: dhruba, kailiu, haobo, sdong, emayanke Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D15273	12 years ago
Lei Jin	d118707f8d	set bg_error_ when background flush goes wrong Summary: as title Test Plan: unit test Reviewers: haobo, igor, sdong, kailiu, dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D15435	12 years ago
Igor Canadi	514e42c7cc	Fix some lint warnings	12 years ago
Igor Canadi	fa99d53e55	Change ColumnFamilyData from struct to class Summary: ColumnFamilyData grew a lot, there's much more data that it holds now. It makes more sense to encapsulate it better by making it a class. Test Plan: make check Reviewers: dhruba, haobo, kailiu, sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D15579	12 years ago
Igor Canadi	15999e728e	Fix column family test (create directory)	12 years ago
Igor Canadi	4662969bf5	PurgeObsoleteFiles in DropColumnFamily Summary: When we drop the column family, we want to delete all the files from that column family. Test Plan: make check Reviewers: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D15561	12 years ago
Igor Canadi	f24a3ee52d	Read from and write to different column families Summary: This one is big. It adds ability to write to and read from different column families (see the unit test). It also supports recovery of different column families from log, which was the hardest part to reason about. We need to make sure to never delete the log file which has unflushed data from any column family. To support that, I added another concept, which is versions_->MinLogNumber() Test Plan: Added a unit test in column_family_test Reviewers: dhruba, haobo, sdong, kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D15537	12 years ago
Igor Canadi	e57f0cc1a1	add include <atomic> to version_set.h	12 years ago
Igor Canadi	5d2c62822e	Only get the manifest file size if there is no error Summary: I came across this while working on column families. CorruptionTest::RecoverWriteError threw a SIGSEG because the descriptor_log_->file() was nullptr. I'm not sure why it doesn't happen in master, but better safe than sorry. @kailiu, can we get this in release, too? Test Plan: make check Reviewers: kailiu, dhruba, haobo Reviewed By: haobo CC: leveldb, kailiu Differential Revision: https://reviews.facebook.net/D15513	12 years ago
Igor Canadi	4bf25357ae	[column families] Removing VersionSet::current() Summary: Instead of VersionSet::current(), DBImpl uses default_cfd_->current directly. Test Plan: make check Reviewers: dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D15483	12 years ago
Mark Callaghan	90f29ccbef	Update monitoring to include average time per compaction and stall Summary: The new columns are msComp and msStall that provide average time per compaction and stall for that level in milliseconds. Level Files Size(MB) Score Time(sec) Read(MB) Write(MB) Rn(MB) Rnp1(MB) Wnew(MB) RW-Amplify Read(MB/s) Write(MB/s) Rn Rnp1 Wnp1 NewW Count msComp msStall Ln-stall Stall-cnt ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 0 8 15 1.5 2 0 30 0 0 30 0.0 0.0 15.5 0 0 0 0 16 112 0.2 1.3 7568 1 8 16 1.6 1 26 26 15 11 16 3.5 17.6 18.1 8 6 13 7 3 362 0.0 0.0 0 2 1 2 0.0 0 0 2 0 0 2 0.0 0.0 18.4 0 0 0 0 1 50 0.0 0.0 0 Task ID: # Blame Rev: Test Plan: run db_bench Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin PUBLIC platform impact section - Bugzilla: # - end platform impact - Reviewers: haobo Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D15345	12 years ago
Schalk-Willem Kruger	3d33da75ef	Fix UnmarkEOF for partial blocks Summary: Blocks in the transaction log are a fixed size, but the last block in the transaction log file is usually a partial block. When a new record is added after the reader hit the end of the file, a new physical record will be appended to the last block. ReadPhysicalRecord can only read full blocks and assumes that the file position indicator is aligned to the start of a block. If the reader is forced to read further by simply clearing the EOF flag, ReadPhysicalRecord will read a full block starting from somewhere in the middle of a real block, causing it to lose alignment and to have a partial physical record at the end of the read buffer. This will result in length mismatches and checksum failures. When the log file is tailed for replication this will cause the log iterator to become invalid, necessitating the creation of a new iterator which will have to read the log file from scratch. This diff fixes this issue by reading the remaining portion of the last block we read from. This is done when the reader is forced to read further (UnmarkEOF is called). Test Plan: - Added unit tests - Stress test (with replication). Check dbdir/LOG file for corruptions. - Test on test tier Reviewers: emayanke, haobo, dhruba Reviewed By: haobo CC: vamsi, sheki, dhruba, kailiu, igor Differential Revision: https://reviews.facebook.net/D15249	12 years ago
Igor Canadi	511b03a5b5	LogAndApply to take ColumnFamilyData Summary: This removes the default implementation of LogAndApply that applied the changed to the default column family by default. It is mostly simple reformatting. Test Plan: make check Reviewers: dhruba, kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D15465	12 years ago
Igor Canadi	eb055609e4	[column families] Move memtable and immutable memtable list to column family data Summary: All memtables and immutable memtables are moved from DBImpl to ColumnFamilyData. For now, they are all referenced from default column family in DBImpl. It shouldn't be hard to get them from custom column family. Test Plan: make check Reviewers: dhruba, kailiu, sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D15459	12 years ago
Igor Canadi	832158e7f7	Fsync directory after we create a new file Summary: @dhruba, I'm not sure where we need to sync the directory. I implemented the function in Env() and added the dir sync just after we close the newly created file in the builder. Should I also add FsyncDir() to new files that get created by a compaction? Test Plan: Confirmed that FsyncDir is returning Status::OK() Reviewers: dhruba, haobo Reviewed By: dhruba CC: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D14751	12 years ago
Siying Dong	b20486f294	[Performance Branch] HashLinkList to avoid to convert length prefixed string back to internal keys Summary: Converting from length prefixed buffer back to internal key costs some CPU but it is not necessary. In this patch, internal keys are pass though the functions so that we don't need to convert back to it. Test Plan: make all check Reviewers: haobo, kailiu Reviewed By: kailiu CC: igor, leveldb Differential Revision: https://reviews.facebook.net/D15393	12 years ago
Igor Canadi	6c2ca1d3e6	Move NeedsCompaction() from VersionSet to Version Summary: There is no reason to have functions NeedCompaction(), MaxCompactionScore() and MaxCompactionScoreLevel() in VersionSet, since they don't access any data in VersionSet. Test Plan: make check Reviewers: kailiu, haobo, sdong Reviewed By: kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D15333	12 years ago
Igor Canadi	e55b3c040c	Fixing ref-counting memtables	12 years ago
Mike Lin	af7838de36	address code review comments on `5e3aeb5f8e` - reduce string copying in Compaction::Summary - simplify file number checking in UniversalCompactionStopStyleSimilarSize unit test	12 years ago
Igor Canadi	983fafa56c	Fix memory leak	12 years ago
Igor Canadi	68a91a2e6a	missing include	12 years ago
Igor Canadi	04afa32134	Fix reduce levels ReduceNumberOfLevels had segmentation fault in WriteSnapshot() since we didn't change the number of levels in VersionSet (we consider them immutable from now on). This fixes the problem.	12 years ago
Siying Dong	8477255da3	Moving Some includes from options.h to forward declaration Summary: By removing some includes form options.h and reply on forward declaration, we can more easily reason the dependencies. Test Plan: make all check Reviewers: kailiu, haobo, igor, dhruba Reviewed By: kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D15411	12 years ago
Igor Canadi	f653fdcf5a	Fixing iterator cleanup for Tailing iterator Immutable tailing iterator doesn't set CleanupState::mem, so we don't have to unref it.	12 years ago
Igor Canadi	677fee27c6	Make VersionSet::ReduceNumberOfLevels() static Summary: A lot of our code implicitly assumes number_levels to be static. ReduceNumberOfLevels() breaks that assumption. For example, after calling ReduceNumberOfLevels(), DBImpl::NumberLevels() will be different from VersionSet::NumberLevels(). This is dangerous. Thankfully, it's not in public headers and is only used from LDB cmd tool. LDB tool is only using it statically, i.e. it never calls it with running DB instance. With this diff, we make it explicitly static. This way, we can assume number_levels to be immutable and not break assumption that lot of our code is relying upon. LDB tool can still use the method. Also, I removed the method from a separate file since it breaks filename completition. version_se<TAB> now completes to "version_set." instead of "version_set" (without the dot). I don't see a big reason that the function should be in a different file. Test Plan: reduce_levels_test Reviewers: dhruba, haobo, kailiu, sdong Reviewed By: kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D15303	12 years ago
Igor Canadi	c583157d49	MemTableListVersion Summary: MemTableListVersion is to MemTableList what Version is to VersionSet. I took almost the same ideas to develop MemTableListVersion. The reason is to have copying std::list done in background, while flushing, rather than in foreground (MultiGet() and NewIterator()) under a mutex! Also, whenever we copied MemTableList, we copied also some MemTableList metadata (flush_requested_, commit_in_progress_, etc.), which was wasteful. This diff avoids std::list copy under a mutex in both MultiGet() and NewIterator(). I created a small database with some number of immutable memtables, and creating 100.000 iterators in a single-thread (!) decreased from {188739, 215703, 198028} to {154352, 164035, 159817}. A lot of the savings come from code under a mutex, so we should see much higher savings with multiple threads. Creating new iterator is very important to LogDevice team. I also think this diff will make SuperVersion obsolete for performance reasons. I will try it in the next diff. SuperVersion gave us huge savings on Get() code path, but I think that most of the savings came from copying MemTableList under a mutex. If we had MemTableListVersion, we would never need to copy the entire object (like we still do in NewIterator() and MultiGet()) Test Plan: `make check` works. I will also do `make valgrind_check` before commit Reviewers: dhruba, haobo, kailiu, sdong, emayanke, tnovak Reviewed By: kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D15255	12 years ago
Igor Canadi	e832e72b31	Revert "Moving to glibc-fb" This reverts commit `d24961b65e`. For some reason, glibc2.17-fb breaks gflags. Reverting for now	12 years ago
kailiu	66dc033af3	Temporarily disable caching index/filter blocks Summary: Mixing index/filter blocks with data blocks resulted in some known issues. To make sure in next release our users won't be affected, we added a new option in BlockBasedTableFactory::TableOption to conceal this functionality for now. This patch also introduced a BlockBasedTableReader::OpenOptions, which avoids the "infinite" growth of parameters in BlockBasedTableReader::Open(). Test Plan: make check Reviewers: haobo, sdong, igor, dhruba Reviewed By: igor CC: leveldb, tnovak Differential Revision: https://reviews.facebook.net/D15327	12 years ago
Igor Canadi	d24961b65e	Moving to glibc-fb Summary: It looks like we might have some trouble when building the new release with 4.8, since fbcode is using glibc2.17-fb by default and we are using glibc2.17. It was reported by Benjamin Renard in our internal group. This diff moves our fbcode build to use glibc2.17-fb by default. I got some linker errors when compiling, complaining that `google::SetUsageMessage()` was undefined. After deleting all offending lines, the compile was successful and everything works. Test Plan: Compiled Ran ./db_bench ./db_stress ./db_repl_stress Reviewers: kailiu Reviewed By: kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D15405	12 years ago
Siying Dong	4605e20c58	If User setting of compaction multipliers overflow, use default value 1 instead Summary: Currently, compaction multipliers can overflow and cause unexpected behaviors. In this patch, we detect those overflows and use multiplier 1 for them. Test Plan: make all check Reviewers: dhruba, haobo, igor, kailiu Reviewed By: kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D15321	12 years ago
Igor Canadi	09489d395f	Fix a bug in DBImpl::CreateColumnFamily	12 years ago
Lei Jin	aba2acb5ec	CompactRange() to return status Summary: as title Test Plan: make all check What else tests shall I cover? Reviewers: igor, haobo CC: Differential Revision: https://reviews.facebook.net/D15339	12 years ago
Tomislav Novak	81c9cc9b3b	Tailing iterator Summary: This diff implements a special type of iterator that doesn't create a snapshot (can be used to read newly inserted data) and is optimized for doing sequential reads. TailingIterator uses current superversion number to determine whether to invalidate its internal iterators. If the version hasn't changed, it can often avoid doing expensive seeks over immutable structures (sst files and immutable memtables). Test Plan: * new unit tests * running LD with this patch Reviewers: igor, dhruba, haobo, sdong, kailiu Reviewed By: sdong CC: leveldb, lovro, march Differential Revision: https://reviews.facebook.net/D15285	12 years ago
Igor Canadi	7c5e583a27	ColumnFamilySet Summary: I created a separate class ColumnFamilySet to keep track of column families. Before we did this in VersionSet and I believe this approach is cleaner. Let me know if you have any comments. I will commit tomorrow. Test Plan: make check Reviewers: dhruba, haobo, kailiu, sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D15357	12 years ago

... 39 40 41 42 43 ...

2738 Commits (2d42cf5ea94d7503c9f35a388abc63c5b5f31af9)