rocksdb

Commit Graph

Author	SHA1	Message	Date
Danny Guo	d9ca83df28	[rocksdb] make init prefix more robust Summary: Currently if client uses kNULLString as the prefix, it will confuse compaction filter v2. This diff added a bool to indicate if the prefix has been intialized. I also added a unit test to cover this case and make sure the new code path is hit. Test Plan: db_test Reviewers: igor, haobo Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D17151	11 years ago
Danny Guo	b47812fba6	[rocksdb] new CompactionFilterV2 API Summary: This diff adds a new CompactionFilterV2 API that roll up the decisions of kv pairs during compactions. These kv pairs must share the same key prefix. They are buffered inside the db. typedef std::vector<Slice> SliceVector; virtual std::vector<bool> Filter(int level, const SliceVector& keys, const SliceVector& existing_values, std::vector<std::string>* new_values, std::vector<bool>* values_changed ) const = 0; Application can override the Filter() function to operate on the buffered kv pairs. More details in the inline documentation. Test Plan: make check. Added unit tests to make sure Keep, Delete, Change all works. Reviewers: haobo CCs: leveldb Differential Revision: https://reviews.facebook.net/D15087	11 years ago
Yueh-Hsuan Chiang	cda4006e87	Enhance partial merge to support multiple arguments Summary: * PartialMerge api now takes a list of operands instead of two operands. * Add min_pertial_merge_operands to Options, indicating the minimum number of operands to trigger partial merge. * This diff is based on Schalk's previous diff (D14601), but it also includes necessary changes such as updating the pure C api for partial merge. Test Plan: * make check all * develop tests for cases where partial merge takes more than two operands. TODOs (from Schalk): * Add test with min_partial_merge_operands > 2. * Perform benchmarks to measure the performance improvements (can probably use results of task #2837810.) * Add description of problem to doc/index.html. * Change wiki pages to reflect the interface changes. Reviewers: haobo, igor, vamsi Reviewed By: haobo CC: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D16815	11 years ago
Igor Canadi	e67241f0b9	Sanity check on Open Summary: Everytime a client opens a DB, we do a sanity check that: * checks the existance of all the necessary files * verifies that file sizes are correct Some of the code was stolen from https://reviews.facebook.net/D16935 Test Plan: added a unit test Reviewers: dhruba, haobo, sdong Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D17097	11 years ago
Igor Canadi	8ea3cb621e	If paranoid_checks -- Mark DB read-only on any IOError Summary: Whenever we get an IOError from GetImpl() or NewIterator(), we should immediatelly mark the DB read-only. The same check already exists in Write() and Compaction(). This should help with clients that are somehow missing a file. Test Plan: make check Reviewers: dhruba, haobo, sdong, ljin Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D17061	11 years ago
sdong	71e6a34271	Add a DB property to indicate number of background errors encountered Summary: Add a property to calculate number of background errors encountered to help users build their monitoring Test Plan: Add a unit test. make all check Reviewers: haobo, igor, dhruba Reviewed By: igor CC: ljin, nkg-, yhchiang, leveldb Differential Revision: https://reviews.facebook.net/D16959	11 years ago
Kai Liu	1ec72b37b1	Several easy-to-add properties related to compaction and flushes Summary: To partly address the request @nkg- raised, add three easy-to-add properties to compactions and flushes. Test Plan: run unit tests and add a new unit test to cover new properties. Reviewers: haobo, dhruba Reviewed By: dhruba CC: nkg-, leveldb Differential Revision: https://reviews.facebook.net/D13677	11 years ago
Lei Jin	63cef90078	disable the log_number check in Recover() Summary: There is a chance that an old MANIFEST is corrupted in 2.7 but just not noticed. This check would fail them. Change it to log instead of returning a Corruption status. Test Plan: make Reviewers: haobo, igor Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D16923	11 years ago
Igor Canadi	f26cb0f093	Optimize fallocation Summary: Based on my recent findings (posted in our internal group), if we use fallocate without KEEP_SIZE flag, we get superior performance of fdatasync() in append-only workloads. This diff provides an option for user to not use KEEP_SIZE flag, thus optimizing his sync performance by up to 2x-3x. At one point we also just called posix_fallocate instead of fallocate, which isn't very fast: http://code.woboq.org/userspace/glibc/sysdeps/posix/posix_fallocate.c.html (tl;dr it manually writes out zero bytes to allocate storage). This diff also fixes that, by first calling fallocate and then posix_fallocate if fallocate is not supported. Test Plan: make check Reviewers: dhruba, sdong, haobo, ljin Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D16761	11 years ago
Igor Canadi	ae25742af9	Fix race condition in manifest roll Summary: When the manifest is getting rolled the following happens: 1) manifest_file_number_ is assigned to a new manifest number (even though the old one is still current) 2) mutex is unlocked 3) SetCurrentFile() creates temporary file manifest_file_number_.dbtmp 4) SetCurrentFile() renames manifest_file_number_.dbtmp to CURRENT 5) mutex is locked If FindObsoleteFiles happens between (3) and (4) it will: 1) Delete manifest_file_number_.dbtmp (because it's not in pending_outputs_) 2) Delete old manifest (because the manifest_file_number_ already points to a new one) I introduce the concept of prev_manifest_file_number_ that will avoid the race condition. However, we should discuss the future of MANIFEST file rolling. We found some race conditions with it last week and who knows how many more are there. Nobody is using it in production because we don't trust the implementation. Should we even support it? Test Plan: make check Reviewers: ljin, dhruba, haobo, sdong Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16929	11 years ago
Igor Canadi	db234133a9	[CF] WriteBatch to take in ColumnFamilyHandle Summary: Client doesn't need to know anything about ColumnFamily ID. By making WriteBatch take ColumnFamilyHandle as a parameter, we can eliminate method GetID() from ColumnFamilyHandle Test Plan: column_family_test Reviewers: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16887	11 years ago
sdong	5aa81f04fa	Fix extra compaction tasks scheduled after D16767 in some cases Summary: With D16767, there is a case compaction tasks are scheduled infinitely: (1) no flush thread is configured and more than 1 compaction threads (2) a flush is going on by one compaction hread (3) the state of SST files is in the state that versions_->current()->NeedsCompaction() will generate a false positive (return true actually there is no work to be done) In that case, a infinite loop will be formed. This patch would fix it. Test Plan: make all check Reviewers: haobo, igor, ljin Reviewed By: igor CC: dhruba, yhchiang, leveldb Differential Revision: https://reviews.facebook.net/D16863	11 years ago
Kai Liu	11da8bc5df	A heuristic way to check if a memtable is full Summary: This is is based on https://reviews.facebook.net/D15027. It's not finished but I would like to give a prototype to avoid arena over-allocation while making better use of the already allocated memory blocks. Instead of check approximate memtable size, we will take a deeper look at the arena, which incorporate essential idea that @sdong suggests: flush when arena has allocated its last and the last is "almost full" Test Plan: N/A Reviewers: haobo, sdong Reviewed By: sdong CC: leveldb, sdong Differential Revision: https://reviews.facebook.net/D15051	11 years ago
Igor Canadi	25c8a1a20f	More bug fixed introduced by code cleanup	11 years ago
Igor Canadi	b5d6ad69fc	Bug fixes introduced by code cleanup	11 years ago
Igor Canadi	fb2346fc1f	[CF] Code cleanup part 1 Summary: I'm cleaning up some code preparing for the big diff review tomorrow. This is the first part of the cleanup. Changes are mostly cosmetic. The goal is to decrease amount of code difference between columnfamilies and master branch. This diff also fixes race condition when dropping column family. Test Plan: Ran db_stress with variety of parameters Reviewers: dhruba, haobo Differential Revision: https://reviews.facebook.net/D16833	11 years ago
Igor Canadi	45ad75db80	Correct version of D16821	11 years ago
sdong	bd45633b71	Fix data race against logging data structure because of LogBuffer Summary: @igor pointed out that there is a potential data race because of the way we use the newly introduced LogBuffer. After "bg_compaction_scheduled_--" or "bg_flush_scheduled_--", they can both become 0. As soon as the lock is released after that, DBImpl's deconstructor can go ahead and deconstruct all the states inside DB, including the info_log object hold in a shared pointer of the options object it keeps. At that point it is not safe anymore to continue using the info logger to write the delayed logs. With the patch, lock is released temporarily for log buffer to be flushed before "bg_compaction_scheduled_--" or "bg_flush_scheduled_--". In order to make sure we don't miss any pending flush or compaction, a new flag bg_schedule_needed_ is added, which is set to be true if there is a pending flush or compaction but not scheduled because of the max thread limit. If the flag is set to be true, the scheduling function will be called before compaction or flush thread finishes. Thanks @igor for this finding! Test Plan: make all check Reviewers: haobo, igor Reviewed By: haobo CC: dhruba, ljin, yhchiang, igor, leveldb Differential Revision: https://reviews.facebook.net/D16767	11 years ago
Igor Canadi	457c78eb89	[CF] db_stress for column families Summary: I had this diff for a while to test column families implementation. Last night, I ran it sucessfully for 10 hours with the command: time ./db_stress --threads=30 --ops_per_thread=200000000 --max_key=5000 --column_families=20 --clear_column_family_one_in=3000000 --verify_before_write=1 --reopen=50 --max_background_compactions=10 --max_background_flushes=10 --db=/tmp/db_stress It is ready to be committed :) Test Plan: Ran it for 10 hours Reviewers: dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16797	11 years ago
sdong	6c66bc08d9	Temp Fix of LogBuffer flushing Summary: To temp fix the log buffer flushing. Flush the buffer inside the lock. Clean the trunk before we find an eventual fix. Test Plan: make all check Reviewers: haobo, igor Reviewed By: igor CC: ljin, leveldb, yhchiang Differential Revision: https://reviews.facebook.net/D16791	11 years ago
Igor Canadi	cb9802168f	Add a comment after SignalAll() Summary: Having code after SignalAll has already caused 2 bugs. Let's make sure this doesn't happen again. Test Plan: no test Reviewers: sdong, dhruba, haobo Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16785	11 years ago
Igor Canadi	d5de22dc09	Call PurgeObsoleteFiles() only when HaveSomethingToDelete() Summary: as title Test Plan: fixed the build failure http://ci-builds.fb.com/job/rocksdb_build/987/console Reviewers: haobo Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16743	11 years ago
Haobo Xu	a91aed615a	[RocksDB] Minor cleanup of PurgeObsoleteFiles Summary: as title. also made info log output of file deletion a bit more descriptive. Test Plan: make check; db_bench and look at LOG output Reviewers: igor Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D16731	11 years ago
Lei Jin	8d007b4aaf	Consolidate SliceTransform object ownership Summary: (1) Fix SanitizeOptions() to also check HashLinkList. The current dynamic case just happens to work because the 2 classes have the same layout. (2) Do not delete SliceTransform object in HashSkipListFactory and HashLinkListFactory destructor. Reason: SanitizeOptions() enforces prefix_extractor and SliceTransform to be the same object when HashFactory is used. This makes the behavior strange: when HashFactory is used, prefix_extractor will be released by RocksDB. If other memtable factory is used, prefix_extractor should be released by user. Test Plan: db_bench && make asan_check Reviewers: haobo, igor, sdong Reviewed By: igor CC: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D16587	11 years ago
Haobo Xu	9e0e6aa7f6	[RocksDB] make sure KSVObsolete does not get accessed as a valid pointer. Summary: KSVObsolete is no longer nullptr and needs to be checked explicitly. Also did some minor code cleanup and added a stat counter to track superversion cleanups incurred in the foreground. Test Plan: make check Reviewers: ljin Reviewed By: ljin CC: leveldb Differential Revision: https://reviews.facebook.net/D16701	11 years ago
Haobo Xu	66da467983	[RocksDB] LogBuffer Cleanup Summary: Moved LogBuffer class to an internal header. Removed some unneccesary indirection. Enabled log buffer for BackgroundCallFlush. Forced log buffer flush right after Unlock to improve time ordering of info log. Test Plan: make check; db_bench compare LOG output Reviewers: sdong Reviewed By: sdong CC: leveldb, igor Differential Revision: https://reviews.facebook.net/D16707	11 years ago
Igor Canadi	d4f2c610d3	Ignore dropped column families -- don't flush or compact them	11 years ago
Igor Canadi	9f15092ebd	[CF] NewIterators Summary: Adding the last missing function -- NewIterators(). Pretty simple implementation Test Plan: added a unit test Reviewers: dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16689	11 years ago
Lei Jin	e5fa4944fc	use CAS when returning SuperVersion to ThreadLocal Summary: Add a check at the end of GetImpl to release SuperVersion if it becomes obsolete. Also do Scrape() inside InstallSuperVersion so it happens more frequent. Test Plan: make all check running asan_check now Reviewers: igor, haobo, sdong, dhruba Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16641	11 years ago
Igor Canadi	eec8695206	Delete local sv when destroying DB from stress test Summary: Not deleting local SV caused some an crash test issue: http://ci-builds.fb.com/job/rocksdb_asan_crash_test/83/console Test Plan: ran unit tests Reviewers: ljin Reviewed By: ljin CC: leveldb Differential Revision: https://reviews.facebook.net/D16635	11 years ago
sdong	ecb1ffa2a8	Buffer info logs when picking compactions and write them out after releasing the mutex Summary: Now while the background thread is picking compactions, it writes out multiple info_logs, especially for universal compaction, which introduces a chance of waiting log writing in mutex, which is bad. To remove this risk, write all those info logs to a buffer and flush it after releasing the mutex. Test Plan: make all check check the log lines while running some tests that trigger compactions. Reviewers: haobo, igor, dhruba Reviewed By: dhruba CC: i.am.jin.lei, dhruba, yhchiang, leveldb, nkg- Differential Revision: https://reviews.facebook.net/D16515	11 years ago
Igor Canadi	a329dd1b25	Fix TEST_Destroy_DBImpl() to work with column families	11 years ago
sdong	e8ecca9e86	CleanupIteratorState() only to initialize DeletionState when super version cleanup needed Summary: Two changes: 1. DeletionState is only constructed when cleaning up is needed 2. Fix the bug of deletion state construction bug. A change was made in a previous patch: https://reviews.facebook.net/rROCKSDB774ed89c2405ee058086b099cbc8b29e243739cc#71a34e2e However, it somehow got lost when merging Test Plan: make all check Reviewers: kailiu, haobo, igor Reviewed By: igor CC: igor, dhruba, i.am.jin.lei, yhchiang, leveldb Differential Revision: https://reviews.facebook.net/D16233	11 years ago
Igor Canadi	e21d5b8bbc	[CF] Flush all memtables on column family drop Summary: When column family is dropped, we want to delete all WALs that refer to it. To do that, we need to make them obsolete by flushing all the memtables Test Plan: column_family_test Reviewers: dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16557	11 years ago
Igor Canadi	335b207974	[CF] Delete SuperVersion in a special function Summary: Added a function DeleteSuperVersion that can be called in DBImpl destructor before PurgingObsoleteFiles. That way, PurgeObsoleteFiles will be able to delete all files held by alive super versions. Test Plan: column_family_test with valgrind Reviewers: dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16545	11 years ago
Igor Canadi	f9b2f0ad79	[CF] Fix CF bugs in WriteBatch Summary: This diff fixes two bugs: * Increase sequence number even if WriteBatch fails. This is important because WriteBatches in WAL logs have implictly increasing sequence number, even if one update in a write batch fails. This caused some writes to get lost in my CF stress testing * Tolerate 'invalid column family' errors on recovery. When a column family is dropped, processing WAL logs can have some WriteBatches that still refer to the dropped column family. In recovery environment, we want to ignore those errors. In client's Write() code path, however, we want to return the failure to the client if he's trying to add data to invalid column family. Test Plan: db_stress's verification works now Reviewers: dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16533	11 years ago
Igor Canadi	8ea21a778b	[CF] Rething LogAndApply for column families Summary: I though I might get away with as little changes to LogAndApply() as possible. It turns out this is not the case. This diff introduces different behavior of LogAndApply() for three cases: 1. column family add 2. column family drop 3. no-column family manipulation (1) and (2) don't support group commit yet. There were a lot of problems with old version od LogAndApply, detected by db_stress. The biggest was non-atomicity of manifest writes and metadata changes (i.e. if column family add is in manifest, it also has to be in in-memory data structure). Test Plan: db_stress Reviewers: dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16491	11 years ago
Igor Canadi	58ca641d53	Make Log::Reader more robust Summary: This diff does two things: (1) Log::Reader does not report a corruption when the last record in a log or manifest file is truncated (meaning that log writer died in the middle of the write). Inherited the code from LevelDB: https://code.google.com/p/leveldb/source/detail?r=269fc6ca9416129248db5ca57050cd5d39d177c8# (2) Turn off mmap writes for all writes to log and manifest files (2) is necessary because if we use mmap writes, the last record is not truncated, but is actually filled with zeros, making checksum fail. It is hard to recover from checksum failing. Test Plan: Added unit tests from LevelDB Actually recovered a "corrupted" MANIFEST file. Reviewers: dhruba, haobo Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16119	11 years ago
Yueh-Hsuan Chiang	a77527f2af	Add ReadOptions to TransactionLogIterator. Summary: Add an optional input parameter ReadOptions to DB::GetUpdateSince(), which allows the verification of checksums to be disabled by setting ReadOptions::verify_checksums to false. Test Plan: Tests are done off-line and will not be included in the regular unit test. Reviewers: igor Reviewed By: igor CC: leveldb, xjin, dhruba Differential Revision: https://reviews.facebook.net/D16305	11 years ago
Igor Canadi	f6a257b6a1	Set dropped column family before persisting in the manifest	11 years ago
Igor Canadi	510f84b686	[CF] CreateColumnFamily fix Summary: This fixes few bugs with CreateColumnFamily * We first have to LogAndApply and then call VersionSet::CreateColumnFamily. Otherwise, WriteSnapshot might be invoked, writing out column family add inside of LogAndApply, even though it's not really committed * Fix LogAndApplyHelper() to not apply log number to column_family_data, which is in case of column family add, just a dummy (default) column family * Create SuperVerion when creating column family Test Plan: column_family_test Reviewers: dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16443	11 years ago
Igor Canadi	206b38f31c	SetLogNumber in CreateColumnFamily	11 years ago
Igor Canadi	b41a3bc4da	[CF] Change flow of CreateColumnFamily Summary: Previously, we first wrote to the manifest and then created internal data structure. Now, we first create internal data structure. That way, we can write out internal comparator to the manifest Test Plan: column_family_test Reviewers: dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16425	11 years ago
Lei Jin	ad0c3747cb	cache SuperVersion in thread local storage to avoid mutex lock Summary: as title Test Plan: asan_check will post results later Reviewers: haobo, igor, dhruba, sdong Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16257	11 years ago
Igor Canadi	4c42201204	[CF] Test fixes and speedup	11 years ago
Igor Canadi	343c32be7b	[CF] DifferentMergeOperators and DifferentCompactionStyles tests Summary: Two new column family tests: * DifferentMergeOperators -- three column families, one without merge operator, one with add operator and one with append operator. verify that operations work as expected. * DifferentCompactionStyles -- three column families, two with level compactions and one with universal compaction. trigger the compactions and verify they work as expected. Test Plan: nope Reviewers: dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16377	11 years ago
Igor Canadi	6e7cae7711	[CF] More tests Summary: New unit tests for column families Test Plan: this is a test Reviewers: dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16359	11 years ago
Igor Canadi	9bce2b2a84	[CF] Fix lint errors in CF code Summary: Big CF diff uncovered some lint errors. This diff fixes some of them. Not much to see here Test Plan: make check Reviewers: dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16347	11 years ago
Igor Canadi	8b7ab9951c	[CF] Handle failure in WriteBatch::Handler Summary: * Add ColumnFamilyHandle::GetID() function. Client needs to know column family's ID to be able to construct WriteBatch * Handle WriteBatch::Handler failure gracefully. Since WriteBatch is not a very smart function (it takes raw CF id), client can add data to WriteBatch for column family that doesn't exist. In that case, we need to gracefully return failure status from DB::Write(). To do that, I added a return Status to WriteBatch functions PutCF, DeleteCF and MergeCF. Test Plan: Added test to column_family_test Reviewers: dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16323	11 years ago
Igor Canadi	5ad7ee03ea	[CF] Log deletion in column families Summary: * Added unit test that verifies that obsolete files are deleted. * Advance log number for empty column family when cutting log file. * MinLogNumber() bug fix! (caught by the new unit test) Test Plan: unit test Reviewers: dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16311	11 years ago

... 10 11 12 13 14 ...

1017 Commits (5f65dc877806f8d60d64144ddee9023b178c2d50)