rocksdb

Commit Graph

Author	SHA1	Message	Date
Igor Canadi	69aa6ecb26	Finalize fist version in column family	12 years ago
Igor Canadi	758fa8c359	Don't Finalize in CompactionPicker Summary: Finalize re-sorts (read: mutates) the files_ in Version* and it is called by CompactionPicker during normal runtime. At the same time, this same Version* lives in the SuperVersion* and is accessed without the mutex in GetImpl() code path. Mutating the files_ in one thread and reading the same files_ in another thread is a bad idea. It caused this issue: http://ci-builds.fb.com/job/rocksdb_crashtest/285/console Long-term, we need to be more careful with method contracts and clearly document what state can be mutated when. Now that we are much faster because we don't lock in GetImpl(), we keep running into data races that were not a problem before when we were slower. db_stress has been very helpful in detecting those. Short-term, I removed Finalize() from CompactionPicker. Note: I believe this is an issue in current 2.7 version running in production. Test Plan: make check Will also run db_stress to see if issue is gone Reviewers: sdong, ljin, dhruba, haobo Reviewed By: sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D16983	12 years ago
Lei Jin	63cef90078	disable the log_number check in Recover() Summary: There is a chance that an old MANIFEST is corrupted in 2.7 but just not noticed. This check would fail them. Change it to log instead of returning a Corruption status. Test Plan: make Reviewers: haobo, igor Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D16923	12 years ago
Igor Canadi	bcea9c1296	Finalize version in dumpmanifest	12 years ago
Igor Canadi	f26cb0f093	Optimize fallocation Summary: Based on my recent findings (posted in our internal group), if we use fallocate without KEEP_SIZE flag, we get superior performance of fdatasync() in append-only workloads. This diff provides an option for user to not use KEEP_SIZE flag, thus optimizing his sync performance by up to 2x-3x. At one point we also just called posix_fallocate instead of fallocate, which isn't very fast: http://code.woboq.org/userspace/glibc/sysdeps/posix/posix_fallocate.c.html (tl;dr it manually writes out zero bytes to allocate storage). This diff also fixes that, by first calling fallocate and then posix_fallocate if fallocate is not supported. Test Plan: make check Reviewers: dhruba, sdong, haobo, ljin Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D16761	12 years ago
Igor Canadi	ae25742af9	Fix race condition in manifest roll Summary: When the manifest is getting rolled the following happens: 1) manifest_file_number_ is assigned to a new manifest number (even though the old one is still current) 2) mutex is unlocked 3) SetCurrentFile() creates temporary file manifest_file_number_.dbtmp 4) SetCurrentFile() renames manifest_file_number_.dbtmp to CURRENT 5) mutex is locked If FindObsoleteFiles happens between (3) and (4) it will: 1) Delete manifest_file_number_.dbtmp (because it's not in pending_outputs_) 2) Delete old manifest (because the manifest_file_number_ already points to a new one) I introduce the concept of prev_manifest_file_number_ that will avoid the race condition. However, we should discuss the future of MANIFEST file rolling. We found some race conditions with it last week and who knows how many more are there. Nobody is using it in production because we don't trust the implementation. Should we even support it? Test Plan: make check Reviewers: ljin, dhruba, haobo, sdong Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16929	12 years ago
Igor Canadi	d63ae5cb59	Adjust memtable sizes in unit test	12 years ago
Yueh-Hsuan Chiang	a5fafd4f46	Correct the logic of MemTable::ShouldFlushNow(). Summary: Memtable will now be forced to flush if the one of the following conditions is met: 1. Already allocated more than write_buffer_size + 60% arena block size. (the overflowing condition) 2. Unable to safely allocate one more arena block without hitting the overflowing condition AND the unused allocated memory < 25% arena block size. Test Plan: make all check Reviewers: sdong, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16893	12 years ago
sdong	c61c9830d4	Fix a bug that Prev() can hang. Summary: Prev() now can hang when there is a key with more than max_skipped number of appearance internally but all of them are newer than the sequence ID to seek. Add unit tests to confirm the bug and fix it. Test Plan: make all check Reviewers: igor, haobo Reviewed By: igor CC: ljin, yhchiang, leveldb Differential Revision: https://reviews.facebook.net/D16899	12 years ago
Lei Jin	0cf6c8f7ce	fix: use the correct edit when comparing log_number Summary: In the last fix, I forgot to point to the writer when comparing edit, which is apparently not correct. Test Plan: still running make whitebox_crash_test Reviewers: igor, haobo, igor2 Reviewed By: igor2 CC: leveldb Differential Revision: https://reviews.facebook.net/D16911	12 years ago
Lei Jin	453ec52ca1	journal log_number correctly in MANIFEST Summary: Here is what it can cause probelm: There is one memtable flush and one compaction. Both call LogAndApply(). If both edits are applied in the same batch with flush edit first and the compaction edit followed. LogAndApplyHelper() will assign compaction edit current VersionSet's log number(which should be smaller than the log number from flush edit). It cause log_numbers in MANIFEST to be not monotonic increasing, which violates the assume Recover() makes. What is more is after comitting to MANIFEST file, log_number_ in VersionSet is updated to the log_number from the last edit, which is the compaction one. It ends up not updating the log_number. Test Plan: make whitebox_crash_test got another assertion about iter->valid(), not sure if that is related to this. Reviewers: igor, haobo Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D16875	12 years ago
Caio SBA	b9c78d2db6	Make it compile on Debian/GCC 4.7	12 years ago
Igor Canadi	a782bb989e	Fix log_number in LogAndApply	12 years ago
Igor Canadi	928ee23567	Change WriteBatch interface	12 years ago
Igor Canadi	2bad3cb0db	Missing includes	12 years ago
Igor Canadi	db234133a9	[CF] WriteBatch to take in ColumnFamilyHandle Summary: Client doesn't need to know anything about ColumnFamily ID. By making WriteBatch take ColumnFamilyHandle as a parameter, we can eliminate method GetID() from ColumnFamilyHandle Test Plan: column_family_test Reviewers: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16887	12 years ago
Igor Canadi	3c75cc15a9	Fix HashSkipList and HashLinkedList SIGSEGV Summary: Original Summary: Yesterday, @ljin and I were debugging various db_stress issues. We suspected one of them happens when we concurrently call NewIterator without prefix_seek on HashSkipList. This test demonstrates it. Update: Arena is not thread-safe!! When creating a new full iterator, we have to create a new arena, otherwise we're doomed. Test Plan: SIGSEGV and assertion-throwing test now works! Reviewers: ljin, haobo, sdong Reviewed By: sdong CC: leveldb, ljin Differential Revision: https://reviews.facebook.net/D16857	12 years ago
Igor Canadi	6c72079d77	Fix warning on Mac OS	12 years ago
Igor Canadi	f0e1e3ebf1	CF cleanup part 2	12 years ago
Igor Canadi	f071a20f6e	Need more data in memtable to flush due to 11da8b	12 years ago
sdong	5aa81f04fa	Fix extra compaction tasks scheduled after D16767 in some cases Summary: With D16767, there is a case compaction tasks are scheduled infinitely: (1) no flush thread is configured and more than 1 compaction threads (2) a flush is going on by one compaction hread (3) the state of SST files is in the state that versions_->current()->NeedsCompaction() will generate a false positive (return true actually there is no work to be done) In that case, a infinite loop will be formed. This patch would fix it. Test Plan: make all check Reviewers: haobo, igor, ljin Reviewed By: igor CC: dhruba, yhchiang, leveldb Differential Revision: https://reviews.facebook.net/D16863	12 years ago
Kai Liu	11da8bc5df	A heuristic way to check if a memtable is full Summary: This is is based on https://reviews.facebook.net/D15027. It's not finished but I would like to give a prototype to avoid arena over-allocation while making better use of the already allocated memory blocks. Instead of check approximate memtable size, we will take a deeper look at the arena, which incorporate essential idea that @sdong suggests: flush when arena has allocated its last and the last is "almost full" Test Plan: N/A Reviewers: haobo, sdong Reviewed By: sdong CC: leveldb, sdong Differential Revision: https://reviews.facebook.net/D15051	12 years ago
Igor Canadi	25c8a1a20f	More bug fixed introduced by code cleanup	12 years ago
Igor Canadi	b5d6ad69fc	Bug fixes introduced by code cleanup	12 years ago
Igor Canadi	fb2346fc1f	[CF] Code cleanup part 1 Summary: I'm cleaning up some code preparing for the big diff review tomorrow. This is the first part of the cleanup. Changes are mostly cosmetic. The goal is to decrease amount of code difference between columnfamilies and master branch. This diff also fixes race condition when dropping column family. Test Plan: Ran db_stress with variety of parameters Reviewers: dhruba, haobo Differential Revision: https://reviews.facebook.net/D16833	12 years ago
Igor Canadi	45ad75db80	Correct version of D16821	12 years ago
Igor Canadi	2b95dc1542	Revert "Fix bad merge of D16791 and D16767" This reverts commit `839c8ecfcd`.	12 years ago
sdong	839c8ecfcd	Fix bad merge of D16791 and D16767 Summary: A bad Auto-Merge caused log buffer is flushed twice. Remove the unintended one. Test Plan: Should already be tested (the code looks the same as when I ran unit tests). Reviewers: haobo, igor Reviewed By: haobo CC: ljin, yhchiang, leveldb Differential Revision: https://reviews.facebook.net/D16821	12 years ago
sdong	bd45633b71	Fix data race against logging data structure because of LogBuffer Summary: @igor pointed out that there is a potential data race because of the way we use the newly introduced LogBuffer. After "bg_compaction_scheduled_--" or "bg_flush_scheduled_--", they can both become 0. As soon as the lock is released after that, DBImpl's deconstructor can go ahead and deconstruct all the states inside DB, including the info_log object hold in a shared pointer of the options object it keeps. At that point it is not safe anymore to continue using the info logger to write the delayed logs. With the patch, lock is released temporarily for log buffer to be flushed before "bg_compaction_scheduled_--" or "bg_flush_scheduled_--". In order to make sure we don't miss any pending flush or compaction, a new flag bg_schedule_needed_ is added, which is set to be true if there is a pending flush or compaction but not scheduled because of the max thread limit. If the flag is set to be true, the scheduling function will be called before compaction or flush thread finishes. Thanks @igor for this finding! Test Plan: make all check Reviewers: haobo, igor Reviewed By: haobo CC: dhruba, ljin, yhchiang, igor, leveldb Differential Revision: https://reviews.facebook.net/D16767	12 years ago
Igor Canadi	d833f15738	Fix bug in VersionEdit::DebugString()	12 years ago
Igor Canadi	37472bb279	Add MaxColumnFamily to VersionEdit::DebugString()	12 years ago
Igor Canadi	457c78eb89	[CF] db_stress for column families Summary: I had this diff for a while to test column families implementation. Last night, I ran it sucessfully for 10 hours with the command: time ./db_stress --threads=30 --ops_per_thread=200000000 --max_key=5000 --column_families=20 --clear_column_family_one_in=3000000 --verify_before_write=1 --reopen=50 --max_background_compactions=10 --max_background_flushes=10 --db=/tmp/db_stress It is ready to be committed :) Test Plan: Ran it for 10 hours Reviewers: dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16797	12 years ago
sdong	6c66bc08d9	Temp Fix of LogBuffer flushing Summary: To temp fix the log buffer flushing. Flush the buffer inside the lock. Clean the trunk before we find an eventual fix. Test Plan: make all check Reviewers: haobo, igor Reviewed By: igor CC: ljin, leveldb, yhchiang Differential Revision: https://reviews.facebook.net/D16791	12 years ago
Igor Canadi	cb9802168f	Add a comment after SignalAll() Summary: Having code after SignalAll has already caused 2 bugs. Let's make sure this doesn't happen again. Test Plan: no test Reviewers: sdong, dhruba, haobo Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16785	12 years ago
Igor Canadi	dad8603fc4	[CF] Fix column family dropping Summary: Column family should be dropped after the change has been commited Test Plan: db stress Reviewers: dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16779	12 years ago
Igor Canadi	d5de22dc09	Call PurgeObsoleteFiles() only when HaveSomethingToDelete() Summary: as title Test Plan: fixed the build failure http://ci-builds.fb.com/job/rocksdb_build/987/console Reviewers: haobo Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16743	12 years ago
sdong	fac58c0504	DBTest: remove perf_context's time > 0 check Summary: DBTest checks perf_context.seek_internal_seek_time > 0 and perf_context.find_next_user_entry_time > 0, which is not reliable. Remove them. Test Plan: ./db_test Reviewers: igor, haobo, ljin Reviewed By: igor CC: dhruba, yhchiang, leveldb Differential Revision: https://reviews.facebook.net/D16737	12 years ago
Haobo Xu	a91aed615a	[RocksDB] Minor cleanup of PurgeObsoleteFiles Summary: as title. also made info log output of file deletion a bit more descriptive. Test Plan: make check; db_bench and look at LOG output Reviewers: igor Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D16731	12 years ago
Lei Jin	8d007b4aaf	Consolidate SliceTransform object ownership Summary: (1) Fix SanitizeOptions() to also check HashLinkList. The current dynamic case just happens to work because the 2 classes have the same layout. (2) Do not delete SliceTransform object in HashSkipListFactory and HashLinkListFactory destructor. Reason: SanitizeOptions() enforces prefix_extractor and SliceTransform to be the same object when HashFactory is used. This makes the behavior strange: when HashFactory is used, prefix_extractor will be released by RocksDB. If other memtable factory is used, prefix_extractor should be released by user. Test Plan: db_bench && make asan_check Reviewers: haobo, igor, sdong Reviewed By: igor CC: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D16587	12 years ago
Haobo Xu	9e0e6aa7f6	[RocksDB] make sure KSVObsolete does not get accessed as a valid pointer. Summary: KSVObsolete is no longer nullptr and needs to be checked explicitly. Also did some minor code cleanup and added a stat counter to track superversion cleanups incurred in the foreground. Test Plan: make check Reviewers: ljin Reviewed By: ljin CC: leveldb Differential Revision: https://reviews.facebook.net/D16701	12 years ago
Haobo Xu	66da467983	[RocksDB] LogBuffer Cleanup Summary: Moved LogBuffer class to an internal header. Removed some unneccesary indirection. Enabled log buffer for BackgroundCallFlush. Forced log buffer flush right after Unlock to improve time ordering of info log. Test Plan: make check; db_bench compare LOG output Reviewers: sdong Reviewed By: sdong CC: leveldb, igor Differential Revision: https://reviews.facebook.net/D16707	12 years ago
Igor Canadi	04d2c26e17	Add option verify_checksums_in_compaction Summary: If verify_checksums_in_compaction is true, compaction will verify checksums. This is default. If it's false, compaction doesn't verify checksums. This is useful for in-memory workloads. Test Plan: corruption_test Reviewers: dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D16695	12 years ago
Igor Canadi	d4f2c610d3	Ignore dropped column families -- don't flush or compact them	12 years ago
Igor Canadi	9f15092ebd	[CF] NewIterators Summary: Adding the last missing function -- NewIterators(). Pretty simple implementation Test Plan: added a unit test Reviewers: dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16689	12 years ago
Lei Jin	e5fa4944fc	use CAS when returning SuperVersion to ThreadLocal Summary: Add a check at the end of GetImpl to release SuperVersion if it becomes obsolete. Also do Scrape() inside InstallSuperVersion so it happens more frequent. Test Plan: make all check running asan_check now Reviewers: igor, haobo, sdong, dhruba Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16641	12 years ago
Igor Canadi	eec8695206	Delete local sv when destroying DB from stress test Summary: Not deleting local SV caused some an crash test issue: http://ci-builds.fb.com/job/rocksdb_asan_crash_test/83/console Test Plan: ran unit tests Reviewers: ljin Reviewed By: ljin CC: leveldb Differential Revision: https://reviews.facebook.net/D16635	12 years ago
sdong	ecb1ffa2a8	Buffer info logs when picking compactions and write them out after releasing the mutex Summary: Now while the background thread is picking compactions, it writes out multiple info_logs, especially for universal compaction, which introduces a chance of waiting log writing in mutex, which is bad. To remove this risk, write all those info logs to a buffer and flush it after releasing the mutex. Test Plan: make all check check the log lines while running some tests that trigger compactions. Reviewers: haobo, igor, dhruba Reviewed By: dhruba CC: i.am.jin.lei, dhruba, yhchiang, leveldb, nkg- Differential Revision: https://reviews.facebook.net/D16515	12 years ago
Igor Canadi	e2dd148a8b	Fix compile fail introduced by merge	12 years ago
Igor Canadi	a329dd1b25	Fix TEST_Destroy_DBImpl() to work with column families	12 years ago
Igor Canadi	9625acbf70	[CF] Dont reuse dropped column family IDs Summary: Column family IDs should be unique, even if column family is dropped. To achieve this, we save max column family in manifest. Note that the diff is still not ready. I'm only using differential to move the patch to my Mac machine. Test Plan: added a test to column_family_test Reviewers: dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16581	12 years ago

1 2 3 4 5 ...

810 Commits (69aa6ecb269f1fc7dfa3d97e3cf5e9fb4f8556de)