rocksdb

Commit Graph

Author	SHA1	Message	Date
Yi Wu	81747f1be6	Refactor MutableCFOptions Summary: * Change constructor of MutableCFOptions to depends only on ColumnFamilyOptions. * Move `max_subcompactions`, `compaction_options_fifo` and `compaction_pri` to ImmutableCFOptions to make it clear that they are immutable. Test Plan: existing unit tests. Reviewers: yhchiang, IslamAbdelRahman, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D63945	8 years ago
sdong	32149059f9	Merge options source_compaction_factor, max_grandparent_overlap_bytes and expanded_compaction_factor into max_compaction_bytes Summary: To reduce number of options, merge source_compaction_factor, max_grandparent_overlap_bytes and expanded_compaction_factor into max_compaction_bytes. Test Plan: Add two new unit tests. Run all existing tests, including jtest. Reviewers: yhchiang, igor, IslamAbdelRahman Reviewed By: IslamAbdelRahman Subscribers: leveldb, andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D59829	8 years ago
Islam AbdelRahman	ccecf3f4fb	UniversalCompaction should ignore sorted runs being compacted (when compacting for file num) Summary: If we have total number of sorted runs greater than level0_file_num_compaction_trigger, Universal compaction will always issue a compaction even if the number of sorted runs that are not being compacted is less than level0_file_num_compaction_trigger. This diff changes this behaviour to relay on the `number of sorted runs not being compacted` instead of `total number of sorted runs` Test Plan: New unit test Reviewers: sdong Reviewed By: sdong Subscribers: andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D61533	8 years ago
sdong	0ce258f9b3	Compaction picker to expand output level files for keys cross files' boundary too. Summary: We may wrongly drop delete operation if we pick a file with the entry to be delete, the put entry of the same user key is in the next file in the level, and the next file is not picked. We expand compaction inputs for output level too. Test Plan: Add unit tests that reproduct the bug of dropping delete entry. Change compaction_picker_test to assert the new behavior. Reviewers: IslamAbdelRahman, igor Reviewed By: igor Subscribers: leveldb, andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D61173	8 years ago
sdong	c4cef07f1b	Update DBTestUniversalCompaction.UniversalCompactionSingleSortedRun to use max_size_amplification_percent = 0 Summary: With max_size_amplification_percent = 0 to make sure that DBTestUniversalCompaction.UniversalCompactionSingleSortedRun tests the configuration to compact to one single sorted run. Test Plan: Run all existing tests Reviewers: yhchiang, andrewkr, IslamAbdelRahman Reviewed By: IslamAbdelRahman Subscribers: leveldb, andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D60021	8 years ago
Ashish Shenoy	99765ed855	Clean up the ComputeCompactionScore() API Summary: Make CompactionOptionsFIFO a part of mutable_cf_options Test Plan: UT Reviewers: sdong Reviewed By: sdong Subscribers: andrewkr, lgalanis, dhruba Differential Revision: https://reviews.facebook.net/D58653	9 years ago
Aaron Gao	43afd72bee	[rocksdb] make more options dynamic Summary: make more ColumnFamilyOptions dynamic: - compression - soft_pending_compaction_bytes_limit - hard_pending_compaction_bytes_limit - min_partial_merge_operands - report_bg_io_stats - paranoid_file_checks Test Plan: Add sanity check in `db_test.cc` for all above options except for soft_pending_compaction_bytes_limit and hard_pending_compaction_bytes_limit. All passed. Reviewers: andrewkr, sdong, IslamAbdelRahman Reviewed By: IslamAbdelRahman Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D57519	9 years ago
Islam AbdelRahman	4b31723433	Add bottommost_compression option Summary: Add a new option that can be used to set a specific compression algorithm for bottommost level. This option will only affect levels larger than base level. I have also updated CompactionJobInfo to include the compression algorithm used in compaction Test Plan: added new unittest existing unittests Reviewers: andrewkr, yhchiang, sdong Reviewed By: sdong Subscribers: lightmark, andrewkr, dhruba, yoshinorim Differential Revision: https://reviews.facebook.net/D57669	9 years ago
sdong	ef204df7ef	Compaction always needs to be removed from level0_compactions_in_progress_ for universal compaction Summary: We always put compaction to level0_compactions_in_progress_ for universal compaction, so we should also remove it. The bug causes assert failure when running manual compaction. Test Plan: TEST_TMPDIR=/dev/shm/ ./db_bench --benchmarks=fillrandom,compact --subcompactions=16 --compaction_style=1 always fails on my host. After the fix, it doesn't fail any more. Reviewers: IslamAbdelRahman, andrewkr, kradhakrishnan, yhchiang Reviewed By: yhchiang Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D55017	9 years ago
sdong	b5b1db167a	Recompute compaction score after scheduling manual compaction Summary: After we made manual compaction runnable concurrently with automaticallly compaction, we need to run ComputeCompactionScore() to prepare a coming compaction picking call before the compaction finishes. Test Plan: Run existing tests. Reviewers: yhchiang, IslamAbdelRahman, andrewkr, kradhakrishnan, anthony, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D54891	9 years ago
Baraa Hamodi	21e95811d1	Updated all copyright headers to the new format.	9 years ago
sdong	235b162be1	Not scheduling more L1->L2 compaction if L0->L1 is pending with higher priority Summary: When L0->L1 is pending, there may be one L1->L2 compaction going on which prevents the L0->L1 compaction from happening. If L1 needs more data to be moved to L2, then we may continue scheduling more L1->L2 compactions. The end result may be that L0->L1 compaction will not happen until L1 size drops to below target size. We can reduce the stalling because of number of L0 files by stopping schedling new L1->L2 compaction when L0's score is higher than L1. Test Plan: Run all existing tests. Reviewers: yhchiang, MarkCallaghan, rven, anthony, IslamAbdelRahman, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D52401	9 years ago
Islam AbdelRahman	d005c66faf	Report compaction reason in CompactionListener Summary: Add CompactionReason to CompactionJobInfo This will allow users to understand why compaction started which will help options tuning Test Plan: added new tests make check -j64 Reviewers: yhchiang, anthony, kradhakrishnan, sdong, rven Reviewed By: rven Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D51975	9 years ago
Zhipeng Jia	131f7ddf63	fix typo: sr to picking_sr	9 years ago
Gunnar Kudrjavets	97265f5f14	Fix minor bugs in delete operator, snprintf, and size_t usage Summary: List of changes: 1) Fix the snprintf() usage in cases where wrong variable was used to determine the output buffer size. 2) Remove unnecessary checks before calling delete operator. 3) Increase code correctness by using size_t type when getting vector's size. 4) Unify the coding style by removing namespace::std usage at the top of the file to confirm to the majority usage. 5) Fix various lint errors pointed out by 'arc lint'. Test Plan: Code review and build: git diff make clean make -j 32 commit-prereq arc lint Reviewers: kradhakrishnan, sdong, rven, anthony, yhchiang, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D51849	9 years ago
Venkatesh Radhakrishnan	030215bf01	Running manual compactions in parallel with other automatic or manual compactions in restricted cases Summary: This diff provides a framework for doing manual compactions in parallel with other compactions. We now have a deque of manual compactions. We also pass manual compactions as an argument from RunManualCompactions down to BackgroundCompactions, so that RunManualCompactions can be reentrant. Parallelism is controlled by the two routines ConflictingManualCompaction to allow/disallow new parallel/manual compactions based on already existing ManualCompactions. In this diff, by default manual compactions still have to run exclusive of other compactions. However, by setting the compaction option, exclusive_manual_compaction to false, it is possible to run other compactions in parallel with a manual compaction. However, we are still restricted to one manual compaction per column family at a time. All of these restrictions will be relaxed in future diffs. I will be adding more tests later. Test Plan: Rocksdb regression + new tests + valgrind Reviewers: igor, anthony, IslamAbdelRahman, kradhakrishnan, yhchiang, sdong Reviewed By: sdong Subscribers: yoshinorim, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D47973	9 years ago
sdong	dac5b248b1	UniversalCompactionPicker::PickCompaction(): avoid to form compactions if there is no file Summary: Currently RocksDB may break in lines like this: for (size_t i = sorted_runs.size() - 1; i >= first_index_after; i--) { if options.level0_file_num_compaction_trigger=0. Fix it by not executing the logic of picking compactions if there is no file (sorted_runs.size() = 0). Also internally set options.level0_file_num_compaction_trigger=1 if users give a 0. 0 is a value makes no sense in RocksDB. Test Plan: Run all tests. Will add a unit test too. Reviewers: yhchiang, IslamAbdelRahman, anthony, kradhakrishnan, rven Reviewed By: rven Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D50727	9 years ago
Venkatesh Radhakrishnan	2ae4d7d708	Make sure that CompactFiles does not run two parallel Level 0 compactions Summary: Since level 0 files can overlap, two level 0 compactions cannot run in parallel. Compact files needs to check this before running a compaction. Test Plan: CompactFilesTest.L0ConflictsFiles Reviewers: igor, IslamAbdelRahman, anthony, sdong, yhchiang Reviewed By: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D50079	9 years ago
sdong	f1b9f804e9	Add a mode to always pick the oldest file to compact for each level Summary: Add options.compaction_pri, which specifies the policy about which file to compact first. kCompactionPriByLargestSeq will compact oldest files first. Verified the behavior in db_bench but did not write unit tests yet. Also need to make it settable through option string and dynamically changeable. Test Plan: Will write unit tests Reviewers: igor, rven, anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, MarkCallaghan Reviewed By: yhchiang Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D45951	9 years ago
Mayank Pundir	a5e312a7a4	Improving condition for bottommost level during compaction Summary: The diff modifies the condition checked to determine the bottommost level during compaction. Previously, absence of files in higher levels alone was used as the condition. Now, the function additionally evaluates if the higher levels have files which have non-overlapping key ranges, then the level can be safely considered as the bottommost level. Test Plan: Unit test cases added and passing. However, unit tests of universal compaction are failing as a result of the changes made in this diff. Need to understand why that is happening. Reviewers: igor Subscribers: dhruba, sdong, lgalanis, meyering Differential Revision: https://reviews.facebook.net/D46473	9 years ago
Andres Notzli	09d982f9e0	Fix compact_files_example Summary: See task #7983654. The example was triggering an assert in compaction job because the compaction was not marked as manual. With this patch, CompactionPicker::FormCompaction() marks compactions as manual. This patch also fixes a couple of typos, adds optimistic_transaction_example to .gitignore and librocksdb as a dependency for examples. Adding librocksdb as a dependency makes sure that the examples are built with the latest changes in librocksdb. Test Plan: make clean && cd examples && make all && ./compact_files_example Reviewers: rven, sdong, anthony, igor, yhchiang Reviewed By: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D45117	9 years ago
Venkatesh Radhakrishnan	20b244fcca	Fix CompactFiles by adding all necessary files Summary: The compact files API had a bug where some overlapping files are not added. These are files which overlap with files which were added to the compaction input files, but not to the original set of input files. This happens only when there are more than two levels involved in the compaction. An example will illustrate this better. Level 2 has 1 input file 1.sst which spans [20,30]. Level 3 has added file 2.sst which spans [10,25] Level 4 has file 3.sst which spans [35,40] and input file 4.sst which spans [46,50]. The existing code would not add 3.sst to the set of input_files because it only becomes an overlapping file in level 4 and it wasn't one in level 3. When installing the results of the compaction, 3.sst would overlap with output file from the compact files and result in the assertion in version_set.cc:1130 // Must not overlap assert(level <= 0 \|\| level_files->empty() \|\| internal_comparator_->Compare( (level_files)[level_files->size() - 1]->largest, f->smallest) < 0); This change now adds overlapping files from the current level to the set of input files also so that we don't hit the assertion above. Test Plan: d=/tmp/j; rm -rf $d; seq 1000 \| parallel --gnu --eta 'd=/tmp/j/d-{}; mkdir -p $d; TEST_TMPDIR=$d ./db_compaction_test --gtest_filter=CompactilesOnLevel* --gtest_also_run_disabled_tests >& '$d'/log-{}' Reviewers: igor, yhchiang, sdong Reviewed By: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D43437	9 years ago
Poornima Chozhiyath Raman	1bdfcef7bf	Fix when output level is 0 of universal compaction with trivial move Summary: Fix for universal compaction with trivial move, when the ouput level is 0. The tests where failing. Fixed by allowing normal compaction when output level is 0. Test Plan: modified test cases run successfully. Reviewers: sdong, yhchiang, IslamAbdelRahman Reviewed By: IslamAbdelRahman Subscribers: anthony, kradhakrishnan, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D42933	9 years ago
krad	c613960690	Build fix. Summary: gcc-4.9-glibc-2.20 complains about uninitialized variable. db/compaction_picker.cc: In member function 'bool rocksdb::CompactionPicker::IsInputNonOverlapping(rocksdb::Compaction*)': db/compaction_picker.cc:1174:17: error: 'prev.rocksdb::{anonymous}::InputFileInfo::f' may be used uninitialized in this function [-Werror=maybe-uninitialized] InputFileInfo prev, curr, next; Test Plan: pmake on local environment Reviewers: sdong igor CC: leveldb@ Task ID: # Blame Rev:	9 years ago
Islam AbdelRahman	e290f5d3ce	Block reduce_levels_test in ROCKSDB_LITE Summary: Block reduce_levels_test in ROCKSDB_LITE as LDBCommand is not supported Test Plan: make reduce_levels_test -j64 OPT=-DROCKSDB_LITE make reduce_levels_test -j64 make check -j64 Reviewers: sdong, igor, yhchiang Reviewed By: yhchiang Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D41967	9 years ago
sdong	f9728640f3	"make format" against last 10 commits Summary: This helps Windows port to format their changes, as discussed. Might have formatted some other codes too becasue last 10 commits include more. Test Plan: Build it. Reviewers: anthony, IslamAbdelRahman, kradhakrishnan, yhchiang, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D41961	9 years ago
Poornima Chozhiyath Raman	411c8e3d19	Build fail fix Summary: Build fail fix. Type cast issues. Test Plan: compiled Reviewers: sdong, yhchiang Reviewed By: yhchiang Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D41349	9 years ago
Poornima Chozhiyath Raman	c0b23dd5b0	Enabling trivial move in universal compaction Summary: This change enables trivial move if all the input files are non onverlapping while doing Universal Compaction. Test Plan: ./compaction_picker_test and db_test ran successfully with the new testcases. Reviewers: sdong Reviewed By: sdong Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D40875	9 years ago
Dmitri Smirnov	18285c1e2f	Windows Port from Microsoft Summary: Make RocksDb build and run on Windows to be functionally complete and performant. All existing test cases run with no regressions. Performance numbers are in the pull-request. Test plan: make all of the existing unit tests pass, obtain perf numbers. Co-authored-by: Praveen Rao praveensinghrao@outlook.com Co-authored-by: Sherlock Huang baihan.huang@gmail.com Co-authored-by: Alex Zinoviev alexander.zinoviev@me.com Co-authored-by: Dmitri Smirnov dmitrism@microsoft.com	9 years ago
Igor Canadi	a84df655f3	Don't let two L0->L1 compactions run in parallel Summary: With experimental feature SuggestCompactRange() we don't restrict running two L0->L1 compactions in parallel. This diff fixes this. Test Plan: added a unit test to reproduce the failure. fixed the unit test Reviewers: yhchiang, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D39981	10 years ago
sdong	6fa7085121	CompactRange skips levels 1 to base_level -1 for dynamic level base size Summary: CompactRange() now is much more expensive for dynamic level base size as it goes through all the levels. Skip those not used levels between level 0 an base level. Test Plan: Run all unit tests Reviewers: yhchiang, rven, anthony, kradhakrishnan, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D37125	10 years ago
Igor Canadi	7413306d94	Take a chance on a random file when choosing compaction Summary: When trying to compact entire database with SuggestCompactRange(), we'll first try the left-most files. This is pretty bad, because: 1) the left part of LSM tree will be overly compacted, but right part will not be touched 2) First compaction will pick up the left-most file. Second compaction will try to pick up next left-most, but this will not be possible, because there's a big chance that second's file range on N+1 level is already being compacted. I observe both of those problems when running Mongo+RocksDB and trying to compact the DB to clean up tombstones. I'm unable to clean them up :( This diff adds a bit of randomness into choosing a file. First, it chooses a file at random and tries to compact that one. This should solve both problems specified here. Test Plan: make check Reviewers: yhchiang, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38379	10 years ago
Igor Canadi	b5881762bc	Reset parent_index and base_index when picking files marked for compaction Summary: This caused a crash of our MongoDB + RocksDB instance. PickCompactionBySize() sets its own parent_index. We never reset this parent_index when picking PickFilesMarkedForCompactionExperimental(). So we might end up doing SetupOtherInputs() with parent_index that was set by PickCompactionBySize, although we're using compaction calculated using PickFilesMarkedForCompactionExperimental. Test Plan: Added a unit test that fails with assertion on master. Reviewers: yhchiang, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38337	10 years ago
Igor Canadi	d978139063	SuggestCompactRange() is manual compaction Summary: When reporting compaction that was started because of SuggestCompactRange() we should treat it as manual compaction. Test Plan: none Reviewers: yhchiang, rven Reviewed By: rven Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38139	10 years ago
Liangjun Feng	9aa011fa36	Optimize GetRange Function Summary: Optimize GetRange Function by checking the level of the files Test Plan: pass make all check Reviewers: rven, yhchiang, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D37977	10 years ago
clark.kang	6ede020dc4	fix typos	10 years ago
sdong	d01bbb53ae	Fix CompactRange for universal compaction with num_levels > 1 Summary: CompactRange for universal compaction with num_levels > 1 seems to have a bug. The unit test also has a bug so it doesn't capture the problem. Fix it. Revert the compact range to the logic equivalent to num_levels=1. Always compact all files together. It should also fix DBTest.IncreaseUniversalCompactionNumLevels. The issue was that options.write_buffer_size = 100 << 10 and options.write_buffer_size = 100 << 10 are not used in later test scenarios. So write_buffer_size of 4MB was used. The compaction trigger condition is not anymore obvious as expected. Test Plan: Run the new test and all test suites Reviewers: yhchiang, rven, kradhakrishnan, anthony, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D37551	10 years ago
Igor Canadi	aa14670b27	Add an assertion in CompactionPicker Summary: Reading CompactionPicker I noticed this dangerous substraction of two unsigned integers. We should assert to mark this as safe. Test Plan: make check Reviewers: anthony, rven, yhchiang, sdong Reviewed By: sdong Subscribers: kradhakrishnan, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D37041	10 years ago
Igor Canadi	6059bdf86a	Add experimental API MarkForCompaction() Summary: Some Mongo+Rocks datasets in Parse's environment are not doing compactions very frequently. During the quiet period (with no IO), we'd like to schedule compactions so that our reads become faster. Also, aggressively compacting during quiet periods helps when write bursts happen. In addition, we also want to compact files that are containing deleted key ranges (like old oplog keys). All of this is currently not possible with CompactRange() because it's single-threaded and blocks all other compactions from happening. Running CompactRange() risks an issue of blocking writes because we generate too much Level 0 files before the compaction is over. Stopping writes is very dangerous because they hold transaction locks. We tried running manual compaction once on Mongo+Rocks and everything fell apart. MarkForCompaction() solves all of those problems. This is very light-weight manual compaction. It is lower priority than automatic compactions, which means it shouldn't interfere with background process keeping the LSM tree clean. However, if no automatic compactions need to be run (or we have extra background threads available), we will start compacting files that are marked for compaction. Test Plan: added a new unit test Reviewers: yhchiang, rven, MarkCallaghan, sdong Reviewed By: sdong Subscribers: yoshinorim, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D37083	10 years ago
Igor Canadi	00c2afcd38	Fix bug in ExpandWhileOverlapping() Summary: If ExpandWhileOverlapping() we don't clear inputs. That's a bug introduced by my recent patch https://reviews.facebook.net/D36687. However, we have no tests covering ExpandWhileOverlapping(). I created a task t6771252 to add ExpandWhileOverlapping() tests. Test Plan: make check Reviewers: sdong, rven, yhchiang Reviewed By: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D37077	10 years ago
sdong	fcb206b667	SyncPoint to allow a callback with an argument and use it to get DBTest.DynamicLevelCompressionPerLevel2 more straight-forward Summary: Allow users to give a callback function with parameter using sync point, so more complicated verification can be done in tests. Use it in DBTest.DynamicLevelCompressionPerLevel2 so that failures will be more easy to debug. Test Plan: Run all tests. Run DBTest.DynamicLevelCompressionPerLevel2 with valgrind check. Reviewers: rven, yhchiang, anthony, kradhakrishnan, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D36999	10 years ago
Igor Canadi	590fadc407	Fix compile warning on CLANG Summary: oops Test Plan: compiles now Reviewers: sdong, yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D36867	10 years ago
Igor Canadi	47b8743984	Make Compaction class easier to use Summary: The goal of this diff is to make Compaction class easier to use. This should also make new compaction algorithms easier to write (like CompactFiles from @yhchiang and dynamic leveled and multi-leveled universal from @sdong). Here are couple of things demonstrating that Compaction class is hard to use: 1. we have two constructors of Compaction class 2. there's this thing called grandparents_, but it appears to only be setup for leveled compaction and not compactfiles 3. it's easy to introduce a subtle and dangerous bug like this: D36225 4. SetupBottomMostLevel() is hard to understand and it shouldn't be. See this comment: `afbafeaeae/db/compaction.cc (L236-L241)`. It also made it harder for @yhchiang to write CompactFiles, as evidenced by this: `afbafeaeae/db/compaction_picker.cc (L204-L210)` The problem is that we create Compaction object, which holds a lot of state, and then pass it around to some functions. After those functions are done mutating, then we call couple of functions on Compaction object, like SetupBottommostLevel() and MarkFilesBeingCompacted(). It is very hard to see what's happening with all that Compaction's state while it's travelling across different functions. If you're writing a new PickCompaction() function you need to try really hard to understand what are all the functions you need to run on Compaction object and what state you need to setup. My proposed solution is to make important parts of Compaction immutable after construction. PickCompaction() should calculate compaction inputs and then pass them onto Compaction object once they are finalized. That makes it easy to create a new compaction -- just provide all the parameters to the constructor and you're done. No need to call confusing functions after you created your object. This diff doesn't fully achieve that goal, but it comes pretty close. Here are some of the changes: * have one Compaction constructor instead of two. * inputs_ is constant after construction * MarkFilesBeingCompacted() is now private to Compaction class and automatically called on construction/destruction. * SetupBottommostLevel() is gone. Compaction figures it out on its own based on the input. * CompactionPicker's functions are not passing around Compaction object anymore. They are only passing around the state that they need. Test Plan: make check make asan_check make valgrind_check Reviewers: rven, anthony, sdong, yhchiang Reviewed By: yhchiang Subscribers: sdong, yhchiang, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D36687	10 years ago
Venkatesh Radhakrishnan	afbafeaeae	Disallow trivial move if compression level is different Summary: Check compression level of start_level with output_compression before allowing trivial move Test Plan: New DBTest CompressLevelCompactionThirdPath added Reviewers: igor, yhchiang, IslamAbdelRahman, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D36213	10 years ago
sdong	b23bbaa82a	Universal Compactions with Small Files Summary: With this change, we use L1 and up to store compaction outputs in universal compaction. The compaction pick logic stays the same. Outputs are stored in the largest "level" as possible. If options.num_levels=1, it behaves all the same as now. Test Plan: 1) convert most of existing unit tests for universal comapaction to include the option of one level and multiple levels. 2) add a unit test to cover parallel compaction in universal compaction and run it in one level and multiple levels 3) add unit test to migrate from multiple level setting back to one level setting 4) add a unit test to insert keys to trigger multiple rounds of compactions and verify results. Reviewers: rven, kradhakrishnan, yhchiang, igor Reviewed By: igor Subscribers: meyering, leveldb, MarkCallaghan, dhruba Differential Revision: https://reviews.facebook.net/D34539	10 years ago
Igor Canadi	52e0f3353f	Clean up compactions_in_progress_ Summary: Suprisingly, the only way we use this vector is to keep track of level0 compactions. Thus, I simplified it. Test Plan: make check Reviewers: rven, yhchiang, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35313	10 years ago
sdong	e9de8b65a6	Change the way options.compression_per_level is used when options.level_compaction_dynamic_level_bytes=true Summary: Change the way options.compression_per_level is used when options.level_compaction_dynamic_level_bytes=true so that options.compression_per_level[1] determines compression for the level L0 is merged to, options.compression_per_level[2] to the level after that, etc. Test Plan: run all tests Reviewers: rven, yhchiang, kradhakrishnan, igor Reviewed By: igor Subscribers: yoshinorim, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D34431	10 years ago
Igor Canadi	db03739340	options.level_compaction_dynamic_level_bytes to allow RocksDB to pick size bases of levels dynamically. Summary: When having fixed max_bytes_for_level_base, the ratio of size of largest level and the second one can range from 0 to the multiplier. This makes LSM tree frequently irregular and unpredictable. It can also cause poor space amplification in some cases. In this improvement (proposed by Igor Kabiljo), we introduce a parameter option.level_compaction_use_dynamic_max_bytes. When turning it on, RocksDB is free to pick a level base in the range of (options.max_bytes_for_level_base/options.max_bytes_for_level_multiplier, options.max_bytes_for_level_base] so that real level ratios are close to options.max_bytes_for_level_multiplier. Test Plan: New unit tests and pass tests suites including valgrind. Reviewers: MarkCallaghan, rven, yhchiang, igor, ikabiljo Reviewed By: ikabiljo Subscribers: yoshinorim, ikabiljo, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D31437	10 years ago
Igor Canadi	e39f4f6cf9	Fix data race #3 Summary: Added requirement that ComputeCompactionScore() be executed in mutex, since it's accessing being_compacted bool, which can be mutated by other threads. Also added more comments about thread safety of FileMetaData, since it was a bit confusing. However, it seems that FileMetaData doesn't have data races (except being_compacted) Test Plan: Ran 100 ConvertCompactionStyle tests with thread sanitizer. On master -- some failures. With this patch -- none. Reviewers: yhchiang, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D32283	10 years ago
Igor Canadi	fdb6be4e24	Rewritten system for scheduling background work Summary: When scaling to higher number of column families, the worst bottleneck was MaybeScheduleFlushOrCompaction(), which did a for loop over all column families while holding a mutex. This patch addresses the issue. The approach is similar to our earlier efforts: instead of a pull-model, where we do something for every column family, we can do a push-based model -- when we detect that column family is ready to be flushed/compacted, we add it to the flush_queue_/compaction_queue_. That way we don't need to loop over every column family in MaybeScheduleFlushOrCompaction. Here are the performance results: Command: ./db_bench --write_buffer_size=268435456 --db_write_buffer_size=268435456 --db=/fast-rocksdb-tmp/rocks_lots_of_cf --use_existing_db=0 --open_files=55000 --statistics=1 --histogram=1 --disable_data_sync=1 --max_write_buffer_number=2 --sync=0 --benchmarks=fillrandom --threads=16 --num_column_families=5000 --disable_wal=1 --max_background_flushes=16 --max_background_compactions=16 --level0_file_num_compaction_trigger=2 --level0_slowdown_writes_trigger=2 --level0_stop_writes_trigger=3 --hard_rate_limit=1 --num=33333333 --writes=33333333 Before the patch: fillrandom : 26.950 micros/op 37105 ops/sec; 4.1 MB/s After the patch: fillrandom : 17.404 micros/op 57456 ops/sec; 6.4 MB/s Next bottleneck is VersionSet::AddLiveFiles, which is painfully slow when we have a lot of files. This is coming in the next patch, but when I removed that code, here's what I got: fillrandom : 7.590 micros/op 131758 ops/sec; 14.6 MB/s Test Plan: make check two stress tests: Big number of compactions and flushes: ./db_stress --threads=30 --ops_per_thread=20000000 --max_key=10000 --column_families=20 --clear_column_family_one_in=10000000 --verify_before_write=0 --reopen=15 --max_background_compactions=10 --max_background_flushes=10 --db=/fast-rocksdb-tmp/db_stress --prefixpercent=0 --iterpercent=0 --writepercent=75 --db_write_buffer_size=2000000 max_background_flushes=0, to verify that this case also works correctly ./db_stress --threads=30 --ops_per_thread=2000000 --max_key=10000 --column_families=20 --clear_column_family_one_in=10000000 --verify_before_write=0 --reopen=3 --max_background_compactions=3 --max_background_flushes=0 --db=/fast-rocksdb-tmp/db_stress --prefixpercent=0 --iterpercent=0 --writepercent=75 --db_write_buffer_size=2000000 Reviewers: ljin, rven, yhchiang, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D30123	10 years ago

1 2 3

121 Commits (3c21c64c7891be68846400ba7d8b1f088b663d04)