rocksdb

Commit Graph

Author	SHA1	Message	Date
sdong	9130873a13	Add options.new_table_reader_for_compaction_inputs Summary: Currently compaction inputs share the same file descriptor and table reader as other foreground threads. It makes fadvise works less predictable. Add options.new_table_reader_for_compaction_inputs to enforce to create a new file descriptor and new table reader for it. Test Plan: Add the option. Reviewers: rven, anthony, kradhakrishnan, IslamAbdelRahman, igor, yhchiang Reviewed By: igor Subscribers: igor, MarkCallaghan, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D43311	10 years ago
sdong	07d2d34160	Add a counter about estimated pending compaction bytes Summary: Add a counter of estimated bytes the DB needs to compact for all the compactions to finish. Expose it as a DB Property. In the future, we can use threshold of this counter to replace soft rate limit and hard rate limit. A single threshold of estimated compaction debt in bytes will be easier for users to reason about when should slow down and stopping than more abstract soft and hard rate limits. Test Plan: Add unit tests Reviewers: IslamAbdelRahman, yhchiang, rven, kradhakrishnan, anthony, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D44205	10 years ago
Yueh-Hsuan Chiang	a203b913c1	Fixed a rare deadlock in DBTest.ThreadStatusFlush Summary: Currently, ThreadStatusFlush uses two sync-points to ensure there's a flush currently running when calling GetThreadList(). However, one of the sync-point is inside db-mutex, which could cause deadlock in case there's a DB::Get() call. This patch fix this issue by moving the sync-point to a better place where the flush job does not hold the mutex. Test Plan: db_test Reviewers: igor, sdong, anthony, IslamAbdelRahman Reviewed By: IslamAbdelRahman Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D45045	10 years ago
Dmitri Smirnov	5bf8907622	More indent adjustment.	10 years ago
Dmitri Smirnov	e2a9f43d64	Adjust indent	10 years ago
Dmitri Smirnov	1cac89c9b1	Address windows build issues Intro SubCompactionState move functionality =delete copy functionality #ifdef SyncPoint in tests for Windows Release builds	10 years ago
Dmitri Smirnov	f25f06ddd2	Address windows build issues Intro SubCompactionState move functionality =delete copy functionality #ifdef SyncPoint in tests for Windows Release builds	10 years ago
Islam AbdelRahman	027ca5b2cd	Total SST files size DB Property Summary: Add a new DB property that calculate the total size of files used by all RocksDB Versions Test Plan: Unittests for the new property Reviewers: igor, yhchiang, anthony, rven, kradhakrishnan, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D44799	10 years ago
Andres Noetzli	b604d2562f	Removing unused variables to fix build Summary: Removing two unused variables that prevented compilation. Test Plan: make all Reviewers: rven, sdong, yhchiang, anthony, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D44991	10 years ago
Venkatesh Radhakrishnan	1b114eed4d	Free file iterators for files which are above the iterate upper bound to Improve memory utilization Summary: This diff improves the memory utilization for tailing iterators RocksDB, by freeing file iterators which are over the upper bound. It is an updating on Siying's original diff for improving the memory usage for tailing iterators. The changes for the seek and next path are now complete and a test has been added to exercise these paths while deleting file iterators which are above the upper bound. Test Plan: db_tailing_iter_test.TailingIteratorTrimSeekToNext Reviewers: march, tnovak, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D43833	10 years ago
Islam AbdelRahman	3fd70b05b8	Rate limit deletes issued by DestroyDB Summary: Update DestroyDB so that all SST files in the first path id go through DeleteScheduler instead of being deleted immediately Test Plan: added a unittest Reviewers: igor, yhchiang, anthony, kradhakrishnan, rven, sdong Reviewed By: sdong Subscribers: jeanxu2012, dhruba Differential Revision: https://reviews.facebook.net/D44955	10 years ago
Yueh-Hsuan Chiang	df79eafcb3	Introduce GetIntProperty("rocksdb.size-all-mem-tables") Summary: Currently, GetIntProperty("rocksdb.cur-size-all-mem-tables") only returns the memory usage by those memtables which have not yet been flushed. This patch introduces GetIntProperty("rocksdb.size-all-mem-tables"), which includes the memory usage by all the memtables, includes those have been flushed but pinned by iterators. Test Plan: Added a test in db_test Reviewers: igor, anthony, IslamAbdelRahman, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D44229	10 years ago
sdong	888fbdc889	Remove the contstaint that iterator upper bound needs to be within a prefix Summary: There is a check to fail the iterator if prefix extractor is specified but upper bound is out of the prefix for the seek key. Relax this constraint to allow users to set upper bound to the next prefix of the current one. Test Plan: make commit-prereq Reviewers: igor, anthony, kradhakrishnan, yhchiang, rven Reviewed By: rven Subscribers: tnovak, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D44949	10 years ago
Ari Ekmekji	137c376675	Removing variables used only in assertions to prevent build error Summary: A couple variables were declared but only used in assertions which causes issues when building in fbcode. Test Plan: make dbg and make release Reviewers: yhchiang, sdong, igor, anthony, MarkCallaghan Reviewed By: MarkCallaghan Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D44937	10 years ago
Ari Ekmekji	b47cc58516	Bounding Number of Subcompactions Summary: In D43239 (https://reviews.facebook.net/D43239) the number of subcompactions is set based on the number of L1 files with unique starting keys. In certain cases when this number is very large this causes issues, particularly with the overlap between files since very small output files can be generated. This diff bounds the number of subcompactions to the user option DBOption.num_subcompactions. Test Plan: ./db_test ./db_compaction_test Reviewers: sdong, igor, anthony, yhchiang Reviewed By: yhchiang Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D44883	10 years ago
Venkatesh Radhakrishnan	e58e1b18e7	Make tailing iterator show new entries in memtable. Summary: Reseek mutable_iter if it is invalid in Next and immutable_iter is invalid. Test Plan: DBTestTailingIterator.TailingIteratorSeekToNext Reviewers: tnovak, march, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D44865	10 years ago
Ari Ekmekji	601b1aaca0	Fixing Failed Assertion in Subcompaction State Diff Summary: In D43239 (https://reviews.facebook.net/D43239) there is an assertion to make sure a subcompaction's output is never empty at the end of execution. This assertion however breaks the build because some tests lead to exactly that scenario. So instead I have altered the logic to handle this case instead of just failing the assertion. The reason that it is possible for a subcompaction's output to be empty is that during a sequential execution of subcompactions, if a user aborts the compaction job then some of the later subcompactions to be executed may have yet to process any keys and therefore have yet to generate output files. This becomes very rare once the subcompactions are executed in parallel, but for now they are still sequential so the case is possible when there is an early termination, as in some of the tests. Test Plan: ./db_test ./db_compaction_test Reviewers: sdong, igor, anthony, yhchiang Reviewed By: yhchiang Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D44877	10 years ago
Ari Ekmekji	f0da6977a3	[Parallel L0-L1 Compaction Prep]: Giving Subcompactions Their Own State Summary: In prepration for running multiple threads at the same time during a compaction job, this patch assigns each subcompaction its own state (instead of sharing the one global CompactionState). Each subcompaction then uses this state to update its statistics, keep track of its snapshots, etc. during the course of execution. Then at the end of all the executions the statistics are aggregated across the subcompactions so that the final result is the same as if only one larger compaction had run. Test Plan: ./db_test ./db_compaction_test ./compaction_job_test Reviewers: sdong, anthony, igor, noetzli, yhchiang Reviewed By: yhchiang Subscribers: MarkCallaghan, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D43239	10 years ago
Andres Notzli	f32a572099	Simplify querying of merge results Summary: While working on supporting mixing merge operators with single deletes ( https://reviews.facebook.net/D43179 ), I realized that returning and dealing with merge results can be made simpler. Submitting this as a separate diff because it is not directly related to single deletes. Before, callers of merge helper had to retrieve the merge result in one of two ways depending on whether the merge was successful or not (success = result of merge was single kTypeValue). For successful merges, the caller could query the resulting key/value pair and for unsuccessful merges, the result could be retrieved in the form of two deques of keys and values. However, with single deletes, a successful merge does not return a single key/value pair (if merge operands are merged with a single delete, we have to generate a value and keep the original single delete around to make sure that we are not accidentially producing a key overwrite). In addition, the two existing call sites of the merge helper were taking the same actions independently from whether the merge was successful or not, so this patch simplifies that. Test Plan: make clean all check Reviewers: rven, sdong, yhchiang, anthony, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D43353	10 years ago
sdong	72613657f0	Measure file read latency histogram per level Summary: In internal stats, remember read latency histogram, if statistics is enabled. It can be retrieved from DB::GetProperty() with "rocksdb.dbstats" property, if it is enabled. Test Plan: Manually run db_bench and prints out "rocksdb.dbstats" by hand and make sure it prints out as expected Reviewers: igor, IslamAbdelRahman, rven, kradhakrishnan, anthony, yhchiang Reviewed By: yhchiang Subscribers: MarkCallaghan, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D44193	10 years ago
Nathan Bronson	b7198c3afe	reduce db mutex contention for write batch groups Summary: This diff allows a Writer to join the next write batch group without acquiring any locks. Waiting is performed via a per-Writer mutex, so all of the non-leader writers never need to acquire the db mutex. It is now possible to join a write batch group after the leader has been chosen but before the batch has been constructed. This diff doesn't increase parallelism, but reduces synchronization overheads. For some CPU-bound workloads (no WAL, RAM-sized working set) this can substantially reduce contention on the db mutex in a multi-threaded environment. With T=8 N=500000 in a CPU-bound scenario (see the test plan) this is good for a 33% perf win. Not all scenarios see such a win, but none show a loss. This code is slightly faster even for the single-threaded case (about 2% for the CPU-bound scenario below). Test Plan: 1. unit tests 2. COMPILE_WITH_TSAN=1 make check 3. stress high-contention scenarios with db_bench -benchmarks=fillrandom -threads=$T -batch_size=1 -memtablerep=skip_list -value_size=0 --num=$N -level0_slowdown_writes_trigger=9999 -level0_stop_writes_trigger=9999 -disable_auto_compactions --max_write_buffer_number=8 -max_background_flushes=8 --disable_wal --write_buffer_size=160000000 Reviewers: sdong, igor, rven, ljin, yhchiang Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D43887	10 years ago
sdong	603b6da8b8	Add options.compaction_measure_io_stats to print write I/O stats in compactions Summary: Add options.compaction_measure_io_stats to print out / pass to listener accumulated time spent on write calls. Example outputs in info logs: 2015/08/12-16:27:59.463944 7fd428bff700 (Original Log Time 2015/08/12-16:27:59.463922) EVENT_LOG_v1 {"time_micros": 1439422079463897, "job": 6, "event": "compaction_finished", "output_level": 1, "num_output_files": 4, "total_output_size": 6900525, "num_input_records": 111483, "num_output_records": 106877, "file_write_nanos": 15663206, "file_range_sync_nanos": 649588, "file_fsync_nanos": 349614797, "file_prepare_write_nanos": 1505812, "lsm_state": [2, 4, 0, 0, 0, 0, 0]} Add two more counters in iostats_context. Also add a parameter of db_bench. Test Plan: Add a unit test. Also manually verify LOG outputs in db_bench Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D44115	10 years ago
sdong	4637207120	Add test case to repro the mispositional iterator in a low-chance data race case Summary: Iterator has a bug: if a child iterator reaches its end, and user issues a Prev(), and just before SeekToLast() of the child iterator is called, some extra rows is added in the end, the position of iterator can be misplaced. Test Plan: Run the tests with or without valgrind Reviewers: rven, yhchiang, IslamAbdelRahman, anthony Reviewed By: anthony Subscribers: tnovak, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D43671	10 years ago
agiardullo	0db807ec28	Transaction error statuses Summary: Based on feedback from spetrunia, we should better differentiate error statuses for transaction failures. https://github.com/MySQLOnRocksDB/mysql-5.6/issues/86#issuecomment-124605954 Test Plan: unit tests Reviewers: rven, kradhakrishnan, spetrunia, yhchiang, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D43323	10 years ago
agiardullo	c2f2cb0214	Pessimistic Transactions Summary: Initial implementation of Pessimistic Transactions. This diff contains the api changes discussed in D38913. This diff is pretty large, so let me know if people would prefer to meet up to discuss it. MyRocks folks: please take a look at the API in include/rocksdb/utilities/transaction[_db].h and let me know if you have any issues. Also, you'll notice a couple of TODOs in the implementation of RollbackToSavePoint(). After chatting with Siying, I'm going to send out a separate diff for an alternate implementation of this feature that implements the rollback inside of WriteBatch/WriteBatchWithIndex. We can then decide which route is preferable. Next, I'm planning on doing some perf testing and then integrating this diff into MongoRocks for further testing. Test Plan: Unit tests, db_bench parallel testing. Reviewers: igor, rven, sdong, yhchiang, yoshinorim Reviewed By: sdong Subscribers: hermanlee4, maykov, spetrunia, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D40869	10 years ago
Islam AbdelRahman	c2868cbc52	Use manual_compaction for compaction_job_test Summary: Under certain conditions (disable compression) the compactions that are created in compaction_job_test will pass the trivial_move conditions This will cause problems since we assert that we dont run a compaction if it's a trivial move https://github.com/facebook/rocksdb/blob/master/db/compaction_job.cc#L144-L147 for example when we disable compression, compactions become a valid trivial move and the assert fails https://ci-builds.fb.com/view/rocksdb/job/rocksdb_no_compression/180/console Test Plan: compaction_job_test Reviewers: sdong, yhchiang, noetzli, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D43983	10 years ago
Islam AbdelRahman	cee1e8a080	Parallelize LoadTableHandlers Summary: Add a new option that all LoadTableHandlers to use multiple threads to load files on DB Open and Recover Test Plan: make check -j64 COMPILE_WITH_TSAN=1 make check -j64 DISABLE_JEMALLOC=1 make all valgrind_check -j64 (still running) Reviewers: yhchiang, anthony, rven, kradhakrishnan, igor, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D43755	10 years ago
Andres Notzli	4249f159d5	Removing duplicate code in db_bench/db_stress, fixing typos Summary: While working on single delete support for db_bench, I realized that db_bench/db_stress contain a bunch of duplicate code related to copmression and found some typos. This patch removes duplicate code, typos and a redundant #ifndef in internal_stats.cc. Test Plan: make db_stress && make db_bench && ./db_bench --benchmarks=compress,uncompress Reviewers: yhchiang, sdong, rven, anthony, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D43965	10 years ago
Nathan Bronson	1ae27113c7	reduce comparisons by skiplist Summary: Key comparison is the single largest CPU user for CPU-bound workloads. This diff reduces the number of comparisons in two ways. The first is that it moves predecessor array gathering from FindGreaterOrEqual to FindLessThan, so that FindGreaterOrEqual can return immediately if compare_ returns 0. As part of this change I moved the sequential insertion optimization into Insert, to remove the undocumented (and smelly) requirement that prev must be equal to prev_ if it is non-null. The second optimization is that all of the search functions skip calling compare_ when moving to a lower level that has the same Next pointer. With a branching factor of 4 we would expect this to happen 1/4 of the time. On a single-threaded CPU-bound workload (-benchmarks=fillrandom -threads=1 -batch_size=1 -memtablerep=skip_list -value_size=0 --num=1600000 -level0_slowdown_writes_trigger=9999 -level0_stop_writes_trigger=9999 -disable_auto_compactions --max_write_buffer_number=8 -max_background_flushes=8 --disable_wal --write_buffer_size=160000000) on my dev server this is good for a 7% perf win. Test Plan: unit tests Reviewers: rven, ljin, yhchiang, sdong, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D43233	10 years ago
Islam AbdelRahman	a9dcc0a638	Fix clang build Summary: https://ci-builds.fb.com/view/rocksdb/job/rocksdb_clang_build/893/console Fixing clang build Test Plan: make clean USE_CLANG=1 make all -j64 Reviewers: sdong, noetzli, yhchiang, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D43959	10 years ago
Andres Notzli	68f934355a	Better CompactionJob testing Summary: Changed compaction_job_test to support better/more thorough tests and added two tests. Also changed MockFileContents to order using InternalKeyComparator. Test Plan: make compaction_job_test && ./compaction_job_test; make all && make check Reviewers: sdong, rven, igor, yhchiang, anthony Reviewed By: anthony Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D42837	10 years ago
agiardullo	16ea1c7d1c	simple ManagedSnapshot wrapper Summary: Implemented this simple wrapper for something else I was working on. Seemed like it makes sense to expose it instead of burying it in some random code. Test Plan: added test Reviewers: rven, kradhakrishnan, sdong, yhchiang Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D43293	10 years ago
sdong	6a4aaadcd7	Avoid type unique_ptr in LogWriterNumber::writer for Windows build break Summary: Visual Studio complains about deque<LogWriterNumber> because LogWriterNumber is non-copyable for its unique_ptr member writer. Move away from it, and do explit free. It is less safe but I can't think of a better way to unblock it. Test Plan: valgrind check test Reviewers: anthony, IslamAbdelRahman, kolmike, rven, yhchiang Reviewed By: yhchiang Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D43647	10 years ago
Andres Noetzli	d7314ba759	Fixing endless loop if seeking to end of key with seq num 0 Summary: When seeking to the last occurrence of a key with sequence number 0, db_iter ends up in an endless loop because it seeks to type kValueTypeForSeek which is larger than kTypeDeletion/kTypeValue. Added test case that triggers the behavior. Test Plan: make clean all check Reviewers: igor, rven, anthony, yhchiang, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D43653	10 years ago
Islam AbdelRahman	29b028b0ed	Make DeleteScheduler tests more reliable Summary: Update DeleteScheduler tests so that they verify the used penalties for waiting instead of measuring the time spent which is not reliable Test Plan: make -j64 delete_scheduler_test && ./delete_scheduler_test COMPILE_WITH_TSAN=1 make -j64 delete_scheduler_test && ./delete_scheduler_test COMPILE_WITH_ASAN=1 make -j64 delete_scheduler_test && ./delete_scheduler_test make -j64 db_test && ./db_test --gtest_filter="DBTest.RateLimitedDelete:DBTest.DeleteSchedulerMultipleDBPaths" COMPILE_WITH_TSAN=1 make -j64 db_test && ./db_test --gtest_filter="DBTest.RateLimitedDelete:DBTest.DeleteSchedulerMultipleDBPaths" COMPILE_WITH_ASAN=1 make -j64 db_test && ./db_test --gtest_filter="DBTest.RateLimitedDelete:DBTest.DeleteSchedulerMultipleDBPaths" Reviewers: yhchiang, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D43635	10 years ago
Poornima Chozhiyath Raman	7d364d0d94	Fix build failure Summary: fix the build failure Test Plan: make all Reviewers: sdong, IslamAbdelRahman Reviewed By: IslamAbdelRahman Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D43623	10 years ago
Poornima Chozhiyath Raman	960d936e83	Add function 'GetInfoLogList()' Summary: The list of info log files of a db can be obtained using the new function. Test Plan: New test in db_test.cc passed. Reviewers: yhchiang, IslamAbdelRahman, sdong Reviewed By: sdong Subscribers: IslamAbdelRahman, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D41715	10 years ago
sdong	7ccd1c80a7	Add two unit tests for SyncWAL() Summary: Add two unit tests for SyncWAL(). One makes sure SyncWAL() doesn't block writes in the other thread. Another one makes sure SyncWAL() doesn't wait ongoing writes to finish before being executed. Create a new test file db_wal_test and move two WAL related tests from db_test to here. Test Plan: Run the new tests Reviewers: IslamAbdelRahman, rven, kradhakrishnan, kolmike, tnovak, yhchiang Reviewed By: yhchiang Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D43605	10 years ago
sdong	3ae386eafe	Add statistic histogram "rocksdb.sst.read.micros" Summary: Measure read latency histogram and put in statistics. Compaction inputs are excluded from it when possible (unfortunately usually no possible as we usually take table reader from table cache. Test Plan: Run db_bench and it shows the stats, like: rocksdb.sst.read.micros statistics Percentiles :=> 50 : 1.238522 95 : 2.529740 99 : 3.912180 Reviewers: kradhakrishnan, rven, anthony, IslamAbdelRahman, MarkCallaghan, yhchiang Reviewed By: yhchiang Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D43275	10 years ago
Islam AbdelRahman	9aec75fbb9	Enable DBTest.FlushSchedule under TSAN Summary: This patch will fix the false positive of DBTest.FlushSchedule under TSAN, we dont need to disable this test Test Plan: COMPILE_WITH_TSAN=1 make -j64 db_test && ./db_test --gtest_filter="DBTest.FlushSchedule" Reviewers: yhchiang, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D43599	10 years ago
sdong	8e01bd1144	Fix misplaced position for reversing iterator direction while current key is a merge Summary: While doing forward iterating, if current key is merge, internal iterator position is placed to the next key. If Prev() is called now, needs to do extra Prev() to recover the location. This is second attempt of fixing after reverting `ec70fea4c4`. This time shrink the fix to only merge key is the current key and avoid the reseeking logic for max_iterating skipping Test Plan: enable the two disabled tests and make sure they pass Reviewers: rven, IslamAbdelRahman, kradhakrishnan, tnovak, yhchiang Reviewed By: yhchiang Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D43557	10 years ago
Andres Notzli	c465071029	Removing duplicate code Summary: While working on https://reviews.facebook.net/D43179 , I found duplicate code in the tests. This patch removes it. Test Plan: make clean all check Reviewers: igor, sdong, rven, anthony, yhchiang Reviewed By: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D43263	10 years ago
Mike Kolupaev	e06cf1a098	[wal changes 3/3] method in DB to sync WAL without blocking writers Summary: Subj. We really need this feature. Previous diff D40899 has most of the changes to make this possible, this diff just adds the method. Test Plan: `make check`, the new test fails without this diff; ran with ASAN, TSAN and valgrind. Reviewers: igor, rven, IslamAbdelRahman, anthony, kradhakrishnan, tnovak, yhchiang, sdong Reviewed By: sdong Subscribers: MarkCallaghan, maykov, hermanlee4, yoshinorim, tnovak, dhruba Differential Revision: https://reviews.facebook.net/D40905	10 years ago
Ari Ekmekji	5dc3e6881a	Update Tests To Enable Subcompactions Summary: Updated DBTest DBCompactionTest and CompactionJobStatsTest to run compaction-related tests once with subcompactions enabled and once disabled using the TEST_P test type in the Google Test suite. Test Plan: ./db_test ./db_compaction-test ./compaction_job_stats_test Reviewers: sdong, igor, anthony, yhchiang Reviewed By: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D43443	10 years ago
Islam AbdelRahman	c45a57b41e	Support delete rate limiting Summary: Introduce DeleteScheduler that allow enforcing a rate limit on file deletion Instead of deleting files immediately, files are moved to trash directory and deleted in a background thread that apply sleep penalty between deletes if needed. I have updated PurgeObsoleteFiles and PurgeObsoleteWALFiles to use the delete_scheduler instead of env_->DeleteFile Test Plan: added delete_scheduler_test existing unit tests Reviewers: kradhakrishnan, anthony, rven, yhchiang, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D43221	10 years ago
Yueh-Hsuan Chiang	241bb2aef3	Make DBCompactionTest.SkipStatsUpdateTest more stable. Summary: Make DBCompactionTest.SkipStatsUpdateTest more stable by removing flaky but unnecessary assertion on the size of db as simply checking the random file open count is suffice. Test Plan: db_compaction_test Reviewers: igor, anthony, IslamAbdelRahman, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D43533	10 years ago
Yueh-Hsuan Chiang	14d0bfa429	Add DBOptions::skip_sats_update_on_db_open Summary: UpdateAccumulatedStats() is used to optimize compaction decision esp. when the number of deletion entries are high, but this function can slowdown DBOpen esp. in disk environment. This patch adds DBOptions::skip_sats_update_on_db_open, which skips UpdateAccumulatedStats() in DB::Open() time when it's set to true. Test Plan: Add DBCompactionTest.SkipStatsUpdateTest Reviewers: igor, anthony, IslamAbdelRahman, sdong Reviewed By: sdong Subscribers: tnovak, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D42843	10 years ago
Venkatesh Radhakrishnan	20b244fcca	Fix CompactFiles by adding all necessary files Summary: The compact files API had a bug where some overlapping files are not added. These are files which overlap with files which were added to the compaction input files, but not to the original set of input files. This happens only when there are more than two levels involved in the compaction. An example will illustrate this better. Level 2 has 1 input file 1.sst which spans [20,30]. Level 3 has added file 2.sst which spans [10,25] Level 4 has file 3.sst which spans [35,40] and input file 4.sst which spans [46,50]. The existing code would not add 3.sst to the set of input_files because it only becomes an overlapping file in level 4 and it wasn't one in level 3. When installing the results of the compaction, 3.sst would overlap with output file from the compact files and result in the assertion in version_set.cc:1130 // Must not overlap assert(level <= 0 \|\| level_files->empty() \|\| internal_comparator_->Compare( (level_files)[level_files->size() - 1]->largest, f->smallest) < 0); This change now adds overlapping files from the current level to the set of input files also so that we don't hit the assertion above. Test Plan: d=/tmp/j; rm -rf $d; seq 1000 \| parallel --gnu --eta 'd=/tmp/j/d-{}; mkdir -p $d; TEST_TMPDIR=$d ./db_compaction_test --gtest_filter=CompactilesOnLevel* --gtest_also_run_disabled_tests >& '$d'/log-{}' Reviewers: igor, yhchiang, sdong Reviewed By: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D43437	10 years ago
Venkatesh Radhakrishnan	87df6295dd	Make SuggestCompactRangeNoTwoLevel0Compactions deterministic Summary: Made SuggestCompactRangeNoTwoLevel0Compactions by forcing a flush after generating a file and waiting for compaction at the end. Test Plan: Run SuggestCompactRangeNoTwoLevel0Compactions Reviewers: yhchiang, igor, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D43449	10 years ago
Ari Ekmekji	40c64434d4	Parallelize L0-L1 Compaction: Restructure Compaction Job Summary: As of now compactions involving files from Level 0 and Level 1 are single threaded because the files in L0, although sorted, are not range partitioned like the other levels. This means that during L0-L1 compaction each file from L1 needs to be merged with potentially all the files from L0. This attempt to parallelize the L0-L1 compaction assigns a thread and a corresponding iterator to each L1 file that then considers only the key range found in that L1 file and only the L0 files that have those keys (and only the specific portion of those L0 files in which those keys are found). In this way the overlap is minimized and potentially eliminated between different iterators focusing on the same files. The first step is to restructure the compaction logic to break L0-L1 compactions into multiple, smaller, sequential compactions. Eventually each of these smaller jobs will be run simultaneously. Areas to pay extra attention to are # Correct aggregation of compaction job statistics across multiple threads # Proper opening/closing of output files (make sure each thread's is unique) # Keys that span multiple L1 files # Skewed distributions of keys within L0 files Test Plan: Make and run db_test (newer version has separate compaction tests) and compaction_job_stats_test Reviewers: igor, noetzli, anthony, sdong, yhchiang Reviewed By: yhchiang Subscribers: MarkCallaghan, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D42699	10 years ago

1 2 3 4 5 ...

1860 Commits (9130873a13f29a591fdf482c6847325b699b15ed)