rocksdb

Commit Graph

Author	SHA1	Message	Date
Aaron Gao	e72ea485ed	add InDomain regression test Summary: regression tests to make sure seek keys not in domain would not fail assertion Test Plan: ``` [gzh@dev6163.prn2 ~/local/rocksdb] ./prefix_test --gtest_filter=SamePrefixTest.* /tmp/rocksdbtest-112628/prefix_test Note: Google Test filter = SamePrefixTest.* [==========] Running 1 test from 1 test case. [----------] Global test environment set-up. [----------] 1 test from SamePrefixTest [ RUN ] SamePrefixTest.InDomainTest [ OK ] SamePrefixTest.InDomainTest (211 ms) [----------] 1 test from SamePrefixTest (211 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test case ran. (211 ms total) ``` Reviewers: andrewkr, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D61161	9 years ago
sdong	e5b5f12b81	Change options memtable_prefix_bloom_huge_page_tlb_size => memtable_huge_page_size and cover huge page to memtable too Summary: Extend the option memtable_prefix_bloom_huge_page_tlb_size from just putting memtable bloom filter to huge page to memtable itself too. Test Plan: Run all existing tests. Reviewers: IslamAbdelRahman, yhchiang, andrewkr Reviewed By: andrewkr Subscribers: leveldb, andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D60513	9 years ago
sdong	0ce258f9b3	Compaction picker to expand output level files for keys cross files' boundary too. Summary: We may wrongly drop delete operation if we pick a file with the entry to be delete, the put entry of the same user key is in the next file in the level, and the next file is not picked. We expand compaction inputs for output level too. Test Plan: Add unit tests that reproduct the bug of dropping delete entry. Change compaction_picker_test to assert the new behavior. Reviewers: IslamAbdelRahman, igor Reviewed By: igor Subscribers: leveldb, andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D61173	9 years ago
Wanning Jiang	e12270dfee	fix previous typo Summary: old typos with FILTER/INDEX_CACHE Test Plan: still pass this unit test Reviewers: andrewkr Reviewed By: andrewkr Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D61185	9 years ago
Islam AbdelRahman	16e225f70d	Fix MergeContext::copied_operands_ strings moving Summary: MergeContext::copied_operands contain strings that MergeContext::operand_list_ Slices point to It's possible that when MergeContext::copied_operands grow, these strings are moved and there place in memory is changed, this will cause MergeContext::operand_list_ to point to invalid memory. fix this problem by using unique_ptr<string> instead of string Test Plan: run tests under mac/clang Reviewers: sdong, yiwu Reviewed By: yiwu Subscribers: andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D61023	9 years ago
Yi Wu	ae0ad719de	Fix flaky DBSSTTEST::DeleteObsoleteFilesPendingOutputs Summary: The test is flaky on Travis in osx environment. The background flush the test wanting to block can run behind the L2 manual compaction, making the test actually blocking the L2 compaction and won't able to proceed. Test Plan: Test run on travis Reviewers: kradhakrishnan, sdong, andrewkr, IslamAbdelRahman Reviewed By: IslamAbdelRahman Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D61101	9 years ago
Yi Wu	c6654588bd	Disable two dynamic options tests under lite build Summary: RocksDB lite don't support dynamic options. Disable the two test from lite build, and assert `SetOptions` should return `status::OK`. Test Plan: Run the db_options test under lite build and normal build. Reviewers: sdong Reviewed By: sdong Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D61119	9 years ago
sdong	2a6d0cde72	Ignore stale logs while restarting DBs Summary: Stale log files can be deleted out of order. This can happen for various reasons. One of the reason is that no data is ever inserted to a column family and we have an optimization to update its log number, but not all the old log files are cleaned up (the case shown in the unit tests added). It can also happen when we simply delete multiple log files out of order. This causes data corruption because we simply increase seqID after processing the next row and we may end up with writing data with smaller seqID than what is already flushed to memtables. In DB recovery, for the oldest files we are replaying, if there it contains no data for any column family, we ignore the sequence IDs in the file. Test Plan: Add two unit tests that fail without the fix. Reviewers: IslamAbdelRahman, igor, yiwu Reviewed By: yiwu Subscribers: hermanlee4, yoshinorim, leveldb, andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D60891	9 years ago
sdong	d5a51d4de3	Need to make sure log file synced before flushing memtable of one column family Summary: Multiput atomiciy is broken across multiple column families if we don't sync WAL before flushing one column family. The WAL file may contain a write batch containing writes to a key to the CF to be flushed and a key to other CF. If we don't sync WAL before flushing, if machine crashes after flushing, the write batch will only be partial recovered. Data to other CFs are lost. Test Plan: Add a new unit test which will fail without the diff. Reviewers: yhchiang, IslamAbdelRahman, igor, yiwu Reviewed By: yiwu Subscribers: yiwu, leveldb, andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D60915	9 years ago
Yi Wu	89f319c2df	Fix unit test which breaks lite build Summary: Comment out assertion of number of table files from lite build. Test Plan: OPT=-DROCKSDB_LITE make check Reviewers: lightmark Reviewed By: lightmark Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D60999	9 years ago
Yi Wu	32604e6601	Fix flush not being commit while writing manifest Summary: Fix flush not being commit while writing manifest, which is a recent bug introduced by D60075. The issue: # Options.max_background_flushes > 1 # Background thread A pick up a flush job, flush, then commit to manifest. (Note that mutex is released before writing manifest.) # Background thread B pick up another flush job, flush. When it gets to `MemTableList::InstallMemtableFlushResults`, it notices another thread is commiting, so it quit. # After the first commit, thread A doesn't double check if there are more flush result need to commit, leaving the second flush uncommitted. Test Plan: run the test. Also verify the new test hit deadlock without the fix. Reviewers: sdong, igor, lightmark Reviewed By: lightmark Subscribers: andrewkr, omegaga, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D60969	9 years ago
John Alexander	9ab38c45ad	Remove %z Format Specifier and Fix Windows Build of sim_cache.cc (#1224 ) * Replace %zu format specifier with Windows-compatible macro 'ROCKSDB_PRIszt' * Added "port/port.h" include to sim_cache.cc for call to snprintf(). * Applied cleaner fix to windows build, reverting part of `7bedd94`	9 years ago
omegaga	e70020e4f6	Only cache level 0 indexes and filter when opening table reader Summary: In T8216281 we decided to disable prefetching the index and filter during opening table handlers during startup (max_open_files = -1). Test Plan: Rely on `IndexAndFilterBlocksOfNewTableAddedToCache` to guarantee L0 indexes and filters are still cached and change `PinL0IndexAndFilterBlocksTest` to make sure other levels are not cached (maybe add one more test to test we don't cache other levels?) Reviewers: sdong, andrewkr Reviewed By: andrewkr Subscribers: andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D59913	9 years ago
Islam AbdelRahman	68a8e6b8fa	Introduce FullMergeV2 (eliminate memcpy from merge operators) Summary: This diff update the code to pin the merge operator operands while the merge operation is done, so that we can eliminate the memcpy cost, to do that we need a new public API for FullMerge that replace the std::deque<std::string> with std::vector<Slice> This diff is stacked on top of D56493 and D56511 In this diff we - Update FullMergeV2 arguments to be encapsulated in MergeOperationInput and MergeOperationOutput which will make it easier to add new arguments in the future - Replace std::deque<std::string> with std::vector<Slice> to pass operands - Replace MergeContext std::deque with std::vector (based on a simple benchmark I ran https://gist.github.com/IslamAbdelRahman/78fc86c9ab9f52b1df791e58943fb187) - Allow FullMergeV2 output to be an existing operand ``` [Everything in Memtable \| 10K operands \| 10 KB each \| 1 operand per key] DEBUG_LEVEL=0 make db_bench -j64 && ./db_bench --benchmarks="mergerandom,readseq,readseq,readseq,readseq,readseq" --merge_operator="max" --merge_keys=10000 --num=10000 --disable_auto_compactions --value_size=10240 --write_buffer_size=1000000000 [FullMergeV2] readseq : 0.607 micros/op 1648235 ops/sec; 16121.2 MB/s readseq : 0.478 micros/op 2091546 ops/sec; 20457.2 MB/s readseq : 0.252 micros/op 3972081 ops/sec; 38850.5 MB/s readseq : 0.237 micros/op 4218328 ops/sec; 41259.0 MB/s readseq : 0.247 micros/op 4043927 ops/sec; 39553.2 MB/s [master] readseq : 3.935 micros/op 254140 ops/sec; 2485.7 MB/s readseq : 3.722 micros/op 268657 ops/sec; 2627.7 MB/s readseq : 3.149 micros/op 317605 ops/sec; 3106.5 MB/s readseq : 3.125 micros/op 320024 ops/sec; 3130.1 MB/s readseq : 4.075 micros/op 245374 ops/sec; 2400.0 MB/s ``` ``` [Everything in Memtable \| 10K operands \| 10 KB each \| 10 operand per key] DEBUG_LEVEL=0 make db_bench -j64 && ./db_bench --benchmarks="mergerandom,readseq,readseq,readseq,readseq,readseq" --merge_operator="max" --merge_keys=1000 --num=10000 --disable_auto_compactions --value_size=10240 --write_buffer_size=1000000000 [FullMergeV2] readseq : 3.472 micros/op 288018 ops/sec; 2817.1 MB/s readseq : 2.304 micros/op 434027 ops/sec; 4245.2 MB/s readseq : 1.163 micros/op 859845 ops/sec; 8410.0 MB/s readseq : 1.192 micros/op 838926 ops/sec; 8205.4 MB/s readseq : 1.250 micros/op 800000 ops/sec; 7824.7 MB/s [master] readseq : 24.025 micros/op 41623 ops/sec; 407.1 MB/s readseq : 18.489 micros/op 54086 ops/sec; 529.0 MB/s readseq : 18.693 micros/op 53495 ops/sec; 523.2 MB/s readseq : 23.621 micros/op 42335 ops/sec; 414.1 MB/s readseq : 18.775 micros/op 53262 ops/sec; 521.0 MB/s ``` ``` [Everything in Block cache \| 10K operands \| 10 KB each \| 1 operand per key] [FullMergeV2] $ DEBUG_LEVEL=0 make db_bench -j64 && ./db_bench --benchmarks="readseq,readseq,readseq,readseq,readseq" --merge_operator="max" --num=100000 --db="/dev/shm/merge-random-10K-10KB" --cache_size=1000000000 --use_existing_db --disable_auto_compactions readseq : 14.741 micros/op 67837 ops/sec; 663.5 MB/s readseq : 1.029 micros/op 971446 ops/sec; 9501.6 MB/s readseq : 0.974 micros/op 1026229 ops/sec; 10037.4 MB/s readseq : 0.965 micros/op 1036080 ops/sec; 10133.8 MB/s readseq : 0.943 micros/op 1060657 ops/sec; 10374.2 MB/s [master] readseq : 16.735 micros/op 59755 ops/sec; 584.5 MB/s readseq : 3.029 micros/op 330151 ops/sec; 3229.2 MB/s readseq : 3.136 micros/op 318883 ops/sec; 3119.0 MB/s readseq : 3.065 micros/op 326245 ops/sec; 3191.0 MB/s readseq : 3.014 micros/op 331813 ops/sec; 3245.4 MB/s ``` ``` [Everything in Block cache \| 10K operands \| 10 KB each \| 10 operand per key] DEBUG_LEVEL=0 make db_bench -j64 && ./db_bench --benchmarks="readseq,readseq,readseq,readseq,readseq" --merge_operator="max" --num=100000 --db="/dev/shm/merge-random-10-operands-10K-10KB" --cache_size=1000000000 --use_existing_db --disable_auto_compactions [FullMergeV2] readseq : 24.325 micros/op 41109 ops/sec; 402.1 MB/s readseq : 1.470 micros/op 680272 ops/sec; 6653.7 MB/s readseq : 1.231 micros/op 812347 ops/sec; 7945.5 MB/s readseq : 1.091 micros/op 916590 ops/sec; 8965.1 MB/s readseq : 1.109 micros/op 901713 ops/sec; 8819.6 MB/s [master] readseq : 27.257 micros/op 36687 ops/sec; 358.8 MB/s readseq : 4.443 micros/op 225073 ops/sec; 2201.4 MB/s readseq : 5.830 micros/op 171526 ops/sec; 1677.7 MB/s readseq : 4.173 micros/op 239635 ops/sec; 2343.8 MB/s readseq : 4.150 micros/op 240963 ops/sec; 2356.8 MB/s ``` Test Plan: COMPILE_WITH_ASAN=1 make check -j64 Reviewers: yhchiang, andrewkr, sdong Reviewed By: sdong Subscribers: lovro, andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D57075	9 years ago
sdong	e70ba4e40e	MemTable::PostProcess() can skip updating num_deletes if the delta is 0 Summary: In many use cases there is no deletes. No need to pay the overhead of atomically updating num_deletes. Test Plan: Run existing test. Reviewers: ngbronson, yiwu, andrewkr, igor Reviewed By: andrewkr Subscribers: leveldb, andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D60555	9 years ago
sdong	2a282e5f54	DBTablePropertiesTest.GetPropertiesOfTablesInRange: Fix Flaky Summary: Summary There is a possibility that there is no L0 file after writing the data. Generate an L0 file to make it work. Test Plan: Run the test many times. Reviewers: andrewkr, yiwu Reviewed By: yiwu Subscribers: leveldb, andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D60825	9 years ago
John Alexander	9430333f84	New Statistics to track Compression/Decompression (#1197 ) * Added new statistics and refactored to allow ioptions to be passed around as required to access environment and statistics pointers (and, as a convenient side effect, info_log pointer). * Prevent incrementing compression counter when compression is turned off in options. * Prevent incrementing compression counter when compression is turned off in options. * Added two more supported compression types to test code in db_test.cc * Prevent incrementing compression counter when compression is turned off in options. * Added new StatsLevel that excludes compression timing. * Fixed casting error in coding.h * Fixed CompressionStatsTest for new StatsLevel. * Removed unused variable that was breaking the Linux build	9 years ago
sdong	21c55bdb6e	DBTest.DynamicLevelCompressionPerLevel: Tune Threshold Summary: Each SST's file size increases after we add more table properties. Threshold in DBTest.DynamicLevelCompressionPerLevel need to adjust accordingly to avoid occasional failures. Test Plan: Run the test Reviewers: andrewkr, yiwu Subscribers: leveldb, andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D60819	9 years ago
sdong	6797e6ffac	Avoid updating memtable allocated bytes if write_buffer_size is not set Summary: If options.write_buffer_size is not set, nor options.write_buffer_manager, no need to update the bytes allocated counter in MemTableAllocator, which is expensive in parallel memtable insert case. Remove it can improve parallel memtable insert throughput by 10% with write batch size 128. Test Plan: Run benchmarks TEST_TMPDIR=/dev/shm/ ./db_bench --benchmarks=fillrandom -disable_auto_compactions -level0_slowdown_writes_trigger=9999 -level0_stop_writes_trigger=9999 -num=10000000 --writes=1000000 -max_background_flushes=16 -max_write_buffer_number=16 --threads=32 --batch_size=128 -allow_concurrent_memtable_write -enable_write_thread_adaptive_yield The throughput grows 10% with the benchmark. Reviewers: andrewkr, yiwu, IslamAbdelRahman, igor, ngbronson Reviewed By: ngbronson Subscribers: ngbronson, leveldb, andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D60465	9 years ago
Aaron Gao	dda6c72ac8	Add DestroyColumnFamilyHandle(ColumnFamilyHandle) to db.h Summary: add DestroyColumnFamilyHandle(ColumnFamilyHandle) to close column family instead of deleting cfh* User should call this to close a cf and then we can detect the deletion in this function. Test Plan: make all check -j64 Reviewers: andrewkr, yiwu, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D60765	9 years ago
Andrew Kryczka	56222f57df	Avoid FileMetaData copy Summary: as titled Test Plan: unit tests Reviewers: sdong, lightmark Reviewed By: lightmark Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D60597	9 years ago
Yi Wu	6ea41f8527	Fix deadlock when trying update options when write stalls Summary: When write stalls because of auto compaction is disabled, or stop write trigger is reached, user may change these two options to unblock writes. Unfortunately we had issue where the write thread will block the attempt to persist the options, thus creating a deadlock. This diff fix the issue and add two test cases to detect such deadlock. Test Plan: Run unit tests. Also, revert db_impl.cc to master (but don't revert `DBImpl::BackgroundCompaction:Finish` sync point) and run db_options_test. Both tests should hit deadlock. Reviewers: sdong Reviewed By: sdong Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D60627	9 years ago
Jay Edgar	efd013d6d8	Miscellaneous performance improvements Summary: I was investigating performance issues in the SstFileWriter and found all of the following: - The SstFileWriter::Add() function created a local InternalKey every time it was called generating a allocation and free each time. Changed to have an InternalKey member variable that can be reset with the new InternalKey::Set() function. - In SstFileWriter::Add() the smallest_key and largest_key values were assigned the result of a ToString() call, but it is simpler to just assign them directly from the user's key. - The Slice class had no move constructor so each time one was returned from a function a new one had to be allocated, the old data copied to the new, and the old one was freed. I added the move constructor which also required a copy constructor and assignment operator. - The BlockBuilder::CurrentSizeEstimate() function calculates the current estimate size, but was being called 2 or 3 times for each key added. I changed the class to maintain a running estimate (equal to the original calculation) so that the function can return an already calculated value. - The code in BlockBuilder::Add() that calculated the shared bytes between the last key and the new key duplicated what Slice::difference_offset does, so I replaced it with the standard function. - BlockBuilder::Add() had code to copy just the changed portion into the last key value (and asserted that it now matched the new key). It is more efficient just to copy the whole new key over. - Moved this same code up into the 'if (use_delta_encoding_)' since the last key value is only needed when delta encoding is on. - FlushBlockBySizePolicy::BlockAlmostFull calculated a standard deviation value each time it was called, but this information would only change if block_size of block_size_deviation changed, so I created a member variable to hold the value to avoid the calculation each time. - Each PutVarint??() function has a buffer and calls std::string::append(). Two or three calls in a row could share a buffer and a single call to std::string::append(). Some of these will be helpful outside of the SstFileWriter. I'm not 100% the addition of the move constructor is appropriate as I wonder why this wasn't done before - maybe because of compiler compatibility? I tried it on gcc 4.8 and 4.9. Test Plan: The changes should not affect the results so the existing tests should all still work and no new tests were added. The value of the changes was seen by manually testing the SstFileWriter class through MyRocks and adding timing code to identify problem areas. Reviewers: sdong, IslamAbdelRahman Reviewed By: IslamAbdelRahman Subscribers: andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D59607	9 years ago
Aaron Gao	816ae098ea	fix test failure Summary: fix Rocksdb Unit Test USER_FAILURE Test Plan: make all check -j64 Reviewers: sdong, andrewkr Reviewed By: andrewkr Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D60603	9 years ago
Aaron Gao	8e6b38d895	update DB::AddFile to ingest list of sst files Summary: DB::AddFile(std::string file_path) API that allow them to ingest an SST file created using SstFileWriter We want to update this interface to be able to accept a list of files that will be ingested, DB::AddFile(std::vector<std::string> file_path_list). Test Plan: Add test case `AddExternalSstFileList` in `DBSSTTest`. To make sure: 1. files key ranges are not overlapping with each other 2. each file key range dont overlap with the DB key range 3. make sure no snapshots are held Reviewers: andrewkr, sdong, IslamAbdelRahman Reviewed By: IslamAbdelRahman Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D58587	9 years ago
Yi Wu	296545a2c7	Fix clang analyzer errors Summary: Fixing erros reported by clang static analyzer. * Removing some unused variables. * Adding assertions to fix false positives reported by clang analyzer. * Adding `__clang_analyzer__` macro to suppress false positive warnings. Test Plan: USE_CLANG=1 OPT=-g make analyze -j64 Reviewers: andrewkr, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D60549	9 years ago
sdong	907f24d0e1	Concurrent memtable inserter to update counters and flush state after all inserts Summary: In concurrent memtable insert case, updating counters in MemTable::Add() can count for 5% CPU usage. By batch all the counters and update in the end of the write batch, the CPU overheads are overhead in the use cases where more than one key is updated in one write batch. Test Plan: Write throughput increases 12% with this benchmark setting: TEST_TMPDIR=/dev/shm/ ./db_bench --benchmarks=fillrandom -disable_auto_compactions -level0_slowdown_writes_trigger=9999 -level0_stop_writes_trigger=9999 -num=10000000 --writes=1000000 -max_background_flushes=16 -max_write_buffer_number=16 --threads=64 --batch_size=128 -allow_concurrent_memtable_write -enable_write_thread_adaptive_yield Reviewers: andrewkr, IslamAbdelRahman, ngbronson, igor Reviewed By: ngbronson Subscribers: ngbronson, leveldb, andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D60495	9 years ago
Andrew Kryczka	e1b3ee8a79	Cleanup auto-roll logger flush-while-rolling test Summary: Use @omegaga's awesome feature to avoid use of callbacks for ensuring SyncPoints happen in a particular thread. Depends on D60375. Test Plan: $ ./auto_roll_logger_test Reviewers: omegaga, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba, omegaga, leveldb Differential Revision: https://reviews.facebook.net/D60471	9 years ago
omegaga	cd4178a015	Add a new feature to enforce a sync point only active on a thread Summary: Add markers to sync points. A marked sync point will only be active when it is on the same thread as the marker sync point. Test Plan: Write a unit test to validate. Reviewers: sdong, IslamAbdelRahman, andrewkr Reviewed By: andrewkr Subscribers: andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D60375	9 years ago
Gunnar Kudrjavets	b954847fca	Fix release build for MyRocks by using debug-only code only in debug builds Summary: MyRocks release integration build breaks because we treat warnings caused by unused variables as errors. Variable `edit` is only used in debug builds. Therefore we need to guard it using `#ifndef NDEBUG` check. Test Plan: - `[p]arc diff --preview` for the default validation. - Verify that release build fails before this fix and passes after applying it. Reviewers: andrewkr, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D60423	9 years ago
sdong	a00bf1b3cf	Add More Logging to track total_log_size Summary: We saw instances where total_log_size is off the real value, but I'm not able to reproduce it. Add more logging to help debugging when it happens again. Test Plan: Run the unit test and see the logging. Reviewers: andrewkr, yhchiang, igor, IslamAbdelRahman Reviewed By: IslamAbdelRahman Subscribers: leveldb, andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D60081	9 years ago
sdong	32df9733d1	Add options.write_buffer_manager: control total memtable size across DB instances Summary: Add option write_buffer_manager to help users control total memory spent on memtables across multiple DB instances. Test Plan: Add a new unit test. Reviewers: yhchiang, IslamAbdelRahman Reviewed By: IslamAbdelRahman Subscribers: adela, benj, sumeet, muthu, leveldb, andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D59925	9 years ago
Aaron Gao	5aaef91d4a	group multiple batch of flush into one manifest file (one call to LogAndApply) Summary: Currently, if several flush outputs are committed together, we issue each manifest write per batch (1 batch = 1 flush = 1 sst file = 1+ continuous memtables). Each manifest write requires one fsync and one fsync to parent directory. In some cases, it becomes the bottleneck of write. We should batch them and write in one manifest write when possible. Test Plan: ` ./db_bench -benchmarks="fillseq" -max_write_buffer_number=16 -max_background_flushes=16 -disable_auto_compactions=true -min_write_buffer_number_to_merge=1 -write_buffer_size=65536 -level0_stop_writes_trigger=10000 -level0_slowdown_writes_trigger=10000` Before ``` Initializing RocksDB Options from the specified file Initializing RocksDB Options from command-line flags RocksDB: version 4.9 Date: Fri Jul 1 15:38:17 2016 CPU: 32 * Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz CPUCache: 20480 KB Keys: 16 bytes each Values: 100 bytes each (50 bytes after compression) Entries: 1000000 Prefix: 0 bytes Keys per prefix: 0 RawSize: 110.6 MB (estimated) FileSize: 62.9 MB (estimated) Write rate: 0 bytes/second Compression: Snappy Memtablerep: skip_list Perf Level: 1 WARNING: Assertions are enabled; benchmarks unnecessarily slow ------------------------------------------------ Initializing RocksDB Options from the specified file Initializing RocksDB Options from command-line flags DB path: [/tmp/rocksdbtest-112628/dbbench] fillseq : 166.277 micros/op 6014 ops/sec; 0.7 MB/s ``` After ``` Initializing RocksDB Options from the specified file Initializing RocksDB Options from command-line flags RocksDB: version 4.9 Date: Fri Jul 1 15:35:05 2016 CPU: 32 * Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz CPUCache: 20480 KB Keys: 16 bytes each Values: 100 bytes each (50 bytes after compression) Entries: 1000000 Prefix: 0 bytes Keys per prefix: 0 RawSize: 110.6 MB (estimated) FileSize: 62.9 MB (estimated) Write rate: 0 bytes/second Compression: Snappy Memtablerep: skip_list Perf Level: 1 WARNING: Assertions are enabled; benchmarks unnecessarily slow ------------------------------------------------ Initializing RocksDB Options from the specified file Initializing RocksDB Options from command-line flags DB path: [/tmp/rocksdbtest-112628/dbbench] fillseq : 52.328 micros/op 19110 ops/sec; 2.1 MB/s ``` Reviewers: andrewkr, IslamAbdelRahman, yhchiang, sdong Reviewed By: sdong Subscribers: igor, andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D60075	9 years ago
omegaga	a45ee83181	Fix a bug that accesses invalid address in iterator cleanup function Summary: Reported in T11889874. When registering the cleanup function we should copy the option so that we can still access it if ReadOptions is deleted. Test Plan: Add a unit test to reproduce this bug. Reviewers: sdong Reviewed By: sdong Subscribers: andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D60087	9 years ago
Gunnar Kudrjavets	bdb1d19a69	Fix UBSan build break caused by variable not initialized Summary: UBSan is unhappy because `cfd` is not initialized. This breaks UBSan build which in turn breaks MyRocks continuous integration with RocksDB which in turns makes me unhappy :-) Fix this. Test Plan: - `[p]arc diff --preview` + Sandcastle. - Verify that `COMPILE_WITH_UBSAN=1 OPT=-g make J=1 ubsan_check` gets past the break. Reviewers: andrewkr, hermanlee4, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D60117	9 years ago
sdong	c4cef07f1b	Update DBTestUniversalCompaction.UniversalCompactionSingleSortedRun to use max_size_amplification_percent = 0 Summary: With max_size_amplification_percent = 0 to make sure that DBTestUniversalCompaction.UniversalCompactionSingleSortedRun tests the configuration to compact to one single sorted run. Test Plan: Run all existing tests Reviewers: yhchiang, andrewkr, IslamAbdelRahman Reviewed By: IslamAbdelRahman Subscribers: leveldb, andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D60021	9 years ago
charsyam	4f2b0946d1	fix simple typos (#1183 )	9 years ago
Andrew Kryczka	3b7ed677de	ColumnFamilyOptions API [CF + RepairDB part 3/3] Summary: Overload RepairDB to take vector-of-ColumnFamilyDescriptor, which tells us CF name + options. Also takes a ColumnFamilyOptions for unspecified column families encountered during the repair. One potentially confusing thing is that we store options in the constructor and don't invoke AddColumnFamily() until discovering the CF in ScanTable. This is because we don't know the CF ID until we find a table belonging to that CF. Depends on D59781. Test Plan: $ ./repair_test Reviewers: yhchiang, IslamAbdelRahman, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D59853	9 years ago
Andrew Kryczka	56ac686292	Detect column family from properties [CF + RepairDB part 2/3] Summary: This diff uses the CF ID and CF name properties in the SST file to associate recovered data with the proper column family. Depends on D59775. - In ScanTable(), create column families in VersionSet each time a new one is discovered (via reading SST file properties) - In ConvertLogToTable(), dump an SST file for every column family with data in the WAL - In AddTables(), make a VersionEdit per-column family that adds all of that CF's tables Test Plan: $ ./repair_test Reviewers: yhchiang, IslamAbdelRahman, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D59781	9 years ago
Andrew Kryczka	343507afb1	Refactor to use VersionSet [CF + RepairDB part 1/3] Summary: To support column families, it is easiest to use VersionSet to manage our column families (if we don't have Versions then ColumnFamilyData always behaves as a dummy column family). This diff only refactors the existing repair logic to use VersionSet; the next two parts will add support for multiple column families. Test Plan: $ ./repair_test Reviewers: yhchiang, IslamAbdelRahman, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D59775	9 years ago
omegaga	c4e19b77e8	Add a read option to enable background purge when cleaning up iterators Summary: Add a read option `background_purge_on_iterator_cleanup` to avoid deleting files in foreground when destroying iterators. Instead, a job is scheduled in high priority queue and would be executed in a separate background thread. Test Plan: Add a variant of PurgeObsoleteFileTest. Turn on background purge option in the new test, and use sleeping task to ensure files are deleted in background. Reviewers: IslamAbdelRahman, sdong Reviewed By: IslamAbdelRahman Subscribers: andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D59499	9 years ago
Islam AbdelRahman	fa813f7478	Update DB::AddFile() to ingest the file to the lowest possible level Summary: DB::AddFile() right now always add the ingested file to L0 update the logic to add the file to the lowest possible level Test Plan: unit tests Reviewers: jkedgar, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba, yoshinorim Differential Revision: https://reviews.facebook.net/D59637	9 years ago
sdong	7b79238b65	Deprectate filter_deletes Summary: filter_deltes is not a frequently used feature. Remove it. Test Plan: Run all test suites. Reviewers: igor, yhchiang, IslamAbdelRahman Reviewed By: IslamAbdelRahman Subscribers: leveldb, andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D59427	9 years ago
Islam AbdelRahman	30a24f2d3d	Add InternalStats and logging for AddFile() Summary: We dont report the bytes that we ingested from AddFile which make the write amplification numbers incorrect Update InternalStats and add logging for AddFile() Test Plan: Make sure the code compile and existing tests pass Reviewers: lightmark, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D59763	9 years ago
sdong	249e796dfc	Fix Flaky DBCompactionTest.SkipStatsUpdateTest Summary: DBCompactionTest.SkipStatsUpdateTest sometimes fails. I don't see any verification related to the deletes issued. Remove them to avoid the uncertainty. Test Plan: Run the test. Reviewers: IslamAbdelRahman, andrewkr, yhchiang Reviewed By: yhchiang Subscribers: leveldb, andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D59613	9 years ago
Islam AbdelRahman	f5177c761f	Remove wasteful instrumentation in FullMerge (stacked on D59577) Summary: [ This diff is stacked on top of D59577 ] We keep calling timer.ElapsedNanos() on every call to MergeOperator::FullMerge even when statistics are disabled, this is wasteful. I run the readseq benchmark on a DB containing 100K merge operands for 100K keys (1 operand per key) with 1GB block cache I see slight performance improvment Original results ``` $ ./db_bench --benchmarks="readseq,readseq,readseq,readseq,readseq,readseq,readseq,readseq,readseq" --merge_operator="max" --merge_keys=100000 --num=100000 --db="/dev/shm/100K_merge_compacted/" --cache_size=1073741824 --use_existing_db --disable_auto_compactions ------------------------------------------------ DB path: [/dev/shm/100K_merge_compacted/] readseq : 0.498 micros/op 2006597 ops/sec; 222.0 MB/s DB path: [/dev/shm/100K_merge_compacted/] readseq : 0.295 micros/op 3393627 ops/sec; 375.4 MB/s DB path: [/dev/shm/100K_merge_compacted/] readseq : 0.285 micros/op 3511155 ops/sec; 388.4 MB/s DB path: [/dev/shm/100K_merge_compacted/] readseq : 0.286 micros/op 3500470 ops/sec; 387.2 MB/s DB path: [/dev/shm/100K_merge_compacted/] readseq : 0.283 micros/op 3530751 ops/sec; 390.6 MB/s DB path: [/dev/shm/100K_merge_compacted/] readseq : 0.289 micros/op 3464811 ops/sec; 383.3 MB/s DB path: [/dev/shm/100K_merge_compacted/] readseq : 0.277 micros/op 3612814 ops/sec; 399.7 MB/s DB path: [/dev/shm/100K_merge_compacted/] readseq : 0.283 micros/op 3539640 ops/sec; 391.6 MB/s DB path: [/dev/shm/100K_merge_compacted/] readseq : 0.285 micros/op 3503766 ops/sec; 387.6 MB/s ``` After patch ``` $ ./db_bench --benchmarks="readseq,readseq,readseq,readseq,readseq,readseq,readseq,readseq,readseq" --merge_operator="max" --merge_keys=100000 --num=100000 --db="/dev/shm/100K_merge_compacted/" --cache_size=1073741824 --use_existing_db --disable_auto_compactions ------------------------------------------------ DB path: [/dev/shm/100K_merge_compacted/] readseq : 0.476 micros/op 2100119 ops/sec; 232.3 MB/s DB path: [/dev/shm/100K_merge_compacted/] readseq : 0.278 micros/op 3600887 ops/sec; 398.4 MB/s DB path: [/dev/shm/100K_merge_compacted/] readseq : 0.275 micros/op 3636698 ops/sec; 402.3 MB/s DB path: [/dev/shm/100K_merge_compacted/] readseq : 0.271 micros/op 3691661 ops/sec; 408.4 MB/s DB path: [/dev/shm/100K_merge_compacted/] readseq : 0.273 micros/op 3661534 ops/sec; 405.1 MB/s DB path: [/dev/shm/100K_merge_compacted/] readseq : 0.276 micros/op 3627106 ops/sec; 401.3 MB/s DB path: [/dev/shm/100K_merge_compacted/] readseq : 0.272 micros/op 3682635 ops/sec; 407.4 MB/s DB path: [/dev/shm/100K_merge_compacted/] readseq : 0.266 micros/op 3758331 ops/sec; 415.8 MB/s DB path: [/dev/shm/100K_merge_compacted/] readseq : 0.266 micros/op 3761907 ops/sec; 416.2 MB/s ``` Test Plan: make check -j64 Reviewers: yhchiang, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D59583	9 years ago
Islam AbdelRahman	7c919deccc	Reuse TimedFullMerge instead of FullMerge + instrumentation Summary: We have alot of code duplication whenever we call FullMerge we keep duplicating the instrumentation and statistics code This is a simple diff to refactor the code to use TimedFullMerge instead of FullMerge Test Plan: COMPILE_WITH_ASAN=1 make check -j64 Reviewers: andrewkr, yhchiang, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D59577	9 years ago
Yi Wu	bc8af90e8c	add option to not flush memtable on open() Summary: Add option to not flush memtable on open() In case the option is enabled, don't delete existing log files by not updating log numbers to MANIFEST. Will still flush if we need to (e.g. memtable full in the middle). In that case we also flush final memtable. If wal_recovery_mode = kPointInTimeRecovery, do not halt immediately after encounter corruption. Instead, check if seq id of next log file is last_log_sequence + 1. In that case we continue recovery. Test Plan: See unit test. Reviewers: dhruba, horuff, sdong Reviewed By: sdong Subscribers: benj, yhchiang, andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D57813	9 years ago
sdong	6faddd7c55	Merge db/slice.cc into util/slice.cc Summary: It confuses some compilers to have slice.cc under multiple directories. Merge them. Test Plan: Run existing tests Reviewers: andrewkr, yhchiang, IslamAbdelRahman Reviewed By: IslamAbdelRahman Subscribers: leveldb, andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D59409	9 years ago
sdong	5009b5326b	BlockBasedTable::FullFilterKeyMayMatch() Should skip prefix bloom if full key bloom exists Summary: Currently, if users define both of full key bloom and prefix bloom in SST files. During Get(), if full key bloom shows the key may exist, we still go ahead and check prefix bloom. This is wasteful. If bloom filter for full keys exists, we should always ignore prefix bloom in Get(). Test Plan: Run existing tests Reviewers: yhchiang, IslamAbdelRahman Reviewed By: IslamAbdelRahman Subscribers: leveldb, andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D57825	9 years ago

1 2 3 4 5 ...

2388 Commits (afad5bd1c5f37782a96542a8663243e272c3edcb)