rocksdb

Commit Graph

Author	SHA1	Message	Date
sdong	94918ae84b	db_bench: explicitly clear buffer in compress benchmark Summary: It is reported that in compress benchmark in db_bench, zlib will cause an OOM. The suggestd fix was to clear the buffer. Test Plan: Build and run compress benchmark. Reviewers: IslamAbdelRahman, yhchiang, rven, andrewkr, kradhakrishnan, anthony Reviewed By: anthony Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D52857	9 years ago
Mark Callaghan	4041903ecd	Enhance db_bench write rate limit Summary: 1) changes tools/{benchmark,run_flash_bench}.sh to optionally use the write rate limit 2) removes code for --writes_per_second and switches the 'background' write rate limit to use --benchmark_write_rate_limit Replaces https://reviews.facebook.net/D49113 Task ID: #9555881 Blame Rev: Test Plan: tools/run_flash_bench.sh Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin PUBLIC platform impact section - Bugzilla: # - end platform impact - Reviewers: igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D52485	9 years ago
sdong	d8677a8d2c	Upgrade internal CLANG version for FB-internal gcc 4.8.1 Summary: After removing two move operations, we can make CLANG 3.7 build pass under GCC 4.8.1. Test Plan: USE_CLANG=1 ROCKSDB_FBCODE_BUILD_WITH_481=1 make all -j32 Reviewers: yhchiang, IslamAbdelRahman, rven, anthony Reviewed By: anthony Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D52365	9 years ago
Nathan Bronson	7d87f02799	support for concurrent adds to memtable Summary: This diff adds support for concurrent adds to the skiplist memtable implementations. Memory allocation is made thread-safe by the addition of a spinlock, with small per-core buffers to avoid contention. Concurrent memtable writes are made via an additional method and don't impose a performance overhead on the non-concurrent case, so parallelism can be selected on a per-batch basis. Write thread synchronization is an increasing bottleneck for higher levels of concurrency, so this diff adds --enable_write_thread_adaptive_yield (default off). This feature causes threads joining a write batch group to spin for a short time (default 100 usec) using sched_yield, rather than going to sleep on a mutex. If the timing of the yield calls indicates that another thread has actually run during the yield then spinning is avoided. This option improves performance for concurrent situations even without parallel adds, although it has the potential to increase CPU usage (and the heuristic adaptation is not yet mature). Parallel writes are not currently compatible with inplace updates, update callbacks, or delete filtering. Enable it with --allow_concurrent_memtable_write (and --enable_write_thread_adaptive_yield). Parallel memtable writes are performance neutral when there is no actual parallelism, and in my experiments (SSD server-class Linux and varying contention and key sizes for fillrandom) they are always a performance win when there is more than one thread. Statistics are updated earlier in the write path, dropping the number of DB mutex acquisitions from 2 to 1 for almost all cases. This diff was motivated and inspired by Yahoo's cLSM work. It is more conservative than cLSM: RocksDB's write batch group leader role is preserved (along with all of the existing flush and write throttling logic) and concurrent writers are blocked until all memtable insertions have completed and the sequence number has been advanced, to preserve linearizability. My test config is "db_bench -benchmarks=fillrandom -threads=$T -batch_size=1 -memtablerep=skip_list -value_size=100 --num=1000000/$T -level0_slowdown_writes_trigger=9999 -level0_stop_writes_trigger=9999 -disable_auto_compactions --max_write_buffer_number=8 -max_background_flushes=8 --disable_wal --write_buffer_size=160000000 --block_size=16384 --allow_concurrent_memtable_write" on a two-socket Xeon E5-2660 @ 2.2Ghz with lots of memory and an SSD hard drive. With 1 thread I get ~440Kops/sec. Peak performance for 1 socket (numactl -N1) is slightly more than 1Mops/sec, at 16 threads. Peak performance across both sockets happens at 30 threads, and is ~900Kops/sec, although with fewer threads there is less performance loss when the system has background work. Test Plan: 1. concurrent stress tests for InlineSkipList and DynamicBloom 2. make clean; make check 3. make clean; DISABLE_JEMALLOC=1 make valgrind_check; valgrind db_bench 4. make clean; COMPILE_WITH_TSAN=1 make all check; db_bench 5. make clean; COMPILE_WITH_ASAN=1 make all check; db_bench 6. make clean; OPT=-DROCKSDB_LITE make check 7. verify no perf regressions when disabled Reviewers: igor, sdong Reviewed By: sdong Subscribers: MarkCallaghan, IslamAbdelRahman, anthony, yhchiang, rven, sdong, guyg8, kradhakrishnan, dhruba Differential Revision: https://reviews.facebook.net/D50589	9 years ago
sdong	15b8902264	Change default options.delayed_write_rate Summary: We now have a mechanism to further slowdown writes. Double default options.delayed_write_rate to try to keep the default behavior closer to it used to be. Test Plan: Run all tests. Reviewers: IslamAbdelRahman Reviewed By: IslamAbdelRahman Subscribers: yhchiang, kradhakrishnan, rven, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D52281	9 years ago
Nathan Bronson	a48382399d	Fix use-after free in db_bench Test Plan: valgrind db_bench Reviewers: igor, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D52101	9 years ago
sdong	c37729a6a6	db_bench: --soft_pending_compaction_bytes_limit should set options.soft_pending_compaction_bytes_limit Summary: Fix a bug that options.soft_pending_compaction_bytes_limit is not actually set with --soft_pending_compaction_bytes_limit Test Plan: Run db_bench with this parameter and make sure the parameter is set correctly. Reviewers: anthony, kradhakrishnan, yhchiang, IslamAbdelRahman, igor, rven Reviewed By: rven Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D52125	9 years ago
Gunnar Kudrjavets	97265f5f14	Fix minor bugs in delete operator, snprintf, and size_t usage Summary: List of changes: 1) Fix the snprintf() usage in cases where wrong variable was used to determine the output buffer size. 2) Remove unnecessary checks before calling delete operator. 3) Increase code correctness by using size_t type when getting vector's size. 4) Unify the coding style by removing namespace::std usage at the top of the file to confirm to the majority usage. 5) Fix various lint errors pointed out by 'arc lint'. Test Plan: Code review and build: git diff make clean make -j 32 commit-prereq arc lint Reviewers: kradhakrishnan, sdong, rven, anthony, yhchiang, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D51849	9 years ago
Dmitri Smirnov	aca403d2b5	Fix another rebase problems.	9 years ago
Dmitri Smirnov	236fe21c92	Enable MS compiler warning c4244. Mostly due to the fact that there are differences in sizes of int,long on 64 bit systems vs GNU.	9 years ago
sdong	56e77f0967	Deprecate options.soft_rate_limit and add options.soft_pending_compaction_bytes_limit Summary: Deprecate options.soft_rate_limit, which is hard to tune, with options.soft_pending_compaction_bytes_limit, which would trigger the slowdown if estimated pending compaction bytes exceeds the threshold. The hope is to make it more striaght-forward to tune. Test Plan: Modify DBTest.SoftLimit to cover options.soft_pending_compaction_bytes_limit instead; run all unit tests. Reviewers: IslamAbdelRahman, yhchiang, rven, kradhakrishnan, igor, anthony Reviewed By: anthony Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D51117	9 years ago
sdong	ac8e56f050	db_bench: in uncompress benchmark, get Snappy size from compressed stream Summary: Now in benchmark "uncompress" in db_bench, we get size from compressed stream for all other compression types except Snappy, where we allocate memory based on parameter. Change it to match to behavior of other compression types. Test Plan: Run ./db_bench --benchmarks=uncompress with snappy and other compression types. Reviewers: yhchiang, kradhakrishnan, anthony, IslamAbdelRahman, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D51681	9 years ago
SherlockNoMad	ebc2d490d1	Split histogram per OperationType in db_bench	9 years ago
Islam AbdelRahman	a163cc2d5a	Lint everything Summary: ``` arc2 lint --everything ``` run the linter on the whole code repo to fix exisitng lint issues Test Plan: make check -j64 Reviewers: sdong, rven, anthony, kradhakrishnan, yhchiang Reviewed By: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D50769	9 years ago
Dmitri Smirnov	20f57b1715	Enable Windows warnings C4307 C4309 C4512 C4701 Enable C4307 'operator' : integral constant overflow Longs and ints on Windows are 32-bit hence the overflow Enable C4309 'conversion' : truncation of constant value Enable C4512 'class' : assignment operator could not be generated Enable C4701 Potentially uninitialized local variable 'name' used	9 years ago
SherlockNoMad	ccc8c10c0c	Move skip_table_builder_flush to BlockBasedTableOption	9 years ago
sdong	11c71a365a	db_bench: --compaction_pri default should be rocksdb::Options().compaction_pri Summary: Currently db_bnech's --compaction_pri default is set to be rocksdb::Options().compaction_style. Change it to rocksdb::Options().compaction_pri. Although, for now both is 0. Test Plan: Build db_bench Reviewers: anthony, rven, IslamAbdelRahman, kradhakrishnan, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D49773	9 years ago
SherlockNoMad	a6dd0831d5	Add Option to Skip Flushing in TableBuilder	9 years ago
Dmitri Smirnov	1277a48f1b	Fix 80 character limit issue.	9 years ago
Dmitri Smirnov	6fbc4f9f3e	Implement smart buffer management. introduce a new DBOption random_access_max_buffer_size to limit the size of the random access buffer used for unbuffered access. Implement read ahead buffering when enabled. To that effect propagate compaction_readahead_size and the new option to the env options to make it available for the implementation. Add Hint() override so SetupForCompaction() call would call Hint() readahead can now be setup from both Hint() and EnableReadAhead() Add new option random_access_max_buffer_size support db_bench, options_helper to make it string parsable and the unit test.	9 years ago
Islam AbdelRahman	b81b2ec25d	Fix benchmarks under ROCKSDB_LITE Summary: Fix db_bench and memtablerep_bench under ROCKSDB_LITE Test Plan: OPT=-DROCKSDB_LITE make db_bench -j64 OPT=-DROCKSDB_LITE make memtablerep_bench -j64 make db_bench -j64 make memtablerep_bench -j64 Reviewers: yhchiang, anthony, rven, igor, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D48717	9 years ago
Islam AbdelRahman	c29af48d3e	Add max_file_opening_threads to db_bench Summary: Add an option to db_bench for max_file_opening_threads Test Plan: compile and run db_bench Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: dhruba, paultuckfield Differential Revision: https://reviews.facebook.net/D47811	9 years ago
sdong	f1b9f804e9	Add a mode to always pick the oldest file to compact for each level Summary: Add options.compaction_pri, which specifies the policy about which file to compact first. kCompactionPriByLargestSeq will compact oldest files first. Verified the behavior in db_bench but did not write unit tests yet. Also need to make it settable through option string and dynamically changeable. Test Plan: Will write unit tests Reviewers: igor, rven, anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, MarkCallaghan Reviewed By: yhchiang Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D45951	9 years ago
Andres Noetzli	014fd55adc	Support for SingleDelete() Summary: This patch fixes #7460559. It introduces SingleDelete as a new database operation. This operation can be used to delete keys that were never overwritten (no put following another put of the same key). If an overwritten key is single deleted the behavior is undefined. Single deletion of a non-existent key has no effect but multiple consecutive single deletions are not allowed (see limitations). In contrast to the conventional Delete() operation, the deletion entry is removed along with the value when the two are lined up in a compaction. Note: The semantics are similar to @igor's prototype that allowed to have this behavior on the granularity of a column family ( https://reviews.facebook.net/D42093 ). This new patch, however, is more aggressive when it comes to removing tombstones: It removes the SingleDelete together with the value whenever there is no snapshot between them while the older patch only did this when the sequence number of the deletion was older than the earliest snapshot. Most of the complex additions are in the Compaction Iterator, all other changes should be relatively straightforward. The patch also includes basic support for single deletions in db_stress and db_bench. Limitations: - Not compatible with cuckoo hash tables - Single deletions cannot be used in combination with merges and normal deletions on the same key (other keys are not affected by this) - Consecutive single deletions are currently not allowed (and older version of this patch supported this so it could be resurrected if needed) Test Plan: make all check Reviewers: yhchiang, sdong, rven, anthony, yoshinorim, igor Reviewed By: igor Subscribers: maykov, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D43179	9 years ago
sdong	5de807ac16	Add options.hard_pending_compaction_bytes_limit to stop writes if compaction lagging behind Summary: Add an option to stop writes if compaction lefts behind. If estimated pending compaction bytes is more than threshold specified by options.hard_pending_compaction_bytes_liimt, writes will stop until compactions are cleared to under the threshold. Test Plan: Add unit test DBTest.HardLimit Reviewers: rven, kradhakrishnan, anthony, IslamAbdelRahman, yhchiang, igor Reviewed By: igor Subscribers: MarkCallaghan, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D45999	9 years ago
Dmitri Smirnov	30e82d5c41	Refactor to support file_reader_writer on Windows. Summary. A change https://reviews.facebook.net/differential/diff/224721/ Has attempted to move common functionality out of platform dependent code to a new facility called file_reader_writer. This includes: - perf counters - Buffering - RateLimiting However, the change did not attempt to refactor Windows code. To mitigate, we introduce new quering interfaces such as UseOSBuffer(), GetRequiredBufferAlignment() and ReaderWriterForward() for pure forwarding where required. Introduce WritableFile got a new method Truncate(). This is to communicate to the file as to how much data it has on close. - When space is pre-allocated on Linux it is filled with zeros implicitly, no such thing exist on Windows so we must truncate file on close. - When operating in unbuffered mode the last page is filled with zeros but we still want to truncate. Previously, Close() would take care of it but now buffer management is shifted to the wrappers and the file has no idea about the file true size. This means that Close() on the wrapper level must always include Truncate() as well as wrapper __dtor should call Close() and against double Close(). Move buffered/unbuffered write logic to the wrapper. Utilize Aligned buffer class. Adjust tests and implement Truncate() where necessary. Come up with reasonable defaults for new virtual interfaces. Forward calls for RandomAccessReadAhead class to avoid double buffering and locking (double locking in unbuffered mode on WIndows).	9 years ago
agiardullo	18db1e4695	better db_bench options for transactions Summary: Pessimistic Transaction expiration time checking currently causes a performace regression, Lets disable it in db_bench by default. Also, in order to be able to better tune how much contention we're simulating, added new optinos to set lock timeout and snapshot. Test Plan: run db_bench randomtranansaction Reviewers: sdong, igor, yhchiang, MarkCallaghan Reviewed By: MarkCallaghan Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D45831	9 years ago
sdong	7a0dbdf3ac	Add ZSTD (not final format) compression type Summary: Add ZSTD compression type. The same way as adding LZ4. Test Plan: run all tests. Generate files in db_bench. Make sure reads succeed. But the SST files cannot be opened in older versions. Also some other adhoc tests. Reviewers: rven, anthony, IslamAbdelRahman, kradhakrishnan, igor Reviewed By: igor Subscribers: MarkCallaghan, maykov, yoshinorim, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D45747	9 years ago
Yueh-Hsuan Chiang	8ef0144e2f	Add argument --show_table_properties to db_bench Summary: Add argument --show_table_properties to db_bench -show_table_properties (If true, then per-level table properties will be printed on every stats-interval when stats_interval is set and stats_per_interval is on.) type: bool default: false Test Plan: ./db_bench --show_table_properties=1 --stats_interval=100000 --stats_per_interval=1 ./db_bench --show_table_properties=1 --stats_interval=100000 --stats_per_interval=1 --num_column_families=2 Sample Output: Compaction Stats [column_family_name_000001] Level Files Size(MB) Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) Stall(cnt) KeyIn KeyDrop --------------------------------------------------------------------------------------------------------------------------------------------------------------------- L0 3/0 5 0.8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 86.3 0 17 0.021 0 0 0 L1 5/0 9 0.9 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.000 0 0 0 L2 9/0 16 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.000 0 0 0 Sum 17/0 31 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 86.3 0 17 0.021 0 0 0 Int 0/0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 83.9 0 2 0.022 0 0 0 Flush(GB): cumulative 0.030, interval 0.004 Stalls(count): 0 level0_slowdown, 0 level0_numfiles, 0 memtable_compaction, 0 leveln_slowdown_soft, 0 leveln_slowdown_hard Level[0]: # data blocks=2571; # entries=84813; raw key size=2035512; raw average key size=24.000000; raw value size=8481300; raw average value size=100.000000; data block size=5690119; index block size=82415; filter block size=0; (estimated) table size=5772534; filter policy name=N/A; Level[1]: # data blocks=4285; # entries=141355; raw key size=3392520; raw average key size=24.000000; raw value size=14135500; raw average value size=100.000000; data block size=9487353; index block size=137377; filter block size=0; (estimated) table size=9624730; filter policy name=N/A; Level[2]: # data blocks=7713; # entries=254439; raw key size=6106536; raw average key size=24.000000; raw value size=25443900; raw average value size=100.000000; data block size=17077893; index block size=247269; filter block size=0; (estimated) table size=17325162; filter policy name=N/A; Level[3]: # data blocks=0; # entries=0; raw key size=0; raw average key size=0.000000; raw value size=0; raw average value size=0.000000; data block size=0; index block size=0; filter block size=0; (estimated) table size=0; filter policy name=N/A; Level[4]: # data blocks=0; # entries=0; raw key size=0; raw average key size=0.000000; raw value size=0; raw average value size=0.000000; data block size=0; index block size=0; filter block size=0; (estimated) table size=0; filter policy name=N/A; Level[5]: # data blocks=0; # entries=0; raw key size=0; raw average key size=0.000000; raw value size=0; raw average value size=0.000000; data block size=0; index block size=0; filter block size=0; (estimated) table size=0; filter policy name=N/A; Level[6]: # data blocks=0; # entries=0; raw key size=0; raw average key size=0.000000; raw value size=0; raw average value size=0.000000; data block size=0; index block size=0; filter block size=0; (estimated) table size=0; filter policy name=N/A; Reviewers: anthony, IslamAbdelRahman, MarkCallaghan, sdong, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D45651	9 years ago
Igor Canadi	5f4166c90e	ReadaheadRandomAccessFile -- userspace readahead Summary: ReadaheadRandomAccessFile acts as a transparent layer on top of RandomAccessFile. When a Read() request is issued, it issues a much bigger request to the OS and caches the result. When a new request comes in and we already have the data cached, it doesn't have to issue any requests to the OS. We add ReadaheadRandomAccessFile layer only when file is read during compactions. D45105 was incorrectly closed by Phabricator because I committed it to a separate branch (not master), so I'm resubmitting the diff. Test Plan: make check Reviewers: MarkCallaghan, sdong Reviewed By: sdong Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D45123	9 years ago
Andres Noetzli	2050832974	Fixing race condition in DBTest.DynamicMemtableOptions Summary: This patch fixes a race condition in DBTEst.DynamicMemtableOptions. In rare cases, it was possible that the main thread would fill up both memtables before the flush job acquired its work. Then, the flush job was flushing both memtables together, producing only one L0 file while the test expected two. Now, the test waits for flushes to finish earlier, to make sure that the memtables are flushed in separate flush jobs. Test Plan: Insert "usleep(10000);" after "IOSTATS_SET_THREAD_POOL_ID(Env::Priority::HIGH);" in BGWorkFlush() to make the issue more likely. Then test with: make db_test && time while ./db_test --gtest_filter=*DynamicMemtableOptions; do true; done Reviewers: rven, sdong, yhchiang, anthony, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D45429	9 years ago
Ari Ekmekji	b6def58f73	Changed 'num_subcompactions' to the more accurate 'max_subcompactions' Summary: Up until this point we had DbOptions.num_subcompactions, but it is semantically more correct to call this max_subcompactions since we will schedule up to DbOptions.max_subcompactions smaller compactions at a time during a compaction job. I also added a --subcompactions option to db_bench Test Plan: make all make check Reviewers: sdong, igor, anthony, yhchiang Reviewed By: yhchiang Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D45069	9 years ago
sdong	603b6da8b8	Add options.compaction_measure_io_stats to print write I/O stats in compactions Summary: Add options.compaction_measure_io_stats to print out / pass to listener accumulated time spent on write calls. Example outputs in info logs: 2015/08/12-16:27:59.463944 7fd428bff700 (Original Log Time 2015/08/12-16:27:59.463922) EVENT_LOG_v1 {"time_micros": 1439422079463897, "job": 6, "event": "compaction_finished", "output_level": 1, "num_output_files": 4, "total_output_size": 6900525, "num_input_records": 111483, "num_output_records": 106877, "file_write_nanos": 15663206, "file_range_sync_nanos": 649588, "file_fsync_nanos": 349614797, "file_prepare_write_nanos": 1505812, "lsm_state": [2, 4, 0, 0, 0, 0, 0]} Add two more counters in iostats_context. Also add a parameter of db_bench. Test Plan: Add a unit test. Also manually verify LOG outputs in db_bench Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D44115	9 years ago
agiardullo	c2f2cb0214	Pessimistic Transactions Summary: Initial implementation of Pessimistic Transactions. This diff contains the api changes discussed in D38913. This diff is pretty large, so let me know if people would prefer to meet up to discuss it. MyRocks folks: please take a look at the API in include/rocksdb/utilities/transaction[_db].h and let me know if you have any issues. Also, you'll notice a couple of TODOs in the implementation of RollbackToSavePoint(). After chatting with Siying, I'm going to send out a separate diff for an alternate implementation of this feature that implements the rollback inside of WriteBatch/WriteBatchWithIndex. We can then decide which route is preferable. Next, I'm planning on doing some perf testing and then integrating this diff into MongoRocks for further testing. Test Plan: Unit tests, db_bench parallel testing. Reviewers: igor, rven, sdong, yhchiang, yoshinorim Reviewed By: sdong Subscribers: hermanlee4, maykov, spetrunia, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D40869	9 years ago
Andres Notzli	4249f159d5	Removing duplicate code in db_bench/db_stress, fixing typos Summary: While working on single delete support for db_bench, I realized that db_bench/db_stress contain a bunch of duplicate code related to copmression and found some typos. This patch removes duplicate code, typos and a redundant #ifndef in internal_stats.cc. Test Plan: make db_stress && make db_bench && ./db_bench --benchmarks=compress,uncompress Reviewers: yhchiang, sdong, rven, anthony, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D43965	9 years ago
sdong	ee80432ff8	db_bench add an option of --universal_allow_trivial_move Summary: Now we allow trivial move in universal compaction. Add a parameter in db_bench Test Plan: Run db_bench with this option on and off and make sure the option is switched correctly. Reviewers: yhchiang, igor, kradhakrishnan, anthony Reviewed By: anthony Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D41427	9 years ago
Aaron Feldman	2c8de0ecae	Update --help message in db_bench. Summary: Remove --help entry for readhot. Update read_random_exp_range flag description: The distribution is num * exp(-r), not num * exp(r). Test Plan: Run ./db_bench --help Reviewers: sdong, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D42303	9 years ago
sdong	f9728640f3	"make format" against last 10 commits Summary: This helps Windows port to format their changes, as discussed. Might have formatted some other codes too becasue last 10 commits include more. Test Plan: Build it. Reviewers: anthony, IslamAbdelRahman, kradhakrishnan, yhchiang, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D41961	9 years ago
Aaron Feldman	1f4d565709	Add db_bench flag to set cache_index_and_filter_blocks Summary: The new flag --cache_index_and_filter_blocks sets BlockBasedTableOptions.cache_index_and_filter_blocks Test Plan: make db_bench. Working on benchmarks with the new flag. Reviewers: igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D41481	9 years ago
Dmitri Smirnov	d2f0912bd3	Merge the latest changes from github/master	9 years ago
Dmitri Smirnov	18285c1e2f	Windows Port from Microsoft Summary: Make RocksDb build and run on Windows to be functionally complete and performant. All existing test cases run with no regressions. Performance numbers are in the pull-request. Test plan: make all of the existing unit tests pass, obtain perf numbers. Co-authored-by: Praveen Rao praveensinghrao@outlook.com Co-authored-by: Sherlock Huang baihan.huang@gmail.com Co-authored-by: Alex Zinoviev alexander.zinoviev@me.com Co-authored-by: Dmitri Smirnov dmitrism@microsoft.com	9 years ago
Giuseppe Ottaviano	782a1590f9	Implement a table-level row cache Summary: Implementation of a table-level row cache. It only caches point queries done through the `DB::Get` interface, queries done through the `Iterator` interface will completely skip the cache. Supports snapshots and merge operations. Test Plan: Ran `make valgrind_check commit-prereq` Reviewers: igor, philipp, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D39849	10 years ago
Igor Canadi	2dc3910b5e	Add --benchmark_write_rate_limit option to db_bench Summary: So far, we benchmarked RocksDB by writing as fast as possible. With this change, we're able to limit our write throughput, which should help us better understand how RocksDB performes under varying write workloads. Specifically, I'm currently interested in the shape of the graph that has write throughput on one axis and write rate on another. This should help us with designing our stall system, as we have started to do with D36351. Test Plan: $ ./db_bench --benchmarks=fillrandom --benchmark_write_rate_limit=1000000 fillrandom : 118.523 micros/op 8437 ops/sec; 0.9 MB/s $ ./db_bench --benchmarks=fillrandom --benchmark_write_rate_limit=2000000 fillrandom : 59.136 micros/op 16910 ops/sec; 1.9 MB/s Reviewers: MarkCallaghan, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D39759	10 years ago
Islam AbdelRahman	12e030a992	Use CompactRangeOptions for CompactRange Summary: This diff update DB::CompactRange to use RangeCompactionOptions instead of using multiple parameters Old CompactRange is still available but deprecated Test Plan: make all check make rocksdbjava USE_CLANG=1 make all OPT=-DROCKSDB_LITE make release Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D40209	10 years ago
Igor Canadi	d59d90bb1f	db_bench periodically writes QPS to CSV file Summary: This is part of an effort to better understand and optimize RocksDB stalls under high load. I added a feature to db_bench to periodically write QPS to CSV files. That way we can nicely see how our QPS changes in time (especially when DB is stalled) and can do a better job of evaluating our stall system (i.e. we want the QPS to be as constant as possible, as opposed to having bunch of stalls) Cool part of CSV files is that we can easily graph them -- there are a bunch of tools available. Test Plan: Ran ./db_bench --report_interval_seconds=10 --benchmarks=fillrandom --num=10000000 and observed this in report.csv: secs_elapsed,interval_qps 10,2725860 20,1980480 30,1863456 40,1454359 50,1460389 Reviewers: sdong, MarkCallaghan, rven, yhchiang Reviewed By: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D40047	10 years ago
sdong	7842920be5	Slow down writes by bytes written Summary: We slow down data into the database to the rate of options.delayed_write_rate (a new option) with this patch. The thread synchronization approach I take is to still synchronize write controller by DB mutex and GetDelay() is inside DB mutex. Try to minimize the frequency of getting time in GetDelay(). I verified it through db_bench and it seems to work hard_rate_limit is deprecated. options.delayed_write_rate is still not dynamically changeable. Need to work on it as a follow-up. Test Plan: Add new unit tests in db_test Reviewers: yhchiang, rven, kradhakrishnan, anthony, MarkCallaghan, igor Reviewed By: igor Subscribers: ikabiljo, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D36351	10 years ago
sdong	e409d3d745	Make "make all" work for CYGWIN Summary: Some test and benchmark codes don't build for CYGWIN. Fix it. Test Plan: Build "make all" with TARGET_OS=Cygwin on cygwin and make sure it passes. Reviewers: rven, yhchiang, anthony, igor, kradhakrishnan Reviewed By: igor, kradhakrishnan Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D39711	10 years ago
Igor Canadi	4c181f08bc	Fix compile on darwin Summary: As title Test Plan: make check Reviewers: anthony Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D39243	10 years ago
agiardullo	dc9d70de65	Optimistic Transactions Summary: Optimistic transactions supporting begin/commit/rollback semantics. Currently relies on checking the memtable to determine if there are any collisions at commit time. Not yet implemented would be a way of enuring the memtable has some minimum amount of history so that we won't fail to commit when the memtable is empty. You should probably start with transaction.h to get an overview of what is currently supported. Test Plan: Added a new test, but still need to look into stress testing. Reviewers: yhchiang, igor, rven, sdong Reviewed By: sdong Subscribers: adamretter, MarkCallaghan, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D33435	10 years ago
agiardullo	c815351038	Support saving history in memtable_list Summary: For transactions, we are using the memtables to validate that there are no write conflicts. But after flushing, we don't have any memtables, and transactions could fail to commit. So we want to someone keep around some extra history to use for conflict checking. In addition, we want to provide a way to increase the size of this history if too many transactions fail to commit. After chatting with people, it seems like everyone prefers just using Memtables to store this history (instead of a separate history structure). It seems like the best place for this is abstracted inside the memtable_list. I decide to create a separate list in MemtableListVersion as using the same list complicated the flush/installalflushresults logic too much. This diff adds a new parameter to control how much memtable history to keep around after flushing. However, it sounds like people aren't too fond of adding new parameters. So I am making the default size of flushed+not-flushed memtables be set to max_write_buffers. This should not change the maximum amount of memory used, but make it more likely we're using closer the the limit. (We are now postponing deleting flushed memtables until the max_write_buffer limit is reached). So while we might use more memory on average, we are still obeying the limit set (and you could argue it's better to go ahead and use up memory now instead of waiting for a write stall to happen to test this limit). However, if people are opposed to this default behavior, we can easily set it to 0 and require this parameter be set in order to use transactions. Test Plan: Added a xfunc test to play around with setting different values of this parameter in all tests. Added testing in memtablelist_test and planning on adding more testing here. Reviewers: sdong, rven, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D37443	10 years ago

1 2 3 4 5 ...

319 Commits (d6c838f1e130d8860407bc771fa6d4ac238859ba)