rocksdb

Commit Graph

Author	SHA1	Message	Date
Lei Jin	73d7147096	make rate limiter test more reliable Summary: Randomize keys so that compaction actually happens. Change the config so that compaction happens more aggressively. The test takes longer time, but the results are more stable shown by iostat Test Plan: ran it Reviewers: igor, yhchiang Reviewed By: yhchiang Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19533	11 years ago
Lei Jin	8a9cc7885c	report correct interval amplification Summary: as title Test Plan: make release Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19515	11 years ago
Lei Jin	534357ca3a	integrate rate limiter into rocksdb Summary: Add option and plugin rate limiter for PosixWritableFile. The rate limiter only applies to flush and compaction. WAL and MANIFEST are excluded from this enforcement. Test Plan: db_test Reviewers: igor, yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19425	11 years ago
Lei Jin	b278ae8e50	Apply fractional cascading in ForwardIterator::Seek() Summary: Use search hint to reduce FindFile range thus avoid comparison For a small DB with 50M keys, perf_context counter shows it reduces comparison from 2B to 1.3B for a 15-minute run. No perf change was observed for 1 seek thread, but quite good improvement was seen for 32 seek threads, when CPU was busy. will post detail results when ready Test Plan: db_bench and db_test Reviewers: haobo, sdong, dhruba, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D18879	11 years ago
Reed Allman	fd3fb4b0bf	C API: update options w/ convenience funcs & fifo compaction	11 years ago
Reed Allman	e9b18b6b89	C API: bugfix column_family_comact_range	11 years ago
Igor Canadi	4adf64e068	Fix compile issue	11 years ago
Igor Canadi	8a03935f8c	Fix valgrind error in c_test Summary: External contribution caused some valgrind errors: `1a34aaaef0` This diff fixes them Test Plan: ran valgrind Reviewers: sdong, yhchiang, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19485	11 years ago
Evan Shaw	13a130cc00	C API: Add test for compaction filter factories Also refactored the compaction filter tests to share some code and ensure that options were getting reset so future test results aren't confused.	11 years ago
Evan Shaw	3f7104d7c5	C API: Allow setting compaction filter factory	11 years ago
Evan Shaw	91bede79cc	C API: Add support for compaction filter factories (v1)	11 years ago
Radheshyam Balasundaram	f0660d5253	Adding NUMA support to db_bench tests Summary: Changes: - Adding numa_aware flag to db_bench.cc - Using numa.h library to bind memory and cpu of threads to a fixed NUMA node Result: There seems to be no significant change in the micros/op time with numa_aware enabled. I also tried this with other implementations, including a combination of pthread_setaffinity_np, sched_setaffinity and set_mempolicy methods. It'd be great if someone could point out where I'm going wrong and if we can achieve a better micors/op. Test Plan: Ran db_bench tests using following command: ./db_bench --db=/mnt/tmp --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --block_size=4096 --cache_size=17179869184 --cache_numshardbits=6 --compression_type=none --compression_ratio=1 --min_level_to_compress=-1 --disable_seek_compaction=1 --hard_rate_limit=2 --write_buffer_size=134217728 --max_write_buffer_number=2 --level0_file_num_compaction_trigger=8 --target_file_size_base=134217728 --max_bytes_for_level_base=1073741824 --disable_wal=0 --wal_dir=/mnt/tmp --sync=0 --disable_data_sync=1 --verify_checksum=1 --delete_obsolete_files_period_micros=314572800 --max_grandparent_overlap_factor=10 --max_background_compactions=4 --max_background_flushes=0 --level0_slowdown_writes_trigger=16 --level0_stop_writes_trigger=24 --statistics=0 --stats_per_interval=0 --stats_interval=1048576 --histogram=0 --use_plain_table=1 --open_files=-1 --mmap_read=1 --mmap_write=0 --memtablerep=prefix_hash --bloom_bits=10 --bloom_locality=1 --perf_level=0 --duration=300 --benchmarks=readwhilewriting --use_existing_db=1 --num=157286400 --threads=24 --writes_per_second=10240 --numa_aware=[False/True] The tests were run in private devserver with 24 cores and the db was prepopulated using filluniquerandom test. The tests resulted in 0.145 us/op with numa_aware=False and 0.161 us/op with numa_aware=True. Reviewers: sdong, yhchiang, ljin, igor Reviewed By: ljin, igor Subscribers: igor, leveldb Differential Revision: https://reviews.facebook.net/D19353	11 years ago
Reed Allman	1a34aaaef0	C API: column family support	11 years ago
Evan Shaw	9fc23d0c56	C API: support constructing write batch from serialized representation	11 years ago
Yueh-Hsuan Chiang	7b85c1e900	Improve SimpleWriteTimeoutTest to avoid false alarm. Summary: SimpleWriteTimeoutTest has two parts: 1) insert two large key/values to make memtable full and expect both of them are successful; 2) insert another key / value and expect it to be timed-out. Previously we also set a timeout in the first step, but this might sometimes cause false alarm. This diff makes the first two writes run without timeout setting. Test Plan: export ROCKSDB_TESTS=Time make db_test Reviewers: sdong, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19461	11 years ago
Yueh-Hsuan Chiang	d33657a4a5	Fixed a warning in release mode. Summary: Removed a variable that is only used in assertion check. Test Plan: make release Reviewers: ljin, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19455	11 years ago
Yueh-Hsuan Chiang	90a6aca48e	Finer report I/O stats about Flush and Compaction. Summary: This diff allows the I/O stats about Flush and Compaction to be reported in a more accurate way. Instead of measuring the size of a file, it measure I/O cost in per read / write basis. Test Plan: make all check Reviewers: sdong, igor, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19383	11 years ago
Yueh-Hsuan Chiang	d4d338de33	Add timeout_hint_us to WriteOptions and introduce Status::TimeOut. Summary: This diff adds timeout_hint_us to WriteOptions. If it's non-zero, then 1) writes associated with this options MAY be aborted when it has been waiting for longer than the specified time. If an abortion happens, associated writes will return Status::TimeOut. 2) the stall time of the associated write caused by flush or compaction will be limited by timeout_hint_us. The default value of timeout_hint_us is 0 (i.e., OFF.) The statistics of timeout writes will be recorded in WRITE_TIMEDOUT. Test Plan: export ROCKSDB_TESTS=WriteTimeoutAndDelayTest make db_test ./db_test Reviewers: igor, ljin, haobo, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D18837	11 years ago
Igor Canadi	4203431e71	Fix mac os compile error	11 years ago
sdong	2459f7ec4e	Support Multiple DB paths (without having an interface to expose to users) Summary: In this patch, we allow RocksDB to support multiple DB paths internally. No user interface is supported yet so this patch is silent to users. Test Plan: make all check Reviewers: igor, haobo, ljin, yhchiang Reviewed By: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D18921	11 years ago
Igor Canadi	f146cab261	Centralize compression decision to compaction picker Summary: Before this diff, we're deciding enable_compression in CompactionPicker and then we're deciding final compression type in DBImpl. This is kind of confusing. After the diff, the final compression type will be decided in CompactionPicker. The reason for this is that I want CompactFiles() to specify output compression type, so that people can mix and match compression styles in their compaction algorithms. This diff makes it much easier to do that. Test Plan: make check Reviewers: dhruba, haobo, sdong, yhchiang, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19137	11 years ago
sdong	1d05006740	Re-commit the correct part (WalDir) of the revision: Commit `6634844dba` by sdong Two small fixes in db_test Summary: Two fixes: (1) WalDir to pick a directory under TmpDir to allow two tests running in parallel without impacting each other (2) kBlockBasedTableWithWholeKeyHashIndex is disabled by mistake (I assume). Enable it. Test Plan: ./db_test Reviewers: yhchiang, ljin Reviewed By: ljin Subscribers: nkg-, igor, dhruba, haobo, leveldb Differential Revision: https://reviews.facebook.net/D19389	11 years ago
sdong	30b20604db	Revert "Two small fixes in db_test" This reverts commit `6634844dba`.	11 years ago
sdong	9c332aa11a	HashLinkList memtable switches a bucket to a skip list to reduce performance outliers Summary: In this patch, we enhance HashLinkList memtable to reduce performance outliers when a bucket contains too many entries. We switch to skip list for this case to enable binary search. Add threshold_use_skiplist parameter to determine when a bucket needs to switch to skip list. The new data structure is documented in comments in the codes. Test Plan: make all check set threshold_use_skiplist in several tests Reviewers: yhchiang, haobo, ljin Reviewed By: yhchiang, ljin Subscribers: nkg-, xjin, dhruba, yhchiang, leveldb Differential Revision: https://reviews.facebook.net/D19299	11 years ago
sdong	6634844dba	Two small fixes in db_test Summary: Two fixes: (1) WalDir to pick a directory under TmpDir to allow two tests running in parallel without impacting each other (2) kBlockBasedTableWithWholeKeyHashIndex is disabled by mistake (I assume). Enable it. Test Plan: ./db_test Reviewers: yhchiang, ljin Reviewed By: ljin Subscribers: nkg-, igor, dhruba, haobo, leveldb Differential Revision: https://reviews.facebook.net/D19389	11 years ago
Igor Canadi	f5d4df1c02	Fix compile error	11 years ago
Igor Canadi	a2e0d890ed	No need for files_by_size_ in universal compaction Summary: files_by_size_ is sorted by time in case of universal compaction. However, Version::files_ is also sorted by time. So no need for files_by_size_ Test Plan: 1) make check with the change 2) make check with `assert(last_index == c->input_version_->files_[level].size() - 1);` in compaction picker Reviewers: dhruba, haobo, yhchiang, sdong, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19125	11 years ago
Feng Zhu	5656367416	use arena to allocate memtable's bloomfilter and hashskiplist's buckets_ Summary: Bloomfilter and hashskiplist's buckets_ allocated by memtable's arena DynamicBloom: pass arena via constructor, allocate space in SetTotalBits HashSkipListRep: allocate space of buckets_ using arena. do not delete it in deconstructor because arena would take care of it. Several test files are changed. Test Plan: make all check Reviewers: ljin, haobo, yhchiang, sdong Reviewed By: sdong Subscribers: igor, dhruba Differential Revision: https://reviews.facebook.net/D19335	11 years ago
sdong	dd337bc0b2	In logging format, use PRIu64 instead of casting Summary: Code cleaning up, since we are already using __STDC_FORMAT_MACROS in printing uint64_t, change other places. Only logging is changed. Test Plan: make all check Reviewers: ljin Reviewed By: ljin Subscribers: dhruba, yhchiang, haobo, leveldb Differential Revision: https://reviews.facebook.net/D19113	11 years ago
Stanislau Hlebik	a3594867ba	Cache some conditions for DBImpl::MakeRoomForWrite Summary: Task 4580155. Some conditions in DBImpl::MakeRoomForWrite can be cached in ColumnFamilyData, because theirs value can be changed only during compaction, adding new memtable and/or add recalculation of compaction score. These conditions are: cfd->imm()->size() == cfd->options()->max_write_buffer_number - 1 cfd->current()->NumLevelFiles(0) >= cfd->options()->level0_stop_writes_trigger cfd->options()->soft_rate_limit > 0.0 && (score = cfd->current()->MaxCompactionScore()) > cfd->options()->soft_rate_limit cfd->options()->hard_rate_limit > 1.0 && (score = cfd->current()->MaxCompactionScore()) > cfd->options()->hard_rate_limit P.S. As it's my first diff, Siying suggested to add everybody as a reviewers for this diff. Sorry, if I forgot someone or add someone by mistake. Test Plan: make all check Reviewers: haobo, xjin, dhruba, yhchiang, zagfox, ljin, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19311	11 years ago
sdong	19de6a7aad	Remove MemTableRep::GetIterator(const Slice& slice) Summary: It seems to me that when ever function MemTableRep::GetIterator(const Slice& slice) is used, we can use MemTableRep::GetDynamicPrefixIterator() instead. Just delete it to simplify the codes. Test Plan: make all check Reviewers: yhchiang, ljin Reviewed By: ljin Subscribers: xjin, dhruba, haobo, leveldb Differential Revision: https://reviews.facebook.net/D19281	11 years ago
Yueh-Hsuan Chiang	8898a0a0d1	Reorder the member variables of FileMetaData to improve cache locality. Summary: Move stats related member variables of FileMetaData to the bottom to improve cache locality of normal DB operations. Test Plan: make Reviewers: haobo, ljin, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19287	11 years ago
Yueh-Hsuan Chiang	e813f5b6d9	Allow compaction to reclaim storage more effectively. Summary: This diff allows compaction to reclaim storage more effectively. In the current design, compactions are mainly triggered based on the file sizes. However, since deletion entries does not have value, files which have many deletion entries are less likely to be compacted. As a result, it may took a while to make deletion entries to be compacted. This diff address issue by compensating the size of deletion entries during compaction process: the size of each deletion entry in the compaction process is augmented by 2x average value size. The diff applies to both leveled and universal compacitons. Test Plan: develop CompactionDeletionTrigger make db_test ./db_test Reviewers: haobo, igor, ljin, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19029	11 years ago
Yueh-Hsuan Chiang	faa8d21922	Improve an assertion in RandomGenerator::Generate() in db_bench. Summary: RandomGenerator::Generate() currently has an assertion len < data_.size(). However, it is actually fine to have len == data_.size(). This diff change the assertion to len <= data_.size(). Test Plan: make db_bench ./db_bench Reviewers: haobo, sdong, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19269	11 years ago
Lei Jin	3b0dc76699	db_bench: measure the real latency of write/delete Summary: as title Test Plan: make release Reviewers: haobo, sdong, yhchiang Reviewed By: yhchiang Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19227	11 years ago
Lei Jin	a1b5650a75	db_bench: sanity check on compression ratio Summary: as requested by mark Test Plan: make release Reviewers: sdong, haobo Reviewed By: haobo Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19221	11 years ago
Igor Canadi	d4a8423334	Remove seek compaction Summary: As discussed in our internal group, we don't get much use of seek compaction at the moment, while it's making code more complicated and slower in some cases. This diff removes seek compaction and (hopefully) all code that was introduced to support seek compaction. There is one test case that relied on didIO information. I'll try to find another way to implement it. Test Plan: make check Reviewers: sdong, haobo, yhchiang, ljin, dhruba Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19161	11 years ago
Igor Canadi	107e08baa7	Use same sorting for all level 0 files Summary: We decided that one of the long term goals is to unify level and universal compaction. As a small first step, I'm unifying level 0 sorting methods. Previously, we used to sort level 0 files in level compaction by file number and in universal compaction by sequence number. But it turns out that in level compaction, sorting by file number is exactly the same as sorting by sequence number. Test Plan: Ran make check with bunch of asserts to verify the sorting order is exactly the same. Also, make check with this patch Reviewers: haobo, yhchiang, ljin, dhruba, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19131	11 years ago
Haobo Xu	7a9dd5f214	[RocksDB] Make block based table hash index more adaptive Summary: Currently, RocksDB returns error if a db written with prefix hash index, is later opened without providing a prefix extractor. This is uncessarily harsh. Without a prefix extractor, we could always fallback to the normal binary index. Test Plan: unit test, also manually veried LOG that fallback did occur. Reviewers: sdong, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19191	11 years ago
Yueh-Hsuan Chiang	4f5ccfd179	Fixed a potential write hang Summary: Currently, when something badly happen in the DB::Write() while the write-queue contains more than one element, the current design seems to forget to clean up the queue as well as wake-up all the writers, this potentially makes rocksdb hang on writes. Test Plan: make all check Reviewers: sdong, ljin, igor, haobo Reviewed By: haobo Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19167	11 years ago
Lei Jin	c4e90c79ed	bug fix: iteration over ColumnFamilySet needs to be under mutex Summary: asan_crash_test is failing on segfault Test Plan: running asan_crash_test Reviewers: sdong, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19149	11 years ago
Evan Shaw	5363eb8ad4	Add a test for using compaction filters via the C API	11 years ago
Evan Shaw	d72313a7fa	Add a way to set compaction filter in the C API	11 years ago
Evan Shaw	df2701373d	Support for compaction filters in the C API	11 years ago
sdong	edd47c5104	PlainTable to encode to avoid to rewrite prefix when it is the same as the previous key Summary: Add a encoding feature of PlainTable to encode PlainTable's keys to save some bytes for the same prefixes. The data format is documented in table/plain_table_factory.h Test Plan: Add unit test coverage in plain_table_db_test Reviewers: yhchiang, igor, dhruba, ljin, haobo Reviewed By: haobo Subscribers: nkg-, leveldb Differential Revision: https://reviews.facebook.net/D18735	11 years ago
Igor Canadi	3525aac9e5	Change order of parameters in adaptive table factory Summary: This is minor, but if we put the writing talbe factory as the third parameter, when we add a new table format, we'll have a situation: 1) block based factory 2) plain table factory 3) output factory 4) new format factory I think it makes more sense to have output as the first parameter. Also, fixed a NewAdaptiveTableFactory() call in unit test Test Plan: unit test Reviewers: sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19119	11 years ago
sdong	8c265c08f1	HashLinkList to log distribution of number of entries aross buckets Summary: Add two parameters of hash linked list to log distribution of number of entries across all buckets, and a sample row when there are too many entries in one single bucket. Test Plan: Turn it on in plain_table_db_test and see the logs. Reviewers: haobo, ljin Reviewed By: ljin Subscribers: leveldb, nkg-, dhruba, yhchiang Differential Revision: https://reviews.facebook.net/D19095	11 years ago
sdong	200e4b4a72	Add a table factory that can read DB with both of PlainTable and BlockBasedTable in it Summary: The new table factory is used if users want to convert a DB from one table format to the other. A user can use this table to open a DB written using one table format and write new files to another table format. Test Plan: add a unit test Reviewers: haobo, igor Reviewed By: igor Subscribers: dhruba, ljin, yhchiang, leveldb Differential Revision: https://reviews.facebook.net/D19017	11 years ago
Yueh-Hsuan Chiang	e6e259b8ab	Include max_write_buffer_number >= 2 to SanitizeOptions. Summary: Include max_write_buffer_number >= 2 to SanitizeOptions. Test Plan: make all check Reviewers: haobo, sdong, igor, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19077	11 years ago
sdong	cadc1adffa	Refactor: group metadata needed to open an SST file to a separate copyable struct Summary: We added multiple fields to FileMetaData recently and are planning to add more. This refactoring separate the minimum information for accessing the file. This object is copyable (FileMetaData is not copyable since the ref counter). I hope this refactoring can enable further improvements: (1) use it to design a more efficient data structure to speed up read queries. (2) in the future, when we add information of storage level, we can easily do the encoding, instead of enlarge this structure, which might expand memory work set for file meta data. The definition is same as current EncodedFileMetaData used in two level iterator, so now the logic in two level iterator is easier to understand. Test Plan: make all check Reviewers: haobo, igor, ljin Reviewed By: ljin Subscribers: leveldb, dhruba, yhchiang Differential Revision: https://reviews.facebook.net/D18933	11 years ago

... 33 34 35 36 37 ...

2775 Commits (5a9b4d74354c1499839728831fecafd9b75b0af5)