rocksdb

Commit Graph

Author	SHA1	Message	Date
zhangjinpeng1987	45c7ce1377	CompactRangeOptions C API Summary: Add C API for CompactRangeOptions. Closes https://github.com/facebook/rocksdb/pull/1596 Differential Revision: D4252339 Pulled By: yiwu-arbug fbshipit-source-id: f768f93	8 years ago
Andrew Kryczka	b821984d31	DeleteRange read path end-to-end tests Summary: Closes https://github.com/facebook/rocksdb/pull/1592 Differential Revision: D4246260 Pulled By: ajkr fbshipit-source-id: ce03fa2	8 years ago
Artemiy Kolesnikov	2f4fc539c6	Compaction::IsTrivialMove relaxing Summary: IsTrivialMove returns true if no input file overlaps with output_level+1 with more than max_compaction_bytes_ bytes. Closes https://github.com/facebook/rocksdb/pull/1619 Differential Revision: D4278338 Pulled By: yiwu-arbug fbshipit-source-id: 994c001	8 years ago
Islam AbdelRahman	ed8fbdb560	Add EventListener::OnExternalFileIngested() event Summary: Add EventListener::OnExternalFileIngested() to allow user to subscribe to external file ingestion events Closes https://github.com/facebook/rocksdb/pull/1623 Differential Revision: D4285844 Pulled By: IslamAbdelRahman fbshipit-source-id: 0b95a88	8 years ago
Mike Kolupaev	beb36d9c1e	Fixed CompactionFilter::Decision::kRemoveAndSkipUntil Summary: Embarassingly enough, the first time I tried to use my new feature in logdevice it crashed with this assertion failure: db/pinned_iterators_manager.h:30: void rocksdb::PinnedIteratorsManager::StartPinning(): Assertion `pinning_enabled == false' failed The issue was that `pinned_iters_mgr_.StartPinning()` was called but `pinned_iters_mgr_.ReleasePinnedData()` wasn't. Closes https://github.com/facebook/rocksdb/pull/1611 Differential Revision: D4265622 Pulled By: al13n321 fbshipit-source-id: 747b10f	8 years ago
Islam AbdelRahman	67f37cf198	Allow user to specify a CF for SST files generated by SstFileWriter Summary: Allow user to explicitly specify that the generated file by SstFileWriter will be ingested in a specific CF. This allow us to persist the CF id in the generated file Closes https://github.com/facebook/rocksdb/pull/1615 Differential Revision: D4270422 Pulled By: IslamAbdelRahman fbshipit-source-id: 7fb954e	8 years ago
Anton Safonov	9053fe2a5c	Made delete_obsolete_files_period_micros option dynamic Summary: Made delete_obsolete_files_period_micros option dynamic. It can be updating using DB::SetDBOptions(). Closes https://github.com/facebook/rocksdb/pull/1595 Differential Revision: D4246569 Pulled By: tonek fbshipit-source-id: d23f560	8 years ago
Islam AbdelRahman	edde954e7b	fix clang build Summary: override is missing for FilterV2 Closes https://github.com/facebook/rocksdb/pull/1606 Differential Revision: D4263832 Pulled By: IslamAbdelRahman fbshipit-source-id: d8b337a	8 years ago
Islam AbdelRahman	e39d080871	Fix travis (compile for clang < 3.9) Summary: Travis fail because it uses clang 3.6 which don't recognize `__attribute__((__no_sanitize__("undefined")))` Closes https://github.com/facebook/rocksdb/pull/1601 Differential Revision: D4257175 Pulled By: IslamAbdelRahman fbshipit-source-id: fb4d1ab	8 years ago
fangchenliaohui	b77007df8b	Bug: paralle_group status updated in WriteThread::CompleteParallelWorker Summary: Multi-write thread may update the status of the parallel_group in WriteThread::CompleteParallelWorker if the status of Writer is not ok! When copy write status to the paralle_group, the write thread just hold the mutex of the the writer processed by itself. it is useless. The thread should held the the leader of the parallel_group instead. Closes https://github.com/facebook/rocksdb/pull/1598 Differential Revision: D4252335 Pulled By: siying fbshipit-source-id: 3864cf7	8 years ago
Mike Kolupaev	247d0979aa	Support for range skips in compaction filter Summary: This adds the ability for compaction filter to say "drop this key-value, and also drop everything up to key x". This will cause the compaction to seek input iterator to x, without reading the data. This can make compaction much faster when large consecutive chunks of data are filtered out. See the changes in include/rocksdb/compaction_filter.h for the new API. Along the way this diff also adds ability for compaction filter changing merge operands, similar to how it can change values; we're not going to use this feature, it just seemed easier and cleaner to implement it than to document that it's not implemented :) The diff is not as big as it may seem, about half of the lines are a test. Closes https://github.com/facebook/rocksdb/pull/1599 Differential Revision: D4252092 Pulled By: al13n321 fbshipit-source-id: 41e1e48	8 years ago
Panagiotis Ktistakis	96fcefbf1d	c api: expose option for dynamic level size target Summary: Closes https://github.com/facebook/rocksdb/pull/1587 Differential Revision: D4245923 Pulled By: yiwu-arbug fbshipit-source-id: 6ee7291	8 years ago
zhangjinpeng1987	00197cff39	Add C API to set base_backgroud_compactions Summary: Add C API to set base_backgroud_compactions Closes https://github.com/facebook/rocksdb/pull/1571 Differential Revision: D4245709 Pulled By: yiwu-arbug fbshipit-source-id: 792c6b8	8 years ago
Andrew Kryczka	5b219eccb5	deleterange end-to-end test improvements for lite/robustness Summary: Closes https://github.com/facebook/rocksdb/pull/1591 Differential Revision: D4246019 Pulled By: ajkr fbshipit-source-id: 0c4aa37	9 years ago
Andrew Kryczka	e333528991	DeleteRange write path end-to-end tests Summary: Closes https://github.com/facebook/rocksdb/pull/1578 Differential Revision: D4241171 Pulled By: ajkr fbshipit-source-id: ce5fd83	9 years ago
Siying Dong	7784980fcd	Fix mis-reporting of compaction read bytes to the base level Summary: In dynamic leveled compaction, when calculating read bytes, output level bytes may be wronglyl calculated as input level inputs. Fix it. Closes https://github.com/facebook/rocksdb/pull/1475 Differential Revision: D4148412 Pulled By: siying fbshipit-source-id: f2f475a	9 years ago
Islam AbdelRahman	3c6b49ed66	Fix implicit conversion between int64_t to int Summary: Make conversion explicit, implicit conversion breaks the build Closes https://github.com/facebook/rocksdb/pull/1589 Differential Revision: D4245158 Pulled By: IslamAbdelRahman fbshipit-source-id: aaec00d	9 years ago
Siying Dong	b3b875657f	Remove unused assignment in db/db_iter.cc Summary: "make analyze" complains the assignment is not useful. Remove it. Closes https://github.com/facebook/rocksdb/pull/1581 Differential Revision: D4241697 Pulled By: siying fbshipit-source-id: 178f67a	9 years ago
Andrew Kryczka	4f6e89b1d0	Fix range deletion covering key in same SST file Summary: AddTombstones() needs to be before t->Get(), oops :'( Closes https://github.com/facebook/rocksdb/pull/1576 Differential Revision: D4241041 Pulled By: ajkr fbshipit-source-id: 781ceea	9 years ago
Islam AbdelRahman	a2bf265a39	Avoid intentional overflow in GetL0ThresholdSpeedupCompaction Summary: `99c052a34f` fixes integer overflow in GetL0ThresholdSpeedupCompaction() by checking if int become -ve. UBSAN will complain about that since this is still an overflow, we can fix the issue by simply using int64_t Closes https://github.com/facebook/rocksdb/pull/1582 Differential Revision: D4241525 Pulled By: IslamAbdelRahman fbshipit-source-id: b3ae21f	9 years ago
Islam AbdelRahman	52fd1ff2c2	disable UBSAN for functions with intentional -ve shift / overflow Summary: disable UBSAN for functions with intentional left shift on -ve number / overflow These functions are rocksdb:: Hash FixedLengthColBufEncoder::Append FaultInjectionTest:: Key Closes https://github.com/facebook/rocksdb/pull/1577 Differential Revision: D4240801 Pulled By: IslamAbdelRahman fbshipit-source-id: 3e1caf6	9 years ago
Islam AbdelRahman	1886c435b9	Fix CompactionJob::Install division by zero Summary: Fix CompactionJob::Install division by zero Closes https://github.com/facebook/rocksdb/pull/1580 Differential Revision: D4240794 Pulled By: IslamAbdelRahman fbshipit-source-id: 7286721	9 years ago
Islam AbdelRahman	13e66a8f51	Fix compaction_job.cc division by zero Summary: Fix division by zero in compaction_job.cc Closes https://github.com/facebook/rocksdb/pull/1575 Differential Revision: D4240818 Pulled By: IslamAbdelRahman fbshipit-source-id: a8bc757	9 years ago
Andrew Kryczka	01eabf7375	Fix double-counted deletion stat Summary: Both the single deletion and the value are included in compaction outputs, so no need to update the stat for the value's deletion yet, otherwise it'd be double-counted. Closes https://github.com/facebook/rocksdb/pull/1574 Differential Revision: D4241181 Pulled By: ajkr fbshipit-source-id: c9aaa15	9 years ago
Andrew Kryczka	7ffb10fc1a	DeleteRange compaction statistics Summary: - "rocksdb.compaction.key.drop.range_del" - number of keys dropped during compaction due to a range tombstone covering them - "rocksdb.compaction.range_del.drop.obsolete" - number of range tombstones dropped due to compaction to bottom level and no snapshot saving them - s/CompactionIteratorStats/CompactionIterationStats/g since this class is no longer specific to CompactionIterator -- it's also updated for range tombstone iteration during compaction - Move the above class into a separate .h file to avoid circular dependency. Closes https://github.com/facebook/rocksdb/pull/1520 Differential Revision: D4187179 Pulled By: ajkr fbshipit-source-id: 10c2103	9 years ago
Mike Kolupaev	236d4c67e9	Less linear search in DBIter::Seek() when keys are overwritten a lot Summary: In one deployment we saw high latencies (presumably from slow iterator operations) and a lot of CPU time reported by perf with this stack: ``` rocksdb::MergingIterator::Next rocksdb::DBIter::FindNextUserEntryInternal rocksdb::DBIter::Seek ``` I think what's happening is: 1. we create a snapshot iterator, 2. we do lots of Put()s for the same key x; this creates lots of entries in memtable, 3. we seek the iterator to a key slightly smaller than x, 4. the seek walks over lots of entries in memtable for key x, skipping them because of high sequence numbers. CC IslamAbdelRahman Closes https://github.com/facebook/rocksdb/pull/1413 Differential Revision: D4083879 Pulled By: IslamAbdelRahman fbshipit-source-id: a83ddae	9 years ago
Siying Dong	cd7c4143d7	Improve Write Stalling System Summary: Current write stalling system has the problem of lacking of positive feedback if the restricted rate is already too low. Users sometimes stack in very low slowdown value. With the diff, we add a positive feedback (increasing the slowdown value) if we recover from slowdown state back to normal. To avoid the positive feedback to keep the slowdown value to be to high, we add issue a negative feedback every time we are close to the stop condition. Experiments show it is easier to reach a relative balance than before. Also increase level0_stop_writes_trigger default from 24 to 32. Since level0_slowdown_writes_trigger default is 20, stop trigger 24 only gives four files as the buffer time to slowdown writes. In order to avoid stop in four files while 20 files have been accumulated, the slowdown value must be very low, which is amost the same as stop. It also doesn't give enough time for the slowdown value to converge. Increase it to 32 will smooth out the system. Closes https://github.com/facebook/rocksdb/pull/1562 Differential Revision: D4218519 Pulled By: siying fbshipit-source-id: 95e4088	9 years ago
Yi Wu	dfb6fe6755	Unified InlineSkipList::Insert algorithm with hinting Summary: This PR is based on nbronson's diff with small modifications to wire it up with existing interface. Comparing to previous version, this approach works better for inserting keys in decreasing order or updating the same key, and impose less restriction to the prefix extractor. ---- Summary from original diff ---- This diff introduces a single InlineSkipList::Insert that unifies the existing sequential insert optimization (prev_), concurrent insertion, and insertion using externally-managed insertion point hints. There's a deep symmetry between insertion hints (cursors) and the concurrent algorithm. In both cases we have partial information from the recent past that is likely but not certain to be accurate. This diff introduces the struct InlineSkipList::Splice, which encodes predecessor and successor information in the same form that was previously only used within a single call to InsertConcurrently. Splice holds information about an insertion point that can be used to levera Closes https://github.com/facebook/rocksdb/pull/1561 Differential Revision: D4217283 Pulled By: yiwu-arbug fbshipit-source-id: 33ee437	9 years ago
Andrew Kryczka	734e4acafb	Eliminate redundant cache lookup with range deletion Summary: When we introduced range deletion block, TableCache::Get() and TableCache::NewIterator() each did two table cache lookups, one for range deletion block iterator and another for getting the table reader to which the Get()/NewIterator() is delegated. This extra cache lookup was very CPU-intensive (about 10% overhead in a read-heavy benchmark). We can avoid it by reusing the Cache::Handle created for range deletion block iterator to get the file reader. Closes https://github.com/facebook/rocksdb/pull/1537 Differential Revision: D4201167 Pulled By: ajkr fbshipit-source-id: d33ffd8	9 years ago
Maysam Yabandeh	182b940e70	Add WriteOptions.no_slowdown Summary: If the WriteOptions.no_slowdown flag is set AND we need to wait or sleep for the write request, then fail immediately with Status::Incomplete(). Closes https://github.com/facebook/rocksdb/pull/1527 Differential Revision: D4191405 Pulled By: maysamyabandeh fbshipit-source-id: 7f3ce3f	9 years ago
Karthikeyan Radhakrishnan	4118e13330	Persistent Cache: Expose stats to user via public API Summary: Exposing persistent cache stats (counters) to the user via public API. Closes https://github.com/facebook/rocksdb/pull/1485 Differential Revision: D4155274 Pulled By: siying fbshipit-source-id: 30a9f50	9 years ago
Andrew Kryczka	fd43ee09da	Range deletion microoptimizations Summary: - Made RangeDelAggregator's InternalKeyComparator member a reference-to-const so we don't need to copy-construct it. Also added InternalKeyComparator to ImmutableCFOptions so we don't need to construct one for each DBIter. - Made MemTable::NewRangeTombstoneIterator and the table readers' NewRangeTombstoneIterator() functions return nullptr instead of NewEmptyInternalIterator to avoid the allocation. Updated callers accordingly. Closes https://github.com/facebook/rocksdb/pull/1548 Differential Revision: D4208169 Pulled By: ajkr fbshipit-source-id: 2fd65cf	9 years ago
Andrew Kryczka	fe349db57b	Remove Arena in RangeDelAggregator Summary: The Arena construction/destruction introduced significant overhead to read-heavy workload just by creating empty vectors for its blocks, so avoid it in RangeDelAggregator. Closes https://github.com/facebook/rocksdb/pull/1547 Differential Revision: D4207781 Pulled By: ajkr fbshipit-source-id: 9d1c130	9 years ago
Andrew Kryczka	3f62215210	Lazily initialize RangeDelAggregator's map and pinning manager Summary: Since a RangeDelAggregator is created for each read request, these heap-allocating member variables were consuming significant CPU (~3% total) which slowed down request throughput. The map and pinning manager are only necessary when range deletions exist, so we can defer their initialization until the first range deletion is encountered. Currently lazy initialization is done for reads only since reads pass us a single snapshot, which is easier to store on the stack for later insertion into the map than the vector passed to us by flush or compaction. Note the Arena member variable is still expensive, I will figure out what to do with it in a subsequent diff. It cannot be lazily initialized because we currently use this arena even to allocate empty iterators, which is necessary even when no range deletions exist. Closes https://github.com/facebook/rocksdb/pull/1539 Differential Revision: D4203488 Pulled By: ajkr fbshipit-source-id: 3b36279	9 years ago
Andrew Kryczka	635a7bd1ad	refactor TableCache Get/NewIterator for single exit points Summary: these functions were too complicated to change with exit points everywhere, so refactored them. btw, please review urgently, this is a prereq to fix the 5.0 perf regression Closes https://github.com/facebook/rocksdb/pull/1534 Differential Revision: D4198972 Pulled By: ajkr fbshipit-source-id: 04ebfb7	9 years ago
Siying Dong	a4eb7387b2	Allow plain table to store index on file with bloom filter disabled Summary: Currently plain table bloom filter is required if storing metadata on file. Remove the constraint. Closes https://github.com/facebook/rocksdb/pull/1525 Differential Revision: D4190977 Pulled By: siying fbshipit-source-id: be60442	9 years ago
Yi Wu	36e4762ce0	Remove Ticker::SEQUENCE_NUMBER Summary: Remove the ticker count because: * Having to reset the ticker count in WriteImpl is ineffiecent; * It doesn't make sense to have it as a ticker count if multiple db instance share a statistics object. Closes https://github.com/facebook/rocksdb/pull/1531 Differential Revision: D4194442 Pulled By: yiwu-arbug fbshipit-source-id: e2110a9	9 years ago
Andrew Kryczka	760ef68a69	fix deleterange asan issue Summary: pinned_iters_mgr_ pins iterators allocated with arena_, so we should order the instance variable declarations such that the pinned iterators have their destructors executed before the arena is destroyed. Closes https://github.com/facebook/rocksdb/pull/1528 Differential Revision: D4191984 Pulled By: ajkr fbshipit-source-id: 1386f20	9 years ago
Siying Dong	972e3ff295	Enable allow_concurrent_memtable_write and enable_write_thread_adaptive_yield by default Summary: Closes https://github.com/facebook/rocksdb/pull/1496 Differential Revision: D4168080 Pulled By: siying fbshipit-source-id: 056ae62	9 years ago
Yi Wu	1543d5d92e	Report memory usage by memtable insert hints map. Summary: It is hard to measure acutal memory usage by std containers. Even providing a custom allocator will miss count some of the usage. Here we only do a wild guess on its memory usage. Closes https://github.com/facebook/rocksdb/pull/1511 Differential Revision: D4179945 Pulled By: yiwu-arbug fbshipit-source-id: 32ab929	9 years ago
Andrew Kryczka	48e8baebc0	Decouple data iterator and range deletion iterator in TableCache Summary: Previously we used TableCache::NewIterator() for multiple purposes (data block iterator and range deletion iterator), and returned non-ok status in the data block iterator. In one case where the caller only used the range deletion block iterator (`9e7cf3469b/db/version_set.cc (L965-L973)`), we didn't check/free the data block iterator containing non-ok status, which caused a valgrind error. So, this diff decouples creation of data block and range deletion block iterators, and updates the callers accordingly. Both functions can return non-ok status in an InternalIterator. Since the non-ok status is returned in an iterator that the callers will definitely use, it should be more usable/less error-prone. Closes https://github.com/facebook/rocksdb/pull/1513 Differential Revision: D4181423 Pulled By: ajkr fbshipit-source-id: 835b8f5	9 years ago
Andrew Kryczka	661e4c9267	DeleteRange unsupported in non-block-based tables Summary: Return an error from DeleteRange() (or Write() if the user is using the low-level WriteBatch API) if an unsupported table type is configured. Closes https://github.com/facebook/rocksdb/pull/1519 Differential Revision: D4185933 Pulled By: ajkr fbshipit-source-id: abcdf84	9 years ago
Andrew Kryczka	489d142808	DeleteRange interface Summary: Expose DeleteRange() interface since we think the implementation is functionally correct now. Closes https://github.com/facebook/rocksdb/pull/1503 Differential Revision: D4171921 Pulled By: ajkr fbshipit-source-id: 5e21c98	9 years ago
Islam AbdelRahman	eba99c28e4	Fix min_write_buffer_number_to_merge = 0 bug Summary: It's possible that we set min_write_buffer_number_to_merge to 0. This should never happen Closes https://github.com/facebook/rocksdb/pull/1515 Differential Revision: D4183356 Pulled By: yiwu-arbug fbshipit-source-id: c9d39d7	9 years ago
Artemiy Kolesnikov	91300d01f6	Dynamic max_total_wal_size option Summary: Closes https://github.com/facebook/rocksdb/pull/1509 Differential Revision: D4176426 Pulled By: yiwu-arbug fbshipit-source-id: b57689d	9 years ago
Andrew Kryczka	ec2f64794b	Consider subcompaction boundaries when updating file boundaries for range deletion Summary: Adjusted AddToBuilder() to take lower_bound and upper_bound, which serve two purposes: (1) only range deletions overlapping with the interval [lower_bound, upper_bound) will be added to the output file, and (2) the output file's boundaries will not be extended before lower_bound or after upper_bound. Our computation of lower_bound/upper_bound consider both subcompaction boundaries and previous/next files within the subcompaction. Test cases are here (level subcompactions: https://gist.github.com/ajkr/63c7eae3e9667c5ebdc0a7efb74ac332, and universal subcompactions: https://gist.github.com/ajkr/5a62af77c4ebe4052a1955c496d51fdb) but can't be included in this diff as they depend on committing the API first. They fail before this change and pass after. Closes https://github.com/facebook/rocksdb/pull/1501 Reviewed By: yhchiang Differential Revision: D4171685 Pulled By: ajkr fbshipit-source-id: ee99db8	9 years ago
Andrew Kryczka	3b192f6186	Handle full final subcompaction output file with range deletions Summary: This conditional should only open a new file that's dedicated to range deletions when it's the sole output of the subcompaction. Previously, we created such a file whenever the table builder was nullptr, which would've also been the case whenever the CompactionIterator's final key coincided with the final output table becoming full. Closes https://github.com/facebook/rocksdb/pull/1507 Differential Revision: D4174613 Pulled By: ajkr fbshipit-source-id: 9ffacea	9 years ago
Andrew Kryczka	6c57952002	Make range deletion inclusive-exclusive Summary: This makes it easier to implement future optimizations like range collapsing. Closes https://github.com/facebook/rocksdb/pull/1504 Differential Revision: D4172214 Pulled By: ajkr fbshipit-source-id: ac4942f	9 years ago
Yi Wu	1ea79a78c9	Optimize sequential insert into memtable - Part 1: Interface Summary: Currently our skip-list have an optimization to speedup sequential inserts from a single stream, by remembering the last insert position. We extend the idea to support sequential inserts from multiple streams, and even tolerate small reordering wihtin each stream. This PR is the interface part adding the following: - Add `memtable_insert_prefix_extractor` to allow specifying prefix for each key. - Add `InsertWithHint()` interface to memtable, to allow underlying implementation to return a hint of insert position, which can be later pass back to optimize inserts. - Memtable will maintain a map from prefix to hints and pass the hint via `InsertWithHint()` if `memtable_insert_prefix_extractor` is non-null. Closes https://github.com/facebook/rocksdb/pull/1419 Differential Revision: D4079367 Pulled By: yiwu-arbug fbshipit-source-id: 3555326	9 years ago
Yi Wu	df5eeb85ca	Optimize sequential insert into memtable - Part 2: Implementation Summary: Implement a insert hint into skip-list to hint insert position. This is to optimize for the write workload where there are multiple stream of sequential writes. For example, there is a stream of keys of a1, a2, a3... but also b1, b2, b2... Each stream are not neccessary strictly sequential, but can get reorder a little bit. User can specify a prefix extractor and the `SkipListRep` can thus maintan a hint for each of the stream for fast insert into memtable. This is the internal implementation part. See #1419 for the interface part. See inline comments for details. Closes https://github.com/facebook/rocksdb/pull/1449 Differential Revision: D4106781 Pulled By: yiwu-arbug fbshipit-source-id: f4d48c4	9 years ago

... 5 6 7 8 9 ...

2865 Commits (8f2598ac9d730bf0a7c08b9cdb6071fd7b73ba3b)