rocksdb

Commit Graph

Author	SHA1	Message	Date
Reid Horuff	5cf176ca15	Fix for 2PC causing WAL to grow too large Summary: Consider the following single column family scenario: prepare in log A commit in log B WAL is too large, flush all CFs to releast log A CFA is on log B so we do not see CFA is depending on log A so no flush is requested To fix this we must also consider the log containing the prepare section when determining what log a CF is dependent on. Closes https://github.com/facebook/rocksdb/pull/1768 Differential Revision: D4403265 Pulled By: reidHoruff fbshipit-source-id: ce800ff	8 years ago
Mike Kolupaev	d18dd2c41f	Abort compactions more reliably when closing DB Summary: DB shutdown aborts running compactions by setting an atomic shutting_down=true that CompactionJob periodically checks. Without this PR it checks it before processing every _output_ value. If compaction filter filters everything out, the compaction is uninterruptible. This PR adds checks for shutting_down on every _input_ value (in CompactionIterator and MergeHelper). There's also some minor code cleanup along the way. Closes https://github.com/facebook/rocksdb/pull/1639 Differential Revision: D4306571 Pulled By: yiwu-arbug fbshipit-source-id: f050890	8 years ago
Maysam Yabandeh	d0ba8ec8f9	Revert "PinnableSlice" Summary: This reverts commit `54d94e9c2c`. The pull request was landed by mistake. Closes https://github.com/facebook/rocksdb/pull/1755 Differential Revision: D4391678 Pulled By: maysamyabandeh fbshipit-source-id: 36d5149	8 years ago
Maysam Yabandeh	54d94e9c2c	PinnableSlice Summary: Currently the point lookup values are copied to a string provided by the user. This incures an extra memcpy cost. This patch allows doing point lookup via a PinnableSlice which pins the source memory location (instead of copying their content) and releases them after the content is consumed by the user. The old API of Get(string) is translated to the new API underneath. Here is the summary for improvements: 1. value 100 byte: 1.8% regular, 1.2% merge values 2. value 1k byte: 11.5% regular, 7.5% merge values 3. value 10k byte: 26% regular, 29.9% merge values The improvement for merge could be more if we extend this approach to pin the merge output and delay the full merge operation until the user actually needs it. We have put that for future work. PS: Sometimes we observe a small decrease in performance when switching from t5452014 to this patch but with the old Get(string) API. The difference is a little and could be noise. More importantly it is safely cancelled Closes https://github.com/facebook/rocksdb/pull/1732 Differential Revision: D4374613 Pulled By: maysamyabandeh fbshipit-source-id: a077f1a	8 years ago
Siying Dong	438f22bc56	Fix bug of Checkpoint loses recent transactions with 2PC Summary: If 2PC is enabled, checkpoint may not copy previous log files that contain uncommitted prepare records. In this diff we keep those files. Closes https://github.com/facebook/rocksdb/pull/1724 Differential Revision: D4368319 Pulled By: siying fbshipit-source-id: cc2c746	8 years ago
Aaron Gao	972f96b3fb	direct io write support Summary: rocksdb direct io support ``` [gzh@dev11575.prn2 ~/rocksdb] ./db_bench -benchmarks=fillseq --num=1000000 Initializing RocksDB Options from the specified file Initializing RocksDB Options from command-line flags RocksDB: version 5.0 Date: Wed Nov 23 13:17:43 2016 CPU: 40 * Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz CPUCache: 25600 KB Keys: 16 bytes each Values: 100 bytes each (50 bytes after compression) Entries: 1000000 Prefix: 0 bytes Keys per prefix: 0 RawSize: 110.6 MB (estimated) FileSize: 62.9 MB (estimated) Write rate: 0 bytes/second Compression: Snappy Memtablerep: skip_list Perf Level: 1 WARNING: Assertions are enabled; benchmarks unnecessarily slow ------------------------------------------------ Initializing RocksDB Options from the specified file Initializing RocksDB Options from command-line flags DB path: [/tmp/rocksdbtest-112628/dbbench] fillseq : 4.393 micros/op 227639 ops/sec; 25.2 MB/s [gzh@dev11575.prn2 ~/roc Closes https://github.com/facebook/rocksdb/pull/1564 Differential Revision: D4241093 Pulled By: lightmark fbshipit-source-id: 98c29e3	8 years ago
Islam AbdelRahman	989e644ed8	Remove sst_file_manager option from LITE Summary: Remove sst_file_manager option from LITE Closes https://github.com/facebook/rocksdb/pull/1690 Differential Revision: D4341331 Pulled By: IslamAbdelRahman fbshipit-source-id: 9f9328d	8 years ago
Daniel Black	bfbcec2339	Gcc 7 error expansion to defined Summary: sorry if these gcc-7/clang-4 cleanups are getting tedious. Closes https://github.com/facebook/rocksdb/pull/1658 Differential Revision: D4318792 Pulled By: yiwu-arbug fbshipit-source-id: 8e85891	8 years ago
Islam AbdelRahman	1a146f89c7	break Flush wait for dropped CF Summary: In FlushJob we dont do the Flush if the CF is dropped https://github.com/facebook/rocksdb/blob/master/db/flush_job.cc#L184-L188 but inside WaitForFlushMemTable we keep waiting forever even if the CF is dropped. Closes https://github.com/facebook/rocksdb/pull/1664 Differential Revision: D4321032 Pulled By: IslamAbdelRahman fbshipit-source-id: 6e2b25d	8 years ago
Islam AbdelRahman	2ba59b5a1e	Disallow ingesting files into dropped CFs Summary: This PR update IngestExternalFile to return an error if we try to ingest a file into a dropped CF. Right now if IngestExternalFile want to flush a memtable, and it's ingesting a file into a dropped CF, it will wait forever since flushing is not possible for the dropped CF Closes https://github.com/facebook/rocksdb/pull/1657 Differential Revision: D4318657 Pulled By: IslamAbdelRahman fbshipit-source-id: ed6ea2b	8 years ago
Islam AbdelRahman	ed8fbdb560	Add EventListener::OnExternalFileIngested() event Summary: Add EventListener::OnExternalFileIngested() to allow user to subscribe to external file ingestion events Closes https://github.com/facebook/rocksdb/pull/1623 Differential Revision: D4285844 Pulled By: IslamAbdelRahman fbshipit-source-id: 0b95a88	8 years ago
Anton Safonov	9053fe2a5c	Made delete_obsolete_files_period_micros option dynamic Summary: Made delete_obsolete_files_period_micros option dynamic. It can be updating using DB::SetDBOptions(). Closes https://github.com/facebook/rocksdb/pull/1595 Differential Revision: D4246569 Pulled By: tonek fbshipit-source-id: d23f560	8 years ago
Maysam Yabandeh	182b940e70	Add WriteOptions.no_slowdown Summary: If the WriteOptions.no_slowdown flag is set AND we need to wait or sleep for the write request, then fail immediately with Status::Incomplete(). Closes https://github.com/facebook/rocksdb/pull/1527 Differential Revision: D4191405 Pulled By: maysamyabandeh fbshipit-source-id: 7f3ce3f	8 years ago
Andrew Kryczka	fe349db57b	Remove Arena in RangeDelAggregator Summary: The Arena construction/destruction introduced significant overhead to read-heavy workload just by creating empty vectors for its blocks, so avoid it in RangeDelAggregator. Closes https://github.com/facebook/rocksdb/pull/1547 Differential Revision: D4207781 Pulled By: ajkr fbshipit-source-id: 9d1c130	8 years ago
Andrew Kryczka	3f62215210	Lazily initialize RangeDelAggregator's map and pinning manager Summary: Since a RangeDelAggregator is created for each read request, these heap-allocating member variables were consuming significant CPU (~3% total) which slowed down request throughput. The map and pinning manager are only necessary when range deletions exist, so we can defer their initialization until the first range deletion is encountered. Currently lazy initialization is done for reads only since reads pass us a single snapshot, which is easier to store on the stack for later insertion into the map than the vector passed to us by flush or compaction. Note the Arena member variable is still expensive, I will figure out what to do with it in a subsequent diff. It cannot be lazily initialized because we currently use this arena even to allocate empty iterators, which is necessary even when no range deletions exist. Closes https://github.com/facebook/rocksdb/pull/1539 Differential Revision: D4203488 Pulled By: ajkr fbshipit-source-id: 3b36279	8 years ago
Yi Wu	36e4762ce0	Remove Ticker::SEQUENCE_NUMBER Summary: Remove the ticker count because: * Having to reset the ticker count in WriteImpl is ineffiecent; * It doesn't make sense to have it as a ticker count if multiple db instance share a statistics object. Closes https://github.com/facebook/rocksdb/pull/1531 Differential Revision: D4194442 Pulled By: yiwu-arbug fbshipit-source-id: e2110a9	8 years ago
Andrew Kryczka	489d142808	DeleteRange interface Summary: Expose DeleteRange() interface since we think the implementation is functionally correct now. Closes https://github.com/facebook/rocksdb/pull/1503 Differential Revision: D4171921 Pulled By: ajkr fbshipit-source-id: 5e21c98	8 years ago
Artemiy Kolesnikov	91300d01f6	Dynamic max_total_wal_size option Summary: Closes https://github.com/facebook/rocksdb/pull/1509 Differential Revision: D4176426 Pulled By: yiwu-arbug fbshipit-source-id: b57689d	8 years ago
Lijun Tang	adb665e0bf	Allowed delayed_write_rate option to be dynamically set. Summary: Closes https://github.com/facebook/rocksdb/pull/1488 Differential Revision: D4157784 Pulled By: siying fbshipit-source-id: f150081	8 years ago
Maysam Yabandeh	361010d447	Exporting compaction stats in the form of a map Summary: Currently the compaction stats are printed to stdout. We want to export the compaction stats in a map format so that the upper layer apps (e.g., MySQL) could present the stats in any format required by the them. Closes https://github.com/facebook/rocksdb/pull/1477 Differential Revision: D4149836 Pulled By: maysamyabandeh fbshipit-source-id: b3df19f	8 years ago
Reid Horuff	1ca5f6d132	Fix 2PC Recovery SeqId Miscount Summary: Originally sequence ids were calculated, in recovery, based off of the first seqid found if the first log recovered. The working seqid was then incremented from that value based on every insertion that took place. This was faulty because of the potential for missing log files or inserts that skipped the WAL. The current recovery scheme grabs sequence from current recovering batch and increments using memtableinserter to track how many actual inserts take place. This works for 2PC batches as well scenarios where some logs are missing or inserts that skip the WAL. Closes https://github.com/facebook/rocksdb/pull/1486 Differential Revision: D4156064 Pulled By: reidHoruff fbshipit-source-id: a6da8d9	8 years ago
Andrew Kryczka	c90fef88b1	fix open failure with empty wal Summary: Closes https://github.com/facebook/rocksdb/pull/1490 Differential Revision: D4158821 Pulled By: IslamAbdelRahman fbshipit-source-id: 59b73f4	8 years ago
Reid Horuff	d133b08f68	Use correct sequence number when creating memtable Summary: copied from: `5ebfd2623a` Opening existing RocksDB attempts recovery from log files, which uses wrong sequence number to create the memtable. This is a regression introduced in change `a400336`. This change includes a test demonstrating the problem, without the fix the test fails with "Operation failed. Try again.: Transaction could not check for conflicts for operation at SequenceNumber 1 as the MemTable only contains changes newer than SequenceNumber 2. Increasing the value of the max_write_buffer_number_to_maintain option could reduce the frequency of this error" This change is a joint effort by Peter 'Stig' Edwards thatsafunnyname and me. Closes https://github.com/facebook/rocksdb/pull/1458 Differential Revision: D4143791 Pulled By: reidHoruff fbshipit-source-id: 5a25033	8 years ago
Islam AbdelRahman	9bd191d2f4	Fix deadlock between (WriterThread/Compaction/IngestExternalFile) Summary: A deadlock is possible if this happen (1) Writer thread is stopped because it's waiting for compaction to finish (2) Compaction is waiting for current IngestExternalFile() calls to finish (3) IngestExternalFile() is waiting to be able to acquire the writer thread (4) WriterThread is held by stopped writes that are waiting for compactions to finish This patch fix the issue by not incrementing num_running_ingest_file_ except when we acquire the writer thread. This patch include a unittest to reproduce the described scenario Closes https://github.com/facebook/rocksdb/pull/1480 Differential Revision: D4151646 Pulled By: IslamAbdelRahman fbshipit-source-id: 09b39db	8 years ago
Andrew Kryczka	9e7cf3469b	DeleteRange user iterator support Summary: Note: reviewed in https://reviews.facebook.net/D65115 - DBIter maintains a range tombstone accumulator. We don't cleanup obsolete tombstones yet, so if the user seeks back and forth, the same tombstones would be added to the accumulator multiple times. - DBImpl::NewInternalIterator() (used to make DBIter's underlying iterator) adds memtable/L0 range tombstones, L1+ range tombstones are added on-demand during NewSecondaryIterator() (see D62205) - DBIter uses ShouldDelete() when advancing to check whether keys are covered by range tombstones Closes https://github.com/facebook/rocksdb/pull/1464 Differential Revision: D4131753 Pulled By: ajkr fbshipit-source-id: be86559	8 years ago
Andrew Kryczka	f998c9790f	DeleteRange Get support Summary: During Get()/MultiGet(), build up a RangeDelAggregator with range tombstones as we search through live memtable, immutable memtables, and SST files. This aggregator is then used by memtable.cc's SaveValue() and GetContext::SaveValue() to check whether keys are covered. added tests for Get on memtables/files; end-to-end tests mainly in https://reviews.facebook.net/D64761 Closes https://github.com/facebook/rocksdb/pull/1456 Differential Revision: D4111271 Pulled By: ajkr fbshipit-source-id: 6e388d4	8 years ago
Yi Wu	437942e481	Add avoid_flush_during_shutdown DB option Summary: Add avoid_flush_during_shutdown DB option. Closes https://github.com/facebook/rocksdb/pull/1451 Differential Revision: D4108643 Pulled By: yiwu-arbug fbshipit-source-id: abdaf4d	8 years ago
Andrew Kryczka	40a2e406f8	DeleteRange flush support Summary: Changed BuildTable() (used for flush) to (1) add range tombstones to the aggregator, which is used by CompactionIterator to determine which keys can be removed; and (2) add aggregator's range tombstones to the table that is output for the flush. Closes https://github.com/facebook/rocksdb/pull/1438 Differential Revision: D4100025 Pulled By: ajkr fbshipit-source-id: cb01a70	8 years ago
Siying Dong	da61f348d3	Print compression and Fast CRC support info as Header level Summary: Currently the compression suppport and fast CRC support information is printed as info level. They should be in the same level as options, which is header level. Also add ZSTD to this printing. Closes https://github.com/facebook/rocksdb/pull/1448 Differential Revision: D4106608 Pulled By: yiwu-arbug fbshipit-source-id: cb9a076	8 years ago
Siying Dong	c90c48d3c8	Show More DB Stats in info logs Summary: DB Stats now are truncated if there are too many CFs. Extend the buffer size to allow more to be printed out. Also, separate out malloc to another log line. Closes https://github.com/facebook/rocksdb/pull/1439 Differential Revision: D4100943 Pulled By: yiwu-arbug fbshipit-source-id: 79f7218	8 years ago
Siying Dong	04751d5345	L0 compression should follow options.compression_per_level if not empty Summary: Currently, we don't use options.compression_per_level[0] as the compression style for L0 compression type, unless it is None. This behavior doesn't look like on purpose. This diff will make sure L0 compress using the style of options.compression_per_level[0]. Reviewed and accepted in: https://reviews.facebook.net/D65607 Closes https://github.com/facebook/rocksdb/pull/1435 Differential Revision: D4099368 Pulled By: siying fbshipit-source-id: cfbbdcd	8 years ago
Aaron Gao	9de2f75216	revert Prev() in MergingIterator to use previous code in non-prefix-seek mode Summary: Siying suggested to keep old code for normal mode prev() for safety Test Plan: make check -j64 Reviewers: yiwu, andrewkr, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D65439	8 years ago
Aaron Gao	59a7c0337b	Change ioptions to store user_comparator, fix bug Summary: change ioptions.comparator to user_comparator instread of internal_comparator. Also change Comparator* to InternalKeyComparator* to make its type explicitly. Test Plan: make all check -j64 Reviewers: andrewkr, sdong, yiwu Reviewed By: yiwu Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D65121	8 years ago
Islam AbdelRahman	869ae5d786	Support IngestExternalFile (remove AddFile restrictions) Summary: Changes in the diff API changes: - Introduce IngestExternalFile to replace AddFile (I think this make the API more clear) - Introduce IngestExternalFileOptions (This struct will encapsulate the options for ingesting the external file) - Deprecate AddFile() API Logic changes: - If our file overlap with the memtable we will flush the memtable - We will find the first level in the LSM tree that our file key range overlap with the keys in it - We will find the lowest level in the LSM tree above the the level we found in step 2 that our file can fit in and ingest our file in it - We will assign a global sequence number to our new file - Remove AddFile restrictions by using global sequence numbers Other changes: - Refactor all AddFile logic to be encapsulated in ExternalSstFileIngestionJob Test Plan: unit tests (still need to add more) addfile_stress (https://reviews.facebook.net/D65037) Reviewers: yiwu, andrewkr, lightmark, yhchiang, sdong Reviewed By: sdong Subscribers: jkedgar, hcz, andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D65061	8 years ago
Andrew Kryczka	f47054015d	Handle WAL deletion when using avoid_flush_during_recovery Summary: Previously the WAL files that were avoided during recovery would never be considered for deletion. That was because alive_log_files_ was only populated when log files are created. This diff further populates alive_log_files_ with existing log files that aren't flushed during recovery, such that FindObsoleteFiles() can find them later. Depends on D64053. Test Plan: new unit test, verifies it fails before this change and passes after Reviewers: sdong, IslamAbdelRahman, yiwu Reviewed By: yiwu Subscribers: leveldb, dhruba, andrewkr Differential Revision: https://reviews.facebook.net/D64059	8 years ago
Yi Wu	e29d3b67c2	Make max_background_compactions and base_background_compactions dynamic changeable Summary: Add DB::SetDBOptions to dynamic change max_background_compactions and base_background_compactions. I'll add more dynamic changeable options soon. Test Plan: unit test. Reviewers: yhchiang, IslamAbdelRahman, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D64749	8 years ago
Islam AbdelRahman	5691a1d8a4	Fix compaction conflict with running compaction Summary: Issue scenario: (1) We have 3 files in L1 and we issue a compaction that will compact them into 1 file in L2 (2) While compaction (1) is running, we flush a file into L0 and trigger another compaction that decide to move this file to L1 and then move it again to L2 (this file don't overlap with any other files) (3) compaction (1) finishes and install the file it generated in L2, but this file overlap with the file we generated in (2) so we break the LSM consistency Looks like this issue can be triggered by using non-exclusive manual compaction or AddFile() Test Plan: unit tests Reviewers: sdong Reviewed By: sdong Subscribers: hermanlee4, jkedgar, andrewkr, dhruba, yoshinorim Differential Revision: https://reviews.facebook.net/D64947	8 years ago
Andrew Kryczka	017de666c7	fixup commit Summary: I accidentally left out these changes from my commit of D64053 due to messing up the merge conflict resolution. Test Plan: ./db_wal_test Reviewers: Subscribers: Tasks: Blame Revision: D64053	8 years ago
Andrew Kryczka	1b7af5fb1a	Redo handling of recycled logs in full purge Summary: This reverts commit `9e4aa798c3`, which doesn't handle all cases (see inline comment). I reimplemented the logic as suggested in the initial PR: https://github.com/facebook/rocksdb/pull/1313. This approach has two benefits: - All the parsing/filtering of full_scan_candidate_files is kept together in PurgeObsoleteFiles. - We only need to check whether log file is recycled in one place where we've already determined it's a log file Test Plan: new unit test, verified fails before the original fix, still passes now. Reviewers: IslamAbdelRahman, yiwu, sdong Reviewed By: yiwu, sdong Subscribers: leveldb, dhruba, andrewkr Differential Revision: https://reviews.facebook.net/D64053	8 years ago
Aaron Gao	447f17127c	new Prev() prefix support using SeekForPrev() Summary: 1) The previous solution for Prev() prefix support is not clean. Since I add api SeekForPrev(), now the Prev() can be symmetric to Next(). and we do not need SeekToLast() to be called in Prev() any more. Also, Next() will Seek(prefix_seek_key_) to solve the problem of possible inconsistency between db_iter and merge_iter when there is merge_operator. And prefix_seek_key is only refreshed when change direction to forward. 2) This diff also solves the bug of Iterator::SeekToLast() with iterate_upper_bound_ with prefix extractor. add test cases for the above two cases. There are some tests for the SeekToLast() in Prev(), I will clean them later. Test Plan: make all check Reviewers: IslamAbdelRahman, andrewkr, yiwu, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D63933	8 years ago
Reid Horuff	2c1f95291d	Add facility to write only a portion of WriteBatch to WAL Summary: When constructing a write batch a client may now call MarkWalTerminationPoint() on that batch. No batch operations after this call will be added written to the WAL but will still be inserted into the Memtable. This facility is used to remove one of the three WriteImpl calls in 2PC transactions. This produces a ~1% perf improvement. ``` RocksDB - unoptimized 2pc, sync_binlog=1, disable_2pc=off INFO 2016-08-31 14:30:38,814 [main]: REQUEST PHASE COMPLETED. 75000000 requests done in 2619 seconds. Requests/second = 28628 RocksDB - optimized 2pc , sync_binlog=1, disable_2pc=off INFO 2016-08-31 16:26:59,442 [main]: REQUEST PHASE COMPLETED. 75000000 requests done in 2581 seconds. Requests/second = 29054 ``` Test Plan: Two unit tests added. Reviewers: sdong, yiwu, IslamAbdelRahman Reviewed By: yiwu Subscribers: hermanlee4, dhruba, andrewkr Differential Revision: https://reviews.facebook.net/D64599	8 years ago
Islam AbdelRahman	87dfc1d23e	Fix conflict between AddFile() and CompactRange() Summary: Fix the conflict bug between AddFile() and CompactRange() by - Make sure that no AddFile calls are running when asking CompactionPicker to pick compaction for manual compaction - If AddFile() run after we pick the compaction for the manual compaction it will be aware of it since we will add the manual compaction to running_compactions_ after picking it This will solve these 2 scenarios - If AddFile() is running, we will wait for it to finish before we pick a compaction for the manual compaction - If we already picked a manual compaction and then AddFile() started ... we ensure that it never ingest a file in a level that will overlap with the manual compaction Test Plan: unit tests Reviewers: sdong Reviewed By: sdong Subscribers: andrewkr, yoshinorim, jkedgar, dhruba Differential Revision: https://reviews.facebook.net/D64449	8 years ago
yiwu-arbug	eb3894cf42	Recompute compaction score on SetOptions (#1346 ) Summary: We didn't recompute compaction score on SetOptions, and end up not having compaction if no flush happens afterward. The PR fixing it. Test Plan: See unit test. Subscribers: andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D64167	8 years ago
Islam AbdelRahman	5c64fb67d2	Fix AddFile() conflict with compaction output [WaitForAddFile()] Summary: Since AddFile unlock/lock the mutex inside LogAndApply() we need to ensure that during this period other compactions cannot run since such compactions are not aware of the file we are ingesting and could create a compaction that overlap wit this file this diff add - WaitForAddFile() call that will ensure that no AddFile() calls are being processed right now - Call `WaitForAddFile()` in 3 locations -- When doing manual Compaction -- When starting automatic Compaction -- When doing CompactFiles() Test Plan: unit test Reviewers: lightmark, yiwu, andrewkr, sdong Reviewed By: sdong Subscribers: andrewkr, yoshinorim, jkedgar, dhruba Differential Revision: https://reviews.facebook.net/D64383	8 years ago
Yi Wu	9ed928e7a9	Split DBOptions into ImmutableDBOptions and MutableDBOptions Summary: Use ImmutableDBOptions/MutableDBOptions internally and DBOptions only for user-facing APIs. MutableDBOptions is barely a placeholder for now. I'll start to move options to MutableDBOptions in following diffs. Test Plan: make all check Reviewers: yhchiang, IslamAbdelRahman, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D64065	8 years ago
yiwu-arbug	4bc8c88e6b	Recover same sequence id from WAL (#1350 ) Summary: Revert the behavior where we don't read sequence id from WAL, but increase it as we replay the log. We still keep the behave for 2PC for now but will fix later. This change fixes github issue 1339, where some writes come with WAL disabled and we may recover records with wrong sequence id. Test Plan: Added unit test. Subscribers: andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D64275	8 years ago
Aaron Gao	715256338a	forbid merge during recovery Summary: Mitigate regression bug of options.max_successive_merges hit during DB Recovery For https://reviews.facebook.net/D62625 Test Plan: make all check Reviewers: horuff, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D62655	8 years ago
Yi Wu	e4d3f5d9b8	Fix DBImpl::GetWalPreallocateBlockSize Mac build error Summary: Specify type param with std::min to resolve compile error on Mac. Test Plan: https://travis-ci.org/facebook/rocksdb/builds/161223845 Reviewers: sdong Reviewed By: sdong Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D64143	8 years ago
sdong	d78a4401b5	DBImpl::GetWalPreallocateBlockSize() should return size_t Summary: WritableFile::SetPreallocationBlockSize() requires parameter as size_t, and options used in DBImpl::GetWalPreallocateBlockSize() are all size_t. WritableFile::SetPreallocationBlockSize() should return size_t to avoid build break if size_t is not uint64_t. Test Plan: Run existing tests. Reviewers: andrewkr, IslamAbdelRahman, yiwu Reviewed By: yiwu Subscribers: leveldb, andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D64137	8 years ago
sdong	b666f85445	Consider more factors when determining preallocation size of WAL files Summary: Currently the WAL file preallocation size is 1.1 * write_buffer_size. This, however, will be over-estimated if options.db_write_buffer_size or options.max_total_wal_size is set and is much smaller. Test Plan: Add a unit test. Reviewers: andrewkr, yiwu Reviewed By: yiwu Subscribers: leveldb, andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D63957	8 years ago

1 2 3 4 5 ...

1046 Commits (f7bb1a0060cfaeefefac9a32734956653ea7e35c)