rocksdb

Commit Graph

Author	SHA1	Message	Date
Maysam Yabandeh	857adf388f	WritePrepared Txn: Refactor conf params Summary: Summary of changes: - Move seq_per_batch out of Options - Rename concurrent_prepare to two_write_queues - Add allocate_seq_only_for_data_ Closes https://github.com/facebook/rocksdb/pull/3136 Differential Revision: D6304458 Pulled By: maysamyabandeh fbshipit-source-id: 08e685bfa82bbc41b5b1c5eb7040a8ca6e05e58c	7 years ago
Zhongyi Xie	1d6700f9e6	Add test kPointInTimeRecoveryCFConsistency Summary: Context/problem: - CFs may be flushed at different times - A WAL can only be deleted after all CFs have flushed beyond end of that WAL. - Point-in-time recovery might stop upon reaching the first corruption. - Some CFs may have already flushed beyond that point, while others haven't. We should fail the Open() instead of proceeding with inconsistent CFs. Closes https://github.com/facebook/rocksdb/pull/2900 Differential Revision: D5863281 Pulled By: miasantreble fbshipit-source-id: 180dbaf83d96c804cff49b3c406312a4ae61313e	7 years ago
Maysam Yabandeh	c57050b770	Use the default copy constructor in Options Summary: Our current implementation of (semi-)copy constructor of DBOptions and ColumnFamilyOptions seems to intend value by value copy, which is what the default copy constructor does anyway. Moreover not using the default constructor has the risk of forgetting to add newly added options. As an example, allow_2pc seems to be forgotten in the copy constructor which was causing one of the unit tests not seeing its effect. Closes https://github.com/facebook/rocksdb/pull/2888 Differential Revision: D5846368 Pulled By: maysamyabandeh fbshipit-source-id: 1ee92a2aeae93886754b7bc039c3411ea2458683	7 years ago
Siying Dong	3c327ac2d0	Change RocksDB License Summary: Closes https://github.com/facebook/rocksdb/pull/2589 Differential Revision: D5431502 Pulled By: siying fbshipit-source-id: 8ebf8c87883daa9daa54b2303d11ce01ab1f6f75	7 years ago
Maysam Yabandeh	499ebb3ab5	Optimize for serial commits in 2PC Summary: Throughput: 46k tps in our sysbench settings (filling the details later) The idea is to have the simplest change that gives us a reasonable boost in 2PC throughput. Major design changes: 1. The WAL file internal buffer is not flushed after each write. Instead it is flushed before critical operations (WAL copy via fs) or when FlushWAL is called by MySQL. Flushing the WAL buffer is also protected via mutex_. 2. Use two sequence numbers: last seq, and last seq for write. Last seq is the last visible sequence number for reads. Last seq for write is the next sequence number that should be used to write to WAL/memtable. This allows to have a memtable write be in parallel to WAL writes. 3. BatchGroup is not used for writes. This means that we can have parallel writers which changes a major assumption in the code base. To accommodate for that i) allow only 1 WriteImpl that intends to write to memtable via mem_mutex_--which is fine since in 2PC almost all of the memtable writes come via group commit phase which is serial anyway, ii) make all the parts in the code base that assumed to be the only writer (via EnterUnbatched) to also acquire mem_mutex_, iii) stat updates are protected via a stat_mutex_. Note: the first commit has the approach figured out but is not clean. Submitting the PR anyway to get the early feedback on the approach. If we are ok with the approach I will go ahead with this updates: 0) Rebase with Yi's pipelining changes 1) Currently batching is disabled by default to make sure that it will be consistent with all unit tests. Will make this optional via a config. 2) A couple of unit tests are disabled. They need to be updated with the serial commit of 2PC taken into account. 3) Replacing BatchGroup with mem_mutex_ got a bit ugly as it requires releasing mutex_ beforehand (the same way EnterUnbatched does). This needs to be cleaned up. Closes https://github.com/facebook/rocksdb/pull/2345 Differential Revision: D5210732 Pulled By: maysamyabandeh fbshipit-source-id: 78653bd95a35cd1e831e555e0e57bdfd695355a4	7 years ago
Siying Dong	d616ebea23	Add GPLv2 as an alternative license. Summary: Closes https://github.com/facebook/rocksdb/pull/2226 Differential Revision: D4967547 Pulled By: siying fbshipit-source-id: dd3b58ae1e7a106ab6bb6f37ab5c88575b125ab4	8 years ago
Siying Dong	d2dce5611a	Move some files under util/ to separate dirs Summary: Move some files under util/ to new directories env/, monitoring/ options/ and cache/ Closes https://github.com/facebook/rocksdb/pull/2090 Differential Revision: D4833681 Pulled By: siying fbshipit-source-id: 2fd8bef	8 years ago
Siying Dong	d5b607a43f	Make db_wal_test slightly faster Summary: Avoid to run db_wal_test in all the DB test options, and some small changes. Closes https://github.com/facebook/rocksdb/pull/1921 Differential Revision: D4622054 Pulled By: siying fbshipit-source-id: 890fd64	8 years ago
Dmitri Smirnov	0a4cdde50a	Windows thread Summary: introduce new methods into a public threadpool interface, - allow submission of std::functions as they allow greater flexibility. - add Joining methods to the implementation to join scheduled and submitted jobs with an option to cancel jobs that did not start executing. - Remove ugly `#ifdefs` between pthread and std implementation, make it uniform. - introduce pimpl for a drop in replacement of the implementation - Introduce rocksdb::port::Thread typedef which is a replacement for std::thread. On Posix Thread defaults as before std::thread. - Implement WindowsThread that allocates memory in a more controllable manner than windows std::thread with a replaceable implementation. - should be no functionality changes. Closes https://github.com/facebook/rocksdb/pull/1823 Differential Revision: D4492902 Pulled By: siying fbshipit-source-id: c74cb11	8 years ago
Reid Horuff	1ca5f6d132	Fix 2PC Recovery SeqId Miscount Summary: Originally sequence ids were calculated, in recovery, based off of the first seqid found if the first log recovered. The working seqid was then incremented from that value based on every insertion that took place. This was faulty because of the potential for missing log files or inserts that skipped the WAL. The current recovery scheme grabs sequence from current recovering batch and increments using memtableinserter to track how many actual inserts take place. This works for 2PC batches as well scenarios where some logs are missing or inserts that skip the WAL. Closes https://github.com/facebook/rocksdb/pull/1486 Differential Revision: D4156064 Pulled By: reidHoruff fbshipit-source-id: a6da8d9	8 years ago
Andrew Kryczka	f47054015d	Handle WAL deletion when using avoid_flush_during_recovery Summary: Previously the WAL files that were avoided during recovery would never be considered for deletion. That was because alive_log_files_ was only populated when log files are created. This diff further populates alive_log_files_ with existing log files that aren't flushed during recovery, such that FindObsoleteFiles() can find them later. Depends on D64053. Test Plan: new unit test, verifies it fails before this change and passes after Reviewers: sdong, IslamAbdelRahman, yiwu Reviewed By: yiwu Subscribers: leveldb, dhruba, andrewkr Differential Revision: https://reviews.facebook.net/D64059	8 years ago
Andrew Kryczka	1b7af5fb1a	Redo handling of recycled logs in full purge Summary: This reverts commit `9e4aa798c3`, which doesn't handle all cases (see inline comment). I reimplemented the logic as suggested in the initial PR: https://github.com/facebook/rocksdb/pull/1313. This approach has two benefits: - All the parsing/filtering of full_scan_candidate_files is kept together in PurgeObsoleteFiles. - We only need to check whether log file is recycled in one place where we've already determined it's a log file Test Plan: new unit test, verified fails before the original fix, still passes now. Reviewers: IslamAbdelRahman, yiwu, sdong Reviewed By: yiwu, sdong Subscribers: leveldb, dhruba, andrewkr Differential Revision: https://reviews.facebook.net/D64053	8 years ago
Reid Horuff	2c1f95291d	Add facility to write only a portion of WriteBatch to WAL Summary: When constructing a write batch a client may now call MarkWalTerminationPoint() on that batch. No batch operations after this call will be added written to the WAL but will still be inserted into the Memtable. This facility is used to remove one of the three WriteImpl calls in 2PC transactions. This produces a ~1% perf improvement. ``` RocksDB - unoptimized 2pc, sync_binlog=1, disable_2pc=off INFO 2016-08-31 14:30:38,814 [main]: REQUEST PHASE COMPLETED. 75000000 requests done in 2619 seconds. Requests/second = 28628 RocksDB - optimized 2pc , sync_binlog=1, disable_2pc=off INFO 2016-08-31 16:26:59,442 [main]: REQUEST PHASE COMPLETED. 75000000 requests done in 2581 seconds. Requests/second = 29054 ``` Test Plan: Two unit tests added. Reviewers: sdong, yiwu, IslamAbdelRahman Reviewed By: yiwu Subscribers: hermanlee4, dhruba, andrewkr Differential Revision: https://reviews.facebook.net/D64599	8 years ago
Yi Wu	9ed928e7a9	Split DBOptions into ImmutableDBOptions and MutableDBOptions Summary: Use ImmutableDBOptions/MutableDBOptions internally and DBOptions only for user-facing APIs. MutableDBOptions is barely a placeholder for now. I'll start to move options to MutableDBOptions in following diffs. Test Plan: make all check Reviewers: yhchiang, IslamAbdelRahman, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D64065	8 years ago
yiwu-arbug	4bc8c88e6b	Recover same sequence id from WAL (#1350 ) Summary: Revert the behavior where we don't read sequence id from WAL, but increase it as we replay the log. We still keep the behave for 2PC for now but will fix later. This change fixes github issue 1339, where some writes come with WAL disabled and we may recover records with wrong sequence id. Test Plan: Added unit test. Subscribers: andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D64275	8 years ago
sdong	b666f85445	Consider more factors when determining preallocation size of WAL files Summary: Currently the WAL file preallocation size is 1.1 * write_buffer_size. This, however, will be over-estimated if options.db_write_buffer_size or options.max_total_wal_size is set and is much smaller. Test Plan: Add a unit test. Reviewers: andrewkr, yiwu Reviewed By: yiwu Subscribers: leveldb, andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D63957	8 years ago
Yi Wu	40cfa3e021	Fix DBWALTest.RecoveryWithLogDataForSomeCFs with mac Summary: Seems there's no std::array on mac+clang. Use raw array instead. Test Plan: run ./db_wal_test on mac. Reviewers: andrewkr Reviewed By: andrewkr Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D64005	8 years ago
Andrew Kryczka	06b4785fec	Fix recovery for WALs without data for all CFs Summary: if one or more CFs had no data in the WAL, the log number that's used by FindObsoleteFiles() wasn't updated. We need to treat this case the same as if the data for that WAL had been flushed. Test Plan: new unit test Reviewers: IslamAbdelRahman, yiwu, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D63963	8 years ago
Andrew Kryczka	d7242ff4d5	Fix GetSortedWalFiles when log recycling enabled Summary: Previously the sequence number was mistakenly passed in an argument where the log number should go. This caused the reader to assume the old WAL format was used, which is incompatible with the WAL recycling format. Test Plan: new unit test, verified it fails before this change and passes afterwards. Reviewers: yiwu, lightmark, IslamAbdelRahman, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D63987	8 years ago
sdong	32df9733d1	Add options.write_buffer_manager: control total memtable size across DB instances Summary: Add option write_buffer_manager to help users control total memory spent on memtables across multiple DB instances. Test Plan: Add a new unit test. Reviewers: yhchiang, IslamAbdelRahman Reviewed By: IslamAbdelRahman Subscribers: adela, benj, sumeet, muthu, leveldb, andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D59925	8 years ago
charsyam	4f2b0946d1	fix simple typos (#1183 )	8 years ago
Yi Wu	bc8af90e8c	add option to not flush memtable on open() Summary: Add option to not flush memtable on open() In case the option is enabled, don't delete existing log files by not updating log numbers to MANIFEST. Will still flush if we need to (e.g. memtable full in the middle). In that case we also flush final memtable. If wal_recovery_mode = kPointInTimeRecovery, do not halt immediately after encounter corruption. Instead, check if seq id of next log file is last_log_sequence + 1. In that case we continue recovery. Test Plan: See unit test. Reviewers: dhruba, horuff, sdong Reviewed By: sdong Subscribers: benj, yhchiang, andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D57813	9 years ago
Dmitri Smirnov	ee221d2de0	Introduce XPRESS compresssion on Windows. (#1081 ) Comparable with Snappy on comp ratio. Implemented using Windows API, does not require external package. Avaiable since Windows 8 and server 2012. Use -DXPRESS=1 with CMake to enable.	9 years ago
Yi Wu	792762c42c	Split db_test.cc Summary: Split db_test.cc into several files. Moving several helper functions into DBTestBase. Test Plan: make check Reviewers: sdong, yhchiang, IslamAbdelRahman Reviewed By: IslamAbdelRahman Subscribers: dhruba, andrewkr, kradhakrishnan, yhchiang, leveldb, sdong Differential Revision: https://reviews.facebook.net/D56715	9 years ago
Baraa Hamodi	21e95811d1	Updated all copyright headers to the new format.	9 years ago
Dmitri Smirnov	3c750b59ae	No need to #ifdef test only code on windows	9 years ago
Venkatesh Radhakrishnan	f1fdf5205b	Clean up dependency: Move db_test_util.* to db directory Summary: As part of tech debt week, we are cleaning up dependencies. This diff moves db_test_util.[h,cc] from util to db directory. Test Plan: make check Reviewers: igor, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D48543	9 years ago
sdong	7ccd1c80a7	Add two unit tests for SyncWAL() Summary: Add two unit tests for SyncWAL(). One makes sure SyncWAL() doesn't block writes in the other thread. Another one makes sure SyncWAL() doesn't wait ongoing writes to finish before being executed. Create a new test file db_wal_test and move two WAL related tests from db_test to here. Test Plan: Run the new tests Reviewers: IslamAbdelRahman, rven, kradhakrishnan, kolmike, tnovak, yhchiang Reviewed By: yhchiang Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D43605	9 years ago

28 Commits (e27f60b1c867b0b60dbff26d7a35777e6bb9f14b)