rocksdb

Commit Graph

Author	SHA1	Message	Date
Phani Shekhar Mantripragada	446b32cfc3	Support for Column family specific paths. Summary: In this change, an option to set different paths for different column families is added. This option is set via cf_paths setting of ColumnFamilyOptions. This option will work in a similar fashion to db_paths setting. Cf_paths is a vector of Dbpath values which contains a pair of the absolute path and target size. Multiple levels in a Column family can go to different paths if cf_paths has more than one path. To maintain backward compatibility, if cf_paths is not specified for a column family, db_paths setting will be used. Note that, if db_paths setting is also not specified, RocksDB already has code to use db_name as the only path. Changes : 1) A new member "cf_paths" is added to ImmutableCfOptions. This is set, based on cf_paths setting of ColumnFamilyOptions and db_paths setting of ImmutableDbOptions. This member is used to identify the path information whenever files are accessed. 2) Validation checks are added for cf_paths setting based on existing checks for db_paths setting. 3) DestroyDB, PurgeObsoleteFiles etc. are edited to support multiple cf_paths. 4) Unit tests are added appropriately. Closes https://github.com/facebook/rocksdb/pull/3102 Differential Revision: D6951697 Pulled By: ajkr fbshipit-source-id: 60d2262862b0a8fd6605b09ccb0da32bb331787d	8 years ago
Maysam Yabandeh	67182678a5	Stats for false positive rate of full filtesr Summary: Adds two stats to allow us measuring the false positive rate of full filters: - The total count of positives: rocksdb.bloom.filter.full.positive - The total count of true positives: rocksdb.bloom.filter.full.true.positive Not the term "full" in the stat name to indicate that they are meaningful in full filters. block-based filters are to be deprecated soon and supporting it is not worth the the additional cost of if-then-else branches. Closes #3680 Tested by: $ ./db_bench -benchmarks=fillrandom -db /dev/shm/rocksdb-tmpdb --num=1000000 -bloom_bits=10 $ ./db_bench -benchmarks="readwhilewriting" -db /dev/shm/rocksdb-tmpdb --statistics -bloom_bits=10 --duration=60 --num=2000000 --use_existing_db 2>&1 > /tmp/full.log $ grep filter.full /tmp/full.log rocksdb.bloom.filter.full.positive COUNT : 3628593 rocksdb.bloom.filter.full.true.positive COUNT : 3536026 which gives the false positive rate of 2.5% Closes https://github.com/facebook/rocksdb/pull/3681 Differential Revision: D7517570 Pulled By: maysamyabandeh fbshipit-source-id: 630ab1a473afdce404916d297035b6318de4c052	8 years ago
Yi Wu	685912d07f	Clock cache should check if deleter is nullptr before calling it Summary: Clock cache should check if deleter is nullptr before calling it. Closes https://github.com/facebook/rocksdb/pull/3677 Differential Revision: D7493602 Pulled By: yiwu-arbug fbshipit-source-id: 4f94b188d2baf2cbc7c0d5da30fea1215a683de4	8 years ago
Dmitri Smirnov	147dfc7bdf	Fix pre_release callback argument list. Summary: Primitive types constness does not affect the signature of the method and has no influence on whether the overriding method would actually have that const bool instead of just bool. In addition, it is rarely useful but does produce a compatibility warnings in VS 2015 compiler. Closes https://github.com/facebook/rocksdb/pull/3663 Differential Revision: D7475739 Pulled By: ajkr fbshipit-source-id: fb275378b5acc397399420ae6abb4b6bfe5bd32f	8 years ago
Yi Wu	36a9f22931	Blob DB: blob_dump to show uncompressed values Summary: Make blob_dump tool able to show uncompressed values if the blob file is compressed. Also show total compressed vs. raw size at the end if --show_summary is provided. Closes https://github.com/facebook/rocksdb/pull/3633 Differential Revision: D7348926 Pulled By: yiwu-arbug fbshipit-source-id: ca709cb4ed5cf6a550ff2987df8033df81516f8e	8 years ago
Zhongyi Xie	c827b2dc2a	fix build for rocksdb lite Summary: currently rocksdb lite build fails due to the following errors: > db/db_sst_test.cc:29:51: error: ‘FlushJobInfo’ does not name a type virtual void OnFlushCompleted(DB* /db/, const FlushJobInfo& info) override { ^ db/db_sst_test.cc:29:16: error: ‘virtual void rocksdb::FlushedFileCollector::OnFlushCompleted(rocksdb::DB, const int&)’ marked ‘override’, but does not override virtual void OnFlushCompleted(DB /db/, const FlushJobInfo& info) override { ^ db/db_sst_test.cc:24:7: error: ‘class rocksdb::FlushedFileCollector’ has virtual functions and accessible non-virtual destructor [-Werror=non-virtual-dtor] class FlushedFileCollector : public EventListener { ^ db/db_sst_test.cc: In member function ‘virtual void rocksdb::FlushedFileCollector::OnFlushCompleted(rocksdb::DB, const int&)’: db/db_sst_test.cc:31:35: error: request for member ‘file_path’ in ‘info’, which is of non-class type ‘const int’ flushed_files_.push_back(info.file_path); ^ cc1plus: all warnings being treated as errors make: ** [db/db_sst_test.o] Error 1 Closes https://github.com/facebook/rocksdb/pull/3676 Differential Revision: D7493006 Pulled By: miasantreble fbshipit-source-id: 77dff0a5b23e27db51be9b9798e3744e6fdec64f	8 years ago
Sagar Vemuri	7d9067991e	Ttl-triggered and snapshot-release-triggered compactions should not be manual compactions Summary: Ttl-triggered and snapshot-release-triggered compactions should not be considered as manual compactions. This is a bug. Closes https://github.com/facebook/rocksdb/pull/3678 Differential Revision: D7498151 Pulled By: sagar0 fbshipit-source-id: a2d5bed05268a4dc93d54ea97a9ae44b366df15d	8 years ago
Dmitri Smirnov	2a62ca1750	Make Optimistic Tx database stackable Summary: This change models Optimistic Tx db after Pessimistic TX db. The motivation for this change is to make the ptr polymorphic so it can be held by the same raw or smart ptr. Currently, due to the inheritance of the Opt Tx db not being rooted in the manner of Pess Tx from a single DB root it is more difficult to write clean code and have clear ownership of the database in cases when options dictate instantiate of plan DB, Pess Tx DB or Opt tx db. Closes https://github.com/facebook/rocksdb/pull/3566 Differential Revision: D7184502 Pulled By: yiwu-arbug fbshipit-source-id: 31d06efafd79497bb0c230e971857dba3bd962c3	8 years ago
Andrew Kryczka	b058a33705	Reduce default --nooverwritepercent in black-box crash tests Summary: Previously `python tools/db_crashtest.py blackbox` would do no useful work as the crash interval (two minutes) was shorter than the preparation phase. The preparation phase is slow because of the ridiculously inefficient way it computes which keys should not be overwritten. It was doing this for 60M keys since default values were `FLAGS_nooverwritepercent == 60` and `FLAGS_max_key == 100000000`. Move the "nooverwritepercent" override from whitebox-specific to the general options so it also applies to blackbox test runs. Now preparation phase takes a few seconds. Closes https://github.com/facebook/rocksdb/pull/3671 Differential Revision: D7457732 Pulled By: ajkr fbshipit-source-id: 601f4461a6a7e49e50449dcf15aebc9b8a98d6f0	8 years ago
Adam Retter	12b400e814	Some small improvements to the build_tools Summary: Closes https://github.com/facebook/rocksdb/pull/3664 Differential Revision: D7459433 Pulled By: sagar0 fbshipit-source-id: 3817e5d45fc70e83cb26f9800eaa0f4566c8dc0e	8 years ago
Sagar Vemuri	04c11b867d	Level Compaction with TTL Summary: Level Compaction with TTL. As of today, a file could exist in the LSM tree without going through the compaction process for a really long time if there are no updates to the data in the file's key range. For example, in certain use cases, the keys are not actually "deleted"; instead they are just set to empty values. There might not be any more writes to this "deleted" key range, and if so, such data could remain in the LSM for a really long time resulting in wasted space. Introducing a TTL could solve this problem. Files (and, in turn, data) older than TTL will be scheduled for compaction when there is no other background work. This will make the data go through the regular compaction process and get rid of old unwanted data. This also has the (good) side-effect of all the data in the non-bottommost level being newer than ttl, and all data in the bottommost level older than ttl. It could lead to more writes while reducing space. This functionality can be controlled by the newly introduced column family option -- ttl. TODO for later: - Make ttl mutable - Extend TTL to Universal compaction as well? (TTL is already supported in FIFO) - Maybe deprecate CompactionOptionsFIFO.ttl in favor of this new ttl option. Closes https://github.com/facebook/rocksdb/pull/3591 Differential Revision: D7275442 Pulled By: sagar0 fbshipit-source-id: dcba484717341200d419b0953dafcdf9eb2f0267	8 years ago
Koby Kahane	df14424410	Fix 3-way SSE4.2 crc32c usage in MSVC with CMake Summary: The introduction of the 3-way SSE4.2 optimized crc32c implementation in commit `f54d7f5fea` added the `HAVE_PCLMUL` definition when the compiler supports intrinsics for that instruction, but did not modify CMakeLists.txt to set that definition on MSVC when appropriate. As a result, 3-way SSE4.2 is not used in MSVC builds with CMake although it could be. Since the existing test program in CMakeLists.txt for `HAVE_SSE42` already uses `_mm_clmulepi64_si128` which is a PCLMUL instruction, this PR sets `HAVE_PCLMUL` as well if that program builds successfully, fixing the problem. Closes https://github.com/facebook/rocksdb/pull/3673 Differential Revision: D7473975 Pulled By: miasantreble fbshipit-source-id: bc346b9eb38920e427aa1a253e6dd9811efa269e	8 years ago
Maysam Yabandeh	b225de7e10	WritePrepared Txn: smallest_prepare optimization Summary: The is an optimization to reduce lookup in the CommitCache when querying IsInSnapshot. The optimization takes the smallest uncommitted data at the time that the snapshot was taken and if the sequence number of the read data is lower than that number it assumes the data as committed. To implement this optimization two changes are required: i) The AddPrepared function must be called sequentially to avoid out of order insertion in the PrepareHeap (otherwise the top of the heap does not indicate the smallest prepare in future too), ii) non-2PC transactions also call AddPrepared if they do not commit in one step. Closes https://github.com/facebook/rocksdb/pull/3649 Differential Revision: D7388630 Pulled By: maysamyabandeh fbshipit-source-id: b79506238c17467d590763582960d4d90181c600	8 years ago
Amy Tai	1579626d0d	Enable cancelling manual compactions if they hit the sfm size limit Summary: Manual compactions should be cancelled, just like scheduled compactions are cancelled, if sfm->EnoughRoomForCompaction is not true. Closes https://github.com/facebook/rocksdb/pull/3670 Differential Revision: D7457683 Pulled By: amytai fbshipit-source-id: 669b02fdb707f75db576d03d2c818fb98d1876f5	8 years ago
Zhongyi Xie	44653c7b7a	Revert "Avoid adding tombstones of the same file to RangeDelAggregato… Summary: …r multiple times" This reverts commit `e80709a33a`. lingbin PR https://github.com/facebook/rocksdb/pull/3635 is causing some performance regression for seekrandom workloads I'm reverting the commit for now but feel free to submit new patches 😃 To reproduce the regression, you can run the following db_bench command > ./db_bench --benchmarks=fillrandom,seekrandomwhilewriting --threads=1 --num=1000000 --reads=150000 --key_size=66 --value_size=1262 --statistics=0 --compression_ratio=0.5 --histogram=1 --seek_nexts=1 --stats_per_interval=1 --stats_interval_seconds=600 --max_background_flushes=4 --num_multi_db=1 --max_background_compactions=16 --seed=1522388277 -write_buffer_size=1048576 --level0_file_num_compaction_trigger=10000 --compression_type=none write stats printed by db_bench: Table \| \| \| \| \| \| \| \| \| \| \| --- \| --- \| --- \| --- \| --- \| --- \| --- \| --- \| --- \| --- \| --- revert commit \| Percentiles: \| P50: \| 80.77 \| P75: \|102.94 \|P99: \| 1786.44 \| P99.9: \| 1892.39 \|P99.99: 2645.10 \| keep commit \| Percentiles: \| P50: \| 221.72 \| P75: \| 686.62 \| P99: \| 1842.57 \| P99.9: \| 1899.70\| P99.99: 2814.29\| Closes https://github.com/facebook/rocksdb/pull/3672 Differential Revision: D7463315 Pulled By: miasantreble fbshipit-source-id: 8e779c87591127f2c3694b91a56d9b459011959d	8 years ago
Adam Retter	8917eee962	Fixed small typos Summary: Closes https://github.com/facebook/rocksdb/pull/3667 Differential Revision: D7470060 Pulled By: miasantreble fbshipit-source-id: 8e8545cda38f0805f35ccdb8841666a2d7a965f5	8 years ago
Fosco Marotto	d12112d05e	Throw NoSpace instead of IOError when out of space. Summary: Replaces #1702 and is updated from feedback. Closes https://github.com/facebook/rocksdb/pull/3531 Differential Revision: D7457395 Pulled By: gfosco fbshipit-source-id: 25a21dd8cfa5a6e42e024208b444d9379d920c82	8 years ago
Fosco Marotto	d9bfb35d31	Update buckifier and TARGETS Summary: Some flags used via make were not applied in the buckifier/targets file, causing some failures to be missed by testing infra ( ie the one fixed by #3434 ) Closes https://github.com/facebook/rocksdb/pull/3452 Differential Revision: D7457419 Pulled By: gfosco fbshipit-source-id: e4aed2915ca3038c1485bbdeebedfc33d5704a49	8 years ago
Fosco Marotto	c3eb762bb0	Update 64-bit shift in compression.h Summary: This was failing the build on windows with zstd, warning treated as an error, 32-bit shift implicitly converted to 64-bit. Closes https://github.com/facebook/rocksdb/pull/3624 Differential Revision: D7307883 Pulled By: gfosco fbshipit-source-id: 68110e9b5b1b59b668dec6cf86b67556402574e7	8 years ago
Maysam Yabandeh	73f21a7b21	Skip deleted WALs during recovery Summary: This patch record the deleted WAL numbers in the manifest to ignore them and any WAL older than them during recovery. This is to avoid scenarios when we have a gap between the WAL files are fed to the recovery procedure. The gap could happen by for example out-of-order WAL deletion. Such gap could cause problems in 2PC recovery where the prepared and commit entry are placed into two separate WAL and gap in the WALs could result into not processing the WAL with the commit entry and hence breaking the 2PC recovery logic. Closes https://github.com/facebook/rocksdb/pull/3488 Differential Revision: D6967893 Pulled By: maysamyabandeh fbshipit-source-id: 13119feb155a08ab6d4909f437c7a750480dc8a1	8 years ago
Maysam Yabandeh	89d989ed75	WritePrepared Txn: fix a bug in publishing recoverable state seq Summary: When using two_write_queue, the published seq and the last allocated sequence could be ahead of the LastSequence, even if both write queues are stopped as in WriteRecoverableState. The patch fixes a bug in WriteRecoverableState in which LastSequence was used as a reference but the result was applied to last fetched sequence and last published seq. Closes https://github.com/facebook/rocksdb/pull/3665 Differential Revision: D7446099 Pulled By: maysamyabandeh fbshipit-source-id: 1449bed9aed8e9db6af85946efd347cb8efd3c0b	8 years ago
Adam Retter	3cb591954e	Allow rocksdbjavastatic to also be built as debug build Summary: Closes https://github.com/facebook/rocksdb/pull/3654 Differential Revision: D7417948 Pulled By: sagar0 fbshipit-source-id: 9514df9328181e54a6384764444c0c7ce66e7f5f	8 years ago
Maysam Yabandeh	0377ff9dea	WritePrepared Txn: make recoverable state visible after flush Summary: Currently if the CommitTimeWriteBatch is set to be used only as a state that is required only for recovery , the user cannot see that in DB until it is restarted. This while the state is already inserted into the DB after the memtable flush. It would be useful for debugging if make this state visible to the user after the flush by committing it. The patch does it by a invoking a callback that does the commit on the recoverable state. Closes https://github.com/facebook/rocksdb/pull/3661 Differential Revision: D7424577 Pulled By: maysamyabandeh fbshipit-source-id: 137f9408662f0853938b33fa440f27f04c1bbf5c	8 years ago
Yanqin Jin	1f5def1653	Fix race condition causing double deletion of ssts Summary: Possible interleaved execution of background compaction thread calling `FindObsoleteFiles (no full scan) / PurgeObsoleteFiles` and user thread calling `FindObsoleteFiles (full scan) / PurgeObsoleteFiles` can lead to race condition on which RocksDB attempts to delete a file twice. The second attempt will fail and return `IO error`. This may occur to other files, but this PR targets sst. Also add a unit test to verify that this PR fixes the issue. The newly added unit test `obsolete_files_test` has a test case for this scenario, implemented in `ObsoleteFilesTest#RaceForObsoleteFileDeletion`. `TestSyncPoint`s are used to coordinate the interleaving the `user_thread` and background compaction thread. They execute as follows ``` timeline user_thread background_compaction thread t1 \| FindObsoleteFiles(full_scan=false) t2 \| FindObsoleteFiles(full_scan=true) t3 \| PurgeObsoleteFiles t4 \| PurgeObsoleteFiles V ``` When `user_thread` invokes `FindObsoleteFiles` with full scan, it collects ALL files in RocksDB directory, including the ones that background compaction thread have collected in its job context. Then `user_thread` will see an IO error when trying to delete these files in `PurgeObsoleteFiles` because background compaction thread has already deleted the file in `PurgeObsoleteFiles`. To fix this, we make RocksDB remember which (SST) files have been found by threads after calling `FindObsoleteFiles` (see `DBImpl#files_grabbed_for_purge_`). Therefore, when another thread calls `FindObsoleteFiles` with full scan, it will not collect such files. ajkr could you take a look and comment? Thanks! Closes https://github.com/facebook/rocksdb/pull/3638 Differential Revision: D7384372 Pulled By: riversand963 fbshipit-source-id: 01489516d60012e722ee65a80e1449e589ce26d3	8 years ago
Sagar Vemuri	90c542347a	Update comments about MergeOperator::AllowSingleOperand Summary: Updated comments around AllowSingleOperand. Reason: A couple of users were confused and encountered issues due to no overriding PartialMerge with AllowSingleOperand=true. I'll also look into modifying the default merge operator implementation so that overriding PartialMerge is not mandatory when AllowSingleOp=true. Closes https://github.com/facebook/rocksdb/pull/3659 Differential Revision: D7422691 Pulled By: sagar0 fbshipit-source-id: 3d075a6ced0120f5d65cb7ae5412936f1862f342	8 years ago
Sagar Vemuri	d687670256	Fix a leak in FilterBlockBuilder when adding prefix Summary: Our valgrind continuous test found an interesting leak which got introduced in #3614. We were adding the prefix key before saving the previous prefix start offset, due to which previous prefix offset is always incorrect. Fixed it by saving the the previous sate before adding the key. Closes https://github.com/facebook/rocksdb/pull/3660 Differential Revision: D7418698 Pulled By: sagar0 fbshipit-source-id: 9933685f943cf2547ed5c553f490035a2fa785cf	8 years ago
Anand Ananthabhotla	f9f4d40f93	Align SST file data blocks to avoid spanning multiple pages Summary: Provide a block_align option in BlockBasedTableOptions to allow alignment of SST file data blocks. This will avoid higher IOPS/throughput load due to < 4KB data blocks spanning 2 4KB pages. When this option is set to true, the block alignment is set to lower of block size and 4KB. Closes https://github.com/facebook/rocksdb/pull/3502 Differential Revision: D7400897 Pulled By: anand1976 fbshipit-source-id: 04cc3bd144e88e3431a4f97604e63ad7a0f06d44	8 years ago
Maysam Yabandeh	0999e9b79a	WritePrepared Txn: Increase commit cache size to 2^23 Summary: Current commit cache size is 2^21. This was due to a type. With 2^23 commit entries we can have transactions as long as 64s without incurring the cost of having them evicted from the commit cache before their commit. Here is the math: 2^23 / 2 (one out of two seq numbers are for commit) / 2^16 TPS = 2^6 = 64s Closes https://github.com/facebook/rocksdb/pull/3657 Differential Revision: D7411211 Pulled By: maysamyabandeh fbshipit-source-id: e7cacf40579f3acf940643d8a1cfe5dd201caa35	8 years ago
Maysam Yabandeh	35a4469bbf	Fix race condition via concurrent FlushWAL Summary: Currently log_writer->AddRecord in WriteImpl is protected from concurrent calls via FlushWAL only if two_write_queues_ option is set. The patch fixes the problem by i) skip log_writer->AddRecord in FlushWAL if manual_wal_flush is not set, ii) protects log_writer->AddRecord in WriteImpl via log_write_mutex_ if manual_wal_flush_ is set but two_write_queues_ is not. Fixes #3599 Closes https://github.com/facebook/rocksdb/pull/3656 Differential Revision: D7405608 Pulled By: maysamyabandeh fbshipit-source-id: d6cc265051c77ae49c7c6df4f427350baaf46934	8 years ago
Sagar Vemuri	23f9d93f47	Exclude MySQLStyleTransactionTest.TransactionStressTest* from valgrind Summary: I found that each instance of MySQLStyleTransactionTest.TransactionStressTest/x is taking more than 10 hours to complete on our continuous testing environment, causing the whole valgrind run to timeout after a day. So excluding these tests. Closes https://github.com/facebook/rocksdb/pull/3652 Differential Revision: D7400332 Pulled By: sagar0 fbshipit-source-id: 987810574506d01487adf7c2de84d4817ec3d22d	8 years ago
Maysam Yabandeh	3e417a6607	WritePrepared Txn: AddPrepared for all sub-batches Summary: Currently AddPrepared is performed only on the first sub-batch if there are duplicate keys in the write batch. This could cause a problem if the transaction takes too long to commit and the seq number of the first sub-patch moved to old_prepared_ but not the seq of the later ones. The patch fixes this by calling AddPrepared for all sub-patches. Closes https://github.com/facebook/rocksdb/pull/3651 Differential Revision: D7388635 Pulled By: maysamyabandeh fbshipit-source-id: 0ccd80c150d9bc42fe955e49ddb9d7ca353067b4	8 years ago
Dmitri Smirnov	d382ae7de6	Imporve perf of random read and insert compare by suggesting inlining to the compiler Summary: Results from 2015 compiler. This improve sequential insert. Random Read results are inconclusive but I hope 2017 will do a better job at inlining. Before: fillseq : 3.638 micros/op 274866 ops/sec; 213.9 MB/s After: fillseq : 3.379 micros/op 295979 ops/sec; 230.3 MB/s Closes https://github.com/facebook/rocksdb/pull/3645 Differential Revision: D7382711 Pulled By: siying fbshipit-source-id: 092a07ffe8a6e598d1226ceff0f11b35e6c5c8e4	8 years ago
Dmitri Smirnov	53d66df0c4	Refactor sync_point to make implementation either customizable or replaceable Summary: Closes https://github.com/facebook/rocksdb/pull/3637 Differential Revision: D7354373 Pulled By: ajkr fbshipit-source-id: 6816c7bbc192ed0fb944942b11c7074bf24eddf1	8 years ago
Sagar Vemuri	a993c0139d	Add 5.11 and 5.12 to tools/check_format_compatible.sh Summary: Closes https://github.com/facebook/rocksdb/pull/3646 Differential Revision: D7384727 Pulled By: sagar0 fbshipit-source-id: f713af7adb2ffea5303bbf0fac8a8a1630af7b38	8 years ago
LingBin	e80709a33a	Avoid adding tombstones of the same file to RangeDelAggregator multiple times Summary: RangeDelAggregator will remember the files whose range tombstones have been added, so the caller can check whether the file has been added before call AddTombstones. Closes https://github.com/facebook/rocksdb/pull/3635 Differential Revision: D7354604 Pulled By: ajkr fbshipit-source-id: 9b9f7ec130556028df417e650711554b46d8d107	8 years ago
Sagar Vemuri	7ffce2805b	Add Java-API-Changes section to History Summary: We have not been updating our HISTORY.md change log with the RocksJava changes. Going forward, lets add Java changes also to HISTORY.md. There is an old java/HISTORY-JAVA.md, but it hasn't been updated in years. It is much easier to remember to update the change log in a single file, HISTORY.md. I added information about shared block cache here, which was introduced in #3623. Closes https://github.com/facebook/rocksdb/pull/3647 Differential Revision: D7384448 Pulled By: sagar0 fbshipit-source-id: 9b6e569f44e6df5cb7ba06413d9975df0b517d20	8 years ago
Radoslaw Zarzynski	09b6bf828a	InlineSkiplist: don't decode keys unnecessarily during comparisons Summary: Summary ======== `InlineSkipList<>::Insert` takes the `key` parameter as a C-string. Then, it performs multiple comparisons with it requiring the `GetLengthPrefixedSlice()` to be spawn in `MemTable::KeyComparator::operator()(const char* prefix_len_key1, const char* prefix_len_key2)` on the same data over and over. The patch tries to optimize that. Rough performance comparison ===== Big keys, no compression. ``` $ ./db_bench --writes 20000000 --benchmarks="fillrandom" --compression_type none -key_size 256 (...) fillrandom : 4.222 micros/op 236836 ops/sec; 80.4 MB/s ``` ``` $ ./db_bench --writes 20000000 --benchmarks="fillrandom" --compression_type none -key_size 256 (...) fillrandom : 4.064 micros/op 246059 ops/sec; 83.5 MB/s ``` TODO ====== In ~~a separated~~ this PR: - [x] Go outside the write path. Maybe even eradicate the C-string-taking variant of `KeyIsAfterNode` entirely. - [x] Try to cache the transformations applied by `KeyComparator` & friends in situations where we havy many comparisons with the same key. Closes https://github.com/facebook/rocksdb/pull/3516 Differential Revision: D7059300 Pulled By: ajkr fbshipit-source-id: 6f027dbb619a488129f79f79b5f7dbe566fb2dbb	8 years ago
Zhongyi Xie	1cbc96d236	FlushReason improvement Summary: Right now flush reason "SuperVersion Change" covers a few different scenarios which is a bit vague. For example, the following db_bench job should trigger "Write Buffer Full" > $ TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=fillrandom -write_buffer_size=1048576 -target_file_size_base=1048576 -max_bytes_for_level_base=4194304 $ grep 'flush_reason' /dev/shm/dbbench/LOG ... 2018/03/06-17:30:42.543638 7f2773b99700 EVENT_LOG_v1 {"time_micros": 1520386242543634, "job": 192, "event": "flush_started", "num_memtables": 1, "num_entries": 7006, "num_deletes": 0, "memory_usage": 1018024, "flush_reason": "SuperVersion Change"} 2018/03/06-17:30:42.569541 7f2773b99700 EVENT_LOG_v1 {"time_micros": 1520386242569536, "job": 193, "event": "flush_started", "num_memtables": 1, "num_entries": 7006, "num_deletes": 0, "memory_usage": 1018104, "flush_reason": "SuperVersion Change"} 2018/03/06-17:30:42.596396 7f2773b99700 EVENT_LOG_v1 {"time_micros": 1520386242596392, "job": 194, "event": "flush_started", "num_memtables": 1, "num_entries": 7008, "num_deletes": 0, "memory_usage": 1018048, "flush_reason": "SuperVersion Change"} 2018/03/06-17:30:42.622444 7f2773b99700 EVENT_LOG_v1 {"time_micros": 1520386242622440, "job": 195, "event": "flush_started", "num_memtables": 1, "num_entries": 7006, "num_deletes": 0, "memory_usage": 1018104, "flush_reason": "SuperVersion Change"} With the fix: > 2018/03/19-14:40:02.341451 7f11dc257700 EVENT_LOG_v1 {"time_micros": 1521495602341444, "job": 98, "event": "flush_started", "num_memtables": 1, "num_entries": 7009, "num_deletes": 0, "memory_usage": 1018008, "flush_reason": "Write Buffer Full"} 2018/03/19-14:40:02.379655 7f11dc257700 EVENT_LOG_v1 {"time_micros": 1521495602379642, "job": 100, "event": "flush_started", "num_memtables": 1, "num_entries": 7006, "num_deletes": 0, "memory_usage": 1018016, "flush_reason": "Write Buffer Full"} 2018/03/19-14:40:02.418479 7f11dc257700 EVENT_LOG_v1 {"time_micros": 1521495602418474, "job": 101, "event": "flush_started", "num_memtables": 1, "num_entries": 7009, "num_deletes": 0, "memory_usage": 1018104, "flush_reason": "Write Buffer Full"} 2018/03/19-14:40:02.455084 7f11dc257700 EVENT_LOG_v1 {"time_micros": 1521495602455079, "job": 102, "event": "flush_started", "num_memtables": 1, "num_entries": 7009, "num_deletes": 0, "memory_usage": 1018048, "flush_reason": "Write Buffer Full"} 2018/03/19-14:40:02.492293 7f11dc257700 EVENT_LOG_v1 {"time_micros": 1521495602492288, "job": 104, "event": "flush_started", "num_memtables": 1, "num_entries": 7007, "num_deletes": 0, "memory_usage": 1018056, "flush_reason": "Write Buffer Full"} 2018/03/19-14:40:02.528720 7f11dc257700 EVENT_LOG_v1 {"time_micros": 1521495602528715, "job": 105, "event": "flush_started", "num_memtables": 1, "num_entries": 7006, "num_deletes": 0, "memory_usage": 1018104, "flush_reason": "Write Buffer Full"} 2018/03/19-14:40:02.566255 7f11dc257700 EVENT_LOG_v1 {"time_micros": 1521495602566238, "job": 107, "event": "flush_started", "num_memtables": 1, "num_entries": 7009, "num_deletes": 0, "memory_usage": 1018112, "flush_reason": "Write Buffer Full"} Closes https://github.com/facebook/rocksdb/pull/3627 Differential Revision: D7328772 Pulled By: miasantreble fbshipit-source-id: 67c94065fbdd36930f09930aad0aaa6d2c152bb8	8 years ago
Andrew Kryczka	82137f0ce8	Add unit test for WAL corruption Summary: Closes https://github.com/facebook/rocksdb/pull/3618 Differential Revision: D7301053 Pulled By: ajkr fbshipit-source-id: a9dde90caa548c294d03d6386f78428c8536ca14	8 years ago
Sagar Vemuri	2e3d407778	Fsync after writing global seq number in ExternalSstFileIngestionJob Summary: Fsync after writing global sequence number to the ingestion file in ExternalSstFileIngestionJob. Otherwise the file metadata could be incorrect. Closes https://github.com/facebook/rocksdb/pull/3644 Differential Revision: D7373813 Pulled By: sagar0 fbshipit-source-id: 4da2c9e71a8beb5c08b4ac955f288ee1576358b8	8 years ago
Andrew Kryczka	4d51feab0b	Rename function for handling WAL write error Summary: It was misnamed. It actually updates `bg_error_` if `PreprocessWrite()` or `WriteToWAL()` fail, not related to the user callback. Closes https://github.com/facebook/rocksdb/pull/3485 Differential Revision: D6955787 Pulled By: ajkr fbshipit-source-id: bd7afc3fdb7a52830c021cbfc25fcbc3ab7d5e10	8 years ago
Siying Dong	118058ba69	SstFileManager: add bytes_max_delete_chunk Summary: Add `bytes_max_delete_chunk` in SstFileManager so that we can drop a large file in multiple batches. Closes https://github.com/facebook/rocksdb/pull/3640 Differential Revision: D7358679 Pulled By: siying fbshipit-source-id: ef17f0da2f5723dbece2669485a9b91b3edc0bb7	8 years ago
Andrew Kryczka	88c3e26cc0	log value of CompressionOptions::zstd_max_train_bytes Summary: Closes https://github.com/facebook/rocksdb/pull/3587 Differential Revision: D7206901 Pulled By: ajkr fbshipit-source-id: 5d4b1a2653627b44aa3c22db7d98c9cd5dcdb67a	8 years ago
Andrew Kryczka	620823f88b	parse CompressionOptions::zstd_max_train_bytes in options string Summary: Closes https://github.com/facebook/rocksdb/pull/3588 Differential Revision: D7208087 Pulled By: ajkr fbshipit-source-id: 688f7a7c447cb17bee1b410d1fd891c0bf966617	8 years ago
Fosco Marotto	de6cf95a53	Update history for future 5.13 release Summary: Closes https://github.com/facebook/rocksdb/pull/3631 Differential Revision: D7367519 Pulled By: gfosco fbshipit-source-id: 57826cc1c9ffc9f2b351075567b8ad929809cb74	8 years ago
Maysam Yabandeh	7429b20e39	WritePrepared Txn: fix race condition on publishing seq Summary: This commit fixes a race condition on calling SetLastPublishedSequence. The function must be called only from the 2nd write queue when two_write_queues is enabled. However there was a bug that would also call it from the main write queue if CommitTimeWriteBatch is provided to the commit request and yet use_only_the_last_commit_time_batch_for_recovery optimization is not enabled. To fix that we penalize the commit request in such cases by doing an additional write solely to publish the seq number from the 2nd queue. Closes https://github.com/facebook/rocksdb/pull/3641 Differential Revision: D7361508 Pulled By: maysamyabandeh fbshipit-source-id: bf8f7a27e5cccf5425dccbce25eb0032e8e5a4d7	8 years ago
Rohan Rathi	fa8c050e9f	Fixed buffer overrun in BackupEngineImpl::BackupMeta::StoreToFile Summary: The 10MB buffer in BackupEngineImpl::BackupMeta::StoreToFile can be corrupted with a large number of files. Added a check to determine current buffer length and append data to file if buffer becomes full. Resolves https://github.com/facebook/rocksdb/issues/3228 Closes https://github.com/facebook/rocksdb/pull/3636 Differential Revision: D7354160 Pulled By: ajkr fbshipit-source-id: eec12d38095a0d17551a4aaee52b99d30a555722	8 years ago
Huachao Huang	7a6353bd1c	Ignore empty filter block when data block is empty Summary: Close https://github.com/facebook/rocksdb/issues/3592 Closes https://github.com/facebook/rocksdb/pull/3614 Differential Revision: D7291706 Pulled By: ajkr fbshipit-source-id: 9dd8f40bd7716588e1e3fd6be0c2bc2766861f8c	8 years ago
QingpingWang	70282cf876	fix behavior does not match name for "IsFileDeletionsEnabled" Summary: for PR https://github.com/facebook/rocksdb/pull/3598 I deleted the original repo for some reason. Sorry for the inconvenience. Closes https://github.com/facebook/rocksdb/pull/3612 Differential Revision: D7291671 Pulled By: ajkr fbshipit-source-id: 918490ba86b13fe450d232af436cbe259d847c64	8 years ago
QingpingWang	2ce8f63f81	C API for PerfContext Summary: This pull request exposes the interface of PerfContext as C API Closes https://github.com/facebook/rocksdb/pull/3607 Differential Revision: D7294225 Pulled By: ajkr fbshipit-source-id: eddcfbc13538f379950b2c8b299486695ffb5e2c	8 years ago

... 34 35 36 37 38 ...

8760 Commits (ab65278b1f29f9a75f1c184317a6708419dcd27e) All Branches Search

8760 Commits (ab65278b1f29f9a75f1c184317a6708419dcd27e)

All Branches