rocksdb

Commit Graph

Author	SHA1	Message	Date
Yanqin Jin	d42bd041c5	Improve visibility into the reasons for compaction. Summary: Add `compaction_reason` as part of event log for event `compaction started`. Add counters for each `CompactionReason`. Closes https://github.com/facebook/rocksdb/pull/3679 Differential Revision: D7550348 Pulled By: riversand963 fbshipit-source-id: a19cff3a678c785aa5ef41aac78b9a5968fcc34d	8 years ago
Yanqin Jin	d95014b9df	fix some text in comments. Summary: 1. Remove redundant text. 2. Make terminology consistent across all comments and doc of RocksDB. Also do our best to conform to conventions. Specifically, use 'callback' instead of 'call-back' [wikipedia](https://en.wikipedia.org/wiki/Callback_(computer_programming)). Closes https://github.com/facebook/rocksdb/pull/3693 Differential Revision: D7560396 Pulled By: riversand963 fbshipit-source-id: ba8c251c487f4e7d1872a1a8dc680f9e35a6ffb8	8 years ago
Maysam Yabandeh	bde1c1a72a	WritePrepared Txn: add stats Summary: Adding some stats that would be helpful to monitor if the DB has gone to unlikely stats that would hurt the performance. These are mostly when we end up needing to acquire a mutex. Closes https://github.com/facebook/rocksdb/pull/3683 Differential Revision: D7529393 Pulled By: maysamyabandeh fbshipit-source-id: f7d36279a8f39bd84d8ddbf64b5c97f670c5d6d9	8 years ago
Maysam Yabandeh	eb5a295440	WritePrepared Txn: add write_committed option to dump_wal Summary: Currently dump_wal cannot print the prepared records from the WAL that is generated by WRITE_PREPARED write policy since the default reaction of the handler is to return NotSupported if markers of WRITE_PREPARED are encountered. This patch enables the admin to pass --write_committed=false option, which will be accordingly passed to the handler. Note that DBFileDumperCommand and DBDumperCommand are still not updated by this patch but firstly they are not urgent and secondly we need to revise this approach later when we also add WRITE_UNPREPARED markers so I leave it for future work. Tested by running it on a WAL generated by WRITE_PREPARED: $ ./ldb dump_wal --walfile=/dev/shm/dbbench/000003.log \| grep BEGIN_PREARE \| head -1 1,2,70,0,BEGIN_PREARE $ ./ldb dump_wal --walfile=/dev/shm/dbbench/000003.log --write_committed=false \| grep BEGIN_PREARE \| head -1 1,2,70,0,BEGIN_PREARE PUT(0) : 0x30303031313330313938 PUT(0) : 0x30303032353732313935 END_PREPARE(0x74786E31313535383434323738303738363938313335312D30) Closes https://github.com/facebook/rocksdb/pull/3682 Differential Revision: D7522090 Pulled By: maysamyabandeh fbshipit-source-id: a0332207261c61e18b2f9dfbe9feecd9a1339aca	8 years ago
Andrew Kryczka	faba3fb53d	protect valid backup files when max_valid_backups_to_open is set Summary: When `max_valid_backups_to_open` is set, the `BackupEngine` doesn't know about the files referenced by existing backups. This PR prevents us from deleting valid files when that option is set, in cases where we are unable to accurately determine refcount. There are warnings logged when we may miss deleting unreferenced files, and a recommendation in the header for users to periodically unset this option and run a full `GarbageCollect`. Closes https://github.com/facebook/rocksdb/pull/3518 Differential Revision: D7008331 Pulled By: ajkr fbshipit-source-id: 87907f964dc9716e229d08636a895d2fc7b72305	8 years ago
Phani Shekhar Mantripragada	446b32cfc3	Support for Column family specific paths. Summary: In this change, an option to set different paths for different column families is added. This option is set via cf_paths setting of ColumnFamilyOptions. This option will work in a similar fashion to db_paths setting. Cf_paths is a vector of Dbpath values which contains a pair of the absolute path and target size. Multiple levels in a Column family can go to different paths if cf_paths has more than one path. To maintain backward compatibility, if cf_paths is not specified for a column family, db_paths setting will be used. Note that, if db_paths setting is also not specified, RocksDB already has code to use db_name as the only path. Changes : 1) A new member "cf_paths" is added to ImmutableCfOptions. This is set, based on cf_paths setting of ColumnFamilyOptions and db_paths setting of ImmutableDbOptions. This member is used to identify the path information whenever files are accessed. 2) Validation checks are added for cf_paths setting based on existing checks for db_paths setting. 3) DestroyDB, PurgeObsoleteFiles etc. are edited to support multiple cf_paths. 4) Unit tests are added appropriately. Closes https://github.com/facebook/rocksdb/pull/3102 Differential Revision: D6951697 Pulled By: ajkr fbshipit-source-id: 60d2262862b0a8fd6605b09ccb0da32bb331787d	8 years ago
Maysam Yabandeh	67182678a5	Stats for false positive rate of full filtesr Summary: Adds two stats to allow us measuring the false positive rate of full filters: - The total count of positives: rocksdb.bloom.filter.full.positive - The total count of true positives: rocksdb.bloom.filter.full.true.positive Not the term "full" in the stat name to indicate that they are meaningful in full filters. block-based filters are to be deprecated soon and supporting it is not worth the the additional cost of if-then-else branches. Closes #3680 Tested by: $ ./db_bench -benchmarks=fillrandom -db /dev/shm/rocksdb-tmpdb --num=1000000 -bloom_bits=10 $ ./db_bench -benchmarks="readwhilewriting" -db /dev/shm/rocksdb-tmpdb --statistics -bloom_bits=10 --duration=60 --num=2000000 --use_existing_db 2>&1 > /tmp/full.log $ grep filter.full /tmp/full.log rocksdb.bloom.filter.full.positive COUNT : 3628593 rocksdb.bloom.filter.full.true.positive COUNT : 3536026 which gives the false positive rate of 2.5% Closes https://github.com/facebook/rocksdb/pull/3681 Differential Revision: D7517570 Pulled By: maysamyabandeh fbshipit-source-id: 630ab1a473afdce404916d297035b6318de4c052	8 years ago
Dmitri Smirnov	2a62ca1750	Make Optimistic Tx database stackable Summary: This change models Optimistic Tx db after Pessimistic TX db. The motivation for this change is to make the ptr polymorphic so it can be held by the same raw or smart ptr. Currently, due to the inheritance of the Opt Tx db not being rooted in the manner of Pess Tx from a single DB root it is more difficult to write clean code and have clear ownership of the database in cases when options dictate instantiate of plan DB, Pess Tx DB or Opt tx db. Closes https://github.com/facebook/rocksdb/pull/3566 Differential Revision: D7184502 Pulled By: yiwu-arbug fbshipit-source-id: 31d06efafd79497bb0c230e971857dba3bd962c3	8 years ago
Sagar Vemuri	04c11b867d	Level Compaction with TTL Summary: Level Compaction with TTL. As of today, a file could exist in the LSM tree without going through the compaction process for a really long time if there are no updates to the data in the file's key range. For example, in certain use cases, the keys are not actually "deleted"; instead they are just set to empty values. There might not be any more writes to this "deleted" key range, and if so, such data could remain in the LSM for a really long time resulting in wasted space. Introducing a TTL could solve this problem. Files (and, in turn, data) older than TTL will be scheduled for compaction when there is no other background work. This will make the data go through the regular compaction process and get rid of old unwanted data. This also has the (good) side-effect of all the data in the non-bottommost level being newer than ttl, and all data in the bottommost level older than ttl. It could lead to more writes while reducing space. This functionality can be controlled by the newly introduced column family option -- ttl. TODO for later: - Make ttl mutable - Extend TTL to Universal compaction as well? (TTL is already supported in FIFO) - Maybe deprecate CompactionOptionsFIFO.ttl in favor of this new ttl option. Closes https://github.com/facebook/rocksdb/pull/3591 Differential Revision: D7275442 Pulled By: sagar0 fbshipit-source-id: dcba484717341200d419b0953dafcdf9eb2f0267	8 years ago
Adam Retter	8917eee962	Fixed small typos Summary: Closes https://github.com/facebook/rocksdb/pull/3667 Differential Revision: D7470060 Pulled By: miasantreble fbshipit-source-id: 8e8545cda38f0805f35ccdb8841666a2d7a965f5	8 years ago
Sagar Vemuri	90c542347a	Update comments about MergeOperator::AllowSingleOperand Summary: Updated comments around AllowSingleOperand. Reason: A couple of users were confused and encountered issues due to no overriding PartialMerge with AllowSingleOperand=true. I'll also look into modifying the default merge operator implementation so that overriding PartialMerge is not mandatory when AllowSingleOp=true. Closes https://github.com/facebook/rocksdb/pull/3659 Differential Revision: D7422691 Pulled By: sagar0 fbshipit-source-id: 3d075a6ced0120f5d65cb7ae5412936f1862f342	8 years ago
Anand Ananthabhotla	f9f4d40f93	Align SST file data blocks to avoid spanning multiple pages Summary: Provide a block_align option in BlockBasedTableOptions to allow alignment of SST file data blocks. This will avoid higher IOPS/throughput load due to < 4KB data blocks spanning 2 4KB pages. When this option is set to true, the block alignment is set to lower of block size and 4KB. Closes https://github.com/facebook/rocksdb/pull/3502 Differential Revision: D7400897 Pulled By: anand1976 fbshipit-source-id: 04cc3bd144e88e3431a4f97604e63ad7a0f06d44	8 years ago
Radoslaw Zarzynski	09b6bf828a	InlineSkiplist: don't decode keys unnecessarily during comparisons Summary: Summary ======== `InlineSkipList<>::Insert` takes the `key` parameter as a C-string. Then, it performs multiple comparisons with it requiring the `GetLengthPrefixedSlice()` to be spawn in `MemTable::KeyComparator::operator()(const char* prefix_len_key1, const char* prefix_len_key2)` on the same data over and over. The patch tries to optimize that. Rough performance comparison ===== Big keys, no compression. ``` $ ./db_bench --writes 20000000 --benchmarks="fillrandom" --compression_type none -key_size 256 (...) fillrandom : 4.222 micros/op 236836 ops/sec; 80.4 MB/s ``` ``` $ ./db_bench --writes 20000000 --benchmarks="fillrandom" --compression_type none -key_size 256 (...) fillrandom : 4.064 micros/op 246059 ops/sec; 83.5 MB/s ``` TODO ====== In ~~a separated~~ this PR: - [x] Go outside the write path. Maybe even eradicate the C-string-taking variant of `KeyIsAfterNode` entirely. - [x] Try to cache the transformations applied by `KeyComparator` & friends in situations where we havy many comparisons with the same key. Closes https://github.com/facebook/rocksdb/pull/3516 Differential Revision: D7059300 Pulled By: ajkr fbshipit-source-id: 6f027dbb619a488129f79f79b5f7dbe566fb2dbb	8 years ago
Zhongyi Xie	1cbc96d236	FlushReason improvement Summary: Right now flush reason "SuperVersion Change" covers a few different scenarios which is a bit vague. For example, the following db_bench job should trigger "Write Buffer Full" > $ TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=fillrandom -write_buffer_size=1048576 -target_file_size_base=1048576 -max_bytes_for_level_base=4194304 $ grep 'flush_reason' /dev/shm/dbbench/LOG ... 2018/03/06-17:30:42.543638 7f2773b99700 EVENT_LOG_v1 {"time_micros": 1520386242543634, "job": 192, "event": "flush_started", "num_memtables": 1, "num_entries": 7006, "num_deletes": 0, "memory_usage": 1018024, "flush_reason": "SuperVersion Change"} 2018/03/06-17:30:42.569541 7f2773b99700 EVENT_LOG_v1 {"time_micros": 1520386242569536, "job": 193, "event": "flush_started", "num_memtables": 1, "num_entries": 7006, "num_deletes": 0, "memory_usage": 1018104, "flush_reason": "SuperVersion Change"} 2018/03/06-17:30:42.596396 7f2773b99700 EVENT_LOG_v1 {"time_micros": 1520386242596392, "job": 194, "event": "flush_started", "num_memtables": 1, "num_entries": 7008, "num_deletes": 0, "memory_usage": 1018048, "flush_reason": "SuperVersion Change"} 2018/03/06-17:30:42.622444 7f2773b99700 EVENT_LOG_v1 {"time_micros": 1520386242622440, "job": 195, "event": "flush_started", "num_memtables": 1, "num_entries": 7006, "num_deletes": 0, "memory_usage": 1018104, "flush_reason": "SuperVersion Change"} With the fix: > 2018/03/19-14:40:02.341451 7f11dc257700 EVENT_LOG_v1 {"time_micros": 1521495602341444, "job": 98, "event": "flush_started", "num_memtables": 1, "num_entries": 7009, "num_deletes": 0, "memory_usage": 1018008, "flush_reason": "Write Buffer Full"} 2018/03/19-14:40:02.379655 7f11dc257700 EVENT_LOG_v1 {"time_micros": 1521495602379642, "job": 100, "event": "flush_started", "num_memtables": 1, "num_entries": 7006, "num_deletes": 0, "memory_usage": 1018016, "flush_reason": "Write Buffer Full"} 2018/03/19-14:40:02.418479 7f11dc257700 EVENT_LOG_v1 {"time_micros": 1521495602418474, "job": 101, "event": "flush_started", "num_memtables": 1, "num_entries": 7009, "num_deletes": 0, "memory_usage": 1018104, "flush_reason": "Write Buffer Full"} 2018/03/19-14:40:02.455084 7f11dc257700 EVENT_LOG_v1 {"time_micros": 1521495602455079, "job": 102, "event": "flush_started", "num_memtables": 1, "num_entries": 7009, "num_deletes": 0, "memory_usage": 1018048, "flush_reason": "Write Buffer Full"} 2018/03/19-14:40:02.492293 7f11dc257700 EVENT_LOG_v1 {"time_micros": 1521495602492288, "job": 104, "event": "flush_started", "num_memtables": 1, "num_entries": 7007, "num_deletes": 0, "memory_usage": 1018056, "flush_reason": "Write Buffer Full"} 2018/03/19-14:40:02.528720 7f11dc257700 EVENT_LOG_v1 {"time_micros": 1521495602528715, "job": 105, "event": "flush_started", "num_memtables": 1, "num_entries": 7006, "num_deletes": 0, "memory_usage": 1018104, "flush_reason": "Write Buffer Full"} 2018/03/19-14:40:02.566255 7f11dc257700 EVENT_LOG_v1 {"time_micros": 1521495602566238, "job": 107, "event": "flush_started", "num_memtables": 1, "num_entries": 7009, "num_deletes": 0, "memory_usage": 1018112, "flush_reason": "Write Buffer Full"} Closes https://github.com/facebook/rocksdb/pull/3627 Differential Revision: D7328772 Pulled By: miasantreble fbshipit-source-id: 67c94065fbdd36930f09930aad0aaa6d2c152bb8	8 years ago
Siying Dong	118058ba69	SstFileManager: add bytes_max_delete_chunk Summary: Add `bytes_max_delete_chunk` in SstFileManager so that we can drop a large file in multiple batches. Closes https://github.com/facebook/rocksdb/pull/3640 Differential Revision: D7358679 Pulled By: siying fbshipit-source-id: ef17f0da2f5723dbece2669485a9b91b3edc0bb7	8 years ago
Fosco Marotto	de6cf95a53	Update history for future 5.13 release Summary: Closes https://github.com/facebook/rocksdb/pull/3631 Differential Revision: D7367519 Pulled By: gfosco fbshipit-source-id: 57826cc1c9ffc9f2b351075567b8ad929809cb74	8 years ago
QingpingWang	2ce8f63f81	C API for PerfContext Summary: This pull request exposes the interface of PerfContext as C API Closes https://github.com/facebook/rocksdb/pull/3607 Differential Revision: D7294225 Pulled By: ajkr fbshipit-source-id: eddcfbc13538f379950b2c8b299486695ffb5e2c	8 years ago
Niv Dayan	da82aab126	allowing CompactFiles to return new file names Summary: This is a small API extension to allow the CompactFiles method to return the names of files that were created during the compaction. Closes https://github.com/facebook/rocksdb/pull/3608 Differential Revision: D7275789 Pulled By: siying fbshipit-source-id: 1ec0c3954a0f10cd877efb5f29f9be6c7b59e9ba	8 years ago
Sagar Vemuri	cc118b0e6f	Update version Summary: We missed updating version.h on master when cutting 5.11.fb and 5.12.fb branches. It should be the same as the version in the latest release branch (or should it be one more?). I noticed this when trying to run some upgrade/downgrade tests from 5.11 to some new code on master. Closes https://github.com/facebook/rocksdb/pull/3611 Differential Revision: D7282917 Pulled By: sagar0 fbshipit-source-id: 205ee75b77c5b6bbcea95a272760b427025a4aba	8 years ago
Amy Tai	e476d0e252	Adding stat to count cancelled compactions Summary: Added a stat that counts the number of cancelled compactions. Closes https://github.com/facebook/rocksdb/pull/3574 Differential Revision: D7190259 Pulled By: amytai fbshipit-source-id: d5ce82dc9398da6d6d34023ad4ed8cec909852a3	8 years ago
Bruce Mitchener	a3a3f5497c	Fix some typos in comments and docs. Summary: Closes https://github.com/facebook/rocksdb/pull/3568 Differential Revision: D7170953 Pulled By: siying fbshipit-source-id: 9cfb8dd88b7266da920c0e0c1e10fb2c5af0641c	8 years ago
Sagar Vemuri	e69f6e8629	Fix API name in a comment in db.h Summary: ... so that people are not confused. Closes https://github.com/facebook/rocksdb/pull/3580 Differential Revision: D7187175 Pulled By: sagar0 fbshipit-source-id: bce70093d52e38cd24c9432fd708885d7c2c013e	8 years ago
Bruce Mitchener	0de710f5b8	Use nullptr instead of NULL / 0 more consistently. Summary: Closes https://github.com/facebook/rocksdb/pull/3569 Differential Revision: D7170968 Pulled By: yiwu-arbug fbshipit-source-id: 308a6b7dd358a04fd9a7de3d927bfd8abd57d348	8 years ago
Stuart	f021f1d9e1	Add rocksdb_open_with_ttl function in C API Summary: Change-Id: Ie6f9b10bce459f6bf0ade0e5877264b4e10da3f5 Signed-off-by: Stuart <Stuart.Hu@emc.com> Closes https://github.com/facebook/rocksdb/pull/3553 Differential Revision: D7144833 Pulled By: sagar0 fbshipit-source-id: 815225fa6e560d8a5bc47ffd0a98118b107ce264	8 years ago
amytai	0a3db28d98	Disallow compactions if there isn't enough free space Summary: This diff handles cases where compaction causes an ENOSPC error. This does not handle corner cases where another background job is started while compaction is running, and the other background job triggers ENOSPC, although we do allow the user to provision for these background jobs with SstFileManager::SetCompactionBufferSize. It also does not handle the case where compaction has finished and some other background job independently triggers ENOSPC. Usage: Functionality is inside SstFileManager. In particular, users should set SstFileManager::SetMaxAllowedSpaceUsage, which is the reference highwatermark for determining whether to cancel compactions. Closes https://github.com/facebook/rocksdb/pull/3449 Differential Revision: D7016941 Pulled By: amytai fbshipit-source-id: 8965ab8dd8b00972e771637a41b4e6c645450445	8 years ago
Dmitri Smirnov	c364eb42b5	Windows cumulative patch Summary: This patch addressed several issues. Portability including db_test std::thread -> port::Thread Cc: @ and %z to ROCKSDB portable macro. Cc: maysamyabandeh Implement Env::AreFilesSame Make the implementation of file unique number more robust Get rid of C-runtime and go directly to Windows API when dealing with file primitives. Implement GetSectorSize() and aling unbuffered read on the value if available. Adjust Windows Logger for the new interface, implement CloseImpl() Cc: anand1976 Fix test running script issue where $status var was of incorrect scope so the failures were swallowed and not reported. DestroyDB() creates a logger and opens a LOG file in the directory being cleaned up. This holds a lock on the folder and the cleanup is prevented. This fails one of the checkpoin tests. We observe the same in production. We close the log file in this change. Fix DBTest2.ReadAmpBitmapLiveInCacheAfterDBClose failure where the test attempts to open a directory with NewRandomAccessFile which does not work on Windows. Fix DBTest.SoftLimit as it is dependent on thread timing. CC: yiwu-arbug Closes https://github.com/facebook/rocksdb/pull/3552 Differential Revision: D7156304 Pulled By: siying fbshipit-source-id: 43db0a757f1dfceffeb2b7988043156639173f5b	8 years ago
Yi Wu	b864bc9b5b	Blob DB: Improve FIFO eviction Summary: Improving blob db FIFO eviction with the following changes, * Change blob_dir_size to max_db_size. Take into account SST file size when computing DB size. * FIFO now only take into account live sst files and live blob files. It is normal for disk usage to go over max_db_size because there are obsolete sst files and blob files pending deletion. * FIFO eviction now also evict TTL blob files that's still open. It doesn't evict non-TTL blob files. * If FIFO is triggered, it will pass an expiration and the current sequence number to compaction filter. Compaction filter will then filter inlined keys to evict those with an earlier expiration and smaller sequence number. So call LSM FIFO. * Compaction filter also filter those blob indexes where corresponding blob file is gone. * Add an event listener to listen compaction/flush event and update sst file size. * Implement DB::Close() to make sure base db, as well as event listener and compaction filter, destruct before blob db. * More blob db statistics around FIFO. * Fix some locking issue when accessing a blob file. Closes https://github.com/facebook/rocksdb/pull/3556 Differential Revision: D7139328 Pulled By: yiwu-arbug fbshipit-source-id: ea5edb07b33dfceacb2682f4789bea61de28bbfa	8 years ago
Andrew Kryczka	5d68243e61	Comment out unused variables Summary: Submitting on behalf of another employee. Closes https://github.com/facebook/rocksdb/pull/3557 Differential Revision: D7146025 Pulled By: ajkr fbshipit-source-id: 495ca5db5beec3789e671e26f78170957704e77e	8 years ago
Maysam Yabandeh	680864ae54	WritePrepared Txn: Fix bug with duplicate keys during recovery Summary: Fix the following bugs: - During recovery a duplicate key was inserted twice into the write batch of the recovery transaction, once when the memtable returns false (because it was duplicates) and once for the 2nd attempt. This would result into different SubBatch count measured when the recovered transactions is committing. - If a cf is flushed during recovery the memtable is not available to assist in detecting the duplicate key. This could result into not advancing the sequence number when iterating over duplicate keys of a flushed cf and hence inserting the next key with the wrong sequence number. - SubBacthCounter would reset the comparator to default comparator after the first duplicate key. The 2nd duplicate key hence would have gone through a wrong comparator and not being detected. Closes https://github.com/facebook/rocksdb/pull/3562 Differential Revision: D7149440 Pulled By: maysamyabandeh fbshipit-source-id: 91ec317b165f363f5d11ff8b8c47c81cebb8ed77	8 years ago
Yi Wu	1209b6db5c	Blob DB: remove existing garbage collection implementation Summary: Red diff to remove existing implementation of garbage collection. The current approach is reference counting kind of approach and require a lot of effort to get the size counter right on compaction and deletion. I'm going to go with a simple mark-sweep kind of approach and will send another PR for that. CompactionEventListener was added solely for blob db and it adds complexity and overhead to compaction iterator. Removing it as well. Closes https://github.com/facebook/rocksdb/pull/3551 Differential Revision: D7130190 Pulled By: yiwu-arbug fbshipit-source-id: c3a375ad2639a3f6ed179df6eda602372cc5b8df	8 years ago
Yi Wu	bf937cf15b	Add "rocksdb.live-sst-files-size" DB property Summary: Add "rocksdb.live-sst-files-size" DB property which only include files of latest version. Existing "rocksdb.total-sst-files-size" include files from all versions and thus include files that's obsolete but not yet deleted. I'm going to use this new property to cap blob db sst + blob files size. Closes https://github.com/facebook/rocksdb/pull/3548 Differential Revision: D7116939 Pulled By: yiwu-arbug fbshipit-source-id: c6a52e45ce0f24ef78708156e1a923c1dd6bc79a	8 years ago
Andrew Kryczka	3ae0047278	skip CompactRange flush based on memtable contents Summary: CompactRange has a call to Flush because we guarantee that, at the time it's called, all existing keys in the range will be pushed through the user's compaction filter. However, previously the flush was done blindly, so it'd happen even if the memtable does not contain keys in the range specified by the user. This caused unnecessarily many L0 files to be created, leading to write stalls in some cases. This PR checks the memtable's contents, and decides to flush only if it overlaps with `CompactRange`'s range. - Move the memtable overlap check logic from `ExternalSstFileIngestionJob` to `ColumnFamilyData::RangesOverlapWithMemtables` - Reuse the above logic in `CompactRange` and skip flushing if no overlap Closes https://github.com/facebook/rocksdb/pull/3520 Differential Revision: D7018897 Pulled By: ajkr fbshipit-source-id: a3c6b1cfae56687b49dd89ccac7c948e53545934	8 years ago
Siying Dong	c287c098a4	Update comments in DB::Close() Summary: Closes https://github.com/facebook/rocksdb/pull/3543 Differential Revision: D7093251 Pulled By: siying fbshipit-source-id: 4066b82c95ecb65866c5842d68ab13ab9f85d567	8 years ago
Anand Ananthabhotla	dfbe52e099	Fix the Logger::Close() and DBImpl::Close() design pattern Summary: The recent Logger::Close() and DBImpl::Close() implementation rely on calling the CloseImpl() virtual function from the destructor, which will not work. Refactor the implementation to have a private close helper function in derived classes that can be called by both CloseImpl() and the destructor. Closes https://github.com/facebook/rocksdb/pull/3528 Reviewed By: gfosco Differential Revision: D7049303 Pulled By: anand1976 fbshipit-source-id: 76a64cbf403209216dfe4864ecf96b5d7f3db9f4	8 years ago
Maysam Yabandeh	90eca1e616	WritePrepared Txn: optimize SubBatchCnt Summary: Make use of the index in WriteBatchWithIndex to also count the number of sub-batches. This eliminates the need to separately scan the batch to count the number of sub-batches once a duplicate key is detected. Closes https://github.com/facebook/rocksdb/pull/3529 Differential Revision: D7049947 Pulled By: maysamyabandeh fbshipit-source-id: 81cbf12c4e662541c772c7265a8f91631e25c7cd	8 years ago
Igor Sugak	aba3409740	Back out "[codemod] - comment out unused parameters" Reviewed By: igorsugak fbshipit-source-id: 4a93675cc1931089ddd574cacdb15d228b1e5f37	8 years ago
David Lai	f4a030ce81	- comment out unused parameters Reviewed By: everiq, igorsugak Differential Revision: D7046710 fbshipit-source-id: 8e10b1f1e2aecebbfb229c742e214db887e5a461	8 years ago
Sagar Vemuri	8ada876dfe	Add rocksdb.iterator.internal-key property Summary: Added a new iterator property: `rocksdb.iterator.internal-key` to get the internal-key (converted to user key) at which the iterator stopped. Closes https://github.com/facebook/rocksdb/pull/3525 Differential Revision: D7033694 Pulled By: sagar0 fbshipit-source-id: d51e6c00f5e9d766c6276ef79774b81c6c5216f8	8 years ago
Maysam Yabandeh	8eb1d445c3	Unbreak MemTableRep API change Summary: The MemTableRep API was broken by this commit: `813719e952` This patch reverts the changes and instead adds InsertKey (and etc.) overloads to extend the MemTableRep API without breaking the existing classes that inherit from it. Closes https://github.com/facebook/rocksdb/pull/3513 Differential Revision: D7004134 Pulled By: maysamyabandeh fbshipit-source-id: e568d91fe1e17dd76c0c1f6c7dd51a18633b1c4f	8 years ago
Maysam Yabandeh	8a04ee4fd1	WritePrepared Txn: use TransactionDBWriteOptimizations (2nd attempt) Summary: TransactionDB::Write can receive some optimization hints from the user. One is to skip the concurrency control mechanism. WritePreparedTxnDB is currently ignoring such hints. This patch optimizes WritePreparedTxnDB::Write for skip_concurrency_control and skip_duplicate_key_check hints. Closes https://github.com/facebook/rocksdb/pull/3496 Differential Revision: D6971784 Pulled By: maysamyabandeh fbshipit-source-id: cbab10ad538fa2b8bcb47e37c77724afe6e30f03	8 years ago
Andrew Kryczka	ee1c802675	Add delay before flush in CompactRange to avoid write stalling Summary: - Refactored logic for checking write stall condition to a helper function: `GetWriteStallConditionAndCause`. Now it is decoupled from the logic for updating WriteController / stats in `RecalculateWriteStallConditions`, so we can reuse it for predicting whether write stall will occur. - Updated `CompactRange` to first check whether the one additional immutable memtable / L0 file would cause stalling before it flushes. If so, it waits until that is no longer true. - Updated `bg_cv_` to be signaled on `SetOptions` calls. The stall conditions `CompactRange` cares about can change when (1) flush finishes, (2) compaction finishes, or (3) options dynamically change. The cv was already signaled for (1) and (2) but not yet for (3). Closes https://github.com/facebook/rocksdb/pull/3381 Differential Revision: D6754983 Pulled By: ajkr fbshipit-source-id: 5613e03f1524df7192dc6ae885d40fd8f091d972	8 years ago
jsteemann	d62af7fa4f	fix a typo (of a potential vi user) Summary: Closes https://github.com/facebook/rocksdb/pull/3481 Differential Revision: D6939089 Pulled By: siying fbshipit-source-id: ccce3ae10cc5ff50a74b85804afd044b21a3c3e2	8 years ago
Zhongyi Xie	945f618ba5	log flush reason for better debugging experience Summary: It's always a mystery from the logs why flush was triggered -- user triggered it manually, WriteBufferManager triggered it, logs were full, write buffer was full, etc. This PR logs Flush reason whenever a flush is scheduled. Closes https://github.com/facebook/rocksdb/pull/3401 Differential Revision: D6788142 Pulled By: miasantreble fbshipit-source-id: a867e54d493c06adf5172bd36a180fb3faae3511	8 years ago
Maysam Yabandeh	8feee28020	Add skip_cc option to TransactionDB::Write Summary: Compared to DB::Write, TransactionDB::Write has the additional overhead of creating and initializing an internal transaction object, as well as the overhead of locking/unlocking the keys. This patch extends the TransactionDB::Write with an skip_cc option to allow the users to indicate that the write batch do not conflict with others and the concurrency control and its overhead can be skipped. TransactionDB::Write by default calls DB::Write when skip_cc is set, which works for WriteCommitted WritePolicy. Any other flavor of TransactionDB that is not compatible with this default behavior (such as WritePreparedTxnDB) can extend ::Write and implement their own approach for taking into account the skip_cc optimization. Closes https://github.com/facebook/rocksdb/pull/3457 Differential Revision: D6877318 Pulled By: maysamyabandeh fbshipit-source-id: 56f4e21db87ff71492db4e376fb7c2b03dfeab6b	8 years ago
Maysam Yabandeh	88d8b2a2f5	WritePrepared Txn: Duplicate Keys, Txn Part Summary: This patch takes advantage of memtable being able to detect duplicate <key,seq> and returning TryAgain to handle duplicate keys in WritePrepared Txns. Through WriteBatchWithIndex's index it detects existence of at least a duplicate key in the write batch. If duplicate key was reported, it then pays the cost of counting the number of sub-patches by iterating over the write batch and pass it to DBImpl::Write. DB will make use of the provided batch_count to assign proper sequence numbers before sending them to the WAL. When later inserting the batch to the memtable, it increases the seq each time memtbale reports a duplicate (a sub-patch in our counting) and tries again. Closes https://github.com/facebook/rocksdb/pull/3455 Differential Revision: D6873699 Pulled By: maysamyabandeh fbshipit-source-id: db8487526c3a5dc1ddda0ea49f0f979b26ae648d	8 years ago
Anand Ananthabhotla	4b124fb9d3	Handle error return from WriteBuffer() Summary: There are a couple of places where we swallow any error from WriteBuffer() - in SwitchMemtable() and DBImpl::CloseImpl(). Propagate the error up in those cases rather than ignoring it. Closes https://github.com/facebook/rocksdb/pull/3404 Differential Revision: D6879954 Pulled By: anand1976 fbshipit-source-id: 2ef88b554be5286b0a8bad7384ba17a105395bdb	8 years ago
Maysam Yabandeh	813719e952	WritePrepared Txn: Duplicate Keys, Memtable part Summary: Currently DB does not accept duplicate keys (keys with the same user key and the same sequence number). If Memtable returns false when receiving such keys, we can benefit from this signal to properly increase the sequence number in the rare cases when we have a duplicate key in the write batch written to DB under WritePrepared transactions. Closes https://github.com/facebook/rocksdb/pull/3418 Differential Revision: D6822412 Pulled By: maysamyabandeh fbshipit-source-id: adea3ce5073131cd38ed52b16bea0673b1a19e77	8 years ago
Yi Wu	e62a763752	Blob DB: miscellaneous changes Summary: * Expose garbage collection related options * Minor logging and counter name update * Remove unused constants. Closes https://github.com/facebook/rocksdb/pull/3451 Differential Revision: D6867077 Pulled By: yiwu-arbug fbshipit-source-id: 6c3272a9c9d78b125a0bd6b2e56d00d087cdd6c8	8 years ago
Huachao Huang	ab43ff58b5	Delete files in multiple ranges at once Summary: Using `DeleteFilesInRange` to delete files in a lot of ranges can be slow, because `VersionSet::LogAndApply` is expensive. This PR adds a new `DeleteFilesInRange` function to delete files in multiple ranges at once. Close https://github.com/facebook/rocksdb/issues/2951 Closes https://github.com/facebook/rocksdb/pull/3431 Differential Revision: D6849228 Pulled By: ajkr fbshipit-source-id: daeedcabd8def4b1d9ee95a58266dee77b5d68cb	8 years ago
Ben Darnell	65cd6cd4b6	Rewrite comments on use_fsync option Summary: This replaces a vague warning about the mostly-obsolete ext3 filesystem with a more detailed note about a historical bug in the still-relevant ext4. Fixes #3410 Closes https://github.com/facebook/rocksdb/pull/3421 Differential Revision: D6834881 Pulled By: siying fbshipit-source-id: 7771ef5c89a54c0ac17821680779c48178d0b400	8 years ago

... 20 21 22 23 24 ...

2540 Commits (eac6b6d0cd5b73c9ca4f24fce25492282c528908)