rocksdb

Commit Graph

Author	SHA1	Message	Date
Anand Ananthabhotla	72712f4e28	Allow dynamic modification of window size and deletion trigger (#4403 ) Summary: Make the CompactOnDeletionCollectorFactory class public, and provide methods to update the window size and deletion trigger params. These will take effect on subsequent created SST files. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4403 Differential Revision: D9976857 Pulled By: anand1976 fbshipit-source-id: 31dbf0511c12fa2bb9b2a7ba620079e0ee09cf48	6 years ago
Anand Ananthabhotla	a27fce408e	Auto recovery from out of space errors (#4164 ) Summary: This commit implements automatic recovery from a Status::NoSpace() error during background operations such as write callback, flush and compaction. The broad design is as follows - 1. Compaction errors are treated as soft errors and don't put the database in read-only mode. A compaction is delayed until enough free disk space is available to accomodate the compaction outputs, which is estimated based on the input size. This means that users can continue to write, and we rely on the WriteController to delay or stop writes if the compaction debt becomes too high due to persistent low disk space condition 2. Errors during write callback and flush are treated as hard errors, i.e the database is put in read-only mode and goes back to read-write only fater certain recovery actions are taken. 3. Both types of recovery rely on the SstFileManagerImpl to poll for sufficient disk space. We assume that there is a 1-1 mapping between an SFM and the underlying OS storage container. For cases where multiple DBs are hosted on a single storage container, the user is expected to allocate a single SFM instance and use the same one for all the DBs. If no SFM is specified by the user, DBImpl::Open() will allocate one, but this will be one per DB and each DB will recover independently. The recovery implemented by SFM is as follows - a) On the first occurance of an out of space error during compaction, subsequent compactions will be delayed until the disk free space check indicates enough available space. The required space is computed as the sum of input sizes. b) The free space check requirement will be removed once the amount of free space is greater than the size reserved by in progress compactions when the first error occured c) If the out of space error is a hard error, a background thread in SFM will poll for sufficient headroom before triggering the recovery of the database and putting it in write-only mode. The headroom is calculated as the sum of the write_buffer_size of all the DB instances associated with the SFM 4. EventListener callbacks will be called at the start and completion of automatic recovery. Users can disable the auto recov ery in the start callback, and later initiate it manually by calling DB::Resume() Todo: 1. More extensive testing 2. Add disk full condition to db_stress (follow-on PR) Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164 Differential Revision: D9846378 Pulled By: anand1976 fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a	6 years ago
Vitaly Isaev	0bd2ede10e	Memory usage stats in C API (#4340 ) Summary: Please consider this small PR providing access to the `MemoryUsage::GetApproximateMemoryUsageByType` function in plain C API. Actually I'm working on Go application and now trying to investigate the reasons of high memory consumption (#4313). Go [wrappers](https://github.com/tecbot/gorocksdb) are built on the top of Rocksdb C API. According to the #706, `MemoryUsage::GetApproximateMemoryUsageByType` is considered as the best option to get database internal memory usage stats, but it wasn't supported in C API yet. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4340 Differential Revision: D9655135 Pulled By: ajkr fbshipit-source-id: a3d2f3f47c143ae75862fbcca2f571ea1b49e14a	6 years ago
Maysam Yabandeh	3f5282268f	Skip concurrency control during recovery of pessimistic txn (#4346 ) Summary: TransactionOptions::skip_concurrency_control allows pessimistic transactions to skip the overhead of concurrency control. This could be as an optimization if the application knows that the transaction would not have any conflict with concurrent transactions. It is currently used during recovery assuming (i) application guarantees no conflict between prepared transactions in the WAL (ii) application guarantees that recovered transactions will be rolled back/commit before new transactions start. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4346 Differential Revision: D9759149 Pulled By: maysamyabandeh fbshipit-source-id: f896e84fa58b0b584be904c7fd3883a41ea3215b	6 years ago
Kefu Chai	faf529fd7c	env_librados.h: drop redundant #endif (#4354 ) Summary: without this change, rocksdb_env_librados_test fails to build. it's a regression introduced by `64324e32` Signed-off-by: Kefu Chai <tchaikov@gmail.com> Pull Request resolved: https://github.com/facebook/rocksdb/pull/4354 Differential Revision: D9702665 Pulled By: riversand963 fbshipit-source-id: 65134eaff0543733210edfc77f89c96709da7a3f	6 years ago
Maysam Yabandeh	655ef7d77f	Inline doc for format_version 4 (#4350 ) Summary: Fixes #4337 Pull Request resolved: https://github.com/facebook/rocksdb/pull/4350 Differential Revision: D9700871 Pulled By: maysamyabandeh fbshipit-source-id: fe1e07803783f34588dc14aba66d51117ca4a180	6 years ago
cngzhnp	64324e329e	Support pragma once in all header files and cleanup some warnings (#4339 ) Summary: As you know, almost all compilers support "pragma once" keyword instead of using include guards. To be keep consistency between header files, all header files are edited. Besides this, try to fix some warnings about loss of data. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4339 Differential Revision: D9654990 Pulled By: ajkr fbshipit-source-id: c2cf3d2d03a599847684bed81378c401920ca848	6 years ago
Yi Wu	462ed70d64	BlobDB: GetLiveFiles and GetLiveFilesMetadata return relative path (#4326 ) Summary: `GetLiveFiles` and `GetLiveFilesMetadata` should return path relative to db path. It is a separate issue when `path_relative` is false how can we return relative path. But `DBImpl::GetLiveFiles` don't handle it as well when there are multiple `db_paths`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4326 Differential Revision: D9545904 Pulled By: yiwu-arbug fbshipit-source-id: 6762d879fcb561df2b612e6fdfb4a6b51db03f5d	6 years ago
Mikhail Antonov	927f274939	Avoiding write stall caused by manual flushes (#4297 ) Summary: Basically at the moment it seems it's possible to cause write stall by calling flush (either manually vis DB::Flush(), or from Backup Engine directly calling FlushMemTable() while background flush may be already happening. One of the ways to fix it is that in DBImpl::CompactRange() we already check for possible stall and delay flush if needed before we actually proceed to call FlushMemTable(). We can simply move this delay logic to separate method and call it from FlushMemTable. This is draft patch, for first look; need to check tests/update SyncPoints and most certainly would need to add allow_write_stall method to FlushOptions(). Pull Request resolved: https://github.com/facebook/rocksdb/pull/4297 Differential Revision: D9420705 Pulled By: mikhail-antonov fbshipit-source-id: f81d206b55e1d7b39e4dc64242fdfbceeea03fcc	6 years ago
Gauresh Rane	ad789e4e0d	Adding a method for memtable class for memtable getting flushed. (#4304 ) Summary: Memtables are selected for flushing by the flush job. Currently we have listener which is invoked when memtables for a column family are flushed. That listener does not indicate which memtable was flushed in the notification. If clients want to know if particular data in the memtable was retired, there is no straight forward way to know this. This method will help users who implement memtablerep factory and extend interface for memtablerep, to know if the data in the memtable was retired. Another option that was tried, was to depend on memtable destructor to be called after flush to mark that data was persisted. This works all the time but sometimes there can huge delays between actual flush happening and memtable getting destroyed. Hence, if anyone who is waiting for data to persist will have to wait that longer. It is expected that anyone who is implementing this method to have return quickly as it blocks RocksDB. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4304 Reviewed By: riversand963 Differential Revision: D9472312 Pulled By: gdrane fbshipit-source-id: 8e693308dee749586af3a4c5d4fcf1fa5276ea4d	6 years ago
Yanqin Jin	bb5dcea98e	Add path to WritableFileWriter. (#4039 ) Summary: We want to sample the file I/O issued by RocksDB and report the function calls. This requires us to include the file paths otherwise it's hard to tell what has been going on. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4039 Differential Revision: D8670178 Pulled By: riversand963 fbshipit-source-id: 97ee806d1c583a2983e28e213ee764dc6ac28f7a	6 years ago
jsteemann	90f744941d	adds missing PopSavePoint method to Transaction (#4256 ) Summary: Transaction has had methods to deal with SavePoints already, but was missing the PopSavePoint method provided by WriteBatch and WriteBatchWithIndex. This PR adds PopSavePoint to Transaction as well. Having the method on Transaction-level too is useful for applications that repeatedly execute a sequence of operations that normally succeed, but infrequently need to get rolled back. Using SavePoints here is sensible, but as operations normally succeed the application may pile up a lot of useless SavePoints inside a Transaction, leading to slightly increased memory usage for managing the unneeded SavePoints. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4256 Differential Revision: D9326932 Pulled By: yiwu-arbug fbshipit-source-id: 53a0af18a6c7e87feff8a56f1f3eab9df7f371d6	6 years ago
Fenggang Wu	9d646a6311	Add db_bench options of data block hash index (#4281 ) Summary: Add `--data_block_index_type` and `--data_block_hash_table_util_ratio` option to `db_bench`. `--data_block_index_type` can be either of `binary` (default) or `binary_and_hash`; `--data_block_hash_table_util_ratio` will be a double. The default value is `0.75`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4281 Differential Revision: D9361476 Pulled By: fgwu fbshipit-source-id: dc53e01acef9db81b9eec5e8a96f3bc8ed718c10	6 years ago
Siying Dong	9c0c8f5ff6	GetAllKeyVersions() to take an extra argument of `max_num_ikeys`. (#4271 ) Summary: Right now, `ldb idump` may have memory out of control if there is a big range of tombstones. Add an option to cut maxinum number of keys in GetAllKeyVersions(), and push down --max_num_ikeys from ldb. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4271 Differential Revision: D9369149 Pulled By: siying fbshipit-source-id: 7cbb797b7d2fa16573495a7e84937456d3ff25bf	6 years ago
Andrey Zagrebin	aeed4f0749	#3865 fix performance regression introduced by MergeOperator.ShouldMerge (#4266 ) Summary: This PR addresses issue #3865 and implements the following approach to fix it: - adds `MergeContext::GetOperandsDirectionForward` and `MergeContext::GetOperandsDirectionBackward` to query merge operands in a specific order - `MergeContext::GetOperands` becomes a shortcut for `MergeContext::GetOperandsDirectionForward` - pass `MergeContext::GetOperandsDirectionBackward` to `MergeOperator::ShouldMerge` and document the order Pull Request resolved: https://github.com/facebook/rocksdb/pull/4266 Differential Revision: D9360750 Pulled By: sagar0 fbshipit-source-id: 20cb73ff017760b062ecdcf4382560767086e092	6 years ago
Fenggang Wu	19ec44fd39	Improve point-lookup performance using a data block hash index (#4174 ) Summary: Add hash index support to data blocks, which helps to reduce the CPU utilization of point-lookup operations. This feature is backward compatible with the data block created without the hash index. It is disabled by default unless `BlockBasedTableOptions::data_block_index_type` is set to `data_block_index_type = kDataBlockBinaryAndHash.` The DB size would be bigger with the hash index option as a hash table is added at the end of each data block. If the hash utilization ratio is 1:1, the space overhead is one byte per key. The hash table utilization ratio is adjustable using `BlockBasedTableOptions::data_block_hash_table_util_ratio`. A lower utilization ratio will improve more on the point-lookup efficiency, but take more space too. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4174 Differential Revision: D8965914 Pulled By: fgwu fbshipit-source-id: 1c6bae5d1fc39c80282d8890a72e9e67bc247198	6 years ago
Huachao Huang	d916a1105a	c-api: add some missing options Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4267 Differential Revision: D9309505 Pulled By: anand1976 fbshipit-source-id: eb9fee8037f4ff24dc1cdd5cc5ef41c231a03e1f	6 years ago
Maysam Yabandeh	caf0f53a74	Index value delta encoding (#3983 ) Summary: Given that index value is a BlockHandle, which is basically an <offset, size> pair we can apply delta encoding on the values. The first value at each index restart interval encoded the full BlockHandle but the rest encode only the size. Refer to IndexBlockIter::DecodeCurrentValue for the detail of the encoding. This reduces the index size which helps using the block cache more efficiently. The feature is enabled with using format_version 4. The feature comes with a bit of cpu overhead which should be paid back by the higher cache hits due to smaller index block size. Results with sysbench read-only using 4k blocks and using 16 index restart interval: Format 2: 19585 rocksdb read-only range=100 Format 3: 19569 rocksdb read-only range=100 Format 4: 19352 rocksdb read-only range=100 Pull Request resolved: https://github.com/facebook/rocksdb/pull/3983 Differential Revision: D8361343 Pulled By: maysamyabandeh fbshipit-source-id: f882ee082322acac32b0072e2bdbb0b5f854e651	6 years ago
Georgios Bitzes	1b813a9b2e	Make rocksdb::Slice more interoperable with std::string_view (#4242 ) Summary: This change allows using std::string_view objects directly in the DB API: db->Get(some_string_view_object, ...); The conversion from std::string_view to rocksdb::Slice is done automatically, thanks to the added constructor. I'm stopping short of adding an implicit conversion operator from rocksdb::Slice to std::string_view, as I don't think that's a good idea for PinnableSlices. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4242 Differential Revision: D9224134 Pulled By: anand1976 fbshipit-source-id: f50aad04dd0b01737907c0fb88d495c83a81f4e4	6 years ago
Yanqin Jin	de7f423a82	Add SST ingestion to ldb (#4205 ) Summary: We add two subcommands `write_extern_sst` and `ingest_extern_sst` to ldb. This PR avoids changing existing code because we hope to cherry-pick to earlier releases to support compatibility check for external SST file ingestion. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4205 Differential Revision: D9112711 Pulled By: riversand963 fbshipit-source-id: 7cae88380d4de86da8440230e87eca66755648e4	6 years ago
Huachao Huang	badfd70a3e	types: add kEntryBlobIndex for TablePropertiesCollector (#4233 ) Summary: So that we can act accordingly on blob index entries Pull Request resolved: https://github.com/facebook/rocksdb/pull/4233 Differential Revision: D9190205 Pulled By: yiwu-arbug fbshipit-source-id: e5b84d5b41e44fa7a76762f1f7b0305369bb3a0c	6 years ago
Gustav Davidsson	a15354d04e	Expose GetTotalTrashSize in SstFileManager interface (#4206 ) Summary: Hi, it would be great if we could expose this API, so that LogDevice can use it to track the total size of trash files and alarm if it grows too large in relation to disk size. There's probably other customers that would be interested in this as well. :) Pull Request resolved: https://github.com/facebook/rocksdb/pull/4206 Differential Revision: D9115516 Pulled By: gdavidsson fbshipit-source-id: f34993a940e39cb0a0b544ae8298546499b7e047	6 years ago
Sagar Vemuri	12b6cdeed3	Trace and Replay for RocksDB (#3837 ) Summary: A framework for tracing and replaying RocksDB operations. A binary trace file is created by capturing the DB operations, and it can be replayed back at the same rate using db_bench. - Column-families are supported - Multi-threaded tracing is supported. - TraceReader and TraceWriter are exposed to the user, so that tracing to various destinations can be enabled (say, to other messaging/logging services). By default, a FileTraceReader and FileTraceWriter are implemented to capture to a file and replay from it. - This is not yet ideal to be enabled in production due to large performance overhead, but it can be safely tried out in a shadow setup, say, for analyzing RocksDB operations. Currently supported DB operations: - Writes: -- Put -- Merge -- Delete -- SingleDelete -- DeleteRange -- Write - Reads: -- Get (point lookups) Pull Request resolved: https://github.com/facebook/rocksdb/pull/3837 Differential Revision: D7974837 Pulled By: sagar0 fbshipit-source-id: 8ec65aaf336504bc1f6ed0feae67f6ed5ef97a72	6 years ago
Fenggang Wu	ee7617167f	DataBlockHashIndex: Specify that DataBlockHashIndex is not yet implemented in the comment Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4203 Differential Revision: D9090912 Pulled By: fgwu fbshipit-source-id: 6a68be83693ddf2a5c060290382141f0d2fb400b	6 years ago
Yanqin Jin	54de56844d	Remove random writes from SST file ingestion (#4172 ) Summary: RocksDB used to store global_seqno in external SST files written by SstFileWriter. During file ingestion, RocksDB uses `pwrite` to update the `global_seqno`. Since random write is not supported in some non-POSIX compliant file systems, external SST file ingestion is not supported on these file systems. To address this limitation, we no longer update `global_seqno` during file ingestion. Later RocksDB uses the MANIFEST and other information in table properties to deduce global seqno for externally-ingested SST files. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4172 Differential Revision: D8961465 Pulled By: riversand963 fbshipit-source-id: 4382ec85270a96be5bc0cf33758ca2b167b05071	6 years ago
Fenggang Wu	a11df583ec	Add DataBlockIndexType option in BlockBasedTableOptions (#4150 ) Summary: Added DataBlockIndexType option in BlockBasedTableOptions. ``` enum DataBlockIndexType : char { kDataBlockBinarySearch = 0, // traditional block type kDataBlockHashIndex = 1, // additional hash index appended to the end. }; ``` The default type is the traditional binary seek option: `kDataBlockBinarySearch`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4150 Differential Revision: D8895958 Pulled By: fgwu fbshipit-source-id: 480adef48104cf11d30db3bad9a73f98b4a80c10	6 years ago
Maysam Yabandeh	c33b32671e	Correct description of GetColumnFamilyMetaData (#4196 ) Summary: The inline doc was incorrectly mentioned a return status while the function does not return a value. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4196 Differential Revision: D9030927 Pulled By: maysamyabandeh fbshipit-source-id: 07c34dc6bf521021bf790ac1bfedb676171129ec	6 years ago
Maysam Yabandeh	e0906eb785	Clarify max_total_wal_size's scope (#4194 ) Summary: max_total_wal_size takes effect only when there are more than one column families. The patch clarify that in the inline docs Closes https://github.com/facebook/rocksdb/issues/4180 Pull Request resolved: https://github.com/facebook/rocksdb/pull/4194 Differential Revision: D9028767 Pulled By: maysamyabandeh fbshipit-source-id: 8d730ca7f15e76e7ee9ff88b2b48030b2d1b7078	6 years ago
Yanqin Jin	18f538038a	Increase version number to 5.16 (#4176 ) Summary: Given that we have cut 5.15, we should bump the version number to the next version, i.e. 5.16. Also update HISTORY.md cc sagar0 Pull Request resolved: https://github.com/facebook/rocksdb/pull/4176 Differential Revision: D8977965 Pulled By: riversand963 fbshipit-source-id: 481d75d2f446946f0eb2afb7e94ef894c8c87e1e	6 years ago
Manuel Ung	ea212e5316	WriteUnPrepared: Implement unprepared batches for transactions (#4104 ) Summary: This adds support for writing unprepared batches based on size defined in `TransactionOptions::max_write_batch_size`. This is done by overriding methods that modify data (Put/Delete/SingleDelete/Merge) and checking first if write batch size has exceeded threshold. If so, the write batch is written to DB as an unprepared batch. Support for Commit/Rollback for unprepared batch is added as well. This has been done by simply extending the WritePrepared Commit/Rollback logic to take care of all unprep_seq numbers either when updating prepare heap, or adding to commit map. For updating the commit map, this logic exists inside `WriteUnpreparedCommitEntryPreReleaseCallback`. A test change was also made to have transactions unregister themselves when committing without prepare. This is because with write unprepared, there may be unprepared entries (which act similarly to prepared entries) already when a commit is done without prepare. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4104 Differential Revision: D8785717 Pulled By: lth fbshipit-source-id: c02006e281ec1ce00f628e2a7beec0ee73096a91	6 years ago
Chang Su	374c37da5b	move static msgs out of Status class (#4144 ) Summary: The member msgs of class Status contains all types of status messages. When users dump a Status object, msgs will confuse users. So move it out of class Status by making it as file-local static variable. Closes #3831 . Pull Request resolved: https://github.com/facebook/rocksdb/pull/4144 Differential Revision: D8941419 Pulled By: sagar0 fbshipit-source-id: 56b0510258465ff26db15aa6b04e01532e053e3d	6 years ago
Yanqin Jin	79f009f22e	Release 5.15. (#4148 ) Summary: Cut 5.15.fb Pull Request resolved: https://github.com/facebook/rocksdb/pull/4148 Differential Revision: D8886802 Pulled By: riversand963 fbshipit-source-id: 6b6427ce97f5b323a7eebf92458fda8b24b0cece	6 years ago
Siying Dong	ddc07b40fc	Remove managed iterator Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4124 Differential Revision: D8829910 Pulled By: siying fbshipit-source-id: f3e952ccf3a631071a5d77c48e327046f8abb560	6 years ago
Sagar Vemuri	991120fa10	Allow ttl to be changed dynamically (#4133 ) Summary: Allow ttl to be changed dynamically. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4133 Differential Revision: D8845440 Pulled By: sagar0 fbshipit-source-id: c8c87ae643b3a8c4123e4c037c4645efc094a2d3	6 years ago
Nathan VanBenschoten	ef7815b803	Support range deletion tombstones in IngestExternalFile SSTs (#3778 ) Summary: Fixes #3391. This change adds a `DeleteRange` method to `SstFileWriter` and adds support for ingesting SSTs with range deletion tombstones. This is important for applications that need to atomically ingest SSTs while clearing out any existing keys in a given key range. Pull Request resolved: https://github.com/facebook/rocksdb/pull/3778 Differential Revision: D8821836 Pulled By: anand1976 fbshipit-source-id: ca7786c1947ff129afa703dab011d524c7883844	6 years ago
Tamir Duberstein	7bee48bdbd	Add GCC 8 to Travis (#3433 ) Summary: - Avoid `strdup` to use jemalloc on Windows - Use `size_t` for consistency - Add GCC 8 to Travis - Add CMAKE_BUILD_TYPE=Release to Travis Pull Request resolved: https://github.com/facebook/rocksdb/pull/3433 Differential Revision: D6837948 Pulled By: sagar0 fbshipit-source-id: b8543c3a4da9cd07ee9a33f9f4623188e233261f	6 years ago
Siying Dong	35b38a232c	Update comments of WriteBatchWithIndex Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4113 Differential Revision: D8814172 Pulled By: siying fbshipit-source-id: cabc31db2c74803af9b2f99329155a1086eb1b22	6 years ago
Siying Dong	926f3a78a6	In delete scheduler, before ftruncate file for slow delete, check whether there is other hard links (#4093 ) Summary: Right now slow deletion with ftruncate doesn't work well with checkpoints because it ruin hard linked files in checkpoints. To fix it, check the file has no other hard link before ftruncate it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4093 Differential Revision: D8730360 Pulled By: siying fbshipit-source-id: 756eea5bce8a87b9a2ea3a5bfa190b2cab6f75df	6 years ago
Manuel Ung	b9846370e9	WriteUnPrepared: Add support for recovering WriteUnprepared transactions (#4078 ) Summary: This adds support for recovering WriteUnprepared transactions through the following changes: - The information in `RecoveredTransaction` is extended so that it can reference multiple batches. - `MarkBeginPrepare` is extended with a bool indicating whether it is an unprepared begin, and this is passed down to `InsertRecoveredTransaction` to indicate whether the current transaction is prepared or not. - `WriteUnpreparedTxnDB::Initialize` is overridden so that it will rollback unprepared transactions from the recovered transactions. This can be done without updating the prepare heap/commit map, because this is before the DB has finished initializing, and after writing the rollback batch, those data structures should not contain information about the rolled back transaction anyway. Commit/Rollback of live transactions is still unimplemented and will come later. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4078 Differential Revision: D8703382 Pulled By: lth fbshipit-source-id: 7e0aada6c23bd39299f1f20d6c060492e0e6b60a	6 years ago
Yanqin Jin	d4d9fe8e57	Fix a bug caused by not copying the block trailer. (#4096 ) Summary: This was caught by crash test, and the following is a simple way to reproduce it and verify the fix. One way to trigger this code path is to use the following configuration: - Compress SST file - Enable direct IO and prefetch buffer - Do NOT use compressed block cache Closes https://github.com/facebook/rocksdb/pull/4096 Differential Revision: D8742009 Pulled By: riversand963 fbshipit-source-id: f13381078bbb0dce92f60bd313a78ab602bcacd2	6 years ago
Siying Dong	17027aeffc	Change default value of `bytes_max_delete_chunk` to 0 in NewSstFileManager() (#4092 ) Summary: Now by default, with NewSstFileManager, checkpoints may be corrupted. Disable this feature to avoid this issue. Closes https://github.com/facebook/rocksdb/pull/4092 Differential Revision: D8729856 Pulled By: siying fbshipit-source-id: 914c321d6eaf52d8c5981171322d85dd29088307	6 years ago
Manuel Ung	8ad63a4b86	WriteUnPrepared: Add new WAL marker kTypeBeginUnprepareXID (#4069 ) Summary: This adds a new WAL marker of type kTypeBeginUnprepareXID. Also, DBImpl now contains a field called batch_per_txn (meaning one WriteBatch per transaction, or possibly multiple WriteBatches). This would also indicate that this DB is using WriteUnprepared policy. Recovery code would be able to make use of this extra field on DBImpl in a separate diff. For now, it is just used to determine whether the WAL is compatible or not. Closes https://github.com/facebook/rocksdb/pull/4069 Differential Revision: D8675099 Pulled By: lth fbshipit-source-id: ca27cae1738e46d65f2bb92860fc759deb874749	6 years ago
Anand Ananthabhotla	52d4c9b7f6	Allow DB resume after background errors (#3997 ) Summary: Currently, if RocksDB encounters errors during a write operation (user requested or BG operations), it sets DBImpl::bg_error_ and fails subsequent writes. This PR allows the DB to be resumed for certain classes of errors. It consists of 3 parts - 1. Introduce Status::Severity in rocksdb::Status to indicate whether a given error can be recovered from or not 2. Refactor the error handling code so that setting bg_error_ and deciding on severity is in one place 3. Provide an API for the user to clear the error and resume the DB instance This whole change is broken up into multiple PRs. Initially, we only allow clearing the error for Status::NoSpace() errors during background flush/compaction. Subsequent PRs will expand this to include more errors and foreground operations such as Put(), and implement a polling mechanism for out-of-space errors. Closes https://github.com/facebook/rocksdb/pull/3997 Differential Revision: D8653831 Pulled By: anand1976 fbshipit-source-id: 6dc835c76122443a7668497c0226b4f072bc6afd	6 years ago
Yanqin Jin	26d67e357e	Support group commits of version edits (#3944 ) Summary: This PR supports the group commit of multiple version edit entries corresponding to different column families. Column family drop/creation still cannot be grouped. This PR is a subset of [PR 3752](https://github.com/facebook/rocksdb/pull/3752). Closes https://github.com/facebook/rocksdb/pull/3944 Differential Revision: D8432536 Pulled By: riversand963 fbshipit-source-id: 8f11bd05193b6c0d9272d82e44b676abfac113cb	6 years ago
Zhichao Cao	1f6efabe23	Add bottommost_compression_opts to for bottommost_compression (#3985 ) Summary: …ression For `CompressionType` we have options `compression` and `bottommost_compression`. Thus, to make the compression options consitent with the compression type when bottommost_compression is enabled, we add the bottommost_compression_opts Closes https://github.com/facebook/rocksdb/pull/3985 Reviewed By: riversand963 Differential Revision: D8385911 Pulled By: zhichao-cao fbshipit-source-id: 07bc533dd61bcf1cef5927d8d62901c13d38d5fc	6 years ago
chouxi	818c84e116	Store timestamp in deadlock detection (#4060 ) Summary: - Summary Add timestamp into the DeadlockInfo to store the timestamp when deadlock detected on the rocksdb side. - Testplan: `make check -j64` Closes https://github.com/facebook/rocksdb/pull/4060 Differential Revision: D8655380 Pulled By: chouxi fbshipit-source-id: f58e1aa5e09eb1d1eed0a181d4e2304aaf01efe8	6 years ago
Daniel Black	e5ae1bb465	Remove bogus gcc-8.1 warning (#3870 ) Summary: Various rearrangements of the cch maths failed or replacing = '\0' with memset failed to convince the compiler it was nul terminated. So took the perverse option of changing strncpy to strcpy. Return null if memory couldn't be allocated. util/status.cc: In static member function ‘static const char* rocksdb::Status::CopyState(const char)’: util/status.cc:28:15: error: ‘char strncpy(char, const char, size_t)’ output truncated before terminating nul copying as many bytes from a string as its length [-Werror=stringop-truncation] std::strncpy(result, state, cch - 1); ~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~ util/status.cc:19:18: note: length computed here std::strlen(state) + 1; // +1 for the null terminator ~~~~~~~~~~~^~~~~~~ cc1plus: all warnings being treated as errors make: *** [Makefile:645: shared-objects/util/status.o] Error 1 closes #2705 Closes https://github.com/facebook/rocksdb/pull/3870 Differential Revision: D8594114 Pulled By: anand1976 fbshipit-source-id: ab20f3a456a711e4d29144ebe630e4fe3c99ec25	6 years ago
Manuel Ung	a16e00b7b9	WriteUnPrepared Txn: Disable seek to snapshot optimization (#3955 ) Summary: This is implemented by extending ReadCallback with another function `MaxUnpreparedSequenceNumber` which returns the largest visible sequence number for the current transaction, if there is uncommitted data written to DB. Otherwise, it returns zero, indicating no uncommitted data. There are the places where reads had to be modified. - Get and Seek/Next was just updated to seek to max(snapshot_seq, MaxUnpreparedSequenceNumber()) instead, and iterate until a key was visible. - Prev did not need need updates since it did not use the Seek to sequence number optimization. Assuming that locks were held when writing unprepared keys, and ValidateSnapshot runs, there should only be committed keys and unprepared keys of the current transaction, all of which are visible. Prev will simply iterate to get the last visible key. - Reseeking to skip keys optimization was also disabled for write unprepared, since it's possible to hit the max_skip condition even while reseeking. There needs to be some way to resolve infinite looping in this case. Closes https://github.com/facebook/rocksdb/pull/3955 Differential Revision: D8286688 Pulled By: lth fbshipit-source-id: 25e42f47fdeb5f7accea0f4fd350ef35198caafe	6 years ago
Nikhil Benesch	17339dc2f3	Add table property tracking number of range deletions (#4016 ) Summary: Add a new table property, rocksdb.num.range-deletions, which tracks the number of range deletions in a block-based table. Range deletions are no longer counted in rocksdb.num.entries; as discovered in PR #3778, there are various code paths that implicitly assume that rocksdb.num.entries counts only true keys, not range deletions. /cc ajkr nvanbenschoten Closes https://github.com/facebook/rocksdb/pull/4016 Differential Revision: D8527575 Pulled By: ajkr fbshipit-source-id: 92e7edbe78fda53756a558013c9fb496e7764fd7	6 years ago
Zhongyi Xie	408205a36b	use user_key and iterate_upper_bound to determine compatibility of bloom filters (#3899 ) Summary: Previously in https://github.com/facebook/rocksdb/pull/3601 bloom filter will only be checked if `prefix_extractor` in the mutable_cf_options matches the one found in the SST file. This PR relaxes the requirement by checking if all keys in the range [user_key, iterate_upper_bound) all share the same prefix after transforming using the BF in the SST file. If so, the bloom filter is considered compatible and will continue to be looked at. Closes https://github.com/facebook/rocksdb/pull/3899 Differential Revision: D8157459 Pulled By: miasantreble fbshipit-source-id: 18d17cba56a1005162f8d5db7a27aba277089c41	6 years ago

... 16 17 18 19 20 ...

2427 Commits (27f3af596609f3b86af5080cc870c869e836e7f2)