rocksdb

Commit Graph

Author	SHA1	Message	Date
Peter Dillinger	02f2b20864	Add BackupEngine feature to exclude files (#11030 ) Summary: We have a request for RocksDB to essentially support disconnected incremental backup. In other words, if there is limited or no connectivity to the primary backup dir, we should still be able to take an incremental backup relative to that primary backup dir, assuming some metadata about that primary dir is available (and obviously anticipating primary backup dir will be fully available if restore is needed). To support that, this feature allows the API user to "exclude" DB files from backup. This only applies to files that can be shared between backups (sst and blob files), and excluded files are tracked in the backup metadata sufficiently to ensure they are restored at restore time. At restore time, the user provides a set of alternate backup directories (as open BackupEngines, which can be read-only), and excluded files must be found in one of the backup directories ("included" in some backup). This feature depends on backup schema version 2 features, though schema version 2.0 support is not sufficient to read / restore a backup with exclusions. This change updates the schema version to 2.1 because of this feature, so that it's easy to recognize whether a RocksDB release supports this feature, while backups not using the feature are fully compatible with 2.0. Also in this PR: * Stacked on https://github.com/facebook/rocksdb/pull/11029 * Allow progress_callback to be empty, not just no-op function, and recover from exceptions thrown by BackupEngine callbacks. * The internal-only `AsBackupEngine()` function is working around the diamond hierarchy of `BackupEngineImplThreadSafe` to get to the internals, without using confusing features like virtual inheritance. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11030 Test Plan: unit tests added / updated Reviewed By: ajkr Differential Revision: D42004388 Pulled By: pdillinger fbshipit-source-id: 31b6e533d308a5462e528d9012d650482d974077	3 years ago
Yanqin Jin	7d26e4c5a3	Basic Support for Merge with user-defined timestamp (#10819 ) Summary: This PR implements the originally disabled `Merge()` APIs when user-defined timestamp is enabled. Simplest usage: ```cpp // assume string append merge op is used with '.' as delimiter. // ts1 < ts2 db->Put(WriteOptions(), "key", ts1, "v0"); db->Merge(WriteOptions(), "key", ts2, "1"); ReadOptions ro; ro.timestamp = &ts2; db->Get(ro, "key", &value); ASSERT_EQ("v0.1", value); ``` Some code comments are added for clarity. Note: support for timestamp in `DB::GetMergeOperands()` will be done in a follow-up PR. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10819 Test Plan: make check Reviewed By: ltamasi Differential Revision: D40603195 Pulled By: riversand963 fbshipit-source-id: f96d6f183258f3392d80377025529f7660503013	3 years ago
sdong	b0d9776b70	clang format files under include/ (#10850 ) Summary: Run clang-format against files under include/ Pull Request resolved: https://github.com/facebook/rocksdb/pull/10850 Test Plan: Watch existing CI to pass. Reviewed By: ajkr Differential Revision: D40646158 fbshipit-source-id: 8ce04b107c837630f4000a478d0c871577090263	3 years ago
akankshamahajan	0e7b27bfcf	Refactor block cache tracing APIs (#10811 ) Summary: Refactor the classes, APIs and data structures for block cache tracing to allow a user provided trace writer to be used. Currently, only a TraceWriter is supported, with a default built-in implementation of FileTraceWriter. The TraceWriter, however, takes a flat trace record and is thus only suitable for file tracing. This PR introduces an abstract BlockCacheTraceWriter class that takes a structured BlockCacheTraceRecord. The BlockCacheTraceWriter implementation can then format and log the record in whatever way it sees fit. The default BlockCacheTraceWriterImpl does file tracing using a user provided TraceWriter. `DB::StartBlockTrace` will internally redirect to changed `BlockCacheTrace::StartBlockCacheTrace`. New API `DB::StartBlockTrace` is also added that directly takes `BlockCacheTraceWriter` pointer. This same philosophy can be applied to KV and IO tracing as well. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10811 Test Plan: existing unit tests Old API DB::StartBlockTrace checked with db_bench tool create database ``` ./db_bench --benchmarks="fillseq" \ --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 \ --cache_index_and_filter_blocks --cache_size=1048576 \ --disable_auto_compactions=1 --disable_wal=1 --compression_type=none \ --min_level_to_compress=-1 --compression_ratio=1 --num=10000000 ``` To trace block cache accesses when running readrandom benchmark: ``` ./db_bench --benchmarks="readrandom" --use_existing_db --duration=60 \ --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 \ --cache_index_and_filter_blocks --cache_size=1048576 \ --disable_auto_compactions=1 --disable_wal=1 --compression_type=none \ --min_level_to_compress=-1 --compression_ratio=1 --num=10000000 \ --threads=16 \ -block_cache_trace_file="/tmp/binary_trace_test_example" \ -block_cache_trace_max_trace_file_size_in_bytes=1073741824 \ -block_cache_trace_sampling_frequency=1 ``` Reviewed By: anand1976 Differential Revision: D40435289 Pulled By: akankshamahajan15 fbshipit-source-id: fa2755f4788185e19f4605e731641cfd21ab3282	3 years ago
sdong	9509003503	Option migration tool to break down files for FIFO compaction (#10600 ) Summary: Right now, when the option migration tool migrates to FIFO compaction, it compacts all the data into one single SST file and move to L0. Although it creates a valid LSM-tree for FIFO, for any data to be deleted for FIFO, the giant file will be deleted, which might make the DB almost empty. There is not good solution for it, because usually we don't have enough information to reconstruct the FIFO LSM-tree. This change changes to a solution that compromises the FIFO condition. We hope the solution is more useable. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10600 Test Plan: Add unit tests for that. Reviewed By: jay-zhuang Differential Revision: D39106424 fbshipit-source-id: bdfd852c3b343373765b8d9716fefc08fd27145c	3 years ago
Levi Tamasi	81388b36e0	Add support for wide-column point lookups (#10540 ) Summary: The patch adds a new API `GetEntity` that can be used to perform wide-column point lookups. It also extends the `Get` code path and the `MemTable` / `MemTableList` and `Version` / `GetContext` logic accordingly so that wide-column entities can be served from both memtables and SSTs. If the result of a lookup is a wide-column entity (`kTypeWideColumnEntity`), it is passed to the application in deserialized form; if it is a plain old key-value (`kTypeValue`), it is presented as a wide-column entity with a single default (anonymous) column. (In contrast, regular `Get` returns plain old key-values as-is, and returns the value of the default column for wide-column entities, see https://github.com/facebook/rocksdb/issues/10483 .) The result of `GetEntity` is a self-contained `PinnableWideColumns` object. `PinnableWideColumns` contains a `PinnableSlice`, which either stores the underlying data in its own buffer or holds on to a cache handle. It also contains a `WideColumns` instance, which indexes the contents of the `PinnableSlice`, so applications can access the values of columns efficiently. There are several pieces of functionality which are currently not supported for wide-column entities: there is currently no `MultiGetEntity` or wide-column iterator; also, `Merge` and `GetMergeOperands` are not supported, and there is no `GetEntity` implementation for read-only and secondary instances. We plan to implement these in future PRs. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10540 Test Plan: `make check` Reviewed By: akankshamahajan15 Differential Revision: D38847474 Pulled By: ltamasi fbshipit-source-id: 42311a34ccdfe88b3775e847a5e2a5296e002b5b	3 years ago
Gang Liao	ec4ebeff30	Support prepopulating/warming the blob cache (#10298 ) Summary: Many workloads have temporal locality, where recently written items are read back in a short period of time. When using remote file systems, this is inefficient since it involves network traffic and higher latencies. Because of this, we would like to support prepopulating the blob cache during flush. This task is a part of https://github.com/facebook/rocksdb/issues/10156 Pull Request resolved: https://github.com/facebook/rocksdb/pull/10298 Reviewed By: ltamasi Differential Revision: D37908743 Pulled By: gangliao fbshipit-source-id: 9feaed234bc719d38f0c02975c1ad19fa4bb37d1	3 years ago
Jay Zhuang	dcb6a3be4e	Add helper function to get debug type name (#10243 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10243 Reviewed By: riversand963 Differential Revision: D37370236 Pulled By: jay-zhuang fbshipit-source-id: 6e7a6fadf45fdfb5afe97b3f6fe4acf1260d4a86	3 years ago
Levi Tamasi	c73d2a9d18	Add API for writing wide-column entities (#10242 ) Summary: The patch builds on https://github.com/facebook/rocksdb/pull/9915 and adds a new API called `PutEntity` that can be used to write a wide-column entity to the database. The new API is added to both `DB` and `WriteBatch`. Note that currently there is no way to retrieve these entities; more precisely, all read APIs (`Get`, `MultiGet`, and iterator) return `NotSupported` when they encounter a wide-column entity that is required to answer a query. Read-side support (as well as other missing functionality like `Merge`, compaction filter, and timestamp support) will be added in later PRs. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10242 Test Plan: `make check` Reviewed By: riversand963 Differential Revision: D37369748 Pulled By: ltamasi fbshipit-source-id: 7f5e412359ed7a400fd80b897dae5599dbcd685d	3 years ago
Andrew Kryczka	5d6005c780	Add WriteOptions::protection_bytes_per_key (#10037 ) Summary: Added an option, `WriteOptions::protection_bytes_per_key`, that controls how many bytes per key we use for integrity protection in `WriteBatch`. It takes effect when `WriteBatch::GetProtectionBytesPerKey() == 0`. Currently the only supported value is eight. Invoking a user API with it set to any other nonzero value will result in `Status::NotSupported` returned to the user. There is also a bug fix for integrity protection with `inplace_callback`, where we forgot to take into account the possible change in varint length when calculating KV checksum for the final encoded buffer. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10037 Test Plan: - Manual - Set default value of `WriteOptions::protection_bytes_per_key` to eight and ran `make check -j24` - Enabled in MyShadow for 1+ week - Automated - Unit tests have a `WriteMode` that enables the integrity protection via `WriteOptions` - Crash test - in most cases, use `WriteOptions::protection_bytes_per_key` to enable integrity protection Reviewed By: cbi42 Differential Revision: D36614569 Pulled By: ajkr fbshipit-source-id: 8650087ceac9b61b560f1e5fafe5e1baf9c725fb	3 years ago
Yanqin Jin	1777e5f7e9	Snapshots with user-specified timestamps (#9879 ) Summary: In RocksDB, keys are associated with (internal) sequence numbers which denote when the keys are written to the database. Sequence numbers in different RocksDB instances are unrelated, thus not comparable. It is nice if we can associate sequence numbers with their corresponding actual timestamps. One thing we can do is to support user-defined timestamp, which allows the applications to specify the format of custom timestamps and encode a timestamp with each key. More details can be found at https://github.com/facebook/rocksdb/wiki/User-defined-Timestamp-%28Experimental%29. This PR provides a different but complementary approach. We can associate rocksdb snapshots (defined in https://github.com/facebook/rocksdb/blob/7.2.fb/include/rocksdb/snapshot.h#L20) with user-specified timestamps. Since a snapshot is essentially an object representing a sequence number, this PR establishes a bi-directional mapping between sequence numbers and timestamps. In the past, snapshots are usually taken by readers. The current super-version is grabbed, and a `rocksdb::Snapshot` object is created with the last published sequence number of the super-version. You can see that the reader actually has no good idea of what timestamp to assign to this snapshot, because by the time the `GetSnapshot()` is called, an arbitrarily long period of time may have already elapsed since the last write, which is when the last published sequence number is written. This observation motivates the creation of "timestamped" snapshots on the write path. Currently, this functionality is exposed only to the layer of `TransactionDB`. Application can tell RocksDB to create a snapshot when a transaction commits, effectively associating the last sequence number with a timestamp. It is also assumed that application will ensure any two snapshots with timestamps should satisfy the following: ``` snapshot1.seq < snapshot2.seq iff. snapshot1.ts < snapshot2.ts ``` If the application can guarantee that when a reader takes a timestamped snapshot, there is no active writes going on in the database, then we also allow the user to use a new API `TransactionDB::CreateTimestampedSnapshot()` to create a snapshot with associated timestamp. Code example ```cpp // Create a timestamped snapshot when committing transaction. txn->SetCommitTimestamp(100); txn->SetSnapshotOnNextOperation(); txn->Commit(); // A wrapper API for convenience Status Transaction::CommitAndTryCreateSnapshot( std::shared_ptr<TransactionNotifier> notifier, TxnTimestamp ts, std::shared_ptr<const Snapshot>* ret); // Create a timestamped snapshot if caller guarantees no concurrent writes std::pair<Status, std::shared_ptr<const Snapshot>> snapshot = txn_db->CreateTimestampedSnapshot(100); ``` The snapshots created in this way will be managed by RocksDB with ref-counting and potentially shared with other readers. We provide the following APIs for readers to retrieve a snapshot given a timestamp. ```cpp // Return the timestamped snapshot correponding to given timestamp. If ts is // kMaxTxnTimestamp, then we return the latest timestamped snapshot if present. // Othersise, we return the snapshot whose timestamp is equal to `ts`. If no // such snapshot exists, then we return null. std::shared_ptr<const Snapshot> TransactionDB::GetTimestampedSnapshot(TxnTimestamp ts) const; // Return the latest timestamped snapshot if present. std::shared_ptr<const Snapshot> TransactionDB::GetLatestTimestampedSnapshot() const; ``` We also provide two additional APIs for stats collection and reporting purposes. ```cpp Status TransactionDB::GetAllTimestampedSnapshots( std::vector<std::shared_ptr<const Snapshot>>& snapshots) const; // Return timestamped snapshots whose timestamps fall in [ts_lb, ts_ub) and store them in `snapshots`. Status TransactionDB::GetTimestampedSnapshots( TxnTimestamp ts_lb, TxnTimestamp ts_ub, std::vector<std::shared_ptr<const Snapshot>>& snapshots) const; ``` To prevent the number of timestamped snapshots from growing infinitely, we provide the following API to release timestamped snapshots whose timestamps are older than or equal to a given threshold. ```cpp void TransactionDB::ReleaseTimestampedSnapshotsOlderThan(TxnTimestamp ts); ``` Before shutdown, RocksDB will release all timestamped snapshots. Comparison with user-defined timestamp and how they can be combined: User-defined timestamp persists every key with a timestamp, while timestamped snapshots maintain a volatile mapping between snapshots (sequence numbers) and timestamps. Different internal keys with the same user key but different timestamps will be treated as different by compaction, thus a newer version will not hide older versions (with smaller timestamps) unless they are eligible for garbage collection. In contrast, taking a timestamped snapshot at a certain sequence number and timestamp prevents all the keys visible in this snapshot from been dropped by compaction. Here, visible means (seq < snapshot and most recent). The timestamped snapshot supports the semantics of reading at an exact point in time. Timestamped snapshots can also be used with user-defined timestamp. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9879 Test Plan: ``` make check TEST_TMPDIR=/dev/shm make crash_test_with_txn ``` Reviewed By: siying Differential Revision: D35783919 Pulled By: riversand963 fbshipit-source-id: 586ad905e169189e19d3bfc0cb0177a7239d1bd4	3 years ago
Gang Liao	e6432dfd4c	Make it possible to enable blob files starting from a certain LSM tree level (#10077 ) Summary: Currently, if blob files are enabled (i.e. `enable_blob_files` is true), large values are extracted both during flush/recovery (when SST files are written into level 0 of the LSM tree) and during compaction into any LSM tree level. For certain use cases that have a mix of short-lived and long-lived values, it might make sense to support extracting large values only during compactions whose output level is greater than or equal to a specified LSM tree level (e.g. compactions into L1/L2/... or above). This could reduce the space amplification caused by large values that are turned into garbage shortly after being written at the price of some write amplification incurred by long-lived values whose extraction to blob files is delayed. In order to achieve this, we would like to do the following: - Add a new configuration option `blob_file_starting_level` (default: 0) to `AdvancedColumnFamilyOptions` (and `MutableCFOptions` and extend the related logic) - Instantiate `BlobFileBuilder` in `BuildTable` (used during flush and recovery, where the LSM tree level is L0) and `CompactionJob` iff `enable_blob_files` is set and the LSM tree level is `>= blob_file_starting_level` - Add unit tests for the new functionality, and add the new option to our stress tests (`db_stress` and `db_crashtest.py` ) - Add the new option to our benchmarking tool `db_bench` and the BlobDB benchmark script `run_blob_bench.sh` - Add the new option to the `ldb` tool (see https://github.com/facebook/rocksdb/wiki/Administration-and-Data-Access-Tool) - Ideally extend the C and Java bindings with the new option - Update the BlobDB wiki to document the new option. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10077 Reviewed By: ltamasi Differential Revision: D36884156 Pulled By: gangliao fbshipit-source-id: 942bab025f04633edca8564ed64791cb5e31627d	3 years ago
Jay Zhuang	c6d326d3d7	Track SST unique id in MANIFEST and verify (#9990 ) Summary: Start tracking SST unique id in MANIFEST, which is used to verify with SST properties to make sure the SST file is not overwritten or misplaced. A DB option `try_verify_sst_unique_id` is introduced to enable/disable the verification, if enabled, it opens all SST files during DB-open to read the unique_id from table properties (default is false), so it's recommended to use it with `max_open_files = -1` to pre-open the files. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9990 Test Plan: unittests, format-compatible test, mini-crash Reviewed By: anand1976 Differential Revision: D36381863 Pulled By: jay-zhuang fbshipit-source-id: 89ea2eb6b35ed3e80ead9c724eb096083eaba63f	3 years ago
mrambacher	204a42ca97	Added GetFactoryCount/Names/Types to ObjectRegistry (#9358 ) Summary: These methods allow for more thorough testing of the ObjectRegistry and Customizable infrastructure in a simpler manner. With this change, the Customizable tests can now check what factories are registered and attempt to create each of them in a systematic fashion. With this change, I think all of the factories registered with the ObjectRegistry/CreateFromString are now tested via the customizable_test classes. Note that there were a few other minor changes. There was a "posix://*" register with the ObjectRegistry which was missed during the PatternEntry conversion -- these changes found that. The nickname and default names for the FileSystem classes was also inverted. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9358 Reviewed By: pdillinger Differential Revision: D33433542 Pulled By: mrambacher fbshipit-source-id: 9a32da74e6620745b4eeffb2712be70eeeadfa7e	3 years ago
mrambacher	bfc6a8ee4a	Option type info functions (#9411 ) Summary: Add methods to set the various functions (Parse, Serialize, Equals) to the OptionTypeInfo. These methods simplify the number of constructors required for OptionTypeInfo and make the code a little clearer. Add functions to the OptionTypeInfo for Prepare and Validate. These methods allow types other than Configurable and Customizable to have Prepare and Validate logic. These methods could be used by an option to guarantee that its settings were in a range or that a value was initialized. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9411 Reviewed By: pdillinger Differential Revision: D36174849 Pulled By: mrambacher fbshipit-source-id: 72517d8c6bab4723788a4c1a9e16590bff870125	3 years ago
Jay Zhuang	270179bb12	Default `try_load_options` to true when DB is specified (#9937 ) Summary: If the DB path is specified, the user would expect ldb loads the options from the path, but it's not: ``` $ ldb list_live_files_metadata --db=`pwd` ``` Default `try_load_options` to true in that case. The user can still disable that by: ``` $ ldb list_live_files_metadata --db=`pwd` --try_load_options=false ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/9937 Test Plan: `ldb list_live_files_metadata --db=`pwd`` is able to work for a db generated with different options.num_levels. Reviewed By: ajkr Differential Revision: D36106708 Pulled By: jay-zhuang fbshipit-source-id: 2732fdc027a4d172436b2c9b6a9787b56b10c710	3 years ago
yuzhangyu	fff28a7725	Add blob dump support to the dump command (#9881 ) Summary: This patch is the first part of adding blob dump support. It only adds blob dump support to the dump command. A follow up patch will add blob dump support to the dump_live_files command. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9881 Reviewed By: ltamasi Differential Revision: D35796731 Pulled By: jowlyzhang fbshipit-source-id: 2cc5973b222d505a331ac7b969edcf992b47c5ee	3 years ago
Yanqin Jin	d13825e586	Add rollback_deletion_type_callback to TxnDBOptions (#9873 ) Summary: This PR does not affect write-committed. Add a member, `rollback_deletion_type_callback` to TransactionDBOptions so that a write-prepared transaction, when rolling back, can call this callback to decide if a `Delete` or `SingleDelete` should be used to cancel a prior `Put` written to the database during prepare phase. The purpose of this PR is to prevent mixing `Delete` and `SingleDelete` for the same key, causing undefined behaviors. Without this PR, the following can happen: ``` // The application always issues SingleDelete when deleting keys. txn1->Put('a'); txn1->Prepare(); // writes to memtable and potentially gets flushed/compacted to Lmax txn1->Rollback(); // inserts DELETE('a') txn2->Put('a'); txn2->Commit(); // writes to memtable and potentially gets flushed/compacted ``` In the database, we may have ``` L0: [PUT('a', s=100)] L1: [DELETE('a', s=90)] Lmax: [PUT('a', s=0)] ``` If a compaction compacts L0 and L1, then we have ``` L1: [PUT('a', s=100)] Lmax: [PUT('a', s=0)] ``` If a future transaction issues a SingleDelete, we have ``` L0: [SD('a', s=110)] L1: [PUT('a', s=100)] Lmax: [PUT('a', s=0)] ``` Then, a compaction including L0, L1 and Lmax leads to ``` Lmax: [PUT('a', s=0)] ``` which is incorrect. Similar bugs reported and addressed in https://github.com/cockroachdb/pebble/issues/1255. Based on our team's current priority, we have decided to take this approach for now. We may come back and revisit in the future. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9873 Test Plan: make check Reviewed By: ltamasi Differential Revision: D35762170 Pulled By: riversand963 fbshipit-source-id: b28d56eefc786b53c9844b9ef4a7807acdd82c8d	3 years ago
sdong	4f9c0fd083	Add Aggregation Merge Operator (#9780 ) Summary: Add a merge operator that allows users to register specific aggregation function so that they can does aggregation based per key using different aggregation types. See comments of function CreateAggMergeOperator() for actual usage. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9780 Test Plan: Add a unit test to coverage various cases. Reviewed By: ltamasi Differential Revision: D35267444 fbshipit-source-id: 5b02f31c4f3e17e96dd4025cdc49fca8c2868628	3 years ago
yuzhangyu	082eb04200	Add option --decode_blob_index to dump_live_files command (#9842 ) Summary: This change only add decode blob index support to dump_live_files command, which is part of a task to add blob support to a few commands. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9842 Reviewed By: ltamasi Differential Revision: D35650167 Pulled By: jowlyzhang fbshipit-source-id: a78151b98bc38ac6f52c6e01ca6927a3429ddd14	3 years ago
mrambacher	b7db7eae26	Plugin Registry (#7949 ) Summary: Added a Plugin class to the ObjectRegistry. Enabled compile-time and program-time addition of plugins to the Registry. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7949 Reviewed By: mrambacher Differential Revision: D33517674 Pulled By: pdillinger fbshipit-source-id: c3e3270aab76a489bfa9e85d78cdfca951912557	3 years ago
Yanqin Jin	1a1c5bda23	Disallow commit-time-batch for write-prepared/write-unprepared txn conditionally (#9794 ) Summary: For write-prepared/write-unprepared transactions, GetCommitTimeWriteBatch() can be used only if the transaction is started with `TransactionOptions::use_only_the_last_commit_time_batch_for_recovery` set to true. Otherwise, it is possible that multiple uncommitted versions of the same key exist in the database. During bottommost compaction, RocksDB may set the sequence numbers of both to zero once they become committed, causing output SST file to have two identical internal keys. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9794 Test Plan: make check pay special attention to the following ``` transaction_test --gtest_filter=MySQLStyleTransactionTest/MySQLStyleTransactionTest.TransactionStressTest/* ``` Reviewed By: lth Differential Revision: D35327214 Pulled By: riversand963 fbshipit-source-id: 3bae00a28359c10e96e4c6f676d20de5610d8a0f	4 years ago
yaphet	fcd32e687b	remove some break line (#9716 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9716 Reviewed By: mrambacher Differential Revision: D35026096 Pulled By: jay-zhuang fbshipit-source-id: 296c38418e2bb7948d7802e439a08c6621bdb49b	4 years ago
Peter Dillinger	cff0d1e8e6	New backup meta schema, with file temperatures (#9660 ) Summary: The primary goal of this change is to add support for backing up and restoring (applying on restore) file temperature metadata, without committing to either the DB manifest or the FS reported "current" temperatures being exclusive "source of truth". To achieve this goal, we need to add temperature information to backup metadata, which requires updated backup meta schema. Fortunately I prepared for this in https://github.com/facebook/rocksdb/issues/8069, which began forward compatibility in version 6.19.0 for this kind of schema update. (Previously, backup meta schema was not extensible! Making this schema update public will allow some other "nice to have" features like taking backups with hard links, and avoiding crc32c checksum computation when another checksum is already available.) While schema version 2 is newly public, the default schema version is still 1. Until we change the default, users will need to set to 2 to enable features like temperature data backup+restore. New metadata like temperature information will be ignored with a warning in versions before this change and since 6.19.0. The metadata is considered ignorable because a functioning DB can be restored without it. Some detail: * Some renaming because "future schema" is now just public schema 2. * Initialize some atomics in TestFs (linter reported) * Add temperature hint support to SstFileDumper (used by BackupEngine) Pull Request resolved: https://github.com/facebook/rocksdb/pull/9660 Test Plan: related unit test majorly updated for the new functionality, including some shared testing support for tracking temperatures in a FS. Some other tests and testing hooks into production code also updated for making the backup meta schema change public. Reviewed By: ajkr Differential Revision: D34686968 Pulled By: pdillinger fbshipit-source-id: 3ac1fa3e67ee97ca8a5103d79cc87d872c1d862a	4 years ago
Yanqin Jin	5894761056	Improve stress test for transactions (#9568 ) Summary: Test only, no change to functionality. Extremely low risk of library regression. Update test key generation by maintaining existing and non-existing keys. Update db_crashtest.py to drive multiops_txn stress test for both write-committed and write-prepared. Add a make target 'blackbox_crash_test_with_multiops_txn'. Running the following commands caught the bug exposed in https://github.com/facebook/rocksdb/issues/9571. ``` $rm -rf /tmp/rocksdbtest/* $./db_stress -progress_reports=0 -test_multi_ops_txns -use_txn -clear_column_family_one_in=0 \ -column_families=1 -writepercent=0 -delpercent=0 -delrangepercent=0 -customopspercent=60 \ -readpercent=20 -prefixpercent=0 -iterpercent=20 -reopen=0 -ops_per_thread=1000 -ub_a=10000 \ -ub_c=100 -destroy_db_initially=0 -key_spaces_path=/dev/shm/key_spaces_desc -threads=32 -read_fault_one_in=0 $./db_stress -progress_reports=0 -test_multi_ops_txns -use_txn -clear_column_family_one_in=0 -column_families=1 -writepercent=0 -delpercent=0 -delrangepercent=0 -customopspercent=60 -readpercent=20 \ -prefixpercent=0 -iterpercent=20 -reopen=0 -ops_per_thread=1000 -ub_a=10000 -ub_c=100 -destroy_db_initially=0 \ -key_spaces_path=/dev/shm/key_spaces_desc -threads=32 -read_fault_one_in=0 ``` Running the following command caught a bug which will be fixed in https://github.com/facebook/rocksdb/issues/9648 . ``` $TEST_TMPDIR=/dev/shm make blackbox_crash_test_with_multiops_wc_txn ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/9568 Reviewed By: jay-zhuang Differential Revision: D34308154 Pulled By: riversand963 fbshipit-source-id: 99ff1b65c19b46c471d2f2d3b47adcd342a1b9e7	4 years ago
Changneng Chen	9ed96703d1	Add support for BlobDB to ldb (#9630 ) Summary: Add the configuration options and help messages of BlobDB to `ldb` Pull Request resolved: https://github.com/facebook/rocksdb/pull/9630 Test Plan: `python ./tools/ldb_test.py` Reviewed By: ltamasi Differential Revision: D34443176 Pulled By: changneng fbshipit-source-id: 5b3f185cdfc2561e06dd37215c7edfbca07dbe80	4 years ago
Yanqin Jin	6f12599863	Support WBWI for keys having timestamps (#9603 ) Summary: This PR supports inserting keys to a `WriteBatchWithIndex` for column families that enable user-defined timestamps and reading the keys back. The index does not have timestamps. Writing a key to WBWI is unchanged, because the underlying WriteBatch already supports it. When reading the keys back, we need to make sure to distinguish between keys with and without timestamps before comparison. When user calls `GetFromBatchAndDB()`, no timestamp is needed to query the batch, but a timestamp has to be provided to query the db. The assumption is that data in the batch must be newer than data from the db. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9603 Test Plan: make check Reviewed By: ltamasi Differential Revision: D34354849 Pulled By: riversand963 fbshipit-source-id: d25d1f84e2240ce543e521fa30595082fb8db9a0	4 years ago
mrambacher	30b08878d8	Make FilterPolicy Customizable (#9590 ) Summary: Make FilterPolicy into a Customizable class. Allow new FilterPolicy to be discovered through the ObjectRegistry Pull Request resolved: https://github.com/facebook/rocksdb/pull/9590 Reviewed By: pdillinger Differential Revision: D34327367 Pulled By: mrambacher fbshipit-source-id: 37e7edac90ec9457422b72f359ab8ef48829c190	4 years ago
mrambacher	c42d0cf862	Add support for decimals to PatternEntry (#9577 ) Summary: Add support for doubles to ObjectLibrary::PatternEntry. This support will allow patterns containing a non-integer number to be parsed correctly. Added appropriate test cases to cover this new option. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9577 Reviewed By: pdillinger Differential Revision: D34269763 Pulled By: mrambacher fbshipit-source-id: b5ce16cbd3665c2974ec0f3412ef2b403ef8b155	4 years ago
Yanqin Jin	d6e1e6f37a	Add commit_timestamp and read_timestamp to Pessimistic transaction (#9537 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9537 Add `Transaction::SetReadTimestampForValidation()` and `Transaction::SetCommitTimestamp()` APIs with default implementation returning `Status::NotSupported()`. Currently, calling these two APIs do not have any effect. Also add checks to `PessimisticTransactionDB` to enforce that column families in the same db either - disable user-defined timestamp - enable 64-bit timestamp Just to clarify, a `PessimisticTransactionDB` can have some column families without timestamps as well as column families that enable timestamp. Each `PessimisticTransaction` can have two optional timestamps, `read_timestamp_` used for additional validation and `commit_timestamp_` which denotes when the transaction commits. For now, we are going to support `WriteCommittedTxn` (in a series of subsequent PRs) Once set, we do not allow decreasing `read_timestamp_`. The `commit_timestamp_` must be greater than `read_timestamp_` for each transaction and must be set before commit, unless the transaction does not involve any column family that enables user-defined timestamp. TransactionDB builds on top of RocksDB core `DB` layer. Though `DB` layer assumes that user-defined timestamps are byte arrays, `TransactionDB` uses uint64_t to store timestamps. When they are passed down, they are still interpreted as byte-arrays by `DB`. Reviewed By: ltamasi Differential Revision: D31567959 fbshipit-source-id: b0b6b69acab5d8e340cf174f33e8b09f1c3d3502	4 years ago
mrambacher	81ada95bd7	Add STATIC_AVOID_DESTRUCTION for ObjectLibrary/Registry (#9464 ) Summary: This change should guarantee that the default ObjectLibrary/Registry are long-lived and not destroyed while the process is running. This will prevent some issues of them being referenced after they were destroyed via the static destruction. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9464 Reviewed By: pdillinger Differential Revision: D33849876 Pulled By: mrambacher fbshipit-source-id: 7a69177d7c58c81be293fc7ef8e600d47ddbc14b	4 years ago
mrambacher	fe9d495112	Return different Status based on ObjectRegistry::NewObject calls (#9333 ) Summary: This fix addresses https://github.com/facebook/rocksdb/issues/9299. If attempting to create a new object via the ObjectRegistry and a factory is not found, the ObjectRegistry will return a "NotSupported" status. This is the same behavior as previously. If the factory is found but could not successfully create the object, an "InvalidArgument" status is returned. If the factory returned a reason why (in the errmsg), this message will be in the returned status. In practice, there are two options in the ConfigOptions that control how these errors are propagated: - If "ignore_unknown_options=true", then both InvalidArgument and NotSupported status codes will be swallowed internally. Both cases will return success - If "ignore_unsupported_options=true", then having no factory will return success but a failing factory will return an error - If both options are false, both cases (no and failing factory) will return errors. In practice this likely only changes Customizable that may be partially available. For example, the JEMallocMemoryAllocator is a built-in allocator that is registered with the system but may not be compiled in. In this case, the status code for this allocator changed from NotSupported("JEMalloc not available") to InvalidArgumen("JEMalloc not available"). Other Customizable builtins/plugins would have the same semantics. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9333 Reviewed By: pdillinger Differential Revision: D33517681 Pulled By: mrambacher fbshipit-source-id: 8033052d4a4a7b88c2d9f90147b1b4467e51f6fd	4 years ago
Yanqin Jin	3122cb4358	Revise APIs related to user-defined timestamp (#8946 ) Summary: ajkr reminded me that we have a rule of not including per-kv related data in `WriteOptions`. Namely, `WriteOptions` should not include information about "what-to-write", but should just include information about "how-to-write". According to this rule, `WriteOptions::timestamp` (experimental) is clearly a violation. Therefore, this PR removes `WriteOptions::timestamp` for compliance. After the removal, we need to pass timestamp info via another set of APIs. This PR proposes a set of overloaded functions `Put(write_opts, key, value, ts)`, `Delete(write_opts, key, ts)`, and `SingleDelete(write_opts, key, ts)`. Planned to add `Write(write_opts, batch, ts)`, but its complexity made me reconsider doing it in another PR (maybe). For better checking and returning error early, we also add a new set of APIs to `WriteBatch` that take extra `timestamp` information when writing to `WriteBatch`es. These set of APIs in `WriteBatchWithIndex` are currently not supported, and are on our TODO list. Removed `WriteBatch::AssignTimestamps()` and renamed `WriteBatch::AssignTimestamp()` to `WriteBatch::UpdateTimestamps()` since this method require that all keys have space for timestamps allocated already and multiple timestamps can be updated. The constructor of `WriteBatch` now takes a fourth argument `default_cf_ts_sz` which is the timestamp size of the default column family. This will be used to allocate space when calling APIs that do not specify a column family handle. Also, updated `DB::Get()`, `DB::MultiGet()`, `DB::NewIterator()`, `DB::NewIterators()` methods, replacing some assertions about timestamp to returning Status code. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8946 Test Plan: make check ./db_bench -benchmarks=fillseq,fillrandom,readrandom,readseq,deleterandom -user_timestamp_size=8 ./db_stress --user_timestamp_size=8 -nooverwritepercent=0 -test_secondary=0 -secondary_catch_up_one_in=0 -continuous_verification_interval=0 Make sure there is no perf regression by running the following ``` ./db_bench_opt -db=/dev/shm/rocksdb -use_existing_db=0 -level0_stop_writes_trigger=256 -level0_slowdown_writes_trigger=256 -level0_file_num_compaction_trigger=256 -disable_wal=1 -duration=10 -benchmarks=fillrandom ``` Before this PR ``` DB path: [/dev/shm/rocksdb] fillrandom : 1.831 micros/op 546235 ops/sec; 60.4 MB/s ``` After this PR ``` DB path: [/dev/shm/rocksdb] fillrandom : 1.820 micros/op 549404 ops/sec; 60.8 MB/s ``` Reviewed By: ltamasi Differential Revision: D33721359 Pulled By: riversand963 fbshipit-source-id: c131561534272c120ffb80711d42748d21badf09	4 years ago
Yanqin Jin	d10c5c08d3	Remove iter_start_seqnum and preserve_deletes (#9430 ) Summary: According to https://github.com/facebook/rocksdb/blob/6.27.fb/db/db_impl/db_impl.cc#L2896:L2911 and https://github.com/facebook/rocksdb/blob/6.27.fb/db/db_impl/db_impl_open.cc#L203:L208, we are going to remove `iter_start_seqnum` and `preserve_deletes` starting from RocksDB 7.0 Pull Request resolved: https://github.com/facebook/rocksdb/pull/9430 Test Plan: make check and CI Reviewed By: ajkr Differential Revision: D33753639 Pulled By: riversand963 fbshipit-source-id: c80aab8e8d8fc33e52472fed524ed703d0ffc8b6	4 years ago
anand76	e58cc56fb5	Use == operator for shared_ptr nullptr comparison (#9465 ) Summary: From C++ 20 onwards, the != operator is not supported for a shared_ptr. So switch to using ==. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9465 Test Plan: make check Reviewed By: riversand963 Differential Revision: D33850596 Pulled By: anand1976 fbshipit-source-id: eec16d1aa6c39a315ec2d44d233d7518f9c1ddcb	4 years ago
Peter Dillinger	78aee6fedc	Remove obsolete backupable_db.h, utility_db.h (#9438 ) Summary: This also removes the obsolete names BackupableDBOptions and UtilityDB. API users must now use BackupEngineOptions and DBWithTTL::Open. In C API, `rocksdb_backupable_db_` is replaced `rocksdb_backup_engine_`. Similar renaming in Java API. In reference to https://github.com/facebook/rocksdb/issues/9389 Pull Request resolved: https://github.com/facebook/rocksdb/pull/9438 Test Plan: CI Reviewed By: mrambacher Differential Revision: D33780269 Pulled By: pdillinger fbshipit-source-id: 4a6cfc5c1b4c78bcad790b9d3dd13c5fdf4a1fac	4 years ago
Peter Dillinger	449029f865	Remove deprecated ObjectLibrary::Register() (and Regex public API) (#9439 ) Summary: Regexes are considered potentially problematic for use in registering RocksDB extensions, so we are removing ObjectLibrary::Register() and the Regex public API it depended on (now unused). In reference to https://github.com/facebook/rocksdb/issues/9389 Why? * The power of Regexes can make it hard to reason about which extension will match what. (The replacement API isn't perfect, but we are at least "holding the line" on patterns we have seen in practice.) * It is easy to make regexes that don't quite mean what you think they mean, such as forgetting that the `.` in `foo.bar` can match any character or that matching is nondeterministic, as in `a🅱️42` matching `.:[0-9]+`. Some regexes and implementations can have disastrously bad performance. This might not be much practical concern for ObjectLibray here, but we don't want to encourage potentially dangerous further use in production code. (Testing code is fine. See TestRegex.) Pull Request resolved: https://github.com/facebook/rocksdb/pull/9439 Test Plan: CI Reviewed By: mrambacher Differential Revision: D33792342 Pulled By: pdillinger fbshipit-source-id: 4f64dcb04764e639162c8977a5fa196f67754cec	4 years ago
Jay Zhuang	022b400cba	Make `bottommost_temperature` dynamically changeable (#9402 ) Summary: Make `AdvancedColumnFamilyOptions.bottommost_temperature` dynamically changeable with `SetOptions` API. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9402 Test Plan: added unittest Reviewed By: siying Differential Revision: D33674487 Pulled By: jay-zhuang fbshipit-source-id: 8943768156aa6197c63850a64238a8092527d517	4 years ago
Yanqin Jin	fa52376117	Move RADOS support to separate repo (#9206 ) Summary: This PR moves RADOS support from RocksDB repo to a separate repo. The new (temporary?) repo in this PR serves as an example before we finalize the decision on where and who to host RADOS support. At this point, people can start from the example repo and fork. The goal is to include this commit in RocksDB 7.0 release. Reference: https://github.com/ajkr/dedupfs by ajkr Pull Request resolved: https://github.com/facebook/rocksdb/pull/9206 Test Plan: Follow instructions in https://github.com/riversand963/rocksdb-rados-env/blob/main/README.md and build test binary `env_librados_test` and run it. Also, make check Reviewed By: ajkr Differential Revision: D33751690 Pulled By: riversand963 fbshipit-source-id: 30466c62afa9e4619847a48567ed158e62835e35	4 years ago
Yanqin Jin	50135c1bf3	Move HDFS support to separate repo (#9170 ) Summary: This PR moves HDFS support from RocksDB repo to a separate repo. The new (temporary?) repo in this PR serves as an example before we finalize the decision on where and who to host hdfs support. At this point, people can start from the example repo and fork. Java/JNI is not included yet, and needs to be done later if necessary. The goal is to include this commit in RocksDB 7.0 release. Reference: https://github.com/ajkr/dedupfs by ajkr Pull Request resolved: https://github.com/facebook/rocksdb/pull/9170 Test Plan: Follow the instructions in https://github.com/riversand963/rocksdb-hdfs-env/blob/master/README.md. Build and run db_bench and db_stress. make check Reviewed By: ajkr Differential Revision: D33751662 Pulled By: riversand963 fbshipit-source-id: 22b4db7f31762ed417a20239f5a08dcd1696244f	4 years ago
Sergei Petrunia	c9042db619	Range Locking: add support for escalation barriers (#9290 ) Summary: Range Locking supports Lock Escalation. Lock Escalation is invoked when lock memory is nearly exhausted and it reduced the amount of memory used by joining adjacent locks. Bridging the gap between certain locks has adverse effects. For example, in MyRocks it is not a good idea to bridge the gap between locks in different indexes, as that get the lock to cover large portions of indexes, or even entire indexes. Resolve this by introducing Escalation Barrier. The escalation process will call the user-provided barrier callback function: bool(const Endpoint& a, const Endpoint& b) If the function returns true, there's a barrier between a and b and Lock Escalation will not try to bridge the gap between a and b. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9290 Reviewed By: akankshamahajan15 Differential Revision: D33486753 Pulled By: riversand963 fbshipit-source-id: f97910b67aba0579ea1d35f523ca6863d3dd018e	4 years ago
mrambacher	1973fcba11	Restore Regex support for ObjectLibrary::Register, rename new APIs to allow old one to be deprecated in the future (#9362 ) Summary: In order to support old-style regex function registration, restored the original "Register<T>(string, Factory)" method using regular expressions. The PatternEntry methods were left in place but renamed to AddFactory. The goal is to allow for the deprecation of the original regex Registry method in an upcoming release. Added modes to the PatternEntry kMatchZeroOrMore and kMatchAtLeastOne to match * or +, respectively (kMatchAtLeastOne was the original behavior). Pull Request resolved: https://github.com/facebook/rocksdb/pull/9362 Reviewed By: pdillinger Differential Revision: D33432562 Pulled By: mrambacher fbshipit-source-id: ed88ab3f9a2ad0d525c7bd1692873f9bb3209d02	4 years ago
mrambacher	1c39b7952b	Remove/Reduce use of Regex in ObjectRegistry/Library (#9264 ) Summary: Added new ObjectLibrary::Entry classes to replace/reduce the use of Regex. For simple factories that only do name matching, there are "StringEntry" and "AltStringEntry" classes. For classes that use some semblance of regular expressions, there is a PatternEntry class that can match a name and prefixes. There is also a class for Customizable::IndividualId format matches. Added tests for the new derivative classes and got all unit tests to pass. Resolves https://github.com/facebook/rocksdb/issues/9225. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9264 Reviewed By: pdillinger Differential Revision: D33062001 Pulled By: mrambacher fbshipit-source-id: c2d2143bd2d38bdf522705c8280c35381b135c03	4 years ago
slk	2e5f764294	Make IncreaseFullHistoryTsLow to a public API (#9221 ) Summary: As (https://github.com/facebook/rocksdb/issues/9210) discussed, the full_history_ts_low is a member of CompactRangeOptions currently, which means a CF's fullHistoryTsLow is advanced only when users submit a CompactRange request. However, users may want to advance the fllHistoryTsLow without an immediate compact. This merge make IncreaseFullHistoryTsLow to a public API so users can advance each CF's fullHistoryTsLow seperately. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9221 Reviewed By: akankshamahajan15 Differential Revision: D33201106 Pulled By: riversand963 fbshipit-source-id: 9cb1d013ba93260f72e16353e693ffee167b47ee	4 years ago
Sergei Petrunia	1b076e82db	Expose locktree's wait count in RangeLockManagerHandle::Counters (#9289 ) Summary: locktree is a module providing Range Locking. It has a counter for the number of times a lock acquisition request was blocked by an existing conflicting lock and had to wait for it to be released. Expose this counter in RangeLockManagerHandle::Counters::lock_wait_count. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9289 Reviewed By: jay-zhuang Differential Revision: D33079182 Pulled By: riversand963 fbshipit-source-id: 25b1a362d9da247536ab5007bd15900b319f139e	4 years ago
Peter Dillinger	82afa01815	Some API clarifications (#9080 ) Summary: * Clarify that RocksDB is not exception safe on many of our callback and extension interfaces * Clarify FSRandomAccessFile::MultiRead implementations must accept non-sorted inputs (see https://github.com/facebook/rocksdb/issues/8953) * Clarify ConcurrentTaskLimiter and SstFileManager are not (currently) extensible interfaces * Mark WriteBufferManager as `final`, so it is then clearly not a callback interface, even though it smells like one * Clarify TablePropertiesCollector Status returns are mostly ignored Pull Request resolved: https://github.com/facebook/rocksdb/pull/9080 Test Plan: comments only (except WriteBufferManager final) Reviewed By: ajkr Differential Revision: D31968782 Pulled By: pdillinger fbshipit-source-id: 11b648ce3ce3c5e5bdc02d2eafc7ea4b864bd1d2	4 years ago
Peter Dillinger	92e2399669	Fix EnvLibrados and add to CI (#9088 ) Summary: This feature was not part of any common or CI build, so no surprise it broke. Now we can at least ensure compilation. I don't know how to run the test successfully (missing config file) so it is bypassed for now. Fixes https://github.com/facebook/rocksdb/issues/9078 Pull Request resolved: https://github.com/facebook/rocksdb/pull/9088 Test Plan: CI Reviewed By: mrambacher Differential Revision: D32009467 Pulled By: pdillinger fbshipit-source-id: 3e0d1e5fde7f0ece703d48a81479e1cc7392c25c	4 years ago
Peter Dillinger	3ffb3baa0b	Add (Live)FileStorageInfo API (#8968 ) Summary: New classes FileStorageInfo and LiveFileStorageInfo and 'experimental' function DB::GetLiveFilesStorageInfo, which is intended to largely replace several fragmented DB functions needed to create checkpoints and backups. This function is now used to create checkpoints and backups, because it fixes many (probably not all) of the prior complexities of checkpoint not having atomic access to DB metadata. This also ensures strong functional test coverage of the new API. Specifically, much of the old CheckpointImpl::CreateCustomCheckpoint has been migrated to and updated in DBImpl::GetLiveFilesStorageInfo, with the former now calling the latter. Also, the class FileStorageInfo in metadata.h compatibly replaces BackupFileInfo and serves as a new base class for SstFileMetaData. Some old fields of SstFileMetaData are still provided (for now) but deprecated. Although FileStorageInfo::directory is accurate when using db_paths and/or cf_paths, these have never been supported by Checkpoint nor BackupEngine and still are not. This change does now detect these cases and return NotSupported when appropriate. (More work needed for support.) Somehow this change broke ProgressCallbackDuringBackup, but the progress_callback logic was dubious to begin with because it would call the callback based on copy buffer size, not size actually copied. Logic and test updated to track size actually copied per-thread. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8968 Test Plan: tests updated. DB::GetLiveFilesStorageInfo mostly tested by use in CheckpointImpl. DBTest.SnapshotFiles updated to also test GetLiveFilesStorageInfo, including reading the data after DB close. Added CheckpointTest.CheckpointWithDbPath (NotSupported). Reviewed By: siying Differential Revision: D31242045 Pulled By: pdillinger fbshipit-source-id: b183d1ce9799e220daaefd6b3b5365d98de676c0	4 years ago
Zhichao Cao	699f45049d	Introduce a mechanism to dump out blocks from block cache and re-insert to secondary cache (#8912 ) Summary: Background: Cache warming up will cause potential read performance degradation due to reading blocks from storage to the block cache. Since in production, the workload and access pattern to a certain DB is stable, it is a potential solution to dump out the blocks belonging to a certain DB to persist storage (e.g., to a file) and bulk-load the blocks to Secondary cache before the DB is relaunched. For example, when migrating a DB form host A to host B, it will take a short period of time, the access pattern to blocks in the block cache will not change much. It is efficient to dump out the blocks of certain DB, migrate to the destination host and insert them to the Secondary cache before we relaunch the DB. Design: we introduce the interface of CacheDumpWriter and CacheDumpRead for user to store the blocks dumped out from block cache. RocksDB will encode all the information and send the string to the writer. User can implement their own writer it they want. CacheDumper and CacheLoad are introduced to save the blocks and load the blocks respectively. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8912 Test Plan: add new tests to lru_cache_test and pass make check. Reviewed By: pdillinger Differential Revision: D31452871 Pulled By: zhichao-cao fbshipit-source-id: 11ab4f5d03e383f476947116361d54188d36ec48	4 years ago
mrambacher	7fd68b7c39	Make WalFilter, SstPartitionerFactory, FileChecksumGenFactory, and TableProperties Customizable (#8638 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8638 Reviewed By: zhichao-cao Differential Revision: D31024729 Pulled By: mrambacher fbshipit-source-id: 954c04ccab0b8dee64050a27aadf78ed119106c0	4 years ago

1 2 3 4 5 ...

429 Commits (f24ef5d6ab8bcde242508536ff504f2c3e304600)