Summary:
A framework of trace analyzing for RocksDB
After collecting the trace by using the tool of [PR #3837](https://github.com/facebook/rocksdb/pull/3837). User can use the Trace Analyzer to interpret, analyze, and characterize the collected workload.
**Input:**
1. trace file
2. Whole keys space file
**Statistics:**
1. Access count of each operation (Get, Put, Delete, SingleDelete, DeleteRange, Merge) in each column family.
2. Key hotness (access count) of each one
3. Key space separation based on given prefix
4. Key size distribution
5. Value size distribution if appliable
6. Top K accessed keys
7. QPS statistics including the average QPS and peak QPS
8. Top K accessed prefix
9. The query correlation analyzing, output the number of X after Y and the corresponding average time
intervals
**Output:**
1. key access heat map (either in the accessed key space or whole key space)
2. trace sequence file (interpret the raw trace file to line base text file for future use)
3. Time serial (The key space ID and its access time)
4. Key access count distritbution
5. Key size distribution
6. Value size distribution (in each intervals)
7. whole key space separation by the prefix
8. Accessed key space separation by the prefix
9. QPS of each operation and each column family
10. Top K QPS and their accessed prefix range
**Test:**
1. Added the unit test of analyzing Get, Put, Delete, SingleDelete, DeleteRange, Merge
2. Generated the trace and analyze the trace
**Implemented but not tested (due to the limitation of trace_replay):**
1. Analyzing Iterator, supporting Seek() and SeekForPrev() analyzing
2. Analyzing the number of Key found by Get
**Future Work:**
1. Support execution time analyzing of each requests
2. Support cache hit situation and block read situation of Get
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4091
Differential Revision: D9256157
Pulled By: zhichao-cao
fbshipit-source-id: f0ceacb7eedbc43a3eee6e85b76087d7832a8fe6
Summary:
Added DataBlockIndexType option in BlockBasedTableOptions.
```
enum DataBlockIndexType : char {
kDataBlockBinarySearch = 0, // traditional block type
kDataBlockHashIndex = 1, // additional hash index appended to the end.
};
```
The default type is the traditional binary seek option: `kDataBlockBinarySearch`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4150
Differential Revision: D8895958
Pulled By: fgwu
fbshipit-source-id: 480adef48104cf11d30db3bad9a73f98b4a80c10
Summary:
We forgot to dump FIFO and Universal compaction options to the LOG when any option was dynamically changed via `SetOptions` API. Now added those options also to `MutableCFOptions::Dump`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4140
Differential Revision: D8865634
Pulled By: sagar0
fbshipit-source-id: 05a93e26ab8e72fec6249acccd09b0eb3e1ef0ac
Summary:
…ression
For `CompressionType` we have options `compression` and `bottommost_compression`. Thus, to make the compression options consitent with the compression type when bottommost_compression is enabled, we add the bottommost_compression_opts
Closes https://github.com/facebook/rocksdb/pull/3985
Reviewed By: riversand963
Differential Revision: D8385911
Pulled By: zhichao-cao
fbshipit-source-id: 07bc533dd61bcf1cef5927d8d62901c13d38d5fc
Summary:
Previously in https://github.com/facebook/rocksdb/pull/3601 bloom filter will only be checked if `prefix_extractor` in the mutable_cf_options matches the one found in the SST file.
This PR relaxes the requirement by checking if all keys in the range [user_key, iterate_upper_bound) all share the same prefix after transforming using the BF in the SST file. If so, the bloom filter is considered compatible and will continue to be looked at.
Closes https://github.com/facebook/rocksdb/pull/3899
Differential Revision: D8157459
Pulled By: miasantreble
fbshipit-source-id: 18d17cba56a1005162f8d5db7a27aba277089c41
Summary:
Top-level index in partitioned index/filter blocks are small and could be pinned in memory. So far we use that by cache_index_and_filter_blocks to false. This however make it difficult to keep account of the total memory usage. This patch introduces pin_top_level_index_and_filter which in combination with cache_index_and_filter_blocks=true keeps the top-level index in cache and yet pinned them to avoid cache misses and also cache lookup overhead.
Closes https://github.com/facebook/rocksdb/pull/4037
Differential Revision: D8596218
Pulled By: maysamyabandeh
fbshipit-source-id: 3a5f7f9ca6b4b525b03ff6bd82354881ae974ad2
Summary:
By using WritableFileWriter rather than WritableFile directly, we can buffer multiple Append() calls to one write() file system call, which will be expensive to underlying Env without its own write buffering.
Closes https://github.com/facebook/rocksdb/pull/3882
Differential Revision: D8080673
Pulled By: siying
fbshipit-source-id: e0db900cb3c178166aa738f3985db65e3ae2cf1b
Summary:
Currently it is not possible to change bloom filter config without restart the db, which is causing a lot of operational complexity for users.
This PR aims to make it possible to dynamically change bloom filter config.
Closes https://github.com/facebook/rocksdb/pull/3601
Differential Revision: D7253114
Pulled By: miasantreble
fbshipit-source-id: f22595437d3e0b86c95918c484502de2ceca120c
Summary:
Currently manual_wal_flush if set in the options will be used only for the wal files created during wal switch. The configuration thus does not affect the first wal file. The patch fixes that and also update the related unit tests.
This PR is built on top of https://github.com/facebook/rocksdb/pull/3756
Closes https://github.com/facebook/rocksdb/pull/3824
Differential Revision: D7909153
Pulled By: maysamyabandeh
fbshipit-source-id: 024ed99d2555db06bf096c902b998e432bb7b9ce
Summary:
In `cf_options_type_info`, the deprecated options are all considered to have offset zero in the `MutableCFOptions` struct. Previously we weren't checking in `GetMutableOptionsFromStrings` whether the provided option was deprecated or not and simply writing the provided value to the offset specified by `cf_options_type_info`. That meant setting any deprecated option would overwrite the first element in the struct, which is `write_buffer_size`. `db_stress` hit this often since it calls `SetOptions` with `soft_rate_limit=0` and `hard_rate_limit=0`, which are both deprecated so cause `write_buffer_size` to be set to zero, which causes it to crash on the following assertion:
```
db_stress: db/memtable.cc:106: rocksdb::MemTable::MemTable(const rocksdb::InternalKeyComparator&, const rocksdb::ImmutableCFOptions&, const rocksdb::MutableCFOptions&, rocksdb::WriteBufferManager*, rocksdb::SequenceNumber, uint32_t): Assertion `!ShouldScheduleFlush()' failed.
```
We fix it by skipping deprecated options (and logging a warning) when users provide them to `SetOptions`. I didn't want to fail the call for compatibility reasons.
Closes https://github.com/facebook/rocksdb/pull/3700
Differential Revision: D7572596
Pulled By: ajkr
fbshipit-source-id: bd5d84e14c0c39f30c5d4c6df7c1503d2c28ecf1
Summary:
1. Remove redundant text.
2. Make terminology consistent across all comments and doc of RocksDB. Also do
our best to conform to conventions. Specifically, use 'callback' instead of
'call-back' [wikipedia](https://en.wikipedia.org/wiki/Callback_(computer_programming)).
Closes https://github.com/facebook/rocksdb/pull/3693
Differential Revision: D7560396
Pulled By: riversand963
fbshipit-source-id: ba8c251c487f4e7d1872a1a8dc680f9e35a6ffb8
Summary:
In this change, an option to set different paths for different column families is added.
This option is set via cf_paths setting of ColumnFamilyOptions. This option will work in a similar fashion to db_paths setting. Cf_paths is a vector of Dbpath values which contains a pair of the absolute path and target size. Multiple levels in a Column family can go to different paths if cf_paths has more than one path.
To maintain backward compatibility, if cf_paths is not specified for a column family, db_paths setting will be used. Note that, if db_paths setting is also not specified, RocksDB already has code to use db_name as the only path.
Changes :
1) A new member "cf_paths" is added to ImmutableCfOptions. This is set, based on cf_paths setting of ColumnFamilyOptions and db_paths setting of ImmutableDbOptions. This member is used to identify the path information whenever files are accessed.
2) Validation checks are added for cf_paths setting based on existing checks for db_paths setting.
3) DestroyDB, PurgeObsoleteFiles etc. are edited to support multiple cf_paths.
4) Unit tests are added appropriately.
Closes https://github.com/facebook/rocksdb/pull/3102
Differential Revision: D6951697
Pulled By: ajkr
fbshipit-source-id: 60d2262862b0a8fd6605b09ccb0da32bb331787d
Summary:
Level Compaction with TTL.
As of today, a file could exist in the LSM tree without going through the compaction process for a really long time if there are no updates to the data in the file's key range. For example, in certain use cases, the keys are not actually "deleted"; instead they are just set to empty values. There might not be any more writes to this "deleted" key range, and if so, such data could remain in the LSM for a really long time resulting in wasted space.
Introducing a TTL could solve this problem. Files (and, in turn, data) older than TTL will be scheduled for compaction when there is no other background work. This will make the data go through the regular compaction process and get rid of old unwanted data.
This also has the (good) side-effect of all the data in the non-bottommost level being newer than ttl, and all data in the bottommost level older than ttl. It could lead to more writes while reducing space.
This functionality can be controlled by the newly introduced column family option -- ttl.
TODO for later:
- Make ttl mutable
- Extend TTL to Universal compaction as well? (TTL is already supported in FIFO)
- Maybe deprecate CompactionOptionsFIFO.ttl in favor of this new ttl option.
Closes https://github.com/facebook/rocksdb/pull/3591
Differential Revision: D7275442
Pulled By: sagar0
fbshipit-source-id: dcba484717341200d419b0953dafcdf9eb2f0267
Summary:
Provide a block_align option in BlockBasedTableOptions to allow
alignment of SST file data blocks. This will avoid higher
IOPS/throughput load due to < 4KB data blocks spanning 2 4KB pages.
When this option is set to true, the block alignment is set to lower of
block size and 4KB.
Closes https://github.com/facebook/rocksdb/pull/3502
Differential Revision: D7400897
Pulled By: anand1976
fbshipit-source-id: 04cc3bd144e88e3431a4f97604e63ad7a0f06d44
Summary:
enable_pipelined_write was not set in BuildDBOptions() causing its default
value to be dumped in the OPTIONS file
Closes https://github.com/facebook/rocksdb/pull/3585
Differential Revision: D7226395
Pulled By: yiwu-arbug
fbshipit-source-id: 45a659a48d18103ac9ee74bb8805dd0a6ec12474
Summary:
RocksDB should always be able to parse an option file generated using the same or lower version. Unknown option should only happen if it is from a higher version. Change the behavior of RocksDBOptionsParser::Parse()'s behavior with ignore_unknown_options=true so that unknown option from a lower or the same version will never be skipped.
Closes https://github.com/facebook/rocksdb/pull/3527
Differential Revision: D7048851
Pulled By: siying
fbshipit-source-id: e261caea12f6515611a4a29f39acf2b619df2361
Summary:
options/cf_options.cc:
77 memtable_insert_with_hint_prefix_extractor(
CID 1396208 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
2. uninit_member: Non-static class member info_log_level is not initialized in this constructor nor in any functions that it calls.
Closes https://github.com/facebook/rocksdb/pull/3106
Differential Revision: D6874689
Pulled By: sagar0
fbshipit-source-id: b5cd2d13915fd86d87260050f9c5d117615bbe30
Summary:
FIFO and Universal compaction options were recently made dynamic, but I forgot to update these comments. These would mislead anyone who is reading the code.
Closes https://github.com/facebook/rocksdb/pull/3399
Differential Revision: D6786358
Pulled By: sagar0
fbshipit-source-id: 57cfc412f63deaee29bbd82b863304821d60057d
Summary:
Hello and thank you for RocksDB,
While looking into the buffered io used when an `OPTIONS` file is read I noticed the `OPTIONS` files produced by RocksDB 5.8.8 (and head of master) were just over 4096 bytes in size, resulting in the version of glibc I am using (glibc-2.17-196.el7) (on the filesystem used) being passed a 4K buffer for the `fread_unlocked` call and 2 system call reads using a 4096 buffer being used to read the contents of the `OPTIONS` file.
If the buffer size is increased to 8192 then 1 system call read is used to read the contents.
As I think the buffer size is just used for reading `OPTIONS` files, and I thought it likely that `OPTIONS` files have increased in size (as more options are added), I thought I would suggest an increase.
[ If the comments from the top of the `OPTIONS` file are removed, and white space from the start of lines is removed then the size can be reduced to be under 4K, but as more options are added the size seems likely to grow again. ]
Create a new database:
```
> ./ldb --create_if_missing --db=/tmp/rdb_tmp put 1 1
OK
```
The OPTIONS file is 4252 bytes:
```
> stat /tmp/rdb_tmp/OPTIONS* | head -n 2
File: ‘/tmp/rdb_tmp/OPTIONS-000005’
Size: 4252 Blocks: 16 IO Block: 4096 regular file
```
Before, the 4096 byte buffer is used from 2 system read calls:
```
> strace -f ./ldb --try_load_options --db=/tmp/rdb_tmp get DOES_NOT_EXIST 2>&1 |
grep -A 1 'RocksDB option file'
read(3, "# This is a RocksDB option file."..., 4096) = 4096
read(3, "e\n metadata_block_size=4096\n c"..., 4096) = 156
```
ltrace shows 4096 passed to fread_unlocked
```
> ltrace -S -f ./ldb --try_load_options --db=/tmp/rdb_tmp get DOES_NOT_EXIST 2>&1 |
grep -C 3 'RocksDB option file'
[pid 51013] fread_unlocked(0x7ffd5fbf2d50, 1, 4096, 0x7fd2e084e780 <unfinished ...>
[pid 51013] fstat@SYS(3, 0x7ffd5fbf28f0) = 0
[pid 51013] mmap@SYS(nil, 4096, 3, 34, -1, 0) = 0x7fd2e318c000
[pid 51013] read@SYS(3, "# This is a RocksDB option file."..., 4096) = 4096
[pid 51013] <... fread_unlocked resumed> ) = 4096
...
```
After, the 8192 byte buffer is used from 1 system read call:
```
> strace -f ./ldb --try_load_options --db=/tmp/rdb_tmp get DOES_NOT_EXIST 2>&1 | grep -A 1 'RocksDB option file'
read(3, "# This is a RocksDB option file."..., 8192) = 4252
read(3, "", 4096) = 0
```
ltrace shows 8192 passed to fread_unlocked
```
> ltrace -S -f ./ldb --try_load_options --db=/tmp/rdb_tmp get DOES_NOT_EXIST 2>&1 | grep -C 3 'RocksDB option file'
[pid 146611] fread_unlocked(0x7ffcfba382f0, 1, 8192, 0x7fc4e844e780 <unfinished ...>
[pid 146611] fstat@SYS(3, 0x7ffcfba380f0) = 0
[pid 146611] mmap@SYS(nil, 4096, 3, 34, -1, 0) = 0x7fc4eaee0000
[pid 146611] read@SYS(3, "# This is a RocksDB option file."..., 8192) = 4252
[pid 146611] read@SYS(3, "", 4096) = 0
[pid 146611] <... fread_unlocked resumed> ) = 4252
[pid 146611] feof(0x7fc4e844e780) = 1
```
Closes https://github.com/facebook/rocksdb/pull/3294
Differential Revision: D6653684
Pulled By: ajkr
fbshipit-source-id: 222f25f5442fefe1dcec18c700bd9e235bb63491
Summary:
Let me know if more test coverage is needed
Closes https://github.com/facebook/rocksdb/pull/3213
Differential Revision: D6457165
Pulled By: miasantreble
fbshipit-source-id: 3f944abff28aa7775237f1c4f61c64ccbad4eea9
Summary:
The Options header file recommends using max_background_jobs rather than
directly setting max_background_compactions or max_background_flushes.
I've personally seen a performance problem where stalls were happening
because the one background flushing thread was blocked that was fixed
by this change -
https://github.com/cockroachdb/cockroach/issues/19699#issuecomment-347672485
Closes https://github.com/facebook/rocksdb/pull/3208
Differential Revision: D6473178
Pulled By: ajkr
fbshipit-source-id: 67c892ceb7b1909d251492640cb15a0f2262b7ed
Summary:
I started adding gflags support for cmake on linux and got frustrated that I'd need to duplicate the build_detect_platform logic, which determines namespace based on attempting compilation. We can do it differently -- use the GFLAGS_NAMESPACE macro if available, and if not, that indicates it's an old gflags version without configurable namespace so we can simply hardcode "google".
Closes https://github.com/facebook/rocksdb/pull/3212
Differential Revision: D6456973
Pulled By: ajkr
fbshipit-source-id: 3e6d5bde3ca00d4496a120a7caf4687399f5d656
Summary:
Problem: Option string accepts only cache_size as parameter for block_cache which is specified as "block_cache=1M".
It doesn't accept other parameters like num_shards etc.
Changes :
1) ParseBlockBasedTableOption in block_based_table_factory is edited to accept cache options in the format "block_cache=<cache_size>:<num_shard_bits>:<strict_capacity_limit>:<high_pri_pool_ratio>".
Options other than cache_size are optional to maintain backward compatibility. The changes are valid for block_cache_compressed as well.
For example, "block_cache=1M:6:true:0.5", "block_cache=1M:6:true", "block_cache=1M:6" and "block_cache=1M" are all valid option strings.
2) Corresponding unit tests are added.
Closes https://github.com/facebook/rocksdb/pull/3108
Differential Revision: D6420997
Pulled By: sagar0
fbshipit-source-id: cdea8b785688d2802907974af27225ccc1c0cd43
Summary:
Static variables in header files will be instantiated in every file that includes the header file. This patch moves some of them from options_helper.h to its .cc files. It also moves the static variable out of the offset_of since the template function could also lead to multiple instantiation perhaps due to inlining.
Fixes https://github.com/facebook/rocksdb/issues/3176
Closes https://github.com/facebook/rocksdb/pull/3178
Differential Revision: D6363794
Pulled By: maysamyabandeh
fbshipit-source-id: d0a07f061b4d992ab4e0de2706e622131d258fdd
Summary:
The motivation for this PR is to add to RocksDB support for differential (incremental) snapshots, as snapshot of the DB changes between two points in time (one can think of it as diff between to sequence numbers, or the diff D which can be thought of as an SST file or just set of KVs that can be applied to sequence number S1 to get the database to the state at sequence number S2).
This feature would be useful for various distributed storages layers built on top of RocksDB, as it should help reduce resources (time and network bandwidth) needed to recover and rebuilt DB instances as replicas in the context of distributed storages.
From the API standpoint that would like client app requesting iterator between (start seqnum) and current DB state, and reading the "diff".
This is a very draft PR for initial review in the discussion on the approach, i'm going to rework some parts and keep updating the PR.
For now, what's done here according to initial discussions:
Preserving deletes:
- We want to be able to optionally preserve recent deletes for some defined period of time, so that if a delete came in recently and might need to be included in the next incremental snapshot it would't get dropped by a compaction. This is done by adding new param to Options (preserve deletes flag) and new variable to DB Impl where we keep track of the sequence number after which we don't want to drop tombstones, even if they are otherwise eligible for deletion.
- I also added a new API call for clients to be able to advance this cutoff seqnum after which we drop deletes; i assume it's more flexible to let clients control this, since otherwise we'd need to keep some kind of timestamp < -- > seqnum mapping inside the DB, which sounds messy and painful to support. Clients could make use of it by periodically calling GetLatestSequenceNumber(), noting the timestamp, doing some calculation and figuring out by how much we need to advance the cutoff seqnum.
- Compaction codepath in compaction_iterator.cc has been modified to avoid dropping tombstones with seqnum > cutoff seqnum.
Iterator changes:
- couple params added to ReadOptions, to optionally allow client to request internal keys instead of user keys (so that client can get the latest value of a key, be it delete marker or a put), as well as min timestamp and min seqnum.
TableCache changes:
- I modified table_cache code to be able to quickly exclude SST files from iterators heep if creation_time on the file is less then iter_start_ts as passed in ReadOptions. That would help a lot in some DB settings (like reading very recent data only or using FIFO compactions), but not so much for universal compaction with more or less long iterator time span.
What's left:
- Still looking at how to best plug that inside DBIter codepath. So far it seems that FindNextUserKeyInternal only parses values as UserKeys, and iter->key() call generally returns user key. Can we add new API to DBIter as internal_key(), and modify this internal method to optionally set saved_key_ to point to the full internal key? I don't need to store actual seqnum there, but I do need to store type.
Closes https://github.com/facebook/rocksdb/pull/2999
Differential Revision: D6175602
Pulled By: mikhail-antonov
fbshipit-source-id: c779a6696ee2d574d86c69cec866a3ae095aa900
Summary:
GetCommitTimeWriteBatch is currently used to store some state as part of commit in 2PC. In MyRocks it is specifically used to store some data that would be needed only during recovery. So it is not need to be stored in memtable right after each commit.
This patch enables an optimization to write the GetCommitTimeWriteBatch only to the WAL. The batch will be written to memtable during recovery when the WAL is replayed. To cover the case when WAL is deleted after memtable flush, the batch is also buffered and written to memtable right before each memtable flush.
Closes https://github.com/facebook/rocksdb/pull/3071
Differential Revision: D6148023
Pulled By: maysamyabandeh
fbshipit-source-id: 2d09bae5565abe2017c0327421010d5c0d55eaa7
Summary:
228MutableDBOptions::MutableDBOptions()
229 : max_background_jobs(2),
230 base_background_compactions(-1),
231 max_background_compactions(-1),
232 avoid_flush_during_shutdown(false),
233 delayed_write_rate(2 * 1024U * 1024U),
234 max_total_wal_size(0),
235 delete_obsolete_files_period_micros(6ULL * 60 * 60 * 1000000),
236 stats_dump_period_sec(600),
2. uninit_member: Non-static class member bytes_per_sync is not initialized in this constructor nor in any functions that it calls.
CID 1419857 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
4. uninit_member: Non-static class member wal_bytes_per_sync is not initialized in this constructor nor in any functions that it calls.
237 max_open_files(-1) {}
Closes https://github.com/facebook/rocksdb/pull/3069
Differential Revision: D6170424
Pulled By: ajkr
fbshipit-source-id: 1f94e86b87611ad2330b8b1707911150978d68b8
Summary:
- for `SeekToFirst()`, just convert it to a regular `Seek()` if lower bound is specified
- for operations that iterate backwards over user keys (`SeekForPrev`, `SeekToLast`, `Prev`), change `PrevInternal` to check whether user key went below lower bound every time the user key changes -- same approach we use to ensure we stay within a prefix when `prefix_same_as_start=true`.
Closes https://github.com/facebook/rocksdb/pull/3074
Differential Revision: D6158654
Pulled By: ajkr
fbshipit-source-id: cb0e3a922e2650d2cd4d1c6e1c0f1e8b729ff518
Summary:
ColumnFamilyOptions::compaction_options_fifo and all its sub-fields can be set dynamically now.
Some of the ways in which the fifo compaction options can be set are:
- `SetOptions({{"compaction_options_fifo", "{max_table_files_size=1024}"}})`
- `SetOptions({{"compaction_options_fifo", "{ttl=600;}"}})`
- `SetOptions({{"compaction_options_fifo", "{max_table_files_size=1024;ttl=600;}"}})`
- `SetOptions({{"compaction_options_fifo", "{max_table_files_size=51;ttl=49;allow_compaction=true;}"}})`
Most of the code has been made generic enough so that it could be reused later to make universal options (and other such nested defined-types) dynamic with very few lines of parsing/serializing code changes.
Introduced a few new functions like `ParseStruct`, `SerializeStruct` and `GetStringFromStruct`.
The duplicate code in `GetStringFromDBOptions` and `GetStringFromColumnFamilyOptions` has been moved into `GetStringFromStruct`. So they become just simple wrappers now.
Closes https://github.com/facebook/rocksdb/pull/3006
Differential Revision: D6058619
Pulled By: sagar0
fbshipit-source-id: 1e8f78b3374ca5249bb4f3be8a6d3bb4cbc52f92
Summary:
* make `checksum_type_string_map` available for lite
* comment out `FilesPerLevel` in lite mode.
* travis and legocastle lite build also build `all` target and run tests
Closes https://github.com/facebook/rocksdb/pull/3015
Differential Revision: D6069822
Pulled By: yiwu-arbug
fbshipit-source-id: 9fe92ac220e711e9e6ed4e921bd25ef4314796a0
Summary:
Currently, RocksDB does not allow reopening a preexisting DB with no merge operator defined, with a merge operator defined. This means that if a DB ever want to add a merge operator, there's no way to do so currently.
Fix this by adding a new verification type `kByNameAllowFromNull` which will allow old values to be nullptr, and new values to be non-nullptr.
Closes https://github.com/facebook/rocksdb/pull/2958
Differential Revision: D5961131
Pulled By: lth
fbshipit-source-id: 06179bebd0d90db3d43690b5eb7345e2d5bab1eb
Summary:
SUMMARY
Moves the bytes_per_sync and wal_bytes_per_sync options from immutableoptions to mutable options. Also if wal_bytes_per_sync is changed, the wal file and memtables are flushed.
TEST PLAN
ran make check
all passed
Two new tests SetBytesPerSync, SetWalBytesPerSync check that after issuing setoptions with a new value for the var, the db options have the new value.
Closes https://github.com/facebook/rocksdb/pull/2893
Reviewed By: yiwu-arbug
Differential Revision: D5845814
Pulled By: TheRushingWookie
fbshipit-source-id: 93b52d779ce623691b546679dcd984a06d2ad1bd
Summary:
By default the seq number in DB is increased once per written key. WritePrepared txns requires the seq to be increased once per the entire batch so that the seq would be used as the prepare timestamp by which the transaction is identified. Also we need to increase seq for the commit marker since it would give a unique id to the commit timestamp of transactions.
Two unit tests are added to verify our understanding of how the seq should be increased. The recovery path requires much more work and is left to another patch.
Closes https://github.com/facebook/rocksdb/pull/2885
Differential Revision: D5837843
Pulled By: maysamyabandeh
fbshipit-source-id: a08960b93d727e1cf438c254d0c2636fb133cc1c
Summary:
Our current implementation of (semi-)copy constructor of DBOptions and ColumnFamilyOptions seems to intend value by value copy, which is what the default copy constructor does anyway. Moreover not using the default constructor has the risk of forgetting to add newly added options.
As an example, allow_2pc seems to be forgotten in the copy constructor which was causing one of the unit tests not seeing its effect.
Closes https://github.com/facebook/rocksdb/pull/2888
Differential Revision: D5846368
Pulled By: maysamyabandeh
fbshipit-source-id: 1ee92a2aeae93886754b7bc039c3411ea2458683
Summary:
test the `DBOptions(const Options&)` and `ColumnFamilyOptions(const Options&)` constructors. Actually this'll work better once we refactor `RandomInitDBOptions` / `RandomInitCFOptions` to use the authoritative sources of struct members: `db_options_type_info` / `cf_options_type_info` (internal task T21804189 for this).
Closes https://github.com/facebook/rocksdb/pull/2873
Differential Revision: D5817141
Pulled By: ajkr
fbshipit-source-id: 8567c20feced9d1751fdf1f4383e2af30f7e3591
Summary:
currently `ImmutableDBOptions::Dump` use default value for `concurrent_prepare` and `manual_wal_flush`, because DBOptions ctor does not init those member variables.
so in LOG file, it will be
```
Options.concurrent_prepare: 0
Options.manual_wal_flush: 0
```
Closes https://github.com/facebook/rocksdb/pull/2864
Differential Revision: D5816240
Pulled By: ajkr
fbshipit-source-id: 82335e8bcae3dceedc6a99224e7998de5fad1e50