Summary:
We recently removed the dependencies of core components on gtest. Add a Travis test to make sure it doesn't regress. Change cmake setting so that the gtest related components are only included when tests, benchmarks or stress tools are included in the build. Add this build setting in Travis to confirm it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6921
Test Plan: See Travis passes
Reviewed By: ajkr
Differential Revision: D21863564
fbshipit-source-id: df26f50a8305a04ff19ffa8069a1857ecee10289
Summary:
This reverts commit 8d87e9cea1.
Based on offline discussions, it's too early to upgrade to gtest 1.10, as it prevents some developers from using an older version of gtest to integrate to some other systems. Revert it for now.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6923
Reviewed By: pdillinger
Differential Revision: D21864799
fbshipit-source-id: d0726b1ff649fc911b9378f1763316200bd363fc
Summary:
GetTestDirectory implies a file system operation (it creates the
default test directory if missing), so it should be routed to
the FileSystem rather than the Env.
Also remove the GetTestDirectory implementation in the PosixEnv,
since it overrides GetTestDirectory in CompositeEnv making it
impossible to override with a custom FileSystem.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6896
Reviewed By: cheng-chang
Differential Revision: D21868984
Pulled By: ajkr
fbshipit-source-id: e79bfef758d06dacef727c54b96abe62e78726fd
Summary:
Added setting of zstd_max_train_bytes compression option parameter to c interop.
rocksdb_options_set_bottommost_compression_options was using bool parameter and thus not exported, updated it to unsigned char and added to c.h as well.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6796
Reviewed By: cheng-chang
Differential Revision: D21611471
Pulled By: ajkr
fbshipit-source-id: caaaf153de934837ad9af283c7f8c025ff0b0cf5
Summary:
The OptionTypeInfo::Vector method allows a vector<T> to be converted to/from strings via the options.
The kVectorInt and kVectorCompressionType vectors were replaced with this methodology.
As part of this change, the NextToken method was added to the OptionTypeInfo. This method was refactored from code within the StringToMap function.
Future types that could use this functionality include the EventListener vectors.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6424
Reviewed By: cheng-chang
Differential Revision: D21832368
Pulled By: zhichao-cao
fbshipit-source-id: e1ca766faff139d54e6e8407a9ec09ece6517439
Summary:
Rocksdb is using the c++11 std::threads feature. The issue is that
MINGW only supports it when using Posix threads.
This change will allow rocksdb::port::WindowsThread to be replaced
with std::thread, which in turn will allow Rocksdb to be cross
compiled using MINGW.
At the same time, we'll have to use GetCurrentProcessId instead of _getpid.
Signed-off-by: Lucian Petrut <lpetrut@cloudbasesolutions.com>
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6865
Reviewed By: cheng-chang
Differential Revision: D21864285
Pulled By: ajkr
fbshipit-source-id: 0982eed313e7d34d351b1364c1ccc722da473205
Summary:
People keep breaking the gcc 4.8 compilation due to different
warnings for shadowing member functions with locals. Adding to Travis
to keep compatibility. (gcc 4.8 is default on CentOS 7.)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6915
Test Plan: local and Travis
Reviewed By: siying
Differential Revision: D21842894
Pulled By: pdillinger
fbshipit-source-id: bdcd4385127ee5d1cc222d87e53fb3695c32a9d4
Summary:
The patch cleans up the code and improves the consistency checks around
adding/deleting table files in `VersionBuilder`. Namely, it makes the checks
stricter and improves them in the following ways:
1) A table file can now only be deleted from the LSM tree using the level it
resides on. Earlier, there was some unnecessary wiggle room for
trivially moved files (they could be deleted using a lower level number than
the actual one).
2) A table file cannot be added to the tree if it is already present in the tree
on any level (not just the target level). The earlier code only had an assertion
(which is a no-op in release builds) that the newly added file is not already
present on the target level.
3) The above consistency checks around state transitions are now mandatory,
as opposed to the earlier `CheckConsistencyForDeletes`, which was a no-op
in release mode unless `force_consistency_checks` was set to `true`. The rationale
here is that assuming that the initial state is consistent, a valid transition leads to a
next state that is also consistent; however, an *invalid* transition offers no such
guarantee. Hence it makes sense to validate the transitions unconditionally,
and save `force_consistency_checks` for the paranoid checks that re-validate
the entire state.
4) The new checks build on the mechanism introduced in https://github.com/facebook/rocksdb/pull/6862,
which enables us to efficiently look up the location (level and position within level)
of files in a `Version` by file number. This makes the consistency checks much more
efficient than the earlier `CheckConsistencyForDeletes`, which essentially
performed a linear search.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6901
Test Plan:
Extended the unit tests and ran:
`make check`
`make whitebox_crash_test`
Reviewed By: ajkr
Differential Revision: D21822714
Pulled By: ltamasi
fbshipit-source-id: e2b29c8b6da1bf0f59004acc889e4870b2d18215
Summary:
Because ARM and some other platforms have a larger cache line
size, they have a larger minimum filter size, which causes recently
added PartitionedMultiGet test in db_bloom_filter_test to fail on those
platforms. The code would actually end up using larger partitions,
because keys_per_partition_ would be 0 and never == number of keys
added.
The code now attempts to get as close as possible to the small target
size, while fully utilizing that filter size, if the target partition
size is smaller than the minimum filter size.
Also updated the test to break more uniformly across platforms
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6905
Test Plan: updated test, tested on ARM
Reviewed By: anand1976
Differential Revision: D21840639
Pulled By: pdillinger
fbshipit-source-id: 11684b6d35f43d2e98b85ddb2c8dcfd59d670817
Summary:
x.size() -1 or y - 1 can overflow to an extremely large value when x.size() pr y is 0 when they are unsigned type. The end condition of i in the for loop will be extremely large, potentially causes segment fault. Fix them.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6902
Test Plan: pass make asan_check
Reviewed By: ajkr
Differential Revision: D21843767
Pulled By: zhichao-cao
fbshipit-source-id: 5b8b88155ac5a93d86246d832e89905a783bb5a1
Summary:
Replace Status with IOStatus in CopyFile and CreateFile.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6916
Test Plan: pass make asan_check
Reviewed By: cheng-chang
Differential Revision: D21843775
Pulled By: zhichao-cao
fbshipit-source-id: 524d4a0fcf47f0941b923da0346e0de71607f5f6
Summary:
production code under utilities/cassandra depends on gtest.h. Remove them.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6908
Test Plan: Run all existing tests.
Reviewed By: ajkr
Differential Revision: D21842606
fbshipit-source-id: a098e0b49c9aeac51cc90a79562ad9897a36122c
Summary:
The implementation of GetApproximateSizes was inconsistent in
its treatment of the size of non-data blocks of SST files, sometimes
including and sometimes now. This was at its worst with large portion
of table file used by filters and querying a small range that crossed
a table boundary: the size estimate would include large filter size.
It's conceivable that someone might want only to know the size in terms
of data blocks, but I believe that's unlikely enough to ignore for now.
Similarly, there's no evidence the internal function AppoximateOffsetOf
is used for anything other than a one-sided ApproximateSize, so I intend
to refactor to remove redundancy in a follow-up commit.
So to fix this, GetApproximateSizes (and implementation details
ApproximateSize and ApproximateOffsetOf) now consistently include in
their returned sizes a portion of table file metadata (incl filters
and indexes) based on the size portion of the data blocks in range. In
other words, if a key range covers data blocks that are X% by size of all
the table's data blocks, returned approximate size is X% of the total
file size. It would technically be more accurate to attribute metadata
based on number of keys, but that's not computationally efficient with
data available and rarely a meaningful difference.
Also includes miscellaneous comment improvements / clarifications.
Also included is a new approximatesizerandom benchmark for db_bench.
No significant performance difference seen with this change, whether ~700 ops/sec with cache_index_and_filter_blocks and small cache or ~150k ops/sec without cache_index_and_filter_blocks.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6784
Test Plan:
Test added to DBTest.ApproximateSizesFilesWithErrorMargin.
Old code running new test...
[ RUN ] DBTest.ApproximateSizesFilesWithErrorMargin
db/db_test.cc:1562: Failure
Expected: (size) <= (11 * 100), actual: 9478 vs 1100
Other tests updated to reflect consistent accounting of metadata.
Reviewed By: siying
Differential Revision: D21334706
Pulled By: pdillinger
fbshipit-source-id: 6f86870e45213334fedbe9c73b4ebb1d8d611185
Summary:
Release code now depends on gtest, indirectly through including "test_util/testharness.h". This creates multiple problems. One important reason is the definition of IGNORE_STATUS_IF_ERROR() in test_util/testharness.h. Move it to sync_point.h instead.
Note that utilities/cassandra/format.h still depends on "test_util/testharness.h". This will be resolved in a separate diff.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6907
Test Plan: Run all existing tests.
Reviewed By: ajkr
Differential Revision: D21829884
fbshipit-source-id: 9253c19ffde2936f3ae68998210f8e54f645a6e6
Summary:
We may sometimes read the uncompression dictionary when its not
necessary, when we lookup a key in an SST file but the index indicates
the key is not present. This can happen with index_type 3.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6906
Test Plan: make check
Reviewed By: cheng-chang
Differential Revision: D21828944
Pulled By: anand1976
fbshipit-source-id: 7aef4f0a39548d0874eafefd2687006d2652f9bb
Summary:
Right now in FB environment, wrong gcov is used. Fix it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6904
Test Plan: "make coverage" and watch results.
Reviewed By: riversand963
Differential Revision: D21824291
fbshipit-source-id: 666011fd86c36adafa09ebd9eb97742f94fb90bb
Summary:
Currently we rely on `BlockContents` to implicitly free the allocated scratch buffer, but when IO error happens, it doesn't make sense to construct the `BlockContents` which might be corrupted. In the stress test, we find that `assert(req.result.size() == block_size(handle));` fails because of potential IO errors.
In this PR, we explicitly free the scratch buffer on error without constructing `BlockContents`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6903
Test Plan: watch stress test
Reviewed By: anand1976
Differential Revision: D21823869
Pulled By: cheng-chang
fbshipit-source-id: 5603fc80e9bf3f44a9d7250ddebd871afe1eb89f
Summary:
**Summary**
Remove the extraneous newline when using ldb tool. For example, the subcommand list_column_families will print an empty line to stderr even if there are no errors.
**Test plan**
Passed make check; manually tested a few ldb subcommands.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6897
Reviewed By: pdillinger
Differential Revision: D21819352
Pulled By: gg814
fbshipit-source-id: 5a16a6431bb96684fe97647f4d3ac5bf0ec7fc90
Summary:
RocksDB Makefile was assuming existence of 'python' command,
which is not present in CentOS 8. We avoid using 'python' if 'python3' is available.
Also added fancy logic to format-diff.sh to make clang-format-diff.py for Python2 work even with Python3 only (as some CentOS 8 FB machines come equipped)
Also, now use just 'python3' for PYTHON if not found so that an informative
"command not found" error will result rather than something weird.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6883
Test Plan: manually tried some variants, 'make check' on a fresh CentOS 8 machine without 'python' executable or Python2 but with clang-format-diff.py for Python2.
Reviewed By: gg814
Differential Revision: D21767029
Pulled By: pdillinger
fbshipit-source-id: 54761b376b140a3922407bdc462f3572f461d0e9
Summary:
`IterKey::UpdateInternalKey()` is an error-prone API as it's
incompatible with `IterKey::TrimAppend()`, which is used for
decoding delta-encoded internal keys. This PR stops using it in
`BlockIter`. Instead, it assigns global seqno in a separate `IterKey`'s
buffer when needed. The logic for safely getting a Slice with global
seqno properly assigned is encapsulated in `GlobalSeqnoAppliedKey`.
`BinarySeek()` is also migrated to use this API (previously it ignored
global seqno entirely).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6843
Test Plan:
benchmark setup -- single file DBs, in-memory, no compression. "normal_db"
created by regular flush; "ingestion_db" created by ingesting a file. Both
DBs have same contents.
```
$ TEST_TMPDIR=/dev/shm/normal_db/ ./db_bench -benchmarks=fillrandom,compact -write_buffer_size=10485760000 -disable_auto_compactions=true -compression_type=none -num=1000000
$ ./ldb write_extern_sst ./tmp.sst --db=/dev/shm/ingestion_db/dbbench/ --compression_type=no --hex --create_if_missing < <(./sst_dump --command=scan --output_hex --file=/dev/shm/normal_db/dbbench/000007.sst | awk 'began {print "0x" substr($1, 2, length($1) - 2), "==>", "0x" $5} ; /^Sst file format: block-based/ {began=1}')
$ ./ldb ingest_extern_sst ./tmp.sst --db=/dev/shm/ingestion_db/dbbench/
```
benchmark run command:
```
TEST_TMPDIR=/dev/shm/$DB/ ./db_bench -benchmarks=seekrandom -seek_nexts=10 -use_existing_db=true -cache_index_and_filter_blocks=false -num=1000000 -cache_size=1048576000 -threads=1 -reads=40000000
```
results:
| DB | code | throughput |
|---|---|---|
| normal_db | master | 267.9 |
| normal_db | PR6843 | 254.2 (-5.1%) |
| ingestion_db | master | 259.6 |
| ingestion_db | PR6843 | 250.5 (-3.5%) |
Reviewed By: pdillinger
Differential Revision: D21562604
Pulled By: ajkr
fbshipit-source-id: 937596f836930515da8084d11755e1f247dcb264
Summary:
Preliminary user-timestamp support for delete.
If ["a", ts=100] exists, you can delete it by calling `DB::Delete(write_options, key)` in which `write_options.timestamp` points to a `ts` higher than 100.
Implementation
A new ValueType, i.e. `kTypeDeletionWithTimestamp` is added for deletion marker with timestamp.
The reason for a separate `kTypeDeletionWithTimestamp`: RocksDB may drop tombstones (keys with kTypeDeletion) when compacting them to the bottom level. This is OK and useful if timestamp is disabled. When timestamp is enabled, should we still reuse `kTypeDeletion`, we may drop the tombstone with a more recent timestamp, causing deleted keys to re-appear.
Test plan (dev server)
```
make check
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6253
Reviewed By: ltamasi
Differential Revision: D20995328
Pulled By: riversand963
fbshipit-source-id: a9e5c22968ad76f98e3dc6ee0151265a3f0df619
Summary:
Does what it says on the can: the patch adds a hash map to `VersionStorageInfo`
that maps file numbers to file locations, i.e. (level, position in level) pairs. This
will enable stricter consistency checks in `VersionBuilder`. The patch also fixes
all the unit tests that used duplicate file numbers in a version (which would trigger
an assertion with the new code).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6862
Test Plan:
`make check`
`make whitebox_crash_test`
Reviewed By: riversand963
Differential Revision: D21670446
Pulled By: ltamasi
fbshipit-source-id: 2eac249945cf33d8fb8597b26bfff5221e1a861a
Summary:
1. Add a value_size in read options which limits the cumulative value size of keys read in batches. Once the size exceeds read_options.value_size, all the remaining keys are returned with status Abort without further fetching any key.
2. Add a unit test case MultiGetBatchedValueSizeSimple the reads keys from memory and sst files.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6826
Test Plan:
1. make check -j64
2. Add a new unit test case
Reviewed By: anand1976
Differential Revision: D21471483
Pulled By: akankshamahajan15
fbshipit-source-id: dea51b8e76d5d1df38ece8cdb29933b1d798b900
Summary:
If `req.scratch` is an internally allocated buffer, but `raw_block_contents` is not constructed to own `req.scratch`, then `req.scratch` will be leaked.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6879
Test Plan: make asan_check
Reviewed By: anand1976
Differential Revision: D21728498
Pulled By: cheng-chang
fbshipit-source-id: 8fc6a4f2543918c565ddc16ecfad1807eb9a42cf
Summary:
otherwise we have FTBFS like:
2020-05-18T15:12:06.400 INFO:tasks.workunit.client.0.smithi032.stdout:[100%] Linking CXX executable env_librados_test
2020-05-18T15:12:06.620 INFO:tasks.workunit.client.0.smithi032.stderr:/usr/bin/ld: CMakeFiles/rocksdb_env_librados_test.dir/utilities/env_librados_test.cc.o: undefined reference to symbol
'_ZN8librados7v14_2_05Rados4initEPKc@LIBRADOS_14.2.0'
2020-05-18T15:12:06.620 INFO:tasks.workunit.client.0.smithi032.stderr:/usr/bin/ld: /lib/librados.so.2: error adding symbols: DSO missing from command line
2020-05-18T15:12:06.620 INFO:tasks.workunit.client.0.smithi032.stderr:collect2: error: ld returned 1 exit status
this addresses the regression introduced by 07204837ce,
which hides the symbols exposed by `${THIRDPARTY_LIBS}` from
consumers of librocksdb
Signed-off-by: Kefu Chai <tchaikov@gmail.com>
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6855
Reviewed By: riversand963
Differential Revision: D21621904
fbshipit-source-id: 7022ba4dc0003504401fce6f06547e4d74a32ac0
Summary:
somehow the windows-server-2019-vs2019 image changed in a way that made
VS 14 2015 the default. This caused an error when we specify VS 16 2019
as the cmake generator. I could not figure out the right arguments/env
vars to get the latest VS working so pinned the image to the previous
version instead.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6876
Reviewed By: pdillinger
Differential Revision: D21709679
Pulled By: ajkr
fbshipit-source-id: 2d16819ad239b4611fa199547744e1c101dc9da0
Summary:
* Print stack trace on status checked failure
* Make folly_synchronization_distributed_mutex_test a parallel test
* Disable ldb_test.py and rocksdb_dump_test.sh with
ASSERT_STATUS_CHECKED (broken)
* Fix shadow warning in random_access_file_reader.h reported by gcc
4.8.5 (ROCKSDB_NO_FBCODE), also https://github.com/facebook/rocksdb/issues/6866
* Work around compiler bug on max_align_t for gcc < 4.9
* Remove an apparently wrong comment in status.h
* Use check_some in Travis config (for proper diagnostic output)
* Fix ignored Status in loop in options_helper.cc
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6871
Test Plan: manual, CI
Reviewed By: ajkr
Differential Revision: D21706619
Pulled By: pdillinger
fbshipit-source-id: daf6364173d6689904eb394461a69a11f5bee2cb
Summary:
Fixed some option handling code that recently broke the
ASSERT_STATUS_CHECKED build for options_test.
Added all other existing tests that pass under ASSERT_STATUS_CHECKED to
the whitelist.
Added a Travis configuration to run all whitelisted tests with
ASSERT_STATUS_CHECKED. (Someday we might enable this check by default in
debug builds.)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6870
Test Plan: ASSERT_STATUS_CHECKED=1 make check, Travis
Reviewed By: ajkr
Differential Revision: D21704374
Pulled By: pdillinger
fbshipit-source-id: 15daef98136a19d7a6843fa0c9ec08738c2ac693
Summary:
Previously in LITE mode, an autovector did not have a reserved size. When
elements were added to the vector, the underlying array could be reallocated.
There was a set of code that never expands the autovector and was doing &autovector::back(). When the vector is resized, the old addresses may become invalid, causing a later exception to be thrown.
By reserving space in the autovector up front, this problem is eliminated for those uses where the vector will never exceed the initial size.
the resize happens, these pointers become invalid, leading to SEGV or other exceptions.
This change allows the autovector to be fully populated before we take the address of any of its elements, thereby elminating the potential for a resize.
There is comparable code to this change in Version::MultiGet for dealing with the context objects.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6868
Reviewed By: ajkr
Differential Revision: D21693505
Pulled By: cheng-chang
fbshipit-source-id: e71d516b15e08f202593cb80f2a42f048fc95768
Summary:
Fix a couple places where direct I/O was used even though it is
unsupported in lite builds.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6867
Test Plan: `LITE=1 make check -j48`
Reviewed By: pdillinger
Differential Revision: D21689185
Pulled By: ajkr
fbshipit-source-id: 3eaa3abf69cd7d0bcaabbcad3bb5a26fb8dd7301
Summary:
Added code for generically handing structs to OptionTypeInfo. A struct is a collection of variables handled by their own map of OptionTypeInfos. Examples of structs include Compaction and Cache options.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6425
Reviewed By: siying
Differential Revision: D21668789
Pulled By: zhichao-cao
fbshipit-source-id: 064b110de39dadf82361ed4663f7ac1a535b0b07
Summary:
* Add missing unit test for schema stability of FileChecksumGenCrc32c
(previously was only comparing to itself)
* A lot of clarifying comments
* Add some assertions for preconditions
* Rename WritableFileWriter::CalculateFileChecksum -> UpdateFileChecksum
* Simplify FileChecksumGenCrc32c with shared functions
* Implement EndianSwapValue to replace unused EndianTransform
And incidentally since I had trouble with 'make check-format' GitHub action disagreeing with local run,
* Output full diagnostic information when 'make check-format' fails in CI
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6861
Test Plan: new unit test passes before & after other changes
Reviewed By: zhichao-cao
Differential Revision: D21667115
Pulled By: pdillinger
fbshipit-source-id: 6a99970f87605aa024fa540c78cd519ff322c3e6
Summary:
In NoBatchedOpsStress::TestMultiGet, call txn->Get() when transactions
are in use.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6860
Test Plan: make crash_test_with_txn
Reviewed By: pdillinger
Differential Revision: D21667249
Pulled By: anand1976
fbshipit-source-id: 194bd7b9630a8efc3ae29d85422a61214e9e200e
Summary:
If Option.file_checksum_gen_factory is set, rocksdb generates the file checksum during flush and compaction based on the checksum generator created by the factory and store the checksum and function name in vstorage and Manifest.
This PR enable file checksum generation in SstFileWrite and store the checksum and checksum function name in the ExternalSstFileInfo, such that application can use them for other purpose, for example, ingest the file checksum with files in IngestExternalFile().
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6859
Test Plan: add unit test and pass make asan_check.
Reviewed By: ajkr
Differential Revision: D21656247
Pulled By: zhichao-cao
fbshipit-source-id: 78a3570c76031d8832e3d2de3d6c79cdf2b675d0
Summary:
... so that we have freedom to upgrade it (see https://github.com/facebook/rocksdb/issues/6808).
As a side benefit, gtest will no longer be linked into main library in
buck build.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6858
Test Plan: fb internal build & link
Reviewed By: riversand963
Differential Revision: D21652061
Pulled By: pdillinger
fbshipit-source-id: 6018104af944debde576b5beda6c134e737acedb
Summary:
Add MultiGet to VerifyDb and check consistency with Get in TestMultiGet.
Test plan -
make crash_test
ASAN crash test
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6849
Reviewed By: pdillinger
Differential Revision: D21635011
Pulled By: anand1976
fbshipit-source-id: deb5a79d08fefd8d8010204f1f20b83adc92310e
Summary:
This patch is groundwork for an upcoming change to store the set of
linked SSTs in `BlobFileMetaData`. With the current code, a new
`BlobFileMetaData` object is created each time a `VersionEdit` touches
a certain blob file. This is fine as long as these objects are lightweight
and cheap to create; however, with the addition of the linked SST set, it would
be very inefficient since the set would have to be copied over and over again.
Note that this is the same kind of problem that `VersionBuilder` is solving
w/r/t `Version`s and files, and we can apply the same solution; that is, we can
accumulate the changes in a different mutable object, and apply the delta in
one shot when the changes are committed. The patch does exactly that by
adding a new `BlobFileMetaDataDelta` class to `VersionBuilder`. In addition,
it turns the existing `GetBlobFileMetaData` helper into `IsBlobFileInVersion`
(which is fine since that's the only thing the method's clients care about now),
and adds a couple of helper methods that can create a `BlobFileMetaData`
object from the `BlobFileMetaData` in the base (if applicable) and the delta
when the `Version` is saved.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6835
Test Plan: `make check`
Reviewed By: riversand963
Differential Revision: D21505187
Pulled By: ltamasi
fbshipit-source-id: d81a48c5f2ca7b79d7124c935332a6bcf3d5d988
Summary:
Under MacOS when running with make -j 8 check, the temporary directory generated was > 100 characters. This caused the tests to do nothing under MacOS. Most of them still reported success for doing nothing, but ReadaheadSize was expecting the test to run.
By making the option name longer, the tests will no run successfully (and do something!)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6846
Reviewed By: ajkr
Differential Revision: D21576032
fbshipit-source-id: b089cde0d598137b572aa8527cc5459085252af7
Summary:
If both direct IO and IO uring are enabled, when IO uring returns partial result, we'll try to read the remaining part of the request, but the starting address/offset of the remaining part might not be aligned to the block size, in direct IO mode, the unaligned offset causes bug.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6853
Test Plan: run make check with both direct IO and IO uring enabled, this is covered by one of the continuous tests.
Reviewed By: anand1976
Differential Revision: D21603023
Pulled By: cheng-chang
fbshipit-source-id: 942f6a11ff21e1892af6c4464e02bab4c707787c
Summary:
In level compaction, if the total size (even if compensated after taking account of the deletions) of a level hasn't exceeded the limit, but there are lots of deletion entries in some SST files of the level, these files should also be good candidates for compaction. Otherwise, queries for the deleted keys might be slow because they need to go over all the tombstones.
This PR adds an option `deletion_ratio` to the factory of `CompactOnDeletionCollector` to configure it to trigger compaction when the ratio of tombstones >= `deletion_ratio`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6806
Test Plan:
Added new unit test in `compact_on_deletion_collector_test.cc`.
make compact_on_deletion_collector_test && ./compact_on_deletion_collector_test
Reviewed By: ajkr
Differential Revision: D21511981
Pulled By: cheng-chang
fbshipit-source-id: 65a9d0150e8c9c00337787686475252e4535a3e1