Summary:
TransactionDB uses read callback to filter out un-committed data before
a snapshot. But `MultiGet()` API doesn't use that at all, which causes
returning unwanted data.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7963
Test Plan: Added unittest to reproduce
Reviewed By: anand1976
Differential Revision: D26455851
Pulled By: jay-zhuang
fbshipit-source-id: 265276698cf9d8c4cd79e3250ef10d14375bac55
Summary:
Bug fix for status returned being overridden by Status::NotFound in
DBImpl::OpenForReadOnlyCheckExistence. This was casuing some service
owners to misinterpret the actual error and take appropriate steps.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7972
Reviewed By: riversand963
Differential Revision: D26499598
Pulled By: akankshamahajan15
fbshipit-source-id: 05e9fedbe2a2e0e53135760f8ff578a2816d2b8e
Summary:
The patch adds checkpoint support to BlobDB. Blob files are hard linked or
copied, depending on whether the checkpoint directory is on the same filesystem
or not, similarly to table files.
TODO: Add support for blob files to `ExportColumnFamily` and to the checksum
verification logic used by backup/restore.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7959
Test Plan: Ran `make check` and the crash test for a while.
Reviewed By: riversand963
Differential Revision: D26434768
Pulled By: ltamasi
fbshipit-source-id: 994be55a8dc08133028250760fca440d2c7c4dc5
Summary:
The patch adds the configuration options of the new BlobDB implementation
to `db_bench` and adjusts the help messages of the old (`StackableDB`-based)
BlobDB's options to make it clear which implementation they pertain to.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7956
Test Plan: Ran `make check` and `db_bench` with the new options.
Reviewed By: jay-zhuang
Differential Revision: D26384808
Pulled By: ltamasi
fbshipit-source-id: b4405bb2c56cfd3506d4c32e3329c08dfdf69c94
Summary:
Add support for IOTracing in blob files
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7958
Test Plan:
Add a new test and checked manually the trace_file for blob
files being recorded during read and write.
Reviewed By: ltamasi
Differential Revision: D26415950
Pulled By: akankshamahajan15
fbshipit-source-id: 49c2859b3a4f8307e7cb69a92704403a4da46d44
Summary:
in PR https://github.com/facebook/rocksdb/issues/7419 , we introduce the new Append and PositionedAppend APIs to WritableFile at File System, which enable RocksDB to pass the data verification information (e.g., checksum of the data) to the lower layer. In this PR, we use the new API in WritableFileWriter, such that the file created via WritableFileWrite can pass the checksum to the storage layer. To control which types file should apply the checksum handoff, we add checksum_handoff_file_types to DBOptions. User can use this option to control which file types (Currently supported file tyes: kLogFile, kTableFile, kDescriptorFile.) should use the new Append and PositionedAppend APIs to handoff the verification information.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7523
Test Plan: add new unit test, pass make check/ make asan_check
Reviewed By: pdillinger
Differential Revision: D24313271
Pulled By: zhichao-cao
fbshipit-source-id: aafd69091ae85c3318e3e17cbb96fe7338da11d0
Summary:
Adds support for prefetching data in Ribbon queries,
which especially optimizes batched Ribbon queries for MultiGet
(~222ns/key to ~97ns/key) but also single key queries on cold memory
(~333ns to ~226ns) because many queries span more than one cache line.
This required some refactoring of the query algorithm, and there
does not appear to be a noticeable regression in "hot memory" query
times (perhaps from 48ns to 50ns).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7889
Test Plan:
existing unit tests, plus performance validation with
filter_bench:
Each data point is the best of two runs. I saturated the machine
CPUs with other filter_bench runs in the background.
Before:
$ ./filter_bench -impl=3 -m_keys_total_max=200 -average_keys_per_filter=100000 -m_queries=50
WARNING: Assertions are enabled; benchmarks unnecessarily slow
Building...
Build avg ns/key: 125.86
Number of filters: 1993
Total size (MB): 168.166
Reported total allocated memory (MB): 183.211
Reported internal fragmentation: 8.94626%
Bits/key stored: 7.05341
Prelim FP rate %: 0.951827
----------------------------
Mixed inside/outside queries...
Single filter net ns/op: 48.0111
Batched, prepared net ns/op: 222.384
Batched, unprepared net ns/op: 343.908
Skewed 50% in 1% net ns/op: 252.916
Skewed 80% in 20% net ns/op: 320.579
Random filter net ns/op: 332.957
After:
$ ./filter_bench -impl=3 -m_keys_total_max=200 -average_keys_per_filter=100000 -m_queries=50
WARNING: Assertions are enabled; benchmarks unnecessarily slow
Building...
Build avg ns/key: 128.117
Number of filters: 1993
Total size (MB): 168.166
Reported total allocated memory (MB): 183.211
Reported internal fragmentation: 8.94626%
Bits/key stored: 7.05341
Prelim FP rate %: 0.951827
----------------------------
Mixed inside/outside queries...
Single filter net ns/op: 49.8812
Batched, prepared net ns/op: 97.1514
Batched, unprepared net ns/op: 222.025
Skewed 50% in 1% net ns/op: 197.48
Skewed 80% in 20% net ns/op: 212.457
Random filter net ns/op: 226.464
Bloom comparison, for reference:
$ ./filter_bench -impl=2 -m_keys_total_max=200 -average_keys_per_filter=100000 -m_queries=50
WARNING: Assertions are enabled; benchmarks unnecessarily slow
Building...
Build avg ns/key: 35.3042
Number of filters: 1993
Total size (MB): 238.488
Reported total allocated memory (MB): 262.875
Reported internal fragmentation: 10.2255%
Bits/key stored: 10.0029
Prelim FP rate %: 0.965327
----------------------------
Mixed inside/outside queries...
Single filter net ns/op: 9.09931
Batched, prepared net ns/op: 34.21
Batched, unprepared net ns/op: 88.8564
Skewed 50% in 1% net ns/op: 139.75
Skewed 80% in 20% net ns/op: 181.264
Random filter net ns/op: 173.88
Reviewed By: jay-zhuang
Differential Revision: D26378710
Pulled By: pdillinger
fbshipit-source-id: 058428967c55ed763698284cd3b4bbe3351b6e69
Summary:
With M1 macs being available, it is possible that RocksDB will be built on them, without the resulting artifacts to be intended for iOS, where a non-lite RocksDB is needed.
It is not clear to me why the ROCKSDB_LITE cmake option isn't used for iOS consumer, so sending this pull request as a way to foster discussion and to find a path forward to get a full RocksDB build on M1.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7943
Test Plan:
Applied the following patch:
```
diff --git a/fbcode/opensource/fbcode_builder/manifests/rocksdb b/fbcode/opensource/fbcode_builder/manifests/rocksdb
--- a/fbcode/opensource/fbcode_builder/manifests/rocksdb
+++ b/fbcode/opensource/fbcode_builder/manifests/rocksdb
@@ -2,8 +2,8 @@
name = rocksdb
[download]
-url = https://github.com/facebook/rocksdb/archive/v6.8.1.tar.gz
-sha256 = ca192a06ed3bcb9f09060add7e9d0daee1ae7a8705a3d5ecbe41867c5e2796a2
+url = https://github.com/xavierd/rocksdb/archive/master.zip
+sha256 = f93f3f92df66a8401659e35398749d5910b92bd9c14b8354a35ea8852865c422
[dependencies]
lz4
@@ -11,7 +11,7 @@
[build]
builder = cmake
-subdir = rocksdb-6.8.1
+subdir = rocksdb-master
[cmake.defines]
WITH_SNAPPY=ON
```
And ran `getdeps build eden` on an M1 macbook. The build used to fail at link time due to some RocksDB symbols not being found, it now fails for another reason (x86_64 Rust symbols).
Reviewed By: jay-zhuang
Differential Revision: D26324049
Pulled By: xavierd
fbshipit-source-id: 12d86f3395709c4c323f440844e3ae65672aef2d
Summary:
Due to offline discussion, we use actual url of the clang-format-diff.py and add a note.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7950
Reviewed By: pdillinger
Differential Revision: D26370822
Pulled By: riversand963
fbshipit-source-id: 7508e23c002d56d5c1649090438ef5f8ff2cdbe7
Summary:
Added support for detecting plugins linked in the "plugin/" directory and building them from our Makefile in a standardized way. See "plugin/README.md" for details. An example of a plugin that can be built in this way can be found in https://github.com/ajkr/dedupfs.
There will be more to do in terms of making this process more convenient and adding support for CMake.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7918
Test Plan: my own plugin (https://github.com/ajkr/dedupfs) and also heard this patch worked with ZenFS.
Reviewed By: pdillinger
Differential Revision: D26189969
Pulled By: ajkr
fbshipit-source-id: 6624d4357d0ffbaedb42f0d12a3fcb737c78f758
Summary:
Recent Github actions of format checking fail due to invalid location
from where clang-format-diff.py is downloaded. Update the path to point
to a stable, archived location.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7944
Test Plan: manually check the result of Github action.
Reviewed By: ltamasi
Differential Revision: D26345066
Pulled By: riversand963
fbshipit-source-id: 2b1a58c2e59c2f1eb11202d321d2ea002cb0917e
Summary:
Right now, stress test cannot be configured to use memtable whole key filter without prefix filter. It doesn't appear to be necessary. remove this constraint.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7937
Test Plan: "make crash_test" to be able to run.
Reviewed By: ltamasi
Differential Revision: D26295532
fbshipit-source-id: 30c874a9dc2b672a460603a4ee32368674e0face
Summary:
Explicitly reject all range deletions on `TransactionDB` or `OptimisticTransactionDB`, except when the user provides sufficient promises that allow us to proceed safely. The necessary promises are described in the API doc for `TransactionDB::DeleteRange()`. There is currently no way to provide enough promises to make it safe in `OptimisticTransactionDB`.
Fixes https://github.com/facebook/rocksdb/issues/7913.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7929
Test Plan: unit tests covering the cases it's permitted/rejected
Reviewed By: ltamasi
Differential Revision: D26240254
Pulled By: ajkr
fbshipit-source-id: 2834a0ce64cc3e4c3799e35b885a5e79c2f4f6d9
Summary:
The patch fixes the build for `db_bench_tool_test` and makes the tests pass.
Namely, it fixes the following issues:
* https://github.com/facebook/rocksdb/issues/7703 removed the member variable `fs_` but the test case `OptionsFileMultiLevelUniversal`
was not updated.
* https://github.com/facebook/rocksdb/issues/7344 fixed the `OptionsFile` test case for the case when Snappy is *not* available but at the
same time broke it for the case when it *is* available. (The test used a default-constructed
`ColumnFamilyOptions` object, and the default value of the `compression` option is either
Snappy or no compression depending on whether Snappy is supported.)
* The test used `google::ParseCommandLineFlags` instead of
`GFLAGS_NAMESPACE::ParseCommandLineFlags`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7935
Test Plan: Ran the test both with and without Snappy support.
Reviewed By: zhichao-cao
Differential Revision: D26269765
Pulled By: ltamasi
fbshipit-source-id: b7303d8a981ab299d22ab540e0cbd12d149ed9bb
Summary:
Memtable bloom filter is useful in many use cases. A default value on with conservative 1.5% memory can benefit more use cases than use cases impacted.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6584
Test Plan: Run all existing tests.
Reviewed By: pdillinger
Differential Revision: D20626739
fbshipit-source-id: 1dd45532b932139552519b8c2682bd954550c2f9
Summary:
The patch adds support for the options related to the new BlobDB implementation
to `db_stress`, including support for dynamically adjusting them using `SetOptions`
when `set_options_one_in` and a new flag `allow_setting_blob_options_dynamically`
are specified. (The latter is used to prevent the options from being enabled when
incompatible features are in use.)
The patch also updates the `db_stress` help messages of the existing stacked BlobDB
related options to clarify that they pertain to the old implementation. In addition, it
adds the new BlobDB to the crash test script. In order to prevent a combinatorial explosion
of jobs and still perform whitebox/blackbox testing (including under ASAN/TSAN/UBSAN),
and to also test BlobDB in conjunction with atomic flush and transactions, the script sets
the BlobDB options in 10% of normal/`cf_consistency`/`txn` crash test runs.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7900
Test Plan: Ran `make check` and `db_stress`/`db_crashtest.py` with various options.
Reviewed By: jay-zhuang
Differential Revision: D26094913
Pulled By: ltamasi
fbshipit-source-id: c2ef3391a05e43a9687f24e297df05f4a5584814
Summary:
The unimplemented handler will return Status::InvalidArgument() and caused issues when using trace analyzer for write batch record. Override with returning Status::OK()
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7910
Test Plan: tested with real trace, make check
Reviewed By: siying
Differential Revision: D26154327
Pulled By: zhichao-cao
fbshipit-source-id: bcdefd4891f839b2e89e4c079f9f430245f482fb
Summary:
(Fixes a regression introduced in the build_version generation PR https://github.com/facebook/rocksdb/issues/7866 )
In the Makefile case, needed to ignore stderr on the tag (everywhere else was fine).
In the CMAKE case, no GIT implies "changes" so that we use the system date rather than the empty GIT date.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7916
Test Plan: Built in a tree that did not contain the ".git" directory. Validated that no errors appeared during the build process and that the build version date was not empty.
Reviewed By: jay-zhuang
Differential Revision: D26169203
Pulled By: mrambacher
fbshipit-source-id: 3288a23b48d97efed5e5b38c9aefb3ef1153fa16
Summary:
There is a small `SingleDelete` related optimization in the
`CompactionIterator` code: when a `SingleDelete`-`Put` pair is preserved
solely for the purposes of transaction conflict checking, the value
itself gets cleared. (This is referred to as "optimization 3" in the
`CompactionIterator` code.) Though the rest of the code got updated to
support `SingleDelete`'ing blob indexes, this chunk was apparently
missed, resulting in an assertion failure (or `ROCKS_LOG_FATAL` in release
builds) when triggered. Note: in addition to clearing the value, we also
need to update the type of the KV to regular value when dealing with
blob indexes here.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7904
Test Plan: `make check`
Reviewed By: ajkr
Differential Revision: D26118009
Pulled By: ltamasi
fbshipit-source-id: 6bf78043d20265e2b15c2e1ab8865025040c42ae
Summary:
This PR adds the foundation classes for key-value integrity protection and the first use case: protecting live updates from the source buffers added to `WriteBatch` through the destination buffer in `MemTable`. The width of the protection info is not yet configurable -- only eight bytes per key is supported. This PR allows users to enable protection by constructing `WriteBatch` with `protection_bytes_per_key == 8`. It does not yet expose a way for users to get integrity protection via other write APIs (e.g., `Put()`, `Merge()`, `Delete()`, etc.).
The foundation classes (`ProtectionInfo.*`) embed the coverage info in their type, and provide `Protect.*()` and `Strip.*()` functions to navigate between types with different coverage. For making bytes per key configurable (for powers of two up to eight) in the future, these classes are templated on the unsigned integer type used to store the protection info. That integer contains the XOR'd result of hashes with independent seeds for all covered fields. For integer fields, the hash is computed on the raw unadjusted bytes, so the result is endian-dependent. The most significant bytes are truncated when the hash value (8 bytes) is wider than the protection integer.
When `WriteBatch` is constructed with `protection_bytes_per_key == 8`, we hold a `ProtectionInfoKVOTC` (i.e., one that covers key, value, optype aka `ValueType`, timestamp, and CF ID) for each entry added to the batch. The protection info is generated from the original buffers passed by the user, as well as the original metadata generated internally. When writing to memtable, each entry is transformed to a `ProtectionInfoKVOTS` (i.e., dropping coverage of CF ID and adding coverage of sequence number), since at that point we know the sequence number, and have already selected a memtable corresponding to a particular CF. This protection info is verified once the entry is encoded in the `MemTable` buffer.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7748
Test Plan:
- an integration test to verify a wide variety of single-byte changes to the encoded `MemTable` buffer are caught
- add to stress/crash test to verify it works in variety of configs/operations without intentional corruption
- [deferred] unit tests for `ProtectionInfo.*` classes for edge cases like KV swap, `SliceParts` and `Slice` APIs are interchangeable, etc.
Reviewed By: pdillinger
Differential Revision: D25754492
Pulled By: ajkr
fbshipit-source-id: e481bac6c03c2ab268be41359730f1ceb9964866
Summary:
Removed the uses of the Legacy FileWrapper classes from the source code. The wrappers were creating an additional layer of indirection/wrapping, as the Env already has a FileSystem.
Moved the Custom FileWrapper classes into the CustomEnv, as these classes are really for the private use the the CustomEnv class.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7851
Reviewed By: anand1976
Differential Revision: D26114816
Pulled By: mrambacher
fbshipit-source-id: db32840e58d969d3a0fa6c25aaf13d6dcdc74150
Summary:
Closes https://github.com/facebook/rocksdb/issues/7035
Changed how build_version.cc was generated:
- Included the GIT tag/branch in the build_version file
- Changed the "Build Date" to be:
- If the GIT branch is "clean" (no changes), the date of the last git commit
- If the branch is not clean, the current date
- Added APIs to access the "build information", rather than accessing the strings directly.
The build_version.cc file is now regenerated whenever the library objects are rebuilt.
Verified that the built files remain the same size across builds on a "clean build" and the same information is reported by sst_dump --version
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7866
Reviewed By: pdillinger
Differential Revision: D26086565
Pulled By: mrambacher
fbshipit-source-id: 6fcbe47f6033989d5cf26a0ccb6dfdd9dd239d7f
Summary:
During recovery, RocksDB performs a kind of dummy flush; namely, entries
from the WAL are added to memtables, which then get written to SSTs and
blob files (if enabled) just like during a regular flush. Note that
multiple memtables might be flushed during recovery for the same column
family, for example, if the DB is reopened with a lower write buffer size,
and therefore, we need to make sure to collect all SST and blob file
additions. The patch fixes a bug in the earlier logic which resulted in
later blob file additions overwriting earlier ones.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7903
Test Plan: Added a unit test and ran `db_stress`.
Reviewed By: jay-zhuang
Differential Revision: D26110847
Pulled By: ltamasi
fbshipit-source-id: eddb50a608a88f54f3cec3a423de8235aba951fd
Summary:
When retryable IO error occurs during compaction, it is mapped to soft error and set the BG error. However, auto resume is not called to clean the soft error since compaction will reschedule by itself. In this change, When retryable IO error occurs during compaction, BG error is not set. User will be informed the error via EventHelper.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7899
Test Plan: tested with error_handler_fs_test
Reviewed By: anand1976
Differential Revision: D26094097
Pulled By: zhichao-cao
fbshipit-source-id: c53424f11d237405592cd762f43cbbdf8da8234f
Summary:
TIL we have different versions of TARGETS file generated with
options passed to buckifier. Someone thought they were totally fine to
squash the file by re-running the command to generate (pretty reasonable
assumption) but the command was incorrect due to missing the extra
argument used to generate THAT TARGETS file.
This change includes in the command written in the TARGETS header the
extra argument passed to buckify (when used).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7902
Test Plan:
manual, as in the (now fixed) comments at the top of
buckify_rocksdb.py
Reviewed By: ajkr
Differential Revision: D26108317
Pulled By: pdillinger
fbshipit-source-id: 46e93dc1465e27bd18e0e0baa8eeee1b591c765d
Summary:
this is a trivial PR for rocksdb java samples, I think it is a typo about write options. to do sync write, WAL should not be disabled
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7894
Reviewed By: jay-zhuang
Differential Revision: D26047128
Pulled By: mrambacher
fbshipit-source-id: a06ce54cb61af0d3f2578a709c34a0b1ccecb0b2
Summary:
The recovery thread could hold the db.mutex, which is needed from sync
write in main thread.
Make sure the write is done before recovery thread starts.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7897
Test Plan: `gtest-parallel ./error_handler_fs_test --gtest_filter=DBErrorHandlingFSTest.WALWriteRetryableErrorAutoRecover1 -r 10000 --workers=200`
Reviewed By: zhichao-cao
Differential Revision: D26082933
Pulled By: jay-zhuang
fbshipit-source-id: 226fc49228c0e5903f86ff45cc3fed3080abdb1f
Summary:
Currently, db_bench cleanup only deletes the main DB, if there's one.
Multiple DBs that are opened when --num_multi_db is specified are not
deleted, which can lead to crashes due to running compaction threads on
process exit.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7891
Test Plan: Run regression test
Reviewed By: jay-zhuang
Differential Revision: D26049914
Pulled By: anand1976
fbshipit-source-id: acef2821001ca5e208a96a6a273c724e56353316
Summary:
The error recovery thread may out-live DBImpl object, which causing
access released DBImpl.mutex. Close SstFileManager before closing DB.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7896
Test Plan:
the issue can be reproduced by adding sleep in recovery code.
Pass the tests with sleep.
Reviewed By: zhichao-cao
Differential Revision: D26076655
Pulled By: jay-zhuang
fbshipit-source-id: 0d9cc5639c12fcfc001427015e75a9736f33cd96
Summary:
Introduces and uses a SystemClock class to RocksDB. This class contains the time-related functions of an Env and these functions can be redirected from the Env to the SystemClock.
Many of the places that used an Env (Timer, PerfStepTimer, RepeatableThread, RateLimiter, WriteController) for time-related functions have been changed to use SystemClock instead. There are likely more places that can be changed, but this is a start to show what can/should be done. Over time it would be nice to migrate most (if not all) of the uses of the time functions from the Env to the SystemClock.
There are several Env classes that implement these functions. Most of these have not been converted yet to SystemClock implementations; that will come in a subsequent PR. It would be good to unify many of the Mock Timer implementations, so that they behave similarly and be tested similarly (some override Sleep, some use a MockSleep, etc).
Additionally, this change will allow new methods to be introduced to the SystemClock (like https://github.com/facebook/rocksdb/issues/7101 WaitFor) in a consistent manner across a smaller number of classes.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7858
Reviewed By: pdillinger
Differential Revision: D26006406
Pulled By: mrambacher
fbshipit-source-id: ed10a8abbdab7ff2e23d69d85bd25b3e7e899e90
Summary:
1. In IOTracing, add filename with each IOTrace record. Filename is stored in file object (Tracing Wrappers).
2. Change the logic of figuring out which additional information (file_size,
length, offset etc) needs to be store with each operation
which is different for different operations.
When new information will be added in future (depends on operation),
this change would make the future additions simple.
Logic: In IOTraceRecord, io_op_data is added and its
bitwise positions represent which additional information need
to added in the record from enum IOTraceOp. Values in IOTraceOp represent bitwise positions.
So if length and offset needs to be stored (IOTraceOp::kIOLen
is 1 and IOTraceOp::kIOOffset is 2), position 1 and 2 (from rightmost bit) will be set
and io_op_data will contain 110.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7885
Test Plan: Updated io_tracer_test and verified the trace file manually.
Reviewed By: anand1976
Differential Revision: D25982353
Pulled By: akankshamahajan15
fbshipit-source-id: ebfc5539cc0e231d7794a6b42b73f5403e360b22
Summary:
In the original stacked BlobDB implementation, which writes blobs to blob files
immediately and treats blob files as logs, it makes sense to flush the file after
writing each blob to protect against process crashes; however, in the integrated
implementation, which builds blob files in the background jobs, this unnecessarily
reduces performance. This patch fixes this by simply adding a `do_flush` flag to
`BlobLogWriter`, which is set to `true` by the stacked implementation and to `false`
by the new code. Note: the change itself is trivial but the tests needed some work;
since in the new implementation, blobs are now buffered, adding a blob to
`BlobFileBuilder` is no longer guaranteed to result in an actual I/O. Therefore, we can
no longer rely on `FaultInjectionTestEnv` when testing failure cases; instead, we
manipulate the return values of I/O methods directly using `SyncPoint`s.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7892
Test Plan: `make check`
Reviewed By: jay-zhuang
Differential Revision: D26022814
Pulled By: ltamasi
fbshipit-source-id: b3dce419f312137fa70d84cdd9b908fd5d60d8cd
Summary:
…when unused. Causes many calls to clock_gettime, impacting performance.
Was looking for something else via Linux "perf" command when I spotted heavy usage of clock_gettime during a compaction. Our product heavily uses the rocksdb::Options::merge_operator. MergeHelper::FilterMerge() properly tests if timing is enabled/disabled upon entry, but not on exit. This patch fixes the exit.
Note: the entry test also verifies if "nullptr!=stats_". This test is redundant to code within ShouldReportDetailedTime(). Therefore I omitted it in my change.
merge_test.cc updated with test that shows failure before merge_helper.cc change ... and fix after change.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7867
Reviewed By: jay-zhuang
Differential Revision: D25960175
Pulled By: ajkr
fbshipit-source-id: 56e66d7eb6ae5eae89c8e0d5a262bd2905a226b6
Summary:
This provides a workaround for two race conditions that will be fixed in
a more sophisticated way later. This PR:
(1) Makes the client serialize calls to `Timer::Start()` and `Timer::Shutdown()` (see https://github.com/facebook/rocksdb/issues/7711). The long-term fix will be to make those functions thread-safe.
(2) Makes `PeriodicWorkScheduler` atomically add/cancel work together with starting/shutting down its `Timer`. The long-term fix will be for `Timer` API to offer more specialized APIs so the client will not need to synchronize.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7888
Test Plan: ran the repro provided in https://github.com/facebook/rocksdb/issues/7881
Reviewed By: jay-zhuang
Differential Revision: D25990891
Pulled By: ajkr
fbshipit-source-id: a97fdaebbda6d7db7ddb1b146738b68c16c5be38