rocksdb

Commit Graph

Author	SHA1	Message	Date
Levi Tamasi	b8fe7df2e5	Fix LITE build (#10106 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10106 Reviewed By: cbi42 Differential Revision: D36891284 Pulled By: ltamasi fbshipit-source-id: 304ffa84549201659feb0b74d6ba54a83f08906b	3 years ago
zczhu	e88d8935ae	Add comments/permit unchecked error to close_db_dir pull requests (#10093 ) Summary: In [close_db_dir](https://github.com/facebook/rocksdb/pull/10049) pull request, some merging conflicts occurred (some comments and one line `s.PermitUncheckedError()` are missing). This pull request aims to put them back. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10093 Reviewed By: ajkr Differential Revision: D36884117 Pulled By: littlepig2013 fbshipit-source-id: 8c0e2a8793fc52804067c511843bd1ff4912c1c3	3 years ago
Yanqin Jin	ed50ccd19a	Install zstd on CircleCI linux (#10102 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10102 Reviewed By: jay-zhuang Differential Revision: D36885468 Pulled By: riversand963 fbshipit-source-id: 6ed5b62dda8fe0f4be4b66d09bdec0134cf4500c	3 years ago
Gang Liao	e6432dfd4c	Make it possible to enable blob files starting from a certain LSM tree level (#10077 ) Summary: Currently, if blob files are enabled (i.e. `enable_blob_files` is true), large values are extracted both during flush/recovery (when SST files are written into level 0 of the LSM tree) and during compaction into any LSM tree level. For certain use cases that have a mix of short-lived and long-lived values, it might make sense to support extracting large values only during compactions whose output level is greater than or equal to a specified LSM tree level (e.g. compactions into L1/L2/... or above). This could reduce the space amplification caused by large values that are turned into garbage shortly after being written at the price of some write amplification incurred by long-lived values whose extraction to blob files is delayed. In order to achieve this, we would like to do the following: - Add a new configuration option `blob_file_starting_level` (default: 0) to `AdvancedColumnFamilyOptions` (and `MutableCFOptions` and extend the related logic) - Instantiate `BlobFileBuilder` in `BuildTable` (used during flush and recovery, where the LSM tree level is L0) and `CompactionJob` iff `enable_blob_files` is set and the LSM tree level is `>= blob_file_starting_level` - Add unit tests for the new functionality, and add the new option to our stress tests (`db_stress` and `db_crashtest.py` ) - Add the new option to our benchmarking tool `db_bench` and the BlobDB benchmark script `run_blob_bench.sh` - Add the new option to the `ldb` tool (see https://github.com/facebook/rocksdb/wiki/Administration-and-Data-Access-Tool) - Ideally extend the C and Java bindings with the new option - Update the BlobDB wiki to document the new option. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10077 Reviewed By: ltamasi Differential Revision: D36884156 Pulled By: gangliao fbshipit-source-id: 942bab025f04633edca8564ed64791cb5e31627d	3 years ago
Jay Zhuang	a020031552	Add kLastTemperature as temperature high bound (#10044 ) Summary: Only used as temperature high bound for current code, may increase with more temperatures added. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10044 Test Plan: ci Reviewed By: siying Differential Revision: D36633410 Pulled By: jay-zhuang fbshipit-source-id: eecdfa7623c31778c31d789902eacf78aad7b482	3 years ago
Gang Liao	3dc6ebaf74	Support specifying blob garbage collection parameters when CompactRange() (#10073 ) Summary: Garbage collection is generally controlled by the BlobDB configuration options `enable_blob_garbage_collection` and `blob_garbage_collection_age_cutoff`. However, there might be use cases where we would want to temporarily override these options while performing a manual compaction. (One use case would be doing a full key-space manual compaction with full=100% garbage collection age cutoff in order to minimize the space occupied by the database.) Our goal here is to make it possible to override the configured GC parameters when using the `CompactRange` API to perform manual compactions. This PR would involve: - Extending the `CompactRangeOptions` structure so clients can both force-enable and force-disable GC, as well as use a different cutoff than what's currently configured - Storing whether blob GC should actually be enabled during a certain manual compaction and the cutoff to use in the `Compaction` object (considering the above overrides) and passing it to `CompactionIterator` via `CompactionProxy` - Updating the BlobDB wiki to document the new options. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10073 Test Plan: Adding unit tests and adding the new options to the stress test tool. Reviewed By: ltamasi Differential Revision: D36848700 Pulled By: gangliao fbshipit-source-id: c878ef101d1c612429999f513453c319f75d78e9	3 years ago
Zichen Zhu	65893ad959	Explicitly closing all directory file descriptors (#10049 ) Summary: Currently, the DB directory file descriptor is left open until the deconstruction process (`DB::Close()` does not close the file descriptor). To verify this, comment out the lines between `db_ = nullptr` and `db_->Close()` (line 512, 513, 514, 515 in ldb_cmd.cc) to leak the ``db_'' object, build `ldb` tool and run ``` strace --trace=open,openat,close ./ldb --db=$TEST_TMPDIR --ignore_unknown_options put K1 V1 --create_if_missing ``` There is one directory file descriptor that is not closed in the strace log. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10049 Test Plan: Add a new unit test DBBasicTest.DBCloseAllDirectoryFDs: Open a database with different WAL directory and three different data directories, and all directory file descriptors should be closed after calling Close(). Explicitly call Close() after a directory file descriptor is not used so that the counter of directory open and close should be equivalent. Reviewed By: ajkr, hx235 Differential Revision: D36722135 Pulled By: littlepig2013 fbshipit-source-id: 07bdc2abc417c6b30997b9bbef1f79aa757b21ff	3 years ago
Guido Tagliavini Ponce	b4d0e041d0	Add support for FastLRUCache in stress and crash tests. (#10081 ) Summary: Stress tests can run with the experimental FastLRUCache. Crash tests randomly choose between LRUCache and FastLRUCache. Since only LRUCache supports a secondary cache, we validate the `--secondary_cache_uri` and `--cache_type` flags---when `--secondary_cache_uri` is set, the `--cache_type` is set to `lru_cache`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10081 Test Plan: - To test that the FastLRUCache is used and the stress test runs successfully, run `make -j24 CRASH_TEST_EXT_ARGS=—duration=960 blackbox_crash_test_with_atomic_flush`. The cache type should sometimes be `fast_lru_cache`. - To test the flag validation, run `make -j24 CRASH_TEST_EXT_ARGS="--duration=960 --secondary_cache_uri=x" blackbox_crash_test_with_atomic_flush` multiple times. The test will always be aborted (which is okay). Check that the cache type is always `lru_cache`. Reviewed By: anand1976 Differential Revision: D36839908 Pulled By: guidotag fbshipit-source-id: ebcdfdcd12ec04c96c09ae5b9c9d1e613bdd1725	3 years ago
Akanksha Mahajan	45b1c788c4	Update History.md for #9922 (#10092 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10092 Reviewed By: riversand963 Differential Revision: D36832311 Pulled By: akankshamahajan15 fbshipit-source-id: 8fb1cf90b1d4dddebbfbeebeddb15f6905968e9b	3 years ago
Jay Zhuang	5864900cf4	Get current LogFileNumberSize the same as log_writer (#10086 ) Summary: `db_impl.alive_log_files_` is used to track the WAL size in `db_impl.logs_`. Get the `LogFileNumberSize` obj in `alive_log_files_` the same time as `log_writer` to keep them consistent. For this issue, it's not safe to do `deque::reverse_iterator::operator*` and `deque::pop_front()` concurrently, so remove the tail cache. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10086 Test Plan: ``` # on Windows gtest-parallel ./db_test --gtest_filter=DBTest.FileCreationRandomFailure -r 1000 -w 100 ``` Reviewed By: riversand963 Differential Revision: D36822373 Pulled By: jay-zhuang fbshipit-source-id: 5e738051dfc7bcf6a15d85ba25e6365df6b6a6af	3 years ago
Changyu Bi	463873f1bb	Add bug fix to HISTORY.md (#10091 ) Summary: Add to HISTORY.md the bug fixed in https://github.com/facebook/rocksdb/issues/10051 Pull Request resolved: https://github.com/facebook/rocksdb/pull/10091 Reviewed By: ajkr Differential Revision: D36821861 Pulled By: cbi42 fbshipit-source-id: 598812fab88f65c0147ece53cff55cf4ea73aac6	3 years ago
Peter Dillinger	a00cffaf69	Reduce risk of backup or checkpoint missing a WAL file (#10083 ) Summary: We recently saw a case in crash test in which a WAL file in the middle of the list of live WALs was not included in the backup, so the DB was not openable due to missing WAL. We are not sure why, but this change should at least turn that into a backup-time failure by ensuring all the WAL files expected by the manifest (according to VersionSet) are included in `GetSortedWalFiles()` (used by `GetLiveFilesStorageInfo()`, `BackupEngine`, and `Checkpoint`) Related: to maximize the effectiveness of track_and_verify_wals_in_manifest with GetSortedWalFiles() during checkpoint/backup, we will now sync WAL in GetLiveFilesStorageInfo() when track_and_verify_wals_in_manifest=true. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10083 Test Plan: added new unit test for the check in GetSortedWalFiles() Reviewed By: ajkr Differential Revision: D36791608 Pulled By: pdillinger fbshipit-source-id: a27bcf0213fc7ab177760fede50d4375d579afa6	3 years ago
Akanksha Mahajan	d04df2752a	Persist the new MANIFEST after successfully syncing the new WAL during recovery (#9922 ) Summary: In case of non-TransactionDB and avoid_flush_during_recovery = true, RocksDB won't flush the data from WAL to L0 for all column families if possible. As a result, not all column families can increase their log_numbers, and min_log_number_to_keep won't change. For transaction DB (.allow_2pc), even with the flush, there may be old WAL files that it must not delete because they can contain data of uncommitted transactions and min_log_number_to_keep won't change. If we persist a new MANIFEST with advanced log_numbers for some column families, then during a second crash after persisting the MANIFEST, RocksDB will see some column families' log_numbers larger than the corrupted wal, and the "column family inconsistency" error will be hit, causing recovery to fail. As a solution, RocksDB will persist the new MANIFEST after successfully syncing the new WAL. If a future recovery starts from the new MANIFEST, then it means the new WAL is successfully synced. Due to the sentinel empty write batch at the beginning, kPointInTimeRecovery of WAL is guaranteed to go after this point. If future recovery starts from the old MANIFEST, it means the writing the new MANIFEST failed. We won't have the "SST ahead of WAL" error. Currently, RocksDB DB::Open() may creates and writes to two new MANIFEST files even before recovery succeeds. This PR buffers the edits in a structure and writes to a new MANIFEST after recovery is successful Pull Request resolved: https://github.com/facebook/rocksdb/pull/9922 Test Plan: 1. Update unit tests to fail without this change 2. make crast_test -j Branch with unit test and no fix https://github.com/facebook/rocksdb/pull/9942 to keep track of unit test (without fix) Reviewed By: riversand963 Differential Revision: D36043701 Pulled By: akankshamahajan15 fbshipit-source-id: 5760970db0a0920fb73d3c054a4155733500acd9	3 years ago
Yanqin Jin	7c8c803938	Remove unused variable `single_column_family_mode_` (#10078 ) Summary: This variable is actually not being used for anything meaningful, thus remove it. This can make https://github.com/facebook/rocksdb/issues/7516 slightly simpler by reducing the amount of state that must be made lock-free. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10078 Test Plan: make check Reviewed By: ajkr Differential Revision: D36779817 Pulled By: riversand963 fbshipit-source-id: ffb0d9ad6149616917ae5e02bb28102cb90fc406	3 years ago
Jay Zhuang	151dc0038a	Bypass tests instead of skipping (#10076 ) Summary: Make fb test infra happy, more details: https://github.com/facebook/rocksdb/issues/8048 Pull Request resolved: https://github.com/facebook/rocksdb/pull/10076 Test Plan: CI Reviewed By: ajkr Differential Revision: D36768766 Pulled By: jay-zhuang fbshipit-source-id: 4f039a5c623abb6d4a7d09bbf97077618e7ec2c8	3 years ago
Yanqin Jin	5ab5537d79	Deflake unit test BackupEngineTest.Concurrency (#10069 ) Summary: After https://github.com/facebook/rocksdb/issues/9984, BackupEngineTest.Concurrency becomes flaky. During DB::Open(), someone else can rename/remove the LOG file, causing this thread's `CreateLoggerFromOptions()` to fail. The reason is that the operation sequence of "FileExists -> Rename" is not atomic. It's possible that a FileExists() returns OK, but the file gets deleted before Rename(), causing the latter to return IOError with PathNotFound subcode. Although it's not encouraged to concurrently modify the contents of the directories managed by the database instance in this case, we can still perform some simple handling to make DB::Open() more robust. In this case, we can check if a racing thread has deleted the original LOG file, we can allow this thread to continue creating a new LOG file. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10069 Test Plan: ~/gtest-parallel/gtest-parallel -r 100 ./backup_engine_test --gtest_filter=BackupEngineTest.Concurrency Reviewed By: jay-zhuang Differential Revision: D36736913 Pulled By: riversand963 fbshipit-source-id: 3cbe92d77ca175e55e586bdb1a32ac8107217ae6	3 years ago
Changyu Bi	9baeef712f	Fix unittest ExternalSSTFileBasicTest.StableSnapshotWhileLoggingToManifest (#10066 ) Summary: Fix the unittest `ExternalSSTFileBasicTest.StableSnapshotWhileLoggingToManifest` introduced in https://github.com/facebook/rocksdb/issues/10051 that is failing. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10066 Test Plan: CI Reviewed By: ajkr Differential Revision: D36720669 Pulled By: cbi42 fbshipit-source-id: 47a6d2c161f27b605ede5c62d1776eecaf0d5363	3 years ago
Andrea Pappacoda	a0f391cafc	build: fix pkg-config file generation (#9953 ) Summary: - Instead of hardcoding "lib" and "include" in `libdir` and `includedir`, use the values from [`GNUInstallDirs`](https://cmake.org/cmake/help/latest/module/GNUInstallDirs.html). - Use `PROJECT_DESCRIPTION` and `PROJECT_HOMEPAGE_URL` instead of their `CMAKE_` conterparts to fix pkg-config generation when rocksdb is not the top-level project (see [`project()`](https://cmake.org/cmake/help/latest/command/project.html)). - Drop explicit `CMAKE_CURRENT_SOURCE_DIR` and `CMAKE_CURRENT_BINARY_DIR` in [`configure_file()`](https://cmake.org/cmake/help/latest/command/configure_file.html) as that's implied by default (and quite intuitive). See https://github.com/facebook/rocksdb/issues/9945 CC: trynity Pull Request resolved: https://github.com/facebook/rocksdb/pull/9953 Reviewed By: ajkr Differential Revision: D36716373 Pulled By: jay-zhuang fbshipit-source-id: 57840eeb4453099fa1fe861dc03366085dbca704	3 years ago
Jay Zhuang	0adac6f88e	Deflake Transaction stress tests (#10063 ) Summary: TSAN test is slower, for `TransactionStressTest` and `DeadlockStress`, they're reaching the timeout limit of 600 seconds. Decreasing the transaction test number. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10063 Test Plan: CI Reviewed By: ajkr Differential Revision: D36711727 Pulled By: jay-zhuang fbshipit-source-id: 600f82a6d32108f52fbe5572fcc7497607b7fe98	3 years ago
Jay Zhuang	460b44c07f	Deflake column_family_test to avoid hang (#10060 ) Summary: Tests could hang because of flags are not test and set atomiclly, so it's waiting for a sync point forever. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10060 Test Plan: CI Reviewed By: ajkr Differential Revision: D36706311 Pulled By: jay-zhuang fbshipit-source-id: d54b8053ce51b2de74162b28f496c048519b6cde	3 years ago
Jaepil Jeong	4eb7b35f6d	Fix compile error in Clang 13 (#10033 ) Summary: This PR fixes the following compilation error in Clang 13, which was tested on macOS 12.4. ``` ❯ ninja clean && ninja [1/1] Cleaning all built files... Cleaning... 0 files. [198/315] Building CXX object CMakeFiles/rocksdb.dir/util/cleanable.cc.o FAILED: CMakeFiles/rocksdb.dir/util/cleanable.cc.o ccache /opt/homebrew/opt/llvm/bin/clang++ -DGFLAGS=1 -DGFLAGS_IS_A_DLL=0 -DHAVE_FULLFSYNC -DJEMALLOC_NO_DEMANGLE -DLZ4 -DOS_MACOSX -DROCKSDB_JEMALLOC -DROCKSDB_LIB_IO_POSIX -DROCKSDB_NO_DYNAMIC_EXTENSION -DROCKSDB_PLATFORM_POSIX -DSNAPPY -DTBB -DZLIB -DZSTD -I/Users/jaepil/work/deepsearch/deps/cpp/rocksdb -I/Users/jaepil/work/deepsearch/deps/cpp/rocksdb/include -I/Users/jaepil/app/include -I/opt/homebrew/include -I/opt/homebrew/opt/llvm/include -W -Wextra -Wall -pthread -Wsign-compare -Wshadow -Wno-unused-parameter -Wno-unused-variable -Woverloaded-virtual -Wnon-virtual-dtor -Wno-missing-field-initializers -Wno-strict-aliasing -Wno-invalid-offsetof -fno-omit-frame-pointer -momit-leaf-frame-pointer -march=armv8-a+crc+crypto -Wno-unused-function -Werror -O3 -DNDEBUG -DROCKSDB_USE_RTTI -arch arm64 -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX12.3.sdk -std=gnu++17 -MD -MT CMakeFiles/rocksdb.dir/util/cleanable.cc.o -MF CMakeFiles/rocksdb.dir/util/cleanable.cc.o.d -o CMakeFiles/rocksdb.dir/util/cleanable.cc.o -c /Users/jaepil/work/deepsearch/deps/cpp/rocksdb/util/cleanable.cc /Users/jaepil/work/deepsearch/deps/cpp/rocksdb/util/cleanable.cc:24:65: error: no member named 'move' in namespace 'std' Cleanable::Cleanable(Cleanable&& other) noexcept { this = std::move(other); } ~~~~~^ /Users/jaepil/work/deepsearch/deps/cpp/rocksdb/util/cleanable.cc:126:16: error: no member named 'move' in namespace 'std' this = std::move(from); ~~~~~^ 2 errors generated. [209/315] Building CXX object CMakeFiles/rocksdb.dir/tools/block_cache_analyzer/block_cache_trace_analyzer.cc.o ninja: build stopped: subcommand failed. ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/10033 Reviewed By: jay-zhuang Differential Revision: D36580562 Pulled By: ajkr fbshipit-source-id: 0f6b241d186ed528ad62d259af2857d2c2b4ded1	3 years ago
Yanqin Jin	514f0b0937	Fail DB::Open() if logger cannot be created (#9984 ) Summary: For regular db instance and secondary instance, we return error and refuse to open DB if Logger creation fails. Our current code allows it, but it is really difficult to debug because there will be no LOG files. The same for OPTIONS file, which will be explored in another PR. Furthermore, Arena::AllocateAligned(size_t bytes, size_t huge_page_size, Logger* logger) has an assertion as the following: ```cpp #ifdef MAP_HUGETLB if (huge_page_size > 0 && bytes > 0) { assert(logger != nullptr); } #endif ``` It can be removed. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9984 Test Plan: make check Reviewed By: jay-zhuang Differential Revision: D36347754 Pulled By: riversand963 fbshipit-source-id: 529798c0511d2eaa2f0fd40cf7e61c4cbc6bc57e	3 years ago
Gang Liao	e228515740	Pass the size of blob files to SstFileManager during DB open (#10062 ) Summary: RocksDB uses the (no longer aptly named) SST file manager (see https://github.com/facebook/rocksdb/wiki/Managing-Disk-Space-Utilization) to track and potentially limit the space used by SST and blob files (as well as to rate-limit the deletion of these data files). The SST file manager tracks the SST and blob file sizes in an in-memory hash map, which has to be rebuilt during DB open. File sizes can be generally obtained by querying the file system; however, there is a performance optimization possibility here since the sizes of SST and blob files are also tracked in the RocksDB MANIFEST, so we can simply pass the file sizes stored there instead of consulting the file system for each file. Currently, this optimization is only implemented for SST files; we would like to extend it to blob files as well. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10062 Test Plan: Add unit tests for the change to the test suite ltamasi riversand963 akankshamahajan15 Reviewed By: ltamasi Differential Revision: D36726621 Pulled By: gangliao fbshipit-source-id: 4010dc46ef7306142f1c2e0d1c3bf75b196ef82a	3 years ago
Yu Zhang	8c4ea7b851	Add timestamp support to secondary instance (#10061 ) Summary: This PR adds timestamp support to the secondary DB instance. With this, these timestamp related APIs are supported: ReadOptions.timestamp : read should return the latest data visible to this specified timestamp Iterator::timestamp() : returns the timestamp associated with the key, value DB:Get(..., std::string* timestamp) : returns the timestamp associated with the key, value in timestamp Test plan (on devserver): ``` $COMPILE_WITH_ASAN=1 make -j24 all $./db_secondary_test --gtest_filter=DBSecondaryTestWithTimestamp* ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/10061 Reviewed By: riversand963 Differential Revision: D36722915 Pulled By: jowlyzhang fbshipit-source-id: 644ada39e4e51164a759593478c38285e0c1a666	3 years ago
Andrew Kryczka	f6e45382e9	Disable file ingestion in crash test for CF consistency (#10067 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10067 Reviewed By: jay-zhuang Differential Revision: D36727948 Pulled By: ajkr fbshipit-source-id: a3502730412c01ba63d822a5d4bf56f8bae8fcb2	3 years ago
tagliavini	6c50082654	Remove code that only compiles for Visual Studio versions older than 2015 (#10065 ) Summary: There are currently some preprocessor checks that assume support for Visual Studio versions older than 2015 (i.e., 0 < _MSC_VER < 1900), although we don't support them any more. We removed all code that only compiles on those older versions, except third-party/ files. The ROCKSDB_NOEXCEPT symbol is now obsolete, since it now always gets replaced by noexcept. We removed it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10065 Reviewed By: pdillinger Differential Revision: D36721901 Pulled By: guidotag fbshipit-source-id: a2892d365ef53cce44a0a7d90dd6b72ee9b5e5f2	3 years ago
Andrew Kryczka	91ba7837b7	Enable IngestExternalFile() in crash test (#9357 ) Summary: Thanks to https://github.com/facebook/rocksdb/issues/9919 and https://github.com/facebook/rocksdb/issues/10051 the known bugs in file ingestion (besides mmap read + file checksum) are fixed. Now we can try again to enable file ingestion in crash test. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9357 Test Plan: stress file ingestion heavily for an hour: `$ TEST_TMPDIR=/dev/shm python3 tools/db_crashtest.py blackbox --max_key=1000000 --ingest_external_file_one_in=100 --duration=3600 --interval=20 --write_buffer_size=524288 --target_file_size_base=524288 --max_bytes_for_level_base=2097152` Reviewed By: riversand963 Differential Revision: D33410746 Pulled By: ajkr fbshipit-source-id: d276431390995a67f68390d61c06a40945fdd280	3 years ago
Muthu Krishnan	c9c58a320f	Add C API for User Defined Timestamp (#9914 ) Summary: Fixes https://github.com/facebook/rocksdb/issues/9889 Pull Request resolved: https://github.com/facebook/rocksdb/pull/9914 Reviewed By: akankshamahajan15 Differential Revision: D36599983 Pulled By: riversand963 fbshipit-source-id: 39000fb473f850d88359e90b287035257854af0d	3 years ago
Jie Liang Ang	4cf2f6723a	Expose DisableManualCompaction and EnableManualCompaction to C api (#10052 ) Summary: Add `rocksdb_disable_manual_compaction` and `rocksdb_enable_manual_compaction`. Note that `rocksdb_enable_manual_compaction` should be used with care and must not be called more times than `rocksdb_disable_manual_compaction` has been called. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10052 Reviewed By: ajkr Differential Revision: D36665496 Pulled By: jay-zhuang fbshipit-source-id: a4ae6e34694066feb21302ca1a5c365fb9de0ec7	3 years ago
Akanksha Mahajan	28ea1fb44a	Provide support for IOTracing for ReadAsync API (#9833 ) Summary: Same as title Pull Request resolved: https://github.com/facebook/rocksdb/pull/9833 Test Plan: Add unit test and manually check the output of tracing logs For fixed readahead_size it logs as: ``` Access Time : 193352113447923 , File Name: 000026.sst , File Operation: ReadAsync , Latency: 15075 , IO Status: OK, Length: 12288, Offset: 659456 Access Time : 193352113465232 , File Name: 000026.sst , File Operation: ReadAsync , Latency: 14425 , IO Status: OK, Length: 12288, Offset: 671744 Access Time : 193352113481539 , File Name: 000026.sst , File Operation: ReadAsync , Latency: 13062 , IO Status: OK, Length: 12288, Offset: 684032 Access Time : 193352113497692 , File Name: 000026.sst , File Operation: ReadAsync , Latency: 13649 , IO Status: OK, Length: 12288, Offset: 696320 Access Time : 193352113520043 , File Name: 000026.sst , File Operation: ReadAsync , Latency: 19384 , IO Status: OK, Length: 12288, Offset: 708608 Access Time : 193352113538401 , File Name: 000026.sst , File Operation: ReadAsync , Latency: 15406 , IO Status: OK, Length: 12288, Offset: 720896 Access Time : 193352113554855 , File Name: 000026.sst , File Operation: ReadAsync , Latency: 13670 , IO Status: OK, Length: 12288, Offset: 733184 Access Time : 193352113571624 , File Name: 000026.sst , File Operation: ReadAsync , Latency: 13855 , IO Status: OK, Length: 12288, Offset: 745472 Access Time : 193352113587924 , File Name: 000026.sst , File Operation: ReadAsync , Latency: 13953 , IO Status: OK, Length: 12288, Offset: 757760 Access Time : 193352113603285 , File Name: 000026.sst , File Operation: Prefetch , Latency: 59 , IO Status: Not implemented: Prefetch not supported, Length: 8868, Offset: 898349 ``` For implicit readahead: ``` Access Time : 193351865156587 , File Name: 000026.sst , File Operation: Prefetch , Latency: 48 , IO Status: Not implemented: Prefetch not supported, Length: 12266, Offset: 391174 Access Time : 193351865160354 , File Name: 000026.sst , File Operation: Prefetch , Latency: 51 , IO Status: Not implemented: Prefetch not supported, Length: 12266, Offset: 395248 Access Time : 193351865164253 , File Name: 000026.sst , File Operation: Prefetch , Latency: 49 , IO Status: Not implemented: Prefetch not supported, Length: 12266, Offset: 399322 Access Time : 193351865165461 , File Name: 000026.sst , File Operation: ReadAsync , Latency: 222871 , IO Status: OK, Length: 135168, Offset: 401408 ``` Reviewed By: anand1976 Differential Revision: D35601634 Pulled By: akankshamahajan15 fbshipit-source-id: 5a4f32a850af878efa0767bd5706380152a1f26e	3 years ago
Jay Zhuang	5490da20a5	Fix flaky db_basic_bench caused by unreleased iterator (#10058 ) Summary: Iterator is not freed after test is done (after the main for loop), which could cause db close failed. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10058 Test Plan: Able to reproduce consistently with higher thread number, like 100, make sure it passes after the fix Reviewed By: ajkr Differential Revision: D36685823 Pulled By: jay-zhuang fbshipit-source-id: 4c98b8758d106bfe40cae670e689c3d284765bcf	3 years ago
Peter Dillinger	bd170dda03	Abort RocksDB performance regression test on failure in test setup (#10053 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10053 Need to exit if ldb command fails, to avoid running db_bench on empty/bad DB and considering the results valid. Reviewed By: jay-zhuang Differential Revision: D36673200 fbshipit-source-id: e0d78a0d397e0e335d82d9349bfd612d38ffb552	3 years ago
sdong	356f8c5d81	FindObsoleteFiles() to directly check whether candidate files are live (#10040 ) Summary: Right now, in FindObsoleteFiles() we build a list of all live SST files from all existing Versions. This is all done in DB mutex, and is O(mn) where m is number of versions and n is number of files. In some extereme cases, it can take very long. The list is used to see whether a candidate file still shows up in a version. With this commit, every candidate file is directly check against all the versions for file existance. This operation would be O(mk) where k is number of candidate files. Since is usually small (except perhaps full compaction in universal compaction), this is usually much faster than the previous solution. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10040 Test Plan: TBD Reviewed By: riversand963 Differential Revision: D36613391 fbshipit-source-id: 3f13b090f755d9b3ae417faec62cd6e798bac1eb	3 years ago
Changyu Bi	b0e190604b	Update VersionSet last seqno after LogAndApply (#10051 ) Summary: This PR fixes the issue of unstable snapshot during external SST file ingestion. Credit ajkr for the following walk through: consider these relevant steps for of IngestExternalFile(): (1) increase seqno while holding mutex -- `677d2b4a8f/db/db_impl/db_impl.cc (L4768)` (2) LogAndApply() -- `677d2b4a8f/db/db_impl/db_impl.cc (L4797-L4798)` (a) write to MANIFEST with mutex released `a96a4a2f7b/db/version_set.cc (L4407)` (b) apply to in-memory state with mutex held A snapshot taken during (2a) will be unstable. In particular, queries against that snapshot will not include data from the ingested file before (2b), and will include data from the ingested file after (2b). Pull Request resolved: https://github.com/facebook/rocksdb/pull/10051 Test Plan: Added a new unit test: `ExternalSSTFileBasicTest.WriteAfterReopenStableSnapshotWhileLoggingToManifest`. ``` make external_sst_file_basic_test ./external_sst_file_basic_test ``` Reviewed By: ajkr Differential Revision: D36654033 Pulled By: cbi42 fbshipit-source-id: bf720cca313e0cf211585960f3aff04853a31b96	3 years ago
Yiyuan Liu	b71466e982	Improve transaction C-API (#9252 ) Summary: This PR wants to improve support for transaction in C-API: * Support two-phase commit. * Support `get_pinned` and `multi_get` in transaction. * Add `rocksdb_transactiondb_flush` * Support get writebatch from transaction and rebuild transaction from writebatch. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9252 Reviewed By: jay-zhuang Differential Revision: D36459007 Pulled By: riversand963 fbshipit-source-id: 47371d527be821c496353a7fe2fd18d628069a98	3 years ago
Yanqin Jin	9901e7f681	Enable checkpoint and backup in db_stress when timestamp is enabled (#10047 ) Summary: After https://github.com/facebook/rocksdb/issues/10030 and https://github.com/facebook/rocksdb/issues/10004, we can enable checkpoint and backup in stress tests when user-defined timestamp is enabled. This PR has no production risk. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10047 Test Plan: ``` TEST_TMPDIR=/dev/shm make crash_test_with_ts ``` Reviewed By: jowlyzhang Differential Revision: D36641565 Pulled By: riversand963 fbshipit-source-id: d86c9d87efcc34c32d1aa176af691d32b897644a	3 years ago
Levi Tamasi	af7ae912e2	Fix potential ambiguities in/around port/sys_time.h (#10045 ) Summary: There are some time-related POSIX APIs that are not available on Windows (e.g. `localtime_r`), which we have worked around by providing our own implementations in `port/sys_time.h`. This workaround actually relies on some ambiguity: on Windows, a call to `localtime_r` calls `ROCKSDB_NAMESPACE::port::localtime_r` (which is pulled into `ROCKSDB_NAMESPACE` by a using-declaration), while on other platforms it calls the global `localtime_r`. This works fine as long as there is only one candidate function; however, it breaks down when there is more than one `localtime_r` visible in a scope. The patch fixes this by introducing `ROCKSDB_NAMESPACE::port::{TimeVal, GetTimeOfDay, LocalTimeR}` to eliminate any ambiguity. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10045 Test Plan: `make check` Reviewed By: riversand963 Differential Revision: D36639372 Pulled By: ltamasi fbshipit-source-id: fc13dbfa421b7c8918111a6d9e24ce77e91a7c50	3 years ago
Jay Zhuang	a96a4a2f7b	Fix ApproximateOffsetOfCompressed test (#10048 ) Summary: https://github.com/facebook/rocksdb/issues/9857 introduced new an option `use_zstd_dict_trainer`, which is stored in SST as text, e.g.: ``` ... zstd_max_train_bytes=0; enabled=0;... ``` it increased the sst size a little bit and cause `ApproximateOffsetOfCompressed` test to fail: ``` Value 7053 is not in range [4000, 7050] table/table_test.cc:4019: Failure Value of: Between(c.ApproximateOffsetOf("xyz"), 4000, 7050) ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/10048 Test Plan: verified the test pass after the change Reviewed By: cbi42 Differential Revision: D36643688 Pulled By: jay-zhuang fbshipit-source-id: bf12d211f6ae71937259ef21b1226bd06e8da717	3 years ago
Jay Zhuang	23f34c7ae5	Skip ZSTD dict tests if the version doesn't support it (#10046 ) Summary: For example, the default ZSTD version for ubuntu20 is 1.4.4, which will fail the test `PresetCompressionDict`: ``` db/db_test_util.cc:607: Failure Invalid argument: zstd finalizeDictionary cannot be used because ZSTD 1.4.5+ is not linked with the binary. terminate called after throwing an instance of 'testing::internal::GoogleTestFailureException' ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/10046 Test Plan: test pass with old zstd Reviewed By: cbi42 Differential Revision: D36640067 Pulled By: jay-zhuang fbshipit-source-id: b1c49fb7295f57f4515ce4eb3a52ae7d7e45da86	3 years ago
sdong	c78a87cd71	Avoid malloc_usable_size() call inside LRU Cache mutex (#10026 ) Summary: In LRU Cache mutex, we sometimes call malloc_usable_size() to calculate memory used by the metadata object. We prevent it by saving the charge + metadata size, rather than charge, inside the metadata itself. Within the mutex, usually only total charge is needed so we don't need to repeat. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10026 Test Plan: Run existing tests. Reviewed By: pdillinger Differential Revision: D36556253 fbshipit-source-id: f60c96d13cde3af77732e5548e4eac4182fa9801	3 years ago
Yu Zhang	d4081bf0be	Add timestamp support to CompactedDBImpl (#10030 ) Summary: This PR is the second and last part for adding user defined timestamp support to read only DB. Specifically, the change in this PR includes: - `options.timestamp` respected by `CompactedDBImpl::Get` and `CompactedDBImpl::MultiGet` to return results visible up till that timestamp. - `CompactedDBImpl::Get(...,std::string* timestsamp)` and `CompactedDBImpl::MultiGet(std::vector<std::string>* timestamps)` return the timestamp(s) associated with the key(s). Pull Request resolved: https://github.com/facebook/rocksdb/pull/10030 Test Plan: ``` $COMPILE_WITH_ASAN=1 make -j24 all $./db_readonly_with_timestamp_test --gtest_filter="DBReadOnlyTestWithTimestamp.CompactedDB" $./db_basic_test --gtest_filter="DBBasicTest.CompactedDB" $make all check ``` Reviewed By: riversand963 Differential Revision: D36613926 Pulled By: jowlyzhang fbshipit-source-id: 5b7ed7fef822708c12e2caf7a8d2deb6a696f0f0	3 years ago
Changyu Bi	8515bd50c9	Support read rate-limiting in SequentialFileReader (#9973 ) Summary: Added rate limiter and read rate-limiting support to SequentialFileReader. I've updated call sites to SequentialFileReader::Read with appropriate IO priority (or left a TODO and specified IO_TOTAL for now). The PR is separated into four commits: the first one added the rate-limiting support, but with some fixes in the unit test since the number of request bytes from rate limiter in SequentialFileReader are not accurate (there is overcharge at EOF). The second commit fixed this by allowing SequentialFileReader to check file size and determine how many bytes are left in the file to read. The third commit added benchmark related code. The fourth commit moved the logic of using file size to avoid overcharging the rate limiter into backup engine (the main user of SequentialFileReader). Pull Request resolved: https://github.com/facebook/rocksdb/pull/9973 Test Plan: - `make check`, backup_engine_test covers usage of SequentialFileReader with rate limiter. - Run db_bench to check if rate limiting is throttling as expected: Verified that reads and writes are together throttled at 2MB/s, and at 0.2MB chunks that are 100ms apart. - Set up: `./db_bench --benchmarks=fillrandom -db=/dev/shm/test_rocksdb` - Benchmark: ``` strace -ttfe read,write ./db_bench --benchmarks=backup -db=/dev/shm/test_rocksdb --backup_rate_limit=2097152 --use_existing_db strace -ttfe read,write ./db_bench --benchmarks=restore -db=/dev/shm/test_rocksdb --restore_rate_limit=2097152 --use_existing_db ``` - db bench on backup and restore to ensure no performance regression. - backup (avg over 50 runs): pre-change: 1.90443e+06 micros/op; post-change: 1.8993e+06 micros/op (improve by 0.2%) - restore (avg over 50 runs): pre-change: 1.79105e+06 micros/op; post-change: 1.78192e+06 micros/op (improve by 0.5%) ``` # Set up ./db_bench --benchmarks=fillrandom -db=/tmp/test_rocksdb -num=10000000 # benchmark TEST_TMPDIR=/tmp/test_rocksdb NUM_RUN=50 for ((j=0;j<$NUM_RUN;j++)) do ./db_bench -db=$TEST_TMPDIR -num=10000000 -benchmarks=backup -use_existing_db \| egrep 'backup' # Restore #./db_bench -db=$TEST_TMPDIR -num=10000000 -benchmarks=restore -use_existing_db done > rate_limit.txt && awk -v NUM_RUN=$NUM_RUN '{sum+=$3;sum_sqrt+=$3^2}END{print sum/NUM_RUN, sqrt(sum_sqrt/NUM_RUN-(sum/NUM_RUN)^2)}' rate_limit.txt >> rate_limit_2.txt ``` Reviewed By: hx235 Differential Revision: D36327418 Pulled By: cbi42 fbshipit-source-id: e75d4307cff815945482df5ba630c1e88d064691	3 years ago
Jay Zhuang	fd24e4479b	Fix failed VerifySstUniqueIds unittests (#10043 ) Summary: which should use UniqueId64x2 instead of string. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10043 Test Plan: unittest Reviewed By: pdillinger Differential Revision: D36620366 Pulled By: jay-zhuang fbshipit-source-id: cf937a1da362018472fa4396848225e48893848b	3 years ago
Akanksha Mahajan	700d597bd8	Expose unix time in rocksdb::Snapshot (#9923 ) Summary: RocksDB snapshot already has a member unix_time_ set after snapshot is taken. It is now exposed through GetSnapshotTime() API. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9923 Test Plan: Update unit tests Reviewed By: riversand963 Differential Revision: D36048275 Pulled By: akankshamahajan15 fbshipit-source-id: 825210ec287deb0bc3aaa9b8e1f079f07ad686fa	3 years ago
anand76	8e9d9156b0	Fix fbcode internal build failure (#10041 ) Summary: The build failed due to different namespaces for coroutines (std::experimental vs std) based on compiler version. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10041 Reviewed By: ltamasi Differential Revision: D36617212 Pulled By: anand1976 fbshipit-source-id: dfb25320788d32969317d5651173059e2cbd8bd5	3 years ago
Levi Tamasi	253ae017fa	Update version on main to 7.4 and add 7.3 to the format compatibility checks (#10038 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10038 Reviewed By: riversand963 Differential Revision: D36604533 Pulled By: ltamasi fbshipit-source-id: 54ccd0a4b32a320b5640a658ea6846ee897065d1	3 years ago
Akanksha Mahajan	a479c2c2b2	Fix stress test failure "Corruption: checksum mismatch" or "Iterator Diverged" with async_io enabled (#10032 ) Summary: In case of non sequential reads with `async_io`, `FilePRefetchBuffer::TryReadFromCacheAsync` can be called for previous blocks with `offset < bufs_[curr_].offset_` which wasn't handled correctly resulting wrong data being returned from buffer. Since `FilePRefetchBuffer::PrefetchAsync` can be called for any data block, it sets `prev_len_` to 0 indicating `FilePRefetchBuffer::TryReadFromCacheAsync` to go for the prefetching even though offset < bufs_[curr_].offset_ This is because async prefetching is always done in second buffer (to avoid mutex) even though curr_ is empty leading to offset < bufs_[curr_].offset_ in some cases. If prev_len_ is non zero then `TryReadFromCacheAsync` returns false if `offset < bufs_[curr_].offset_ && prev_len != 0` indicating reads are not sequential and previous call wasn't PrefetchAsync. - This PR also simplifies `FilePRefetchBuffer::TryReadFromCacheAsync` as it was getting complicated covering different scenarios based on `async_io` enabled/disabled. If `for_compaction` is set true, it now calls `FilePRefetchBufferTryReadFromCache` following synchronous flow as before. Its decided in BlockFetcher.cc Pull Request resolved: https://github.com/facebook/rocksdb/pull/10032 Test Plan: 1. export CRASH_TEST_EXT_ARGS=" --async_io=1" make crash_test -j completed successfully locally 2. make crash_test -j completed successfully locally 3. Reran CircleCi mini crashtest job 4 - 5 times. 4. Updated prefetch_test for more coverage. Reviewed By: anand1976 Differential Revision: D36579858 Pulled By: akankshamahajan15 fbshipit-source-id: 0c428d62b45e12e082a83acf533a5e37a584bedf	3 years ago
sdong	bea5831bff	Move three info logging within DB Mutex to use log buffer (#10029 ) Summary: info logging with DB Mutex could potentially invoke I/O and cause performance issues. Move three of the cases to use log buffer. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10029 Test Plan: Run existing tests. Reviewed By: jay-zhuang Differential Revision: D36561694 fbshipit-source-id: cabb93fea299001a6b4c2802fcba3fde27fa062c	3 years ago
Peter Dillinger	1e4850f626	Java build: finish compiling before testing (etc) (#10034 ) Summary: Lack of ordering dependencies could lead to random build-linux-java failures with "Truncated class file" because tests started before compilation was finished. (Fix to java/Makefile) Also: * export SHA256_CMD to save copy-paste * Actually fail if Java sample build fails--which it was in CircleCI * Don't require Snappy for Java sample build (for more compatibility) * Remove check_all_python from jtest because it's running in `make check` builds in CircleCI Pull Request resolved: https://github.com/facebook/rocksdb/pull/10034 Test Plan: CI, some manual Reviewed By: jay-zhuang Differential Revision: D36596541 Pulled By: pdillinger fbshipit-source-id: 230d79db4b7ae93a366871ff09d0a88e8e1c8af3	3 years ago
Tom Blamer	cb8586003d	Add plugin header install in CMakeLists.txt (#10025 ) Summary: Fixes https://github.com/facebook/rocksdb/issues/9987. - Plugin specific headers can be specified by setting ${PLUGIN_NAME}_HEADERS in ${PLUGIN_NAME}.mk in the plugin directory. - This is supported by the Makefile based build, but was missing from CMakeLists.txt. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10025 Test Plan: - Add a plugin with ${PLUGIN_NAME}_HEADERS set in both ${PLUGIN_NAME}.mk and CMakeLists.txt - Run Makefile based install and CMake based install and verify installed headers match Reviewed By: riversand963 Differential Revision: D36584908 Pulled By: ajkr fbshipit-source-id: 5ea0137205ccbf0d36faacf45d712c5604065bb5	3 years ago

1 2 3 4 5 ...

11100 Commits (b8fe7df2e5e0a020e910c08909bbccb99c483fd3) All Branches Search

11100 Commits (b8fe7df2e5e0a020e910c08909bbccb99c483fd3)

All Branches