rocksdb

Commit Graph

Author	SHA1	Message	Date
Andrew Kryczka	b71b4597e7	Permit stdout "fail"/"error" in whitebox crash test (#8272 ) Summary: In https://github.com/facebook/rocksdb/issues/8268, the `db_stress` stdout began containing both the strings "fail" and "error" (case-insensitive). The whitebox crash test failed upon seeing either of those strings. I checked that all other occurrences of "fail" and "error" (case-insensitive) that `db_stress` produces are printed to `stderr`. So this PR separates the handling of `db_stress`'s stdout and stderr, and only fails when one those bad strings are found in stderr. The downside of this PR is `db_stress`'s original interleaving of stdout/stderr is not preserved in `db_crashtest.py`'s output. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8272 Test Plan: run it; see it succeeds for several runs until encountering a real error ``` $ python3 tools/db_crashtest.py whitebox --simple --random_kill_odd=8887 --max_key=1000000 --value_size_mult=33 ... db_stress: cache/clock_cache.cc:483: bool rocksdb::{anonymous}::ClockCacheShard::Unref(rocksdb::{anonymous}::CacheHandle, bool, rocksdb::{anonymous}::CleanupContext): Assertion `CountRefs(flags) > 0' failed. TEST FAILED. Output has 'fail'!!! ``` Reviewed By: zhichao-cao Differential Revision: D28239233 Pulled By: ajkr fbshipit-source-id: 3b8602a0d570466a7e2c81bb9c49468f7716091e	4 years ago
sdong	7f3a0f5bc6	db_stress: wait for compaction to finish after open with failure injection (#8270 ) Summary: When injecting in DB open, error can happen in background threads, causing DB open succeed, but DB is soon made read-only and subsequence writes will fail, which is not expected. To prevent it from happening, wait for compaction to finish before serving the traffic. If there is a failure, reopen. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8270 Test Plan: Run the test. Reviewed By: ajkr Differential Revision: D28230537 fbshipit-source-id: e2e97888904f9b9bb50c35ccf95b88c2319ef5c3	4 years ago
sdong	e19908cba6	Refactor kill point (#8241 ) Summary: Refactor kill point to one single class, rather than several extern variables. The intention was to drop unflushed data before killing to simulate some job, and I tried to a pointer to fault ingestion fs to the killing class, but it ended up with harder than I thought. Perhaps we'll need to do this in another way. But I thought the refactoring itself is good so I send it out. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8241 Test Plan: make release and run crash test for a while. Reviewed By: anand1976 Differential Revision: D28078486 fbshipit-source-id: f9182c1455f52e6851c13f88a21bade63bcec45f	4 years ago
mrambacher	8948dc8524	Make ImmutableOptions struct that inherits from ImmutableCFOptions and ImmutableDBOptions (#8262 ) Summary: The ImmutableCFOptions contained a bunch of fields that belonged to the ImmutableDBOptions. This change cleans that up by introducing an ImmutableOptions struct. Following the pattern of Options struct, this class inherits from the DB and CFOption structs (of the Immutable form). Only one structural change (the ImmutableCFOptions::fs was changed to a shared_ptr from a raw one) is in this PR. All of the other changes involve moving the member variables from the ImmutableCFOptions into the ImmutableOptions and changing member variables or function parameters as required for compilation purposes. Follow-on PRs may do a further clean-up of the code, such as renaming variables (such as "ImmutableOptions cf_options") and potentially eliminating un-needed function parameters (there is no longer a need to pass both an ImmutableDBOptions and an ImmutableOptions to a function). Pull Request resolved: https://github.com/facebook/rocksdb/pull/8262 Reviewed By: pdillinger Differential Revision: D28226540 Pulled By: mrambacher fbshipit-source-id: 18ae71eadc879dedbe38b1eb8e6f9ff5c7147dbf	4 years ago
Andrew Kryczka	0f42e50fec	Fix `GetLiveFiles()` returning OPTIONS-000000 (#8268 ) Summary: See release note in HISTORY.md. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8268 Test Plan: unit test repro Reviewed By: siying Differential Revision: D28227901 Pulled By: ajkr fbshipit-source-id: faf61d13b9e43a761e3d5dcf8203923126b51339	4 years ago
Peter Dillinger	3b981eaa1d	Fix use-after-free threading bug in ClockCache (#8261 ) Summary: In testing for https://github.com/facebook/rocksdb/issues/8225 I found cache_bench would crash with -use_clock_cache, as well as db_bench -use_clock_cache, but not single-threaded. Smaller cache size hits failure much faster. ASAN reported the failuer as calling malloc_usable_size on the `key` pointer of a ClockCache handle after it was reportedly freed. On detailed inspection I found this bad sequence of operations for a cache entry: state=InCache=1,refs=1 [thread 1] Start ClockCacheShard::Unref (from Release, no mutex) [thread 1] Decrement ref count state=InCache=1,refs=0 [thread 1] Suspend before CalcTotalCharge (no mutex) [thread 2] Start UnsetInCache (from Insert, mutex held) [thread 2] clear InCache bit state=InCache=0,refs=0 [thread 2] Calls RecycleHandle (based on pre-updated state) [thread 2] Returns to Insert which calls Cleanup which deletes `key` [thread 1] Resume ClockCacheShard::Unref [thread 1] Read `key` in CalcTotalCharge To fix this, I've added a field to the handle to store the metadata charge so that we can efficiently remember everything we need from the handle in Unref. We must not read from the handle again if we decrement the count to zero with InCache=1, which means we don't own the entry and someone else could eject/overwrite it immediately. Note before this change, on amd64 sizeof(Handle) == 56 even though there are only 48 bytes of data. Grouping together the uint32_t fields would cut it down to 48, but I've added another uint32_t, which takes it back up to 56. Not a big deal. Also fixed DisownData to cooperate with ASAN as in LRUCache. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8261 Test Plan: Manual + adding use_clock_cache to db_crashtest.py Base performance ./cache_bench -use_clock_cache Complete in 17.060 s; QPS = 2458513 New performance ./cache_bench -use_clock_cache Complete in 17.052 s; QPS = 2459695 Any difference is easily buried in small noise. Crash test shows still more bug(s) in ClockCache, so I'm expecting to disable ClockCache from production code in a follow-up PR (if we can't find and fix the bug(s)) Reviewed By: mrambacher Differential Revision: D28207358 Pulled By: pdillinger fbshipit-source-id: aa7a9322afc6f18f30e462c75dbbe4a1206eb294	4 years ago
Andrew Kryczka	c70bae1b05	Fix ConcurrentTaskLimiter token release for shutdown (#8253 ) Summary: Previously the shutdown process did not properly wait for all `compaction_thread_limiter` tokens to be released before proceeding to delete the DB's C++ objects. When this happened, we saw tests like "DBCompactionTest.CompactionLimiter" flake with the following error: ``` virtual rocksdb::ConcurrentTaskLimiterImpl::~ConcurrentTaskLimiterImpl(): Assertion `outstanding_tasks_ == 0' failed. ``` There is a case where a token can still be alive even after the shutdown process has waited for BG work to complete. In particular, this happens because the shutdown process only waits for flush/compaction scheduled/unscheduled counters to all reach zero. These counters are decremented in `BackgroundCallCompaction()` functions. However, tokens are released in `BGWorkCompaction()` functions, which actually wrap the `BackgroundCallCompaction()` function. A simple sleep could repro the race condition: ``` $ diff --git a/db/db_impl/db_impl_compaction_flush.cc b/db/db_impl/db_impl_compaction_flush.cc index 806bc548a..ba59efa89 100644 --- a/db/db_impl/db_impl_compaction_flush.cc +++ b/db/db_impl/db_impl_compaction_flush.cc @@ -2442,6 +2442,7 @@ void DBImpl::BGWorkCompaction(void arg) { static_cast<PrepickedCompaction*>(ca.prepicked_compaction); static_cast_with_check<DBImpl>(ca.db)->BackgroundCallCompaction( prepicked_compaction, Env::Priority::LOW); + sleep(1); delete prepicked_compaction; } $ ./db_compaction_test --gtest_filter=DBCompactionTest.CompactionLimiter db_compaction_test: util/concurrent_task_limiter_impl.cc:24: virtual rocksdb::ConcurrentTaskLimiterImpl::~ConcurrentTaskLimiterImpl(): Assertion `outstanding_tasks_ == 0' failed. Received signal 6 (Aborted) #0 /usr/local/fbcode/platform007/lib/libc.so.6(gsignal+0xcf) [0x7f02673c30ff] ?? ??:0 https://github.com/facebook/rocksdb/issues/1 /usr/local/fbcode/platform007/lib/libc.so.6(abort+0x134) [0x7f02673ac934] ?? ??:0 ... ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/8253 Test Plan: sleeps to expose race conditions Reviewed By: akankshamahajan15 Differential Revision: D28168064 Pulled By: ajkr fbshipit-source-id: 9e5167c74398d323e7975980c5cc00f450631160	4 years ago
Andrew Kryczka	c2a3424de5	Deflake DBTest.L0L1L2AndUpHitCounter (#8259 ) Summary: Previously we saw flakes on platforms like arm on CircleCI, such as the following: ``` Note: Google Test filter = DBTest.L0L1L2AndUpHitCounter [==========] Running 1 test from 1 test case. [----------] Global test environment set-up. [----------] 1 test from DBTest [ RUN ] DBTest.L0L1L2AndUpHitCounter db/db_test.cc:5345: Failure Expected: (TestGetTickerCount(options, GET_HIT_L0)) > (100), actual: 30 vs 100 [ FAILED ] DBTest.L0L1L2AndUpHitCounter (150 ms) [----------] 1 test from DBTest (150 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test case ran. (150 ms total) [ PASSED ] 0 tests. [ FAILED ] 1 test, listed below: [ FAILED ] DBTest.L0L1L2AndUpHitCounter ``` The test was totally non-deterministic, e.g., flush/compaction timing would affect how many files on each level. Furthermore, it depended heavily on platform-specific details, e.g., by having a 32KB memtable, it could become full with a very different number of entries depending on the platform. This PR rewrites the test to build a deterministic LSM with one file per level. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8259 Reviewed By: mrambacher Differential Revision: D28178100 Pulled By: ajkr fbshipit-source-id: 0a03b26e8d23c29d8297c1bccb1b115dce33bdcd	4 years ago
Jay Zhuang	8a92564a82	Update CircleCI MacOS Xcode version to 11.3.0 (#8256 ) Summary: To fix CircleCI pyenv installation failure. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8256 Reviewed By: ajkr Differential Revision: D28191772 Pulled By: jay-zhuang fbshipit-source-id: 2bbb1d5ded473e510c11c8ed27884c4ad073973f	4 years ago
sdong	c3ff14e2c1	Hint temperature of bottommost level files to FileSystem (#8222 ) Summary: As the first part of the effort of having placing different files on different storage types, this change introduces several things: (1) An experimental interface in FileSystem that specify temperature to a new file created. (2) A test FileSystemWrapper, SimulatedHybridFileSystem, that simulates HDD for a file of "warm" temperature. (3) A simple experimental feature ColumnFamilyOptions.bottommost_temperature. RocksDB would pass this value to FileSystem when creating any bottommost file. (4) A db_bench parameter that applies the (2) and (3) to db_bench. The motivation of the change is to introduce minimal changes that allow us to evolve tiered storage development. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8222 Test Plan: ./db_bench --benchmarks=fillrandom --write_buffer_size=2000000 -max_bytes_for_level_base=20000000 -level_compaction_dynamic_level_bytes --reads=100 -compaction_readahead_size=20000000 --reads=100000 -num=10000000 followed by ./db_bench --benchmarks=readrandom,stats --write_buffer_size=2000000 -max_bytes_for_level_base=20000000 -simulate_hybrid_fs_file=/tmp/warm_file_list -level_compaction_dynamic_level_bytes -compaction_readahead_size=20000000 --reads=500 --threads=16 -use_existing_db --num=10000000 and see results as expected. Reviewed By: ajkr Differential Revision: D28003028 fbshipit-source-id: 4724896d5205730227ba2f17c3fecb11261744ce	4 years ago
Peter Dillinger	d2ca04e3ed	Add more LSM info to FilterBuildingContext (#8246 ) Summary: Add `num_levels`, `is_bottommost`, and table file creation `reason` to `FilterBuildingContext`, in anticipation of more powerful Bloom-like filter support. To support this, added `is_bottommost` and `reason` to `TableBuilderOptions`, which allowed removing `reason` parameter from `rocksdb::BuildTable`. I attempted to remove `skip_filters` from `TableBuilderOptions`, because filter construction decisions should arise from options, not one-off parameters. I could not completely remove it because the public API for SstFileWriter takes a `skip_filters` parameter, and translating this into an option change would mean awkwardly replacing the table_factory if it is BlockBasedTableFactory with new filter_policy=nullptr option. I marked this public skip_filters option as deprecated because of this oddity. (skip_filters on the read side probably makes sense.) At least `skip_filters` is now largely hidden for users of `TableBuilderOptions` and is no longer used for implementing the optimize_filters_for_hits option. Bringing the logic for that option closer to handling of FilterBuildingContext makes it more obvious that hese two are using the same notion of "bottommost." (Planned: configuration options for Bloom-like filters that generalize `optimize_filters_for_hits`) Recommended follow-up: Try to get away from "bottommost level" naming of things, which is inaccurate (see VersionStorageInfo::RangeMightExistAfterSortedRun), and move to "bottommost run" or just "bottommost." Pull Request resolved: https://github.com/facebook/rocksdb/pull/8246 Test Plan: extended an existing unit test to exercise and check various filter building contexts. Also, existing tests for optimize_filters_for_hits validate some of the "bottommost" handling, which is now closely connected to FilterBuildingContext::is_bottommost through TableBuilderOptions::is_bottommost Reviewed By: mrambacher Differential Revision: D28099346 Pulled By: pdillinger fbshipit-source-id: 2c1072e29c24d4ac404c761a7b7663292372600a	4 years ago
Peter Dillinger	85becd94c1	Refactor: use TableBuilderOptions to reduce parameter lists (#8240 ) Summary: Greatly reduced the not-quite-copy-paste giant parameter lists of rocksdb::NewTableBuilder, rocksdb::BuildTable, BlockBasedTableBuilder::Rep ctor, and BlockBasedTableBuilder ctor. Moved weird separate parameter `uint32_t column_family_id` of TableFactory::NewTableBuilder into TableBuilderOptions. Re-ordered parameters to TableBuilderOptions ctor, so that `uint64_t target_file_size` is not randomly placed between uint64_t timestamps (was easy to mix up). Replaced a couple of fields of BlockBasedTableBuilder::Rep with a FilterBuildingContext. The motivation for this change is making it easier to pass along more data into new fields in FilterBuildingContext (follow-up PR). Pull Request resolved: https://github.com/facebook/rocksdb/pull/8240 Test Plan: ASAN make check Reviewed By: mrambacher Differential Revision: D28075891 Pulled By: pdillinger fbshipit-source-id: fddb3dbb8260a0e8bdcbb51b877ebabf9a690d4f	4 years ago
Akanksha Mahajan	a0e0feca62	Improve BlockPrefetcher to prefetch only for sequential scans (#7394 ) Summary: BlockPrefetcher is used by iterators to prefetch data if they anticipate more data to be used in future and this is valid for forward sequential scans. But BlockPrefetcher tracks only num_file_reads_ and not if reads are sequential. This presents problem for MultiGet with large number of keys when it reseeks index iterator and data block. FilePrefetchBuffer can end up doing large readahead for reseeks as readahead size increases exponentially once readahead is enabled. Same issue is with BlockBasedTableIterator. Add previous length and offset read as well in BlockPrefetcher (creates FilePrefetchBuffer) and FilePrefetchBuffer (does prefetching of data) to determine if reads are sequential and then prefetch. Update the last block read after cache hit to take reads from cache also in account. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7394 Test Plan: Add new unit test case Reviewed By: anand1976 Differential Revision: D23737617 Pulled By: akankshamahajan15 fbshipit-source-id: 8e6917c25ed87b285ee495d1b68dc623d71205a3	4 years ago
anand76	0db4cde6e2	Fix a memory leak in c_test (#8237 ) Summary: Don't call ```rocksdb_cache_disown_data()``` as it causes the memory allocated for ```shards_``` to be leaked. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8237 Reviewed By: jay-zhuang Differential Revision: D28039061 Pulled By: anand1976 fbshipit-source-id: c3464efe2c006b93b4be87030116a12a124598c4	4 years ago
anand76	8fe33a0a9f	Change CircleCI Windows to previous known good image (#8220 ) Summary: This is to try to resolve the VS2015 install failure in CircleCI Windows builds. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8220 Reviewed By: jay-zhuang Differential Revision: D28061834 Pulled By: anand1976 fbshipit-source-id: b2663eb60babee603669a2c2cb55f182df1cc7b1	4 years ago
sdong	cde69a7cfd	db_stress to add --open_metadata_write_fault_one_in (#8235 ) Summary: DB Stress to add --open_metadata_write_fault_one_in which would randomly fail in some file metadata modification operations during DB Open, including file creation, close, renaming and directory sync. Some operations can fail before and after the operations take place. If DB open fails, db_stress would retry without the failure ingestion, and DB is expected to open successfully. This option is enabled in crash test in half of the time. Some follow up changes would allow write failures in open time, and ingesting those failures in non-DB open cases. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8235 Test Plan: Run stress tests for a while and see failures got triggered. This can reproduce the bug fixed by https://github.com/facebook/rocksdb/pull/8192 and a similar one that fails when fsyncing parent directory. Reviewed By: anand1976 Differential Revision: D28010944 fbshipit-source-id: 36a96da4dc3633e5f7680cef3ea0a900fcdb5558	4 years ago
Duarte Nunes	3949731de3	Add WAL flush API to C client (#8226 ) Summary: The C client is missing the`manual_wal_flush` option and the `flush_wal` API. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8226 Reviewed By: ajkr Differential Revision: D28000869 Pulled By: jay-zhuang fbshipit-source-id: ed44937e7e7e75bc0dfa870a14147fbeef0c38f8	4 years ago
Akanksha Mahajan	65abb0cf71	Add 6.18, 6.19 and 6.20 to check_format_compatible.sh (#8236 ) Summary: Add 6.18, 6.19 and 6.20 to check_format_compatible.sh Pull Request resolved: https://github.com/facebook/rocksdb/pull/8236 Test Plan: ./tools/check_format_compatible.sh (tested without 2.7.fb as it was failing as mentioned in the script) Reviewed By: mrambacher Differential Revision: D28019160 Pulled By: akankshamahajan15 fbshipit-source-id: b59a7c5c14cb4c115926e9ae7c74ea586b22c9ed	4 years ago
Sahir Hoda	13c655a887	New C API to expose NewCompactOnDeletionCollectorFactory (#8233 ) Summary: New C API rocksdb_options_add_compact_on_deletion_collector_factory to expose NewCompactOnDeletionCollectorFactory Pull Request resolved: https://github.com/facebook/rocksdb/pull/8233 Reviewed By: mrambacher Differential Revision: D28018381 Pulled By: anand1976 fbshipit-source-id: 674c9ed902c91ff0d9f09e7a60c5f37b907604c6	4 years ago
mrambacher	0ca6d6297f	Rename variables in ImmutableCFOptions to avoid conflicts with ImmutableDBOptions (#8227 ) Summary: Renaming ImmutableCFOptions::info_log and statistics to logger and stats. This is stage 2 in creating an ImmutableOptions class. It is necessary because the names match those in ImmutableOptions and have different types. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8227 Reviewed By: jay-zhuang Differential Revision: D28000967 Pulled By: mrambacher fbshipit-source-id: 3bf2aa04e8f1e8724d825b7deacf41080c14420b	4 years ago
Mr-Leshiy	c2c7d5e916	Fix cast-function-type warning (#8230 ) Summary: Fixing cast-function-type which is appears during the following build: ```bash cmake .. -DFAIL_ON_WARNINGS=ON -DCMAKE_C_COMPILER=x86_64-w64-mingw32-gcc -DCMAKE_CXX_COMPILER=x86_64-w64-mingw32-g++ -DCMAKE_SYSTEM_NAME=Windows make rocksdb ``` Here is the log: ``` /home/leshiy/Work/rocksdb/port/win/env_win.cc: In constructor ‘rocksdb::port::WinClock::WinClock()’: /home/leshiy/Work/rocksdb/port/win/env_win.cc:92:9: error: cast between incompatible function types from ‘FARPROC’ {aka ‘long long int ()()’} to ‘rocksdb::port::WinClock::FnGetSystemTimePreciseAsFileTime’ {aka ‘void ()(_FILETIME)’} [-Werror=cast-function-type] 92 \| (FnGetSystemTimePreciseAsFileTime)GetProcAddress( \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 93 \| module, "GetSystemTimePreciseAsFileTime"); \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ cc1plus: all warnings being treated as errors make[2]: [CMakeFiles/rocksdb.dir/build.make:4337: CMakeFiles/rocksdb.dir/port/win/env_win.cc.obj] Error 1 make[1]: * [CMakeFiles/Makefile2:83: CMakeFiles/rocksdb.dir/all] Error 2 make: *** [Makefile:91: all] Error 2 ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/8230 Reviewed By: jay-zhuang Differential Revision: D28000215 Pulled By: mrambacher fbshipit-source-id: 874782cf48f70470e3fbd9097585bf42e810ca61	4 years ago
Adam Retter	2760c2aef8	WBWI Internal Move implementation from .h into .cpp (#8229 ) Summary: Moves some of the structural refactoring from https://github.com/facebook/rocksdb/pull/8135 into this PR. This just cleans up the code by moving implementation out of the .h file and into the .cc file. Should be considered for merge before both https://github.com/facebook/rocksdb/pull/7214 and https://github.com/facebook/rocksdb/pull/8135 Pull Request resolved: https://github.com/facebook/rocksdb/pull/8229 Reviewed By: jay-zhuang Differential Revision: D27999669 Pulled By: mrambacher fbshipit-source-id: 6eccecbf1f11bb9f5a173e86d1e7bc448bc96071	4 years ago
Adam Retter	69c986825e	Fix javadoc for keyMayExist (#8232 ) Summary: Closes https://github.com/facebook/rocksdb/issues/6985 Pull Request resolved: https://github.com/facebook/rocksdb/pull/8232 Reviewed By: jay-zhuang Differential Revision: D27999779 Pulled By: mrambacher fbshipit-source-id: a37c88d93bde2692b8be9e46e673dda7bea701b2	4 years ago
mrambacher	6bab3a34e9	Move RegisterOptions into the Configurable API (#8223 ) Summary: As previously coded, a Configurable extension would need access to code not in the public API. This change moves RegisterOptions into the Configurable class and therefore available to public extensions. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8223 Reviewed By: anand1976 Differential Revision: D27960188 Pulled By: mrambacher fbshipit-source-id: ac88b19397183df633902def5b5701b9b65fbf40	4 years ago
Saketh Are	cc1c3ee54e	Eliminate double-buffering of keys in block_based_table_builder (#8219 ) Summary: The block_based_table_builder buffers some blocks in memory to construct a good compression dictionary. Before this commit, the keys from each block were buffered separately for convenience. However, the buffered block data implicitly contains all keys. This commit eliminates the redundant key buffers and reduces memory usage. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8219 Reviewed By: ajkr Differential Revision: D27945851 Pulled By: saketh-are fbshipit-source-id: caf3cac1217201e080a1e24b542bedf20973afee	4 years ago
Sahir Hoda	d65d7d657d	Expose JemallocNodumpAllocator to C API (#8178 ) Summary: Add new C APIs to create the JemallocNodumpAllocator and set it on a Cache object. `make test` passes with and without `DISABLE_JEMALLOC=1`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8178 Reviewed By: jay-zhuang Differential Revision: D27944631 Pulled By: ajkr fbshipit-source-id: 2531729aa285a8985c58f22f093c4d53029c4a7b	4 years ago
mrambacher	01e460d538	Make types of Immutable/Mutable Options fields match that of the underlying Option (#8176 ) Summary: This PR is a first step at attempting to clean up some of the Mutable/Immutable Options code. With this change, a DBOption and a ColumnFamilyOption can be reconstructed from their Mutable and Immutable equivalents, respectively. readrandom tests do not show any performance degradation versus master (though both are slightly slower than the current 6.19 release). There are still fields in the ImmutableCFOptions that are not CF options but DB options. Eventually, I would like to move those into an ImmutableOptions (= ImmutableDBOptions+ImmutableCFOptions). But that will be part of a future PR to minimize changes and disruptions. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8176 Reviewed By: pdillinger Differential Revision: D27954339 Pulled By: mrambacher fbshipit-source-id: ec6b805ba9afe6e094bffdbd76246c2d99aa9fad	4 years ago
Jay Zhuang	f0fca2b1d5	Add internal compaction API for Secondary instance (#8171 ) Summary: Add compaction API for secondary instance, which compact the files to a secondary DB path without installing to the LSM tree. The API will be used to remote compaction. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8171 Test Plan: `make check` Reviewed By: ajkr Differential Revision: D27694545 Pulled By: jay-zhuang fbshipit-source-id: 8ff3ec1bffdb2e1becee994918850c8902caf731	4 years ago
Hans Holmberg	e85d8a6517	Add ZenFS to plugin list (#8218 ) Summary: Add ZenFS, a file system for zoned block devices, to PLUGINS.md Pull Request resolved: https://github.com/facebook/rocksdb/pull/8218 Reviewed By: jay-zhuang Differential Revision: D27944376 Pulled By: ajkr fbshipit-source-id: c9ea2e9814001ccd7c56d7ef4d38e20dfeb48d1e	4 years ago
Zhichao Cao	09a9ec3ac0	Fix the false positive alert of CF consistency check in WAL recovery (#8207 ) Summary: In current RocksDB, in recover the information form WAL, we do the consistency check for each column family when one WAL file is corrupted and PointInTimeRecovery is set. However, it will report a false positive alert on "SST file is ahead of WALs" when one of the CF current log number is greater than the corrupted WAL number (CF contains the data beyond the corrupted WAl) due to a new column family creation during flush. In this case, a new WAL is created (it is empty) during a flush. Also, due to some reason (e.g., storage issue or crash happens before SyncCloseLog is called), the old WAL is corrupted. The new CF has no data, therefore, it does not have the consistency issue. Fix: when checking cfd->GetLogNumber() > corrupted_wal_number also check cfd->GetLiveSstFilesSize() > 0. So the CFs with no SST file data will skip the check here. Note potential ignored inconsistency caused due to fix: empty CF can also be caused by write+delete. In this case, after flush, there is no SST files being generated. However, this CF still have the log in the WAL. When the WAL is corrupted, the DB might be inconsistent. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8207 Test Plan: added unit test, make crash_test Reviewed By: riversand963 Differential Revision: D27898839 Pulled By: zhichao-cao fbshipit-source-id: 931fc2d8b92dd00b4169bf84b94e712fd688a83e	4 years ago
mrambacher	47b424f4bd	Add check to cmake to see if we need to link against -latomic (#8183 ) Summary: For some compilers/environments (e.g. Clang, riscv64), we need to link against -latomic. Check if this is a requirement and add the library to the third-party libs if it is. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8183 Reviewed By: pdillinger Differential Revision: D27773564 Pulled By: mrambacher fbshipit-source-id: 68e15d823144f83fb02221c7bf5b1e43323419bf	4 years ago
Yanqin Jin	314352761f	Ignore comparator name mismatch in ldb manifest dump (#8216 ) Summary: RocksDB allows user-specified custom comparators which may not be known to `ldb`, a built-in tool for checking/mutating the database. Therefore, column family comparator names mismatch encountered during manifest dump should not prevent the dumping from proceeding. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8216 Test Plan: ``` make check ``` Also manually do the following ``` KEEP_DB=1 ./db_with_timestamp_basic_test ./ldb --db=<db> manifest_dump --verbose ``` The ldb should succeed and print something like: ``` ... --------------- Column family "default" (ID 0) -------------- log number: 6 comparator: <TestComparator>, but the comparator object is not available. ... ``` Reviewed By: ltamasi Differential Revision: D27927581 Pulled By: riversand963 fbshipit-source-id: f610b2c842187d17f575362070209ee6b74ec6d4	4 years ago
sdong	4985cea141	Add comment to DisableManualCompaction() (#8186 ) Summary: Add comment to DisableManualCompaction() which was missing. Also explictly return from DBImpl::CompactRange() to avoid memtable flush when manual compaction is disabled. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8186 Test Plan: Run existing unit tests. Reviewed By: jay-zhuang Differential Revision: D27744517 fbshipit-source-id: 449548a48905903b888dc9612bd17480f6596a71	4 years ago
Akanksha Mahajan	596e9008e4	Stall writes in WriteBufferManager when memory_usage exceeds buffer_size (#7898 ) Summary: When WriteBufferManager is shared across DBs and column families to maintain memory usage under a limit, OOMs have been observed when flush cannot finish but writes continuously insert to memtables. In order to avoid OOMs, when memory usage goes beyond buffer_limit_ and DBs tries to write, this change will stall incoming writers until flush is completed and memory_usage drops. Design: Stall condition: When total memory usage exceeds WriteBufferManager::buffer_size_ (memory_usage() >= buffer_size_) WriterBufferManager::ShouldStall() returns true. DBImpl first block incoming/future writers by calling write_thread_.BeginWriteStall() (which adds dummy stall object to the writer's queue). Then DB is blocked on a state State::Blocked (current write doesn't go through). WBStallInterface object maintained by every DB instance is added to the queue of WriteBufferManager. If multiple DBs tries to write during this stall, they will also be blocked when check WriteBufferManager::ShouldStall() returns true. End Stall condition: When flush is finished and memory usage goes down, stall will end only if memory waiting to be flushed is less than buffer_size/2. This lower limit will give time for flush to complete and avoid continous stalling if memory usage remains close to buffer_size. WriterBufferManager::EndWriteStall() is called, which removes all instances from its queue and signal them to continue. Their state is changed to State::Running and they are unblocked. DBImpl then signal all incoming writers of that DB to continue by calling write_thread_.EndWriteStall() (which removes dummy stall object from the queue). DB instance creates WBMStallInterface which is an interface to block and signal DBs during stall. When DB needs to be blocked or signalled by WriteBufferManager, state_for_wbm_ state is changed accordingly (RUNNING or BLOCKED). Pull Request resolved: https://github.com/facebook/rocksdb/pull/7898 Test Plan: Added a new test db/db_write_buffer_manager_test.cc Reviewed By: anand1976 Differential Revision: D26093227 Pulled By: akankshamahajan15 fbshipit-source-id: 2bbd982a3fb7033f6de6153aa92a221249861aae	4 years ago
Peter Dillinger	95f6add746	Revert Ribbon starting level support from #8198 (#8212 ) Summary: This partially reverts commit `10196d7edc`. The problem with this change is because of important filter use cases: FIFO compaction and SST writer. FIFO "compaction" always uses level 0 so would only use Ribbon filters if specifically including level 0 for the Ribbon filter policy. SST writer sets level_at_creation=-1 to indicate unknown level, and this would be treated the same as level 0 unless fixed. We are keeping the part about committing to permanent schema, which is only changes to API comments and HISTORY.md. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8212 Test Plan: CI Reviewed By: jay-zhuang Differential Revision: D27896468 Pulled By: pdillinger fbshipit-source-id: 50a775f7cba5d64fb729d9b982e355864020596e	4 years ago
Andrew Gallagher	2e5de5a2c3	Cleanup include (#8208 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8208 Make include of "file_system.h" use the same include path as everywhere else. Reviewed By: riversand963, akankshamahajan15 Differential Revision: D27881606 fbshipit-source-id: fc1e076229fde21041a813c655ce017b5070c8b3	4 years ago
Andrew Kryczka	905dd17b35	Fix seqno in ingested file boundary key metadata (#8209 ) Summary: Fixes https://github.com/facebook/rocksdb/issues/6245. Adapted from https://github.com/facebook/rocksdb/issues/8201 and https://github.com/facebook/rocksdb/issues/8205. Previously we were writing the ingested file's smallest/largest internal keys with sequence number zero, or `kMaxSequenceNumber` in case of range tombstone. The former (sequence number zero) is incorrect and can lead to files being incorrectly ordered. The fix in this PR is to overwrite boundary keys that have sequence number zero with the ingested file's assigned sequence number. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8209 Test Plan: repro unit test Reviewed By: riversand963 Differential Revision: D27885678 Pulled By: ajkr fbshipit-source-id: 4a9f2c6efdfff81c3a9923e915ea88b250ee7b6a	4 years ago
Levi Tamasi	1b99947e99	Mention PR 8206 in HISTORY.md (#8210 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8210 Reviewed By: akankshamahajan15 Differential Revision: D27887612 Pulled By: ltamasi fbshipit-source-id: 0db8d0b6047334dc47fe30a98804449043454386	4 years ago
Jay Zhuang	a89740fbc6	Fix unittest no space issue (#8204 ) Summary: Unittest reports no space from time to time, which can be reproduced on a small memory machine with SHM. It's caused by large WAL files generated during the test, which is preallocated, but didn't truncate during close(). Adding the missing APIs to set preallocation. It added arm test as nightly build, as the test runs more than 1 hour. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8204 Test Plan: test on small memory arm machine Reviewed By: mrambacher Differential Revision: D27873145 Pulled By: jay-zhuang fbshipit-source-id: f797c429d6bc13cbcc673bc03fcc72adda55f506	4 years ago
Jay Zhuang	a345b4d60d	Move arm build from travis to circleci (#8203 ) Summary: Moving ARM build from travis to CircleCI. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8203 Test Plan: CI Reviewed By: ajkr Differential Revision: D27861753 Pulled By: jay-zhuang fbshipit-source-id: 5e36a67f6fbb921c2ed80b284ba2de485411937b	4 years ago
Yanqin Jin	a376c22066	Handle rename() failure in non-local FS (#8192 ) Summary: In a distributed environment, a file `rename()` operation can succeed on server (remote) side, but the client can somehow return non-ok status to RocksDB. Possible reasons include network partition, connection issue, etc. This happens in `rocksdb::SetCurrentFile()`, which can be called in `LogAndApply() -> ProcessManifestWrites()` if RocksDB tries to switch to a new MANIFEST. We currently always delete the new MANIFEST if an error occurs. This is problematic in distributed world. If the server-side successfully updates the CURRENT file via renaming, then a subsequent `DB::Open()` will try to look for the new MANIFEST and fail. As a fix, we can track the execution result of IO operations on the new MANIFEST. - If IO operations on the new MANIFEST fail, then we know the CURRENT must point to the original MANIFEST. Therefore, it is safe to remove the new MANIFEST. - If IO operations on the new MANIFEST all succeed, but somehow we end up in the clean up code block, then we do not know whether CURRENT points to the new or old MANIFEST. (For local POSIX-compliant FS, it should still point to old MANIFEST, but it does not matter if we keep the new MANIFEST.) Therefore, we keep the new MANIFEST. - Any future `LogAndApply()` will switch to a new MANIFEST and update CURRENT. - If process reopens the db immediately after the failure, then the CURRENT file can point to either the new MANIFEST or the old one, both of which exist. Therefore, recovery can succeed and ignore the other. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8192 Test Plan: make check Reviewed By: zhichao-cao Differential Revision: D27804648 Pulled By: riversand963 fbshipit-source-id: 9c16f2a5ce41bc6aadf085e48449b19ede8423e4	4 years ago
Levi Tamasi	0c6e4674a6	Fix a data race related to DB properties (#8206 ) Summary: Historically, the DB properties `rocksdb.cur-size-active-mem-table`, `rocksdb.cur-size-all-mem-tables`, and `rocksdb.size-all-mem-tables` called the method `MemTable::ApproximateMemoryUsage` for mutable memtables, which is not safe without synchronization. This resulted in data races with memtable inserts. The patch changes the code handling these properties to use `MemTable::ApproximateMemoryUsageFast` instead, which returns a cached value backed by an atomic variable. Two test cases had to be updated for this change. `MemoryTest.MemTableAndTableReadersTotal` was fixed by increasing the value size used so each value ends up in its own memtable, which was the original intention (note: the test has been broken in the sense that the test code didn't consider that memtable sizes below 64 KB get increased to 64 KB by `SanitizeOptions`, and has been passing only by accident). `DBTest.MemoryUsageWithMaxWriteBufferSizeToMaintain` relies on completely up-to-date values and thus was changed to use `ApproximateMemoryUsage` directly instead of going through the DB properties. Note: this should be safe in this case since there's only a single thread involved. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8206 Test Plan: `make check` Reviewed By: riversand963 Differential Revision: D27866811 Pulled By: ltamasi fbshipit-source-id: 7bd754d0565e0a65f1f7f0e78ffc093beef79394	4 years ago
Yanqin Jin	b0e20194ea	Handle blob files when options.best_efforts_recovery is true (#8180 ) Summary: If `options.best_efforts_recovery == true`, RocksDB currently tolerates missing table files and recovers to the latest version without missing table files (not considering WAL). It is necessary to handle blob files as well to make the feature more complete. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8180 Test Plan: make check Reviewed By: ltamasi Differential Revision: D27840556 Pulled By: riversand963 fbshipit-source-id: 041685d0dc2e7779ac4f0374c07a8a327704aa5e	4 years ago
Akanksha Mahajan	c377c2ba15	Fix flaky test BackupableDBTest.FileSizeForIncremental (#8197 ) Summary: Test was flaky because for kUseDbSessionId naming, blob files use naming scheme kLegacyCrc32cAndFileSize. So expected number of files because of collision can vary. So disabling blobdb for this test case. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8197 Reviewed By: pdillinger Differential Revision: D27836997 Pulled By: akankshamahajan15 fbshipit-source-id: 5eb21a5f4acae3d6b730a9e1b207264fbc18cb80	4 years ago
Akanksha Mahajan	531a5f88a1	Update release version to 6.20 (#8199 ) Summary: Update release version to 6.20 Pull Request resolved: https://github.com/facebook/rocksdb/pull/8199 Test Plan: No code change Reviewed By: ajkr Differential Revision: D27838750 Pulled By: akankshamahajan15 fbshipit-source-id: f02f722fc6bdd37d626d47a0e932bbecea3507a8	4 years ago
Peter Dillinger	10196d7edc	Ribbon long-term support, starting level support (#8198 ) Summary: Since the Ribbon filter schema seems good (compatible back to 6.15.0), this change commits to long term support of the SST schema, even though we expect the API for enabling Ribbon to change (still called NewExperimentalRibbonFilterPolicy). This also adds support for "hybrid" configuration in which some levels use Bloom (higher levels, lower numbered) for speed and the rest use Ribbon (lower levels, higher numbered) for memory space efficiency. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8198 Test Plan: unit test added, crash test support Reviewed By: jay-zhuang Differential Revision: D27831232 Pulled By: pdillinger fbshipit-source-id: 90e528677689474d293ed6710b42ba89fbd5b5ab	4 years ago
Adam Retter	90e245697f	Fix Windows strcmp for Unicode (#8190 ) Summary: The code for strcmp that was present does work when compiled for Windows unicode file paths. Needs backporting to: * 6.17.fb * 6.18.fb * 6.19.fb Pull Request resolved: https://github.com/facebook/rocksdb/pull/8190 Reviewed By: akankshamahajan15 Differential Revision: D27765588 Pulled By: jay-zhuang fbshipit-source-id: 89f8a5ac61fd7edc758340dfd335b0a5f96dae6e	4 years ago
mrambacher	c871142988	Fix Makefile when multiple targets are invoked (#8195 ) Summary: - Fixes the makefile to do the right thing when invoking multiple targets (e.g. make shared_lib install-shared). - Fixes the building of db_stress in shared lib mode. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8195 Reviewed By: pdillinger Differential Revision: D27803452 Pulled By: mrambacher fbshipit-source-id: 7c285d267770a359eb47f25855affdf58687e0e4	4 years ago
mrambacher	4c41e51c07	Add Blob Options to C API (#8148 ) Summary: Added the Blob option settings from the AdvancedColmnFamilyOptions to the C API. There are no tests for getting/setting options in the C API currently, hence no specific test plans. Should there be a some? Pull Request resolved: https://github.com/facebook/rocksdb/pull/8148 Reviewed By: ltamasi Differential Revision: D27568495 Pulled By: mrambacher fbshipit-source-id: 3a52b784467ea2c4bc58be5f75c5d41f0a5c55d6	4 years ago
Akanksha Mahajan	00803d619c	Fix flaky failure in DBSSTest.DBWithSstFileManagerForBlobFilesWithGC (#8196 ) Summary: Updated the test to wait until all trash files are deleted by SSTFileManager in the background. Since deletion runs in background so number of files deleted might not always be as expected. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8196 Reviewed By: jay-zhuang Differential Revision: D27812273 Pulled By: akankshamahajan15 fbshipit-source-id: d3ace1db34f91254b52fa455e09844d02801f58e	4 years ago

... 3 4 5 6 7 ...

10174 Commits (d6006f9c9bcf196c1825f4ed92b3bd922a453bd8) All Branches Search

10174 Commits (d6006f9c9bcf196c1825f4ed92b3bd922a453bd8)

All Branches