rocksdb

Commit Graph

Author	SHA1	Message	Date
Peter Dillinger	1ba92b8582	Only search specific directories in Python check (#6225 ) Summary: The new Python syntax check could fail if external entities were cloned or symlinked to a subdir in a rocksdb git clone. (E.g. Facebook internal LITE build.) Only look for Python files in specific subdirs Pull Request resolved: https://github.com/facebook/rocksdb/pull/6225 Test Plan: python tools/check_all_python.py (still 34 files checked) Reviewed By: gfosco Differential Revision: D19186110 Pulled By: pdillinger fbshipit-source-id: 1fefa54e36b32cd5d96d3d1a43e8a2a694c22ea5	6 years ago
sdong	f295b099f6	BlockBasedTable::ApproximateSize() should use total order seek (#6222 ) Summary: Right now BlockBasedTable::ApproximateSize() uses default setting about whether to use total order seek. There is no reason for that. There is no reason to do any filtering for approximate size boundary key, and it may introduce bugs. Disable it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6222 Test Plan: Run existing tests Differential Revision: D19184787 fbshipit-source-id: 64180660bd2800914fff75104172b61c06f0b1c9	6 years ago
Peter Dillinger	873331fe49	Refactor pulling out parts of StressTest::OperateDb (#6195 ) Summary: Complete some refactoring called for in https://github.com/facebook/rocksdb/issues/6148. Somehow I got some 'make format' in here for files I didn't change, but that should be OK. (I'm not sure why "hide whitespace changes" doesn't seem to help in review.) Not addressed in this PR: some operations simply print to stdout rather than failing on discovering a bad status or inconsistency. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6195 Differential Revision: D19131067 Pulled By: pdillinger fbshipit-source-id: 4f416e6b792023989ef119f385fe122426cb825d	6 years ago
Maysam Yabandeh	77d5ba7887	Revert "Add kHashSearch to stress tests (#6210 )" (#6220 ) Summary: This reverts commit `54f9092b0c`. It making our daily stress tests fail. Revert it until the issues are fixed. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6220 Differential Revision: D19179881 Pulled By: maysamyabandeh fbshipit-source-id: 99de0eaf776567fa81110b9ad2608234a16083ce	6 years ago
Peter Dillinger	9ff569bdfc	Temporarily disable level_compaction_dynamic_level_bytes in crash test (#6217 ) Summary: We're seeing assertion violations like this in crash test: db_stress: table/block_based/block_based_table_reader.cc:4129: virtual uint64_t rocksdb::BlockBasedTable::ApproximateSize(const rocksdb::Slice&, const rocksdb::Slice&, rocksdb::TableReaderCaller): Assertion `end_offset >= start_offset' failed.*** And ApproximateSize appears only to be called with the level_compaction_dynamic_level_bytes option. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6217 Test Plan: temporarily put an assert(false) in ApproximateSize and briefly run 'make crash_test' Differential Revision: D19179174 Pulled By: pdillinger fbshipit-source-id: 506e6549aea0da19b363a1a6da04373c364d92e4	6 years ago
Peter Dillinger	5b18729d7d	Syntax check python files on testing (#6209 ) Summary: Adds a python script to syntax check all python files in the repository and report any errors. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6209 Test Plan: 'make check' with and without seeded syntax errors. Also look for "No syntax errors in 34 .py files" on success, and in java_test CI output Differential Revision: D19166756 Pulled By: pdillinger fbshipit-source-id: 537df464b767260d66810b4cf4c9808a026c58a4	6 years ago
Maysam Yabandeh	54f9092b0c	Add kHashSearch to stress tests (#6210 ) Summary: Beside extending index_type to kHashSearch, it clarifies in the code base that this feature is incompatible with index_block_restart_interval > 1. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6210 Test Plan: ``` make -j32 crash_test Differential Revision: D19166567 Pulled By: maysamyabandeh fbshipit-source-id: 3aaf75a70a8b462d372d43aac69dbd10df303ec7	6 years ago
Levi Tamasi	130e710056	Add BlobDB GC cutoff parameter to db_bench (#6211 ) Summary: The patch makes it possible to set the BlobDB configuration option `garbage_collection_cutoff` on the command line. In addition, it changes the `db_bench` code so that the default values of BlobDB related parameters are taken from the defaults of the actual BlobDB configuration options (note: this changes the the default of `blob_db_bytes_per_sync`). Pull Request resolved: https://github.com/facebook/rocksdb/pull/6211 Test Plan: Ran `db_bench` with various values of the new parameter. Differential Revision: D19166895 Pulled By: ltamasi fbshipit-source-id: 305ccdf0123b9db032b744715810babdc3e3b7d5	6 years ago
sdong	ef91894798	Fix potential overflow in CalculateSSTWriteHint() (#6212 ) Summary: level passed into ColumnFamilyData::CalculateSSTWriteHint() can be smaller than base_level in current version, which would cause overflow. We see ubsan complains: db/compaction/compaction_job.cc:1511:39: runtime error: load of value 4294967295, which is not a valid value for type 'Env::WriteLifeTimeHint' and I hope this commit fixes it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6212 Test Plan: Run existing tests and see them to pass. Differential Revision: D19168442 fbshipit-source-id: bf8fd86f85478ecfa7556db46dc3242de8c83dc9	6 years ago
Peter Dillinger	7da8c067a2	Avoid heading tags in javadocs; fix EnvironmentTest (#6208 ) Summary: Should fix Travis build error that randomly showed up upon using Java 13 version of javadoc. AdvancedColumnFamilyOptionsInterface.java:257: error: unexpected heading used: <H2>, compared to implicit preceding heading: <H3> According to this reference https://bugs.openjdk.java.net/browse/JDK-8220379 it should work to start at h4, but that didn't work, so avoiding headings should be fine. Also fix Java EnvironmentTest for JDK13. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6208 Test Plan: Travis run on PR (don't have Java 13 handy) Differential Revision: D19163105 Pulled By: pdillinger fbshipit-source-id: 4a9419cbe7ef780fba771b8a1508e1ea80d17b3e	6 years ago
Jermy Li	f453bcb40d	Add unit tests for concurrent CF iteration and drop (#6180 ) Summary: improve https://github.com/facebook/rocksdb/issues/6147 Pull Request resolved: https://github.com/facebook/rocksdb/pull/6180 Differential Revision: D19148936 fbshipit-source-id: f691c9879fd51d54e96c1a99670cf85ca4485a89	6 years ago
sdong	02193ce406	Prevent file prefetch when mmap is enabled. (#6206 ) Summary: Right now, sometimes file prefetching is still on when mmap is enabled. This causes bug of reading wrong data. In this commit, we remove all those possible paths. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6206 Test Plan: make crash_test with compaction_readahead_size, which used to fail. RUn all existing tests. Differential Revision: D19149429 fbshipit-source-id: 9e18ea8c566e416aac9647bdd05afe596634791b	6 years ago
Peter Dillinger	dfb259e48d	Fix syntax error (!) in db_crashtest.py (#6207 ) Summary: Fixes syntax error from https://github.com/facebook/rocksdb/pull/6203 Pull Request resolved: https://github.com/facebook/rocksdb/pull/6207 Test Plan: make blackbox_crash_test -> no more syntax error Differential Revision: D19161752 Pulled By: pdillinger fbshipit-source-id: b3032f296041ab56307762622b9ef6c03a8379aa	6 years ago
Zhichao Cao	c399704c7a	Fix: remove the potential dead store variable in block_based_table_reader.cc (#6204 ) Summary: buf_offset does not need to get the value from req.len for othe final block. It can cause test fail for clan_analyze. Remove it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6204 Test Plan: pass make asan_check Differential Revision: D19145335 Pulled By: zhichao-cao fbshipit-source-id: 8f6e74565746381b5c5ef598b97d746517b36e5b	6 years ago
anand76	2afea29762	Add VerifyChecksum() to db_stress (#6203 ) Summary: Add an option to db_stress, verify_checksum_one_in, to call DB::VerifyChecksum() once every N ops. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6203 Differential Revision: D19145753 Pulled By: anand1976 fbshipit-source-id: d09edf21f309ad53aa40dd25b7a563d50665fd8b	6 years ago
Mike Kolupaev	ce63eda6f0	Fix use-after-free and double-deleting files in BackgroundCallPurge() (#6193 ) Summary: The bad code was: ``` mutex.Lock(); // `mutex` protects `container` for (auto& x : container) { mutex.Unlock(); // do stuff to x mutex.Lock(); } ``` It's incorrect because both `x` and the iterator may become invalid if another thread modifies the container while this thread is not holding the mutex. Broken by https://github.com/facebook/rocksdb/pull/5796 - it replaced a `while (!container.empty())` loop with a `for (auto x : container)`. (RocksDB code does a lot of such unlocking+re-locking of mutexes, and this type of bugs comes up a lot :/ ) Pull Request resolved: https://github.com/facebook/rocksdb/pull/6193 Test Plan: Ran some logdevice integration tests that were crashing without this fix. Differential Revision: D19116874 Pulled By: al13n321 fbshipit-source-id: 9672bc4227c1b68f46f7436db2b96811adb8c703	6 years ago
Levi Tamasi	cbd58af9c3	Update HISTORY.md with recent BlobDB related changes Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6202 Differential Revision: D19144158 Pulled By: ltamasi fbshipit-source-id: 3e2522ced458568e3a2a045663704e30ab0ac223	6 years ago
sdong	9f250dd88e	crash_test: two fixes (#6200 ) Summary: Fix two crash test issues: 1. sync mode should not run with disable_wal=true 2. disable "compaction_readahead_size" for now. With it on, some block checksum verification failure will happen in compaction paths. Not sure why, but disable it for now to keep the test clean. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6200 Test Plan: Run "make crash_test" and "make crash_test_with_atomic_flush" and see it runs way longer than before the fix without failing. Differential Revision: D19143493 fbshipit-source-id: 438fad52fbda60aafd142e1b65578addbe7d72b1	6 years ago
Adam Retter	2d16709487	Small tidy and speed up of the travis build (#6181 ) Summary: Cuts about 30-60 seconds to from each Travis Linux build, and about 15 minutes from each macOS build Pull Request resolved: https://github.com/facebook/rocksdb/pull/6181 Differential Revision: D19098357 Pulled By: pdillinger fbshipit-source-id: 863dd1ab09076ad9b03c2b7914908359628315ae	6 years ago
解轶伦	39fcaf8246	delete superversions in BackgroundCallPurge (#6146 ) Summary: I found that CleanupSuperVersion() may block Get() for 30ms+ （per MemTable is 256MB）. Then I found "delete sv" in ~SuperVersion() takes the time. The backtrace looks like this DBImpl::GetImpl() -> DBImpl::ReturnAndCleanupSuperVersion() -> DBImpl::CleanupSuperVersion() : delete sv; -> ~SuperVersion() I think it's better to delete in a background thread, please review it。 Pull Request resolved: https://github.com/facebook/rocksdb/pull/6146 Differential Revision: D18972066 fbshipit-source-id: 0f7b0b70b9bb1e27ad6fc1c8a408fbbf237ae08c	6 years ago
Levi Tamasi	02aa22957a	Set CompactionIterator::valid_ to false when PrepareBlobOutput indicates error Summary: With https://github.com/facebook/rocksdb/pull/6121, errors returned by `PrepareBlobValue` result in `CompactionIterator::status_` being set to `Corruption` or `IOError` as appropriate, however, `valid_` is not set to `false`. The error is eventually propagated in `CompactionJob::ProcessKeyValueCompaction` but only after the main loop completes. Setting `valid_` to `false` upon errors enables us to terminate the loop early and fail the compaction sooner. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6170 Test Plan: Ran `make check` and used `db_bench` in BlobDB mode. fbshipit-source-id: a2ca88a3ca71115e2605bd34a4c795d8a28bef27	6 years ago
anand1976	1be48cb895	Fix crash in Transaction::MultiGet() when num_keys > 32 Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6192 Test Plan: Add a unit test that fails without the fix and passes now make check Differential Revision: D19124781 Pulled By: anand1976 fbshipit-source-id: 8c8cb6fa16c3fc23ec011e168561a13f76bbd783	6 years ago
Yanqin Jin	7678cf2df7	Use Env::LoadEnv to create custom Env objects (#6196 ) Summary: As title. Previous assumption was that the underlying lib can always return a shared_ptr<Env>. This is too strong. Therefore, we use Env::LoadEnv to relax it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6196 Test Plan: make check Differential Revision: D19133199 Pulled By: riversand963 fbshipit-source-id: c83a0c02a42610d077054f2de1acfc45126b3a75	6 years ago
Maysam Yabandeh	68d5d82d1f	Wait for CancelAllBackgroundWork before Close in db stress (#6191 ) Summary: In https://github.com/facebook/rocksdb/issues/6174 we fixed the stress test to respect the CancelAllBackgroundWork + Close order for WritePrepared transactions. The fix missed to take into account that some invocation of CancelAllBackgroundWork are with wait=false parameter which essentially breaks the order. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6191 Differential Revision: D19102709 Pulled By: maysamyabandeh fbshipit-source-id: f4e7b5fdae47ff1c1ac284ba1cf67d5d3f3d03eb	6 years ago
Zhichao Cao	cddd637997	Merge adjacent file block reads in RocksDB MultiGet() and Add uncompressed block to cache (#6089 ) Summary: In the current MultiGet, if the KV-pairs do not belong to the data blocks in the block cache, multiple blocks are read from a SST. It will trigger one block read for each block request and read them in parallel. In some cases, if some data blocks are adjacent in the SST, the reads for these blocks can be combined to a single large read, which can reduce the system calls and reduce the read latency if possible. Considering to fill the block cache, if multiple data blocks are in the same memory buffer, we need to copy them to the heap separately. Therefore, only in the case that 1) data block compression is enabled, and 2) compressed block cache is null, we can do combined read. Otherwise, extra memory copy is needed, which may cause extra overhead. In the current case, data blocks will be uncompressed to a new memory space. Also, in the case that 1) data block compression is enabled, and 2) compressed block cache is null, it is possible the data block is actually not compressed. In the current logic, these data blocks will not be added to the uncompressed_cache. So if memory buffer is shared and the data block is not compressed, the data block are copied to the head and fill the cache. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6089 Test Plan: Added test case to ParallelIO.MultiGet. Pass make asan_check Differential Revision: D18734668 Pulled By: zhichao-cao fbshipit-source-id: 67c5615ed373e51e42635fd74b36f8f3a66d5da4	6 years ago
sdong	bcc372c0c3	Add some new options to crash_test (#6176 ) Summary: Several options are trivially added to crash test and random values are picked. Made simple test run non-dynamic level and normal test run dynamic level. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6176 Test Plan: Run crash_test and watch the printing Differential Revision: D19053955 fbshipit-source-id: 958cb43c968541ebd87ed4d91e778bd1d40e7502	6 years ago
Levi Tamasi	2d095b4dbc	Update HISTORY.md with the recent memtable trimming fixes Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6194 Differential Revision: D19125292 Pulled By: ltamasi fbshipit-source-id: d41aca2755ec4bec07feedd6b561e8d18606a931	6 years ago
sdong	35126dd874	db_stress: preserve all historic manifest files (#6142 ) Summary: compaction history is stored in manifest files. Preserve all of them in db_stress would help debugging. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6142 Test Plan: Run db_stress and observe that manifest files are preserved. Run whole crash_test and see how DB directory looks like. Differential Revision: D19047026 fbshipit-source-id: f83c3e0bb5332b1b4768be5dcee56a24f9b760a9	6 years ago
Zhichao Cao	fbda25f57a	db_stress: generate the key based on Zipfian distribution (hot key) (#6163 ) Summary: In the current db_stress, all the keys are generated randomly and follows the uniform distribution. In order to test some corner cases that some key are always updated or read, we need to generate the key based on other distributions. In this PR, the key is generated based on Zipfian distribution and the skewness can be controlled by setting hot_key_alpha (0.8 to 1.5 is suggested). The larger hot_key_alpha is, the more skewed will be. Not that, usually, if hot_key_alpha is larger than 2, there might be only 1 or 2 keys that are generated. If hot_key_alpha is 0, it generate the key follows uniform distribution (random key) Testing plan: pass the db_stress and printed the keys to make sure it follows the distribution. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6163 Differential Revision: D18978480 Pulled By: zhichao-cao fbshipit-source-id: e123b4865477f7478e83fb581f9576bada334680	6 years ago
Levi Tamasi	db7c687523	Fix a data race related to memtable trimming (#6187 ) Summary: https://github.com/facebook/rocksdb/pull/6177 introduced a data race involving `MemTableList::InstallNewVersion` and `MemTableList::NumFlushed`. The patch fixes this by caching whether the current version has any memtable history (i.e. flushed memtables that are kept around for transaction conflict checking) in an `std::atomic<bool>` member called `current_has_history_`, similarly to how `current_memory_usage_excluding_last_` is handled. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6187 Test Plan: ``` make clean COMPILE_WITH_TSAN=1 make db_test -j24 ./db_test ``` Differential Revision: D19084059 Pulled By: ltamasi fbshipit-source-id: 327a5af9700fb7102baea2cc8903c085f69543b9	6 years ago
Peter Dillinger	a92bd0a183	Optimize memory and CPU for building new Bloom filter (#6175 ) Summary: The filter bits builder collects all the hashes to add in memory before adding them (because the number of keys is not known until we've walked over all the keys). Existing code uses a std::vector for this, which can mean up to 2x than necessary space allocated (and not freed) and up to ~2x write amplification in memory. Using std::deque uses close to minimal space (for large filters, the only time it matters), no write amplification, frees memory while building, and no need for large contiguous memory area. The only cost is more calls to allocator, which does not appear to matter, at least in benchmark test. For now, this change only applies to the new (format_version=5) Bloom filter implementation, to ease before-and-after comparison downstream. Temporary memory use during build is about the only way the new Bloom filter could regress vs. the old (because of upgrade to 64-bit hash) and that should only matter for full filters. This change should largely mitigate that potential regression. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6175 Test Plan: Using filter_bench with -new_builder option and 6M keys per filter is like large full filter (improvement). 10k keys and no -new_builder is like partitioned filters (about the same). (Corresponding configurations run simultaneously on devserver.) std::vector impl (before) $ /usr/bin/time -v ./filter_bench -impl=2 -quick -new_builder -working_mem_size_mb=1000 - average_keys_per_filter=6000000 Build avg ns/key: 52.2027 Maximum resident set size (kbytes): 1105016 $ /usr/bin/time -v ./filter_bench -impl=2 -quick -working_mem_size_mb=1000 - average_keys_per_filter=10000 Build avg ns/key: 30.5694 Maximum resident set size (kbytes): 1208152 std::deque impl (after) $ /usr/bin/time -v ./filter_bench -impl=2 -quick -new_builder -working_mem_size_mb=1000 - average_keys_per_filter=6000000 Build avg ns/key: 39.0697 Maximum resident set size (kbytes): 1087196 $ /usr/bin/time -v ./filter_bench -impl=2 -quick -working_mem_size_mb=1000 - average_keys_per_filter=10000 Build avg ns/key: 30.9348 Maximum resident set size (kbytes): 1207980 Differential Revision: D19053431 Pulled By: pdillinger fbshipit-source-id: 2888e748723a19d9ea40403934f13cbb8483430c	6 years ago
anand76	ad34faba15	Fix unity test (#6178 ) Summary: Fix the test failure. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6178 Differential Revision: D19071208 Pulled By: maysamyabandeh fbshipit-source-id: 71622832ac93ff2663946c546d9642d5b9e3d194	6 years ago
Maysam Yabandeh	4b97812da8	Add long-running snapshots to stress tests (#6171 ) Summary: Current implementation holds on to 10% of snapshots for 10x longer, and 1% of snapshots 100x longer. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6171 Test Plan: ``` make -j32 crash_test Differential Revision: D19038399 Pulled By: maysamyabandeh fbshipit-source-id: 75da2dbb5c47a0b3f37d299b8719e392b73b42c0	6 years ago
Levi Tamasi	bd8404feff	Do not schedule memtable trimming if there is no history (#6177 ) Summary: We have observed an increase in CPU load caused by frequent calls to `ColumnFamilyData::InstallSuperVersion` from `DBImpl::TrimMemtableHistory` when using `max_write_buffer_size_to_maintain` to limit the amount of memtable history maintained for transaction conflict checking. Part of the issue is that trimming can potentially be scheduled even if there is no memtable history. The patch adds a check that fixes this. See also https://github.com/facebook/rocksdb/pull/6169. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6177 Test Plan: Compared `perf` output for ``` ./db_bench -benchmarks=randomtransaction -optimistic_transaction_db=1 -statistics -stats_interval_seconds=1 -duration=90 -num=500000 --max_write_buffer_size_to_maintain=16000000 --transaction_set_snapshot=1 --threads=32 ``` before and after the change. There is a significant reduction for the call chain `rocksdb::DBImpl::TrimMemtableHistory` -> `rocksdb::ColumnFamilyData::InstallSuperVersion` -> `rocksdb::ThreadLocalPtr::StaticMeta::Scrape` even without https://github.com/facebook/rocksdb/pull/6169. Differential Revision: D19057445 Pulled By: ltamasi fbshipit-source-id: dff81882d7b280e17eda7d9b072a2d4882c50f79	6 years ago
Maysam Yabandeh	349bd3ed82	CancelAllBackgroundWork before Close in db stress (#6174 ) Summary: Close asserts that there is no unreleased snapshots. For WritePrepared transaction, this means that the background work that holds on a snapshot must be canceled first. Update the stress tests to respect the sequence. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6174 Test Plan: ``` make -j32 crash_test Differential Revision: D19057322 Pulled By: maysamyabandeh fbshipit-source-id: c9e9e24f779bbfb0ab72c2717e34576c01bc6362	6 years ago
Adam Retter	edbf0e2d90	Env should also load the native library (#6167 ) Summary: Closes https://github.com/facebook/rocksdb/issues/6118 Pull Request resolved: https://github.com/facebook/rocksdb/pull/6167 Differential Revision: D19053577 Pulled By: pdillinger fbshipit-source-id: 86aca9a5bec0947a641649b515da17b3cb12bdde	6 years ago
Levi Tamasi	0d2172f128	Make it possible to enable periodic compactions for BlobDB (#6172 ) Summary: Periodic compactions ensure that even SSTs that do not get picked up otherwise eventually go through compaction; used in conjunction with BlobDB's garbage collection, they enable BlobDB to reclaim space when old blob files are used by such straggling SSTs. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6172 Test Plan: Ran `make check` and used the BlobDB mode of `db_bench`. Differential Revision: D19045045 Pulled By: ltamasi fbshipit-source-id: 04636ecc4b6cfe8d495bf656faa65d54a5eb1a93	6 years ago
anand76	afa2420c2b	Introduce a new storage specific Env API (#5761 ) Summary: The current Env API encompasses both storage/file operations, as well as OS related operations. Most of the APIs return a Status, which does not have enough metadata about an error, such as whether its retry-able or not, scope (i.e fault domain) of the error etc., that may be required in order to properly handle a storage error. The file APIs also do not provide enough control over the IO SLA, such as timeout, prioritization, hinting about placement and redundancy etc. This PR separates out the file/storage APIs from Env into a new FileSystem class. The APIs are updated to return an IOStatus with metadata about the error, as well as to take an IOOptions structure as input in order to allow more control over the IO. The user can set both ```options.env``` and ```options.file_system``` to specify that RocksDB should use the former for OS related operations and the latter for storage operations. Internally, a ```CompositeEnvWrapper``` has been introduced that inherits from ```Env``` and redirects individual methods to either an ```Env``` implementation or the ```FileSystem``` as appropriate. When options are sanitized during ```DB::Open```, ```options.env``` is replaced with a newly allocated ```CompositeEnvWrapper``` instance if both env and file_system have been specified. This way, the rest of the RocksDB code can continue to function as before. This PR also ports PosixEnv to the new API by splitting it into two - PosixEnv and PosixFileSystem. PosixEnv is defined as a sub-class of CompositeEnvWrapper, and threading/time functions are overridden with Posix specific implementations in order to avoid an extra level of indirection. The ```CompositeEnvWrapper``` translates ```IOStatus``` return code to ```Status```, and sets the severity to ```kSoftError``` if the io_status is retryable. The error handling code in RocksDB can then recover the DB automatically. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5761 Differential Revision: D18868376 Pulled By: anand1976 fbshipit-source-id: 39efe18a162ea746fabac6360ff529baba48486f	6 years ago
Peter Dillinger	58d46d1915	Add useful idioms to Random API (OneInOpt, PercentTrue) (#6154 ) Summary: And clean up related code, especially in stress test. (More clean up of db_stress_test_base.cc coming after this.) Pull Request resolved: https://github.com/facebook/rocksdb/pull/6154 Test Plan: make check, make blackbox_crash_test for a bit Differential Revision: D18938180 Pulled By: pdillinger fbshipit-source-id: 524d27621b8dbb25f6dff40f1081e7c00630357e	6 years ago
Levi Tamasi	6d54eb3dc2	Do not create/install new SuperVersion if nothing was deleted during memtable trim (#6169 ) Summary: We have observed an increase in CPU load caused by frequent calls to `ColumnFamilyData::InstallSuperVersion` from `DBImpl::TrimMemtableHistory` when using `max_write_buffer_size_to_maintain` to limit the amount of memtable history maintained for transaction conflict checking. As it turns out, this is caused by the code creating and installing a new `SuperVersion` even if no memtables were actually trimmed. The patch adds a check to avoid this. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6169 Test Plan: Compared `perf` output for ``` ./db_bench -benchmarks=randomtransaction -optimistic_transaction_db=1 -statistics -stats_interval_seconds=1 -duration=90 -num=500000 --max_write_buffer_size_to_maintain=16000000 --transaction_set_snapshot=1 --threads=32 ``` before and after the change. With the fix, the call chain `rocksdb::DBImpl::TrimMemtableHistory` -> `rocksdb::ColumnFamilyData::InstallSuperVersion` -> `rocksdb::ThreadLocalPtr::StaticMeta::Scrape` no longer registers in the `perf` report. Differential Revision: D19031509 Pulled By: ltamasi fbshipit-source-id: 02686fce594e5b50eba0710e4b28a9b808c8aa20	6 years ago
Kefu Chai	ac304adf46	cmake: do not build tests for Release build and cleanups (#5916 ) Summary: fixes https://github.com/facebook/rocksdb/issues/2445 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5916 Differential Revision: D19031236 fbshipit-source-id: bc3107b6b25a01958677d7cb411b1f381aae91c6	6 years ago
Maysam Yabandeh	fec7302a9d	Enable unordered_write in stress tests (#6164 ) Summary: With WritePrepared transactions configured with two_write_queues, unordered_write will offer the same guarantees as vanilla rocksdb and thus can be enabled in stress tests. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6164 Test Plan: ``` make -j32 crash_test_with_txn Differential Revision: D18991899 Pulled By: maysamyabandeh fbshipit-source-id: eece5e96b4169b67d7931e5c0afca88540a113e1	6 years ago
Levi Tamasi	583c6953d8	Move out valid blobs from the oldest blob files during compaction (#6121 ) Summary: The patch adds logic that relocates live blobs from the oldest N non-TTL blob files as they are encountered during compaction (assuming the BlobDB configuration option `enable_garbage_collection` is `true`), where N is defined as the number of immutable non-TTL blob files multiplied by the value of a new BlobDB configuration option called `garbage_collection_cutoff`. (The default value of this parameter is 0.25, that is, by default the valid blobs residing in the oldest 25% of immutable non-TTL blob files are relocated.) Pull Request resolved: https://github.com/facebook/rocksdb/pull/6121 Test Plan: Added unit test and tested using the BlobDB mode of `db_bench`. Differential Revision: D18785357 Pulled By: ltamasi fbshipit-source-id: 8c21c512a18fba777ec28765c88682bb1a5e694e	6 years ago
Jermy Li	c2029f9716	Support concurrent CF iteration and drop (#6147 ) Summary: It's easy to cause coredump when closing ColumnFamilyHandle with unreleased iterators, especially iterators release is controlled by java GC when using JNI. This patch fixed concurrent CF iteration and drop, we let iterators(actually SuperVersion) hold a ColumnFamilyData reference to prevent the CF from being released too early. fixed https://github.com/facebook/rocksdb/issues/5982 Pull Request resolved: https://github.com/facebook/rocksdb/pull/6147 Differential Revision: D18926378 fbshipit-source-id: 1dff6d068c603d012b81446812368bfee95a5e15	6 years ago
myasuka	4b74035e40	Correct java docs of RocksDB options (#6123 ) Summary: Correct javadocs of several RocksDB option classes to not mislead RocksJava users. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6123 Differential Revision: D18989044 Pulled By: pdillinger fbshipit-source-id: a5ac6a415e5311084b10d973d354e6925788f01e	6 years ago
奏之章	c4ce8e637f	Fix RangeDeletion bug (#6062 ) Summary: Read keys from a snapshot that a range deletion were added after the snapshot was created and this range deletion was inside an immutable memtable, we will get wrong key set. More detail rest in codes. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6062 Differential Revision: D18966785 Pulled By: pdillinger fbshipit-source-id: 38a60bb1e2d0a1dbfc8ec641617200b6a02b86c3	6 years ago
Connor	a844591201	wait pending memtable writes on file ingestion or compact range (#6113 ) Summary: Summary: This PR fixes two unordered_write related issues: - ingestion job may skip the necessary memtable flush https://github.com/facebook/rocksdb/issues/6026 - compact range may cause memtable is flushed before pending unordered write finished 1. `CompactRange` triggers memtable flush but doesn't wait for pending-writes 2. there are some pending writes but memtable is already flushed 3. the memtable related WAL is removed( note that the pending-writes were recorded in that WAL). 4. pending-writes write to newer created memtable 5. there is a restart 6. missing the previous pending-writes because WAL is removed but they aren't included in SST. How to solve: - Wait pending memtable writes before ingestion job check memtable key range - Wait pending memtable writes before flush memtable. Note that: `CompactRange` calls `RangesOverlapWithMemtables` too without waiting for pending waits, but I'm not sure whether it affects the correctness. Test Plan: make check Pull Request resolved: https://github.com/facebook/rocksdb/pull/6113 Differential Revision: D18895674 Pulled By: maysamyabandeh fbshipit-source-id: da22b4476fc7e06c176020e7cc171eb78189ecaf	6 years ago
sdong	814d4e7ce0	Improve instructions to install formatter (#6162 ) Summary: While the instruction of installing "make format" dependencies works on some platforms, it is hard to use for some others. Improve it a little bit. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6162 Test Plan: Run "make format" on an envrionment missing the dependencies and see the instructions printed out Differential Revision: D18970773 fbshipit-source-id: fd21b31053407cc171a6675f781a556a1c3e8945	6 years ago
Maysam Yabandeh	a796c06fef	Fix build breakage from lock_guard error (#6161 ) Summary: This change fixes a source issue that caused compile time error which breaks build for many fbcode services in that setup. The size() member function of channel is a const member, so member variables accessed within it are implicitly const as well. This caused error when clang fails to resolve to a constructor that takes std::mutex because the suitable constructor got rejected due to loss of constness for its argument. The fix is to add mutable modifier to the lock_ member of channel. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6161 Differential Revision: D18967685 Pulled By: maysamyabandeh fbshipit-source-id: 698b6a5153c3c92eeacb842c467aa28cc350d432	6 years ago
Adam Retter	b433bbefe9	Add missing mutable DBOptions to RocksJava (#6152 ) Summary: As requested in https://github.com/facebook/rocksdb/issues/6127 Pull Request resolved: https://github.com/facebook/rocksdb/pull/6152 Differential Revision: D18955608 Pulled By: pdillinger fbshipit-source-id: 3e1367d944e44d5f1675a422f7dd2451c86feb6f	6 years ago

... 5 6 7 8 9 ...

8907 Commits (eeb3cf3f58385eac17654fcfeaf288e568673db8) All Branches Search

8907 Commits (eeb3cf3f58385eac17654fcfeaf288e568673db8)

All Branches