rocksdb

Commit Graph

Author	SHA1	Message	Date
feilongliu	0b0cb6f1a2	Fix segfalut in ~DBWithTTLImpl() when called after Close() (#5485 ) Summary: ~DBWithTTLImpl() fails after calling Close() function (will invoke the Close() function of DBImpl), because the Close() function deletes default_cf_handle_ which is used in the GetOptions() function called in ~DBWithTTLImpl(), hence lead to segfault. Fix by creating a Close() function for the DBWithTTLImpl class and do the close and the work originally in ~DBWithTTLImpl(). If the Close() function is not called, it will be called in the ~DBWithTTLImpl() function. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5485 Test Plan: make clean; USE_CLANG=1 make all check -j Differential Revision: D15924498 fbshipit-source-id: 567397fb972961059083a1ae0f9f99ff74872b78	6 years ago
Zhongyi Xie	24f73436fb	sanitize and limit block_size under 4GB (#5492 ) Summary: `Block::restart_index_`, `Block::restarts_`, and `Block::current_` are defined as uint32_t but `BlockBasedTableOptions::block_size` is defined as a size_t so user might see corruption as in https://github.com/facebook/rocksdb/issues/5486. This PR adds a check in `BlockBasedTableFactory::SanitizeOptions` to disallow such configurations. yiwu-arbug Pull Request resolved: https://github.com/facebook/rocksdb/pull/5492 Differential Revision: D15914047 Pulled By: miasantreble fbshipit-source-id: c943f153d967e15aee7f2795730ab8259e2be201	6 years ago
Sagar Vemuri	68614a9608	Fix AlignedBuffer's usage in Encryption Env (#5396 ) Summary: The usage of `AlignedBuffer` in env_encryption.cc writes and reads to/from the AlignedBuffer's internal buffer directly without going through AlignedBuffer's APIs (like `Append` and `Read`), causing encapsulation to break in some cases. The writes are especially problematic as after the data is written to the buffer (directly using either memmove or memcpy), the size of the buffer is not updated ... causing the AlignedBuffer to lose track of the encapsulated buffer's current size. Fixed this by updating the buffer size after every write. Todo for later: Add an overloaded method to AlignedBuffer to support a memmove in addition to a memcopy. Encryption env does a memmove, and hence I couldn't switch to using `AlignedBuffer.Append()`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5396 Test Plan: `make check` Differential Revision: D15764756 Pulled By: sagar0 fbshipit-source-id: 2e24b52bd3b4b5056c5c1da157f91ddf89370183	6 years ago
Jurriaan Mous	5830c619d5	Java: Make the generics of the Options interfaces more strict (#5461 ) Summary: Make the generics of the Options interfaces more strict so they are usable in a Kotlin Multiplatform expect/actual typealias implementation without causing a Violation of Finite Bound Restriction. This fix would enable the creation of a generic Kotlin multiplatform library by just typealiasing the JVM implementation to the current Java implementation. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5461 Differential Revision: D15903288 Pulled By: sagar0 fbshipit-source-id: 75e83fdf5d2fcede40744a17e767563d6a4b0696	6 years ago
Vijay Nadimpalli	24b118ad98	Combine the read-ahead logic for user reads and compaction reads (#5431 ) Summary: Currently the read-ahead logic for user reads and compaction reads go through different code paths where compaction reads create new table readers and use `ReadaheadRandomAccessFile`. This change is to unify read-ahead logic to use read-ahead in BlockBasedTableReader::InitDataBlock(). As a result of the change `ReadAheadRandomAccessFile` class and `new_table_reader_for_compaction_inputs` option will no longer be used. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5431 Test Plan: make check Here is the benchmarking - https://gist.github.com/vjnadimpalli/083cf423f7b6aa12dcdb14c858bc18a5 Differential Revision: D15772533 Pulled By: vjnadimpalli fbshipit-source-id: b71dca710590471ede6fb37553388654e2e479b9	6 years ago
Simon Grätzer	fe90ed7a70	Replace Corruption with TryAgain status when new tail is not visible to TransactionLogIterator (#5474 ) Summary: When tailing the WAL with TransactionLogIterator, it used to return Corruption status to indicate that the WAL has new tail that is not visible to the iterator, which is a misleading status. The patch replaces it with TryAgain which is more descriptive of a status, indicating that the user needs to create a new iterator to fetch the recent tail. Fixes https://github.com/facebook/rocksdb/issues/5455 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5474 Differential Revision: D15898953 Pulled By: maysamyabandeh fbshipit-source-id: 40966f6457cb539e1aeb104daeada6b0e46059fc	6 years ago
Levi Tamasi	5355e527d9	Make the 'block read count' performance counters consistent (#5484 ) Summary: The patch brings the semantics of per-block-type read performance context counters in sync with the generic block_read_count by only incrementing the counter if the block was actually read from the file. It also fixes index_block_read_count, which fell victim to the refactoring in PR https://github.com/facebook/rocksdb/issues/5298. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5484 Test Plan: Extended the unit tests. Differential Revision: D15887431 Pulled By: ltamasi fbshipit-source-id: a3889759d0ac5759d56625d692cd828d1b9207a6	6 years ago
haoyuhuang	2e8ad03ab3	Add more stats in the block cache trace analyzer (#5482 ) Summary: This PR adds more stats in the block cache trace analyzer. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5482 Differential Revision: D15883553 Pulled By: HaoyuHuang fbshipit-source-id: 6d440e4f657af75690420102d532d0ee1ed4e9cf	6 years ago
Vaibhav Gogte	f46a2a0375	Export Cache::GetCharge (#5476 ) Summary: Exporting GetCharge to cache.hh Pull Request resolved: https://github.com/facebook/rocksdb/pull/5476 Differential Revision: D15881882 Pulled By: riversand963 fbshipit-source-id: 3d99084d10059b4fcaaaba240606ed50bc23351c	6 years ago
Huisheng Liu	92f631da33	replace sprintf with its safe version snprintf (#5475 ) Summary: sprintf is unsafe and has buffer overrun risk. Replace it with the safer version snprintf where buffer size is supplied to avoid overrun. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5475 Differential Revision: D15879481 Pulled By: sagar0 fbshipit-source-id: 7ae1958ffc9727fa50261dfbb98ddd74e70a72d8	6 years ago
Levi Tamasi	d0c6aea192	Revert to respecting only the read_tier read option for index blocks (#5481 ) Summary: PR https://github.com/facebook/rocksdb/issues/5298 subtly changed how read options are applied to the index block during a Get, MultiGet, or iteration. Earlier, only the read_tier option applied to the index block read; since PR https://github.com/facebook/rocksdb/issues/5298, fill_cache and verify_checksums also have an effect. This patch restores the earlier behavior to prevent surprise memory increases for clients due to the index block not being cached. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5481 Test Plan: make check Differential Revision: D15883082 Pulled By: ltamasi fbshipit-source-id: 9a065ec3a6db5a365cf6dd5e95190a20c5756356	6 years ago
Andrew Kryczka	220870523c	Fix compilation with USE_HDFS (#5444 ) Summary: The changes in `8272a6de57` were untested with `USE_HDFS=1`. There were a couple compiler errors. This PR fixes them. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5444 Test Plan: ``` $ EXTRA_LDFLAGS="-L/tmp/hadoop-3.1.2/lib/native/" EXTRA_CXXFLAGS="-I/tmp/hadoop-3.1.2/include" USE_HDFS=1 make -j12 check ``` Differential Revision: D15885009 fbshipit-source-id: 2a0a63739e0b9a2819b461ad63ce1292c4833fe2	6 years ago
Adam Retter	5dc9fbd117	Update the version of ZStd for the Rocks Java static build Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5228 Differential Revision: D15880451 Pulled By: sagar0 fbshipit-source-id: 84da6f42cac15367d95bffa5336ebd002e7c3308	6 years ago
siddontang	4bd0cf541d	build on ARM64 (#5450 ) Summary: Support building RocksDB on AWS ARM64 ``` uname -m aarch64 ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/5450 Differential Revision: D15879851 fbshipit-source-id: a9b56520a2cd9921338305a06d7103a40a3300b8	6 years ago
Yanqin Jin	f287f8dc93	Fix a bug caused by secondary not skipping the beginning of new MANIFEST (#5472 ) Summary: While the secondary is replaying after the primary, the primary may switch to a new MANIFEST. The secondary is already able to detect and follow the primary to the new MANIFEST. However, the current implementation has a bug, described as follows. The new MANIFEST's first records have been generated by VersionSet::WriteSnapshot to describe the current state of the column families and the db as of the MANIFEST creation. Since the secondary instance has already finished recovering upon start, there is no need for the secondary to process these records. Actually, if the secondary were to replay these records, the secondary may end up adding the same SST files again to each column family, causing consistency checks done by VersionBuilder to fail. Therefore, we record the number of records to skip at the beginning of the new MANIFEST and ignore them. Test plan (on dev server) ``` $make clean && make -j32 all $./db_secondary_test ``` All existing unit tests must pass as well. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5472 Differential Revision: D15866771 Pulled By: riversand963 fbshipit-source-id: a1eec4837fb2ad13059398efb0f437e74fd53bed	6 years ago
Zhongyi Xie	ddd088c8b9	fix rocksdb lite and clang contrun test failures (#5477 ) Summary: recent commit `671d15cbdd` introduced some test failures: ``` ===== Running stats_history_test [==========] Running 9 tests from 1 test case. [----------] Global test environment set-up. [----------] 9 tests from StatsHistoryTest [ RUN ] StatsHistoryTest.RunStatsDumpPeriodSec monitoring/stats_history_test.cc:63: Failure dbfull()->SetDBOptions({{"stats_dump_period_sec", "0"}}) Not implemented: Not supported in ROCKSDB LITE db/db_options_test.cc:28:11: error: unused variable 'kMicrosInSec' [-Werror,-Wunused-const-variable] const int kMicrosInSec = 1000000; ``` This PR fixes these failures Pull Request resolved: https://github.com/facebook/rocksdb/pull/5477 Differential Revision: D15871814 Pulled By: miasantreble fbshipit-source-id: 0a7023914d2c1784d9d2d3f5bfb47310d4855394	6 years ago
haoyuhuang	bcfc53b436	Block cache tracing: Fix minor bugs with downsampling and some benchmark results. (#5473 ) Summary: As the code changes for block cache tracing are almost complete, I did a benchmark to compare the performance when block cache tracing is enabled/disabled. With 1% downsampling ratio, the performance overhead of block cache tracing is negligible. When we trace all block accesses, the throughput drops by 6 folds with 16 threads issuing random reads and all reads are served in block cache. Setup: RocksDB: version 6.2 Date: Mon Jun 17 17:11:13 2019 CPU: 24 * Intel Core Processor (Skylake) CPUCache: 16384 KB Keys: 20 bytes each Values: 100 bytes each (100 bytes after compression) Entries: 10000000 Prefix: 20 bytes Keys per prefix: 0 RawSize: 1144.4 MB (estimated) FileSize: 1144.4 MB (estimated) Write rate: 0 bytes/second Read rate: 0 ops/second Compression: NoCompression Compression sampling rate: 0 Memtablerep: skip_list Perf Level: 1 I ran the readrandom workload for 1 minute. Detailed throughput results: (ops/second) Sample rate 0: no block cache tracing. Sample rate 1: trace all block accesses. Sample rate 100: trace accesses 1% blocks. 1 thread \| \| \| -- \| -- \| -- \| -- Sample rate \| 0 \| 1 \| 100 1 MB block cache size \| 13,094 \| 13,166 \| 13,341 10 GB block cache size \| 202,243 \| 188,677 \| 229,182 16 threads \| \| \| -- \| -- \| -- \| -- Sample rate \| 0 \| 1 \| 100 1 MB block cache size \| 208,761 \| 178,700 \| 201,872 10 GB block cache size \| 2,645,996 \| 426,295 \| 2,587,605 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5473 Differential Revision: D15869479 Pulled By: HaoyuHuang fbshipit-source-id: 7ae802abe84811281a6af8649f489887cd7c4618	6 years ago
haoyuhuang	2d1dd5bce7	Support computing miss ratio curves using sim_cache. (#5449 ) Summary: This PR adds a BlockCacheTraceSimulator that reports the miss ratios given different cache configurations. A cache configuration contains "cache_name,num_shard_bits,cache_capacities". For example, "lru, 1, 1K, 2K, 4M, 4G". When we replay the trace, we also perform lookups and inserts on the simulated caches. In the end, it reports the miss ratio for each tuple <cache_name, num_shard_bits, cache_capacity> in a output file. This PR also adds a main source block_cache_trace_analyzer so that we can run the analyzer in command line. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5449 Test Plan: Added tests for block_cache_trace_analyzer. COMPILE_WITH_ASAN=1 make check -j32. Differential Revision: D15797073 Pulled By: HaoyuHuang fbshipit-source-id: aef0c5c2e7938f3e8b6a10d4a6a50e6928ecf408	6 years ago
Yanqin Jin	7d8d56413d	Override check consistency for DBImplSecondary (#5469 ) Summary: `DBImplSecondary` calls `CheckConsistency()` during open. In the past, `DBImplSecondary` did not override this function thus `DBImpl::CheckConsistency()` is called. The following can happen. The secondary instance is performing consistency check which calls `GetFileSize(file_path)` but the file at `file_path` is deleted by the primary instance. `DBImpl::CheckConsistency` does not account for this and fails the consistency check. This is undesirable. The solution is that, we call `DBImpl::CheckConsistency()` first. If it passes, then we are good. If not, we give it a second chance and handles the case of file(s) being deleted. Test plan (on dev server): ``` $make clean && make -j20 all $./db_secondary_test ``` All other existing unit tests must pass as well. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5469 Differential Revision: D15861845 Pulled By: riversand963 fbshipit-source-id: 507d72392508caed3cd003bb2e2aa43f993dd597	6 years ago
Zhongyi Xie	671d15cbdd	Persistent Stats: persist stats history to disk (#5046 ) Summary: This PR continues the work in https://github.com/facebook/rocksdb/pull/4748 and https://github.com/facebook/rocksdb/pull/4535 by adding a new DBOption `persist_stats_to_disk` which instructs RocksDB to persist stats history to RocksDB itself. When statistics is enabled, and both options `stats_persist_period_sec` and `persist_stats_to_disk` are set, RocksDB will periodically write stats to a built-in column family in the following form: key -> (timestamp in microseconds)#(stats name), value -> stats value. The existing API `GetStatsHistory` will detect the current value of `persist_stats_to_disk` and either read from in-memory data structure or from the hidden column family on disk. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5046 Differential Revision: D15863138 Pulled By: miasantreble fbshipit-source-id: bb82abdb3f2ca581aa42531734ac799f113e931b	6 years ago
Maysam Yabandeh	ee294c24ed	Make db_bloom_filter_test parallel (#5467 ) Summary: When run under TSAN it sometimes goes over 10m and times out. The slowest ones are `DBBloomFilterTestWithParam.BloomFilter` which we have 6 of them. Making the tests run in parallel should take care of the timeout issue. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5467 Differential Revision: D15856912 Pulled By: maysamyabandeh fbshipit-source-id: 26c43c55312974c1b809c070342dee037d0219f4	6 years ago
haoyuhuang	d43b4cd570	Integrate block cache tracing into db_bench (#5459 ) Summary: This PR integrates the block cache tracing into db_bench. It adds three command line arguments. -block_cache_trace_file (Block cache trace file path.) type: string default: "" -block_cache_trace_max_trace_file_size_in_bytes (The maximum block cache trace file size in bytes. Block cache accesses will not be logged if the trace file size exceeds this threshold. Default is 64 GB.) type: int64 default: 68719476736 -block_cache_trace_sampling_frequency (Block cache trace sampling frequency, termed s. It uses spatial downsampling and samples accesses to one out of s blocks.) type: int32 default: 1 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5459 Differential Revision: D15832031 Pulled By: HaoyuHuang fbshipit-source-id: 0ecf2f2686557251fe741a2769b21170777efa3d	6 years ago
Adam Retter	d1ae67bdb9	Switch Travis to Xenial build (#4789 ) Summary: I think this should now also run on Travis's new virtualised infrastructure which affords more memory and CPU. We also need to think about migrating from travis-ci.org to travis-ci.com. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4789 Differential Revision: D15856272 fbshipit-source-id: 10b41d21924e8a362bc9646a63ccd1a5dfc437c6	6 years ago
haoyuhuang	7a8d7358bb	Integrate block cache tracer in block based table reader. (#5441 ) Summary: This PR integrates the block cache tracer into block based table reader. The tracer will write the block cache accesses using the trace_writer. The tracer is null in this PR so that nothing will be logged. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5441 Differential Revision: D15772029 Pulled By: HaoyuHuang fbshipit-source-id: a64adb92642cd23222e0ba8b10d86bf522b42f9b	6 years ago
Sagar Vemuri	f1219644ec	Validate CF Options when creating a new column family (#5453 ) Summary: It seems like CF Options are not properly validated when creating a new column family with `CreateColumnFamily` API; only a selected few checks are done. Calling `ColumnFamilyData::ValidateOptions`, which is the single source for all CFOptions validations, will help fix this. (`ColumnFamilyData::ValidateOptions` is already called at the time of `DB::Open`). Test Plan: Added a new test: `DBTest.CreateColumnFamilyShouldFailOnIncompatibleOptions` ``` TEST_TMPDIR=/dev/shm ./db_test --gtest_filter=DBTest.CreateColumnFamilyShouldFailOnIncompatibleOptions ``` Also ran gtest-parallel to make sure the new test is not flaky. ``` TEST_TMPDIR=/dev/shm ~/gtest-parallel/gtest-parallel ./db_test --gtest_filter=DBTest.CreateColumnFamilyShouldFailOnIncompatibleOptions --repeat=10000 [10000/10000] DBTest.CreateColumnFamilyShouldFailOnIncompatibleOptions (15 ms) ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/5453 Differential Revision: D15816851 Pulled By: sagar0 fbshipit-source-id: 9e702b9850f5c4a7e0ef8d39e1e6f9b81e7fe1e5	6 years ago
Huisheng Liu	b47cfec5d0	fix compilation error on MSVC (#5458 ) Summary: "__attribute__((__weak__))" was introduced in port\jemalloc_helper.h. It's not supported by Microsoft VS 2015, resulting in compile error. This fix adds a #if branch to work around the compile issue. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5458 Differential Revision: D15827285 fbshipit-source-id: 8c5f7ad31de1ac677bd96f16c4450767de834beb	6 years ago
Maysam Yabandeh	58c78358ef	Set executeLocal on child lego jobs (#5456 ) Summary: This property is needed to run the child jobs on the same host and thus propagate the child job status back to the parent's. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5456 Reviewed By: yancouto Differential Revision: D15824382 Pulled By: maysamyabandeh fbshipit-source-id: 42f2efbedaa3a8b399281105f0ce793c1c9a6191	6 years ago
haoyuhuang	89695bfbaa	Remove unused variable (#5457 ) Summary: This PR removes the unused variable that causes CLANG build to fail. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5457 Differential Revision: D15825027 Pulled By: HaoyuHuang fbshipit-source-id: 72c847c39ca310560efcbc5938cffa6f31164068	6 years ago
haoyuhuang	bb4178066d	Integrate block cache tracer into db_impl (#5433 ) Summary: This PR integrates the block cache tracer class into db_impl.cc. db_impl.cc contains a member variable of AtomicBlockCacheTraceWriter class and passes its reference to the block_based_table_reader. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5433 Differential Revision: D15728016 Pulled By: HaoyuHuang fbshipit-source-id: 23d5659e8c82d556833dcc1a5558aac8c1f7db71	6 years ago
Levi Tamasi	a3b8c76d8e	Add missing check before calling PurgeObsoleteFiles in EnableFileDeletions (#5448 ) Summary: Calling PurgeObsoleteFiles with a JobContext for which HaveSomethingToDelete is false is a precondition violation. This would trigger an assertion in debug builds; however, in release builds with assertions disabled, this can result in the pending_purge_obsolete_files_ counter in DBImpl underflowing, which in turn can lead to the process hanging during database close. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5448 Differential Revision: D15792569 Pulled By: ltamasi fbshipit-source-id: 82d92c9b4f6a9efcdc69dbb3d5a52a1ae2dd2472	6 years ago
Andrew Kryczka	2c9df9f9e5	Dynamic test whether sync_file_range returns ENOSYS (#5416 ) Summary: `sync_file_range` returns `ENOSYS` on Windows Subsystem for Linux even when using a supposedly supported filesystem like ext4. To handle this case we can do a dynamic check that a no-op `sync_file_range` invocation, which is accomplished by passing zero for the `flags` argument, succeeds. Also I rearranged the function and comments to hopefully make it more easily understandable. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5416 Differential Revision: D15807061 fbshipit-source-id: d31d94e1f228b7850ea500e6199f8b5daf8cfbd3	6 years ago
Bin Fan	ec8111c5a4	Add Alluxio to USERS.md (#5434 ) Summary: Add Alluxio's use case of RocksDB to `USERS.md` for metadata service Pull Request resolved: https://github.com/facebook/rocksdb/pull/5434 Differential Revision: D15766559 Pulled By: riversand963 fbshipit-source-id: b68ef851f8f92e0925c31e55296260225fdf849e	6 years ago
Patrick Zhang	5c76ba9dc4	Support rocksdbjava aarch64 build and test (#5258 ) Summary: Verified with an Ampere Computing eMAG aarch64 system. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5258 Differential Revision: D15807309 Pulled By: maysamyabandeh fbshipit-source-id: ab85d2fd3fe40e6094430ab0eba557b1e979510d	6 years ago
Maysam Yabandeh	60f3ec2ca5	Fix appveyor compliant about passing const to thread (#5447 ) Summary: CLANG would complain if we pass const to lambda function and appveyor complains if we don't (https://github.com/facebook/rocksdb/pull/5443). The patch fixes that by using the default capture mode. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5447 Differential Revision: D15788722 Pulled By: maysamyabandeh fbshipit-source-id: 47e7f49264afe31fdafe42cb8bf93da126abfca9	6 years ago
Maysam Yabandeh	f9842869cf	Disable pipeline writes in stress test (#5445 ) Summary: The tsan crash tests are failing with a data race compliant with pipelined write option. Temporarily disable it until its concurrency issue are fixed. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5445 Differential Revision: D15783824 Pulled By: maysamyabandeh fbshipit-source-id: 413a0c3230b86f524fc7eeea2cf8e8375406e65b	6 years ago
Maysam Yabandeh	f43edff9ac	Disable kPipelinedWrite in MultiThreaded (#5442 ) Summary: TSAN tests report a race condition. We temporarily exclude kPipelinedWrite from MultiThreaded until the race condition is fixed. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5442 Differential Revision: D15782349 Pulled By: maysamyabandeh fbshipit-source-id: 42b4f9b3fa9137f0675e13ad132c0a06800c1bdd	6 years ago
Maysam Yabandeh	4a285d0dd3	Remove passing const variable to thread (#5443 ) Summary: CLANG complains that passing const to thread is not necessary. The patch removes it form PreparedHeap::Concurrent test. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5443 Differential Revision: D15781598 Pulled By: maysamyabandeh fbshipit-source-id: 3aceb05d96182fa4726d6d37eed45fd3aac4c016	6 years ago
Maysam Yabandeh	773f914a40	WritePrepared: switch PreparedHeap from priority_queue to deque (#5436 ) Summary: Internally PreparedHeap is currently using a priority_queue. The rationale was the in the initial design PreparedHeap::AddPrepared could be called in arbitrary order. With the recent optimizations, we call ::AddPrepared only from the main write queue, which results into in-order insertion into PreparedHeap. The patch thus replaces the underlying priority_queue with a more efficient deque implementation. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5436 Differential Revision: D15752147 Pulled By: maysamyabandeh fbshipit-source-id: e6960f2b2097e13137dded1ceeff3b10b03b0aeb	6 years ago
Manuel Ung	ca1aee2a19	WriteUnprepared: commit only from the 2nd queue (#5439 ) Summary: This is a port of this PR into WriteUnprepared: https://github.com/facebook/rocksdb/pull/5014 This also reverts this test change to restore some flaky write unprepared tests: https://github.com/facebook/rocksdb/pull/5315 Tested with: $ gtest-parallel ./transaction_test --gtest_filter=MySQLStyleTransactionTest/MySQLStyleTransactionTest.TransactionStressTest/9 --repeat=128 [128/128] MySQLStyleTransactionTest/MySQLStyleTransactionTest.TransactionStressTest/9 (18250 ms) Pull Request resolved: https://github.com/facebook/rocksdb/pull/5439 Differential Revision: D15761405 Pulled By: lth fbshipit-source-id: ae2581fd942d8a5b3f9278fd6bc3c1ac0b2c964c	6 years ago
Levi Tamasi	ba64a4cf52	Revert "Reduce iterator key comparison for upper/lower bound check (#5111 )" (#5440 ) Summary: This reverts commit `f3a7847598`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5440 Differential Revision: D15765967 Pulled By: ltamasi fbshipit-source-id: d027fe24132e3729289cd7c01857a7eb449d9dd0	6 years ago
Yanqin Jin	7177dc46a1	Handle missing WAL in secondary mode (#5323 ) Summary: In secondary mode, it is possible that the secondary lists the primary's WAL directory, finds a WAL and tries to open it. It is possible that the primary deletes the WAL after secondary listing dir but before the secondary opening it. Then the secondary will fail to open the WAL file with a PathNotFound status. In this case, we can return OK without replaying WAL and optionally replay more MANIFEST. Test Plan (on my dev machine): Without this PR, the following will fail several times out of 100 runs. ``` ~/gtest-parallel/gtest-parallel -r 100 -w 16 ./db_secondary_test --gtest_filter=DBSecondaryTest.SwitchToNewManifestDuringOpen ``` With this PR, the above should always succeed. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5323 Differential Revision: D15763878 Pulled By: riversand963 fbshipit-source-id: c7164fa7cb8d9001abc258b6a2dc93613e4f38ff	6 years ago
haoyuhuang	9bbccda01e	First commit for block cache trace analyzer (#5425 ) Summary: This PR contains the first commit for block cache trace analyzer. It reads a block cache trace file and prints statistics of the traces. We will extend this class to provide more functionalities. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5425 Differential Revision: D15709580 Pulled By: HaoyuHuang fbshipit-source-id: 2f43bd2311f460ab569880819d95eeae217c20bb	6 years ago
sdong	58c4aee42e	TransactionUtil::CheckKey() to skip unnecessary history (#4941 ) Summary: If a memtable definitely covers a key, there isn't a need to check older memtables. We can skip them by checking the earliest sequence number. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4941 Differential Revision: D13932666 fbshipit-source-id: b9d52f234b8ad9dd3bf6547645cd457175a3ca9b	6 years ago
Levi Tamasi	a94aef6596	Fix DBTest.DynamicMiscOptions so it passes even with Snappy disabled (#5438 ) Summary: This affects our "no compression" automated tests. Since PR #5368, DBTest.DynamicMiscOptions has been failing with: db/db_test.cc:4889: Failure dbfull()->SetOptions({{"compression", "kSnappyCompression"}}) Invalid argument: Compression type Snappy is not linked with the binary. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5438 Differential Revision: D15752100 Pulled By: ltamasi fbshipit-source-id: 3f19eff7cafc03b333965be0203c5853d2a9cb71	6 years ago
Maysam Yabandeh	c8c1a549f0	Avoid deadlock between mutex_ and log_write_mutex_ (#5437 ) Summary: To avoid deadlock mutex_ should never be acquired before log_write_mutex_. The patch documents that and also fixes one case in ::FlushWAL that acquires mutex_ through ::WriteStatusCheck when it already holds lock on log_write_mutex_. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5437 Differential Revision: D15749722 Pulled By: maysamyabandeh fbshipit-source-id: f57b69c44b4b80cc6d7ddf3d3fdf4a9eb5a5a45a	6 years ago
Maysam Yabandeh	b2584577fa	Remove global locks from FlushScheduler (#5372 ) Summary: FlushScheduler's methods are instrumented with debug-time locks to check the scheduler state against a simple container definition. Since https://github.com/facebook/rocksdb/pull/2286 the scope of such locks are widened to the entire methods' body. The result is that the concurrency tested during testing (in debug mode) is stricter than the concurrency level manifested at runtime (in release mode). The patch reverts this change to reduce the scope of such locks. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5372 Differential Revision: D15545831 Pulled By: maysamyabandeh fbshipit-source-id: 01d69191afb1dd807d4bdc990fc74813ae7b5426	6 years ago
Yanqin Jin	641cc8d541	Use CreateLoggerFromOptions function (#5427 ) Summary: Use `CreateLoggerFromOptions` function to reduce code duplication. Test plan (on my machine) ``` $make clean && make -j32 db_secondary_test $KEEP_DB=1 ./db_secondary_test ``` Verify all info logs of the secondary instance are properly logged. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5427 Differential Revision: D15748922 Pulled By: riversand963 fbshipit-source-id: bad7261df1b8373efc504f141efc7871e375a311	6 years ago
haoyuhuang	5efa0d6b0d	Create a BlockCacheLookupContext to enable fine-grained block cache tracing. (#5421 ) Summary: BlockCacheLookupContext only contains the caller for now. We will trace block accesses at five places: 1. BlockBasedTable::GetFilter. 2. BlockBasedTable::GetUncompressedDict. 3. BlockBasedTable::MaybeReadAndLoadToCache. (To trace access on data, index, and range deletion block.) 4. BlockBasedTable::Get. (To trace the referenced key and whether the referenced key exists in a fetched data block.) 5. BlockBasedTable::MultiGet. (To trace the referenced key and whether the referenced key exists in a fetched data block.) We create the context at: 1. BlockBasedTable::Get. (kUserGet) 2. BlockBasedTable::MultiGet. (kUserMGet) 3. BlockBasedTable::NewIterator. (either kUserIterator, kCompaction, or external SST ingestion calls this function.) 4. BlockBasedTable::Open. (kPrefetch) 5. Index/Filter::CacheDependencies. (kPrefetch) 6. BlockBasedTable::ApproximateOffsetOf. (kCompaction or kUserApproximateSize). I loaded 1 million key-value pairs into the database and ran the readrandom benchmark with a single thread. I gave the block cache 10 GB to make sure all reads hit the block cache after warmup. The throughput is comparable. Throughput of this PR: 231334 ops/s. Throughput of the master branch: 238428 ops/s. Experiment setup: RocksDB: version 6.2 Date: Mon Jun 10 10:42:51 2019 CPU: 24 * Intel Core Processor (Skylake) CPUCache: 16384 KB Keys: 20 bytes each Values: 100 bytes each (100 bytes after compression) Entries: 1000000 Prefix: 20 bytes Keys per prefix: 0 RawSize: 114.4 MB (estimated) FileSize: 114.4 MB (estimated) Write rate: 0 bytes/second Read rate: 0 ops/second Compression: NoCompression Compression sampling rate: 0 Memtablerep: skip_list Perf Level: 1 Load command: ./db_bench --benchmarks="fillseq" --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --statistics --cache_index_and_filter_blocks --cache_size=10737418240 --disable_auto_compactions=1 --disable_wal=1 --compression_type=none --min_level_to_compress=-1 --compression_ratio=1 --num=1000000 Run command: ./db_bench --benchmarks="readrandom,stats" --use_existing_db --threads=1 --duration=120 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --statistics --cache_index_and_filter_blocks --cache_size=10737418240 --disable_auto_compactions=1 --disable_wal=1 --compression_type=none --min_level_to_compress=-1 --compression_ratio=1 --num=1000000 --duration=120 TODOs: 1. Create a caller for external SST file ingestion and differentiate the callers for iterator. 2. Integrate tracer to trace block cache accesses. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5421 Differential Revision: D15704258 Pulled By: HaoyuHuang fbshipit-source-id: 4aa8a55f8cb1576ffb367bfa3186a91d8f06d93a	6 years ago
anand76	63ace8ef0e	Reuse data block iterator in BlockBasedTableReader::MultiGet() (#5314 ) Summary: Instead of creating a new DataBlockIterator for every key in a MultiGet batch, reuse it if the next key is in the same block. This results in a small 1-2% cpu improvement. TEST_TMPDIR=/dev/shm/multiget numactl -C 10 ./db_bench.tmp -use_existing_db=true -benchmarks="readseq,multireadrandom" -write_buffer_size=4194304 -target_file_size_base=4194304 -max_bytes_for_level_base=16777216 -num=12000000 -reads=12000000 -duration=90 -threads=1 -compression_type=none -cache_size=4194304000 -batch_size=32 -disable_auto_compactions=true -bloom_bits=10 -cache_index_and_filter_blocks=true -pin_l0_filter_and_index_blocks_in_cache=true -multiread_batched=true -multiread_stride=4 Without the change - multireadrandom : 3.066 micros/op 326122 ops/sec; (29375968 of 29375968 found) With the change - multireadrandom : 3.003 micros/op 332945 ops/sec; (29983968 of 29983968 found) Pull Request resolved: https://github.com/facebook/rocksdb/pull/5314 Differential Revision: D15742108 Pulled By: anand1976 fbshipit-source-id: 220fb0b8eea9a0d602ddeb371528f7af7936d771	6 years ago
Yanqin Jin	6ce5580882	Improve memtable earliest seqno assignment for secondary instance (#5413 ) Summary: In regular RocksDB instance, `MemTable::earliest_seqno_` is "db sequence number at the time of creation". However, we cannot use the db sequence number to set the value of `MemTable::earliest_seqno_` for secondary instance, i.e. `DBImplSecondary` due to the logic of MANIFEST and WAL replay. When replaying the log files of the primary, the secondary instance first replays MANIFEST and updates the db sequence number if necessary. Next, the secondary replays WAL files, creates new memtables if necessary and inserts key-value pairs into memtables. The following can occur when the db has two or more column families. Assume the db has column family "default" and "cf1". At a certain in time, both "default" and "cf1" have data in memtables. 1. Primary triggers a flush and flushes "cf1". "default" is not flushed. 2. Secondary replays the MANIFEST updates its db sequence number to the latest value learned from the MANIFEST. 3. Secondary starts to replay WAL that contains the writes to "default". It is possible that the write batches' sequence numbers are smaller than the db sequence number. In this case, these write batches will be skipped, and these updates will not be visible to reader until "default" is later flushed. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5413 Differential Revision: D15637407 Pulled By: riversand963 fbshipit-source-id: 3de3fe35cfc6f1b9f844f3f926f0df29717b6580	6 years ago

... 56 57 58 59 60 ...

10962 Commits (6d2577e5672a7abe7b41a67f1cccce3a6601b30e) All Branches Search

10962 Commits (6d2577e5672a7abe7b41a67f1cccce3a6601b30e)

All Branches