rocksdb

Commit Graph

Author	SHA1	Message	Date
Yanqin Jin	67e735dbf9	Rename BlockBasedTable::ReadMetaBlock (#6009 ) Summary: According to https://github.com/facebook/rocksdb/wiki/Rocksdb-BlockBasedTable-Format, the block read by BlockBasedTable::ReadMetaBlock is actually the meta index block. Therefore, it is better to rename the function to ReadMetaIndexBlock. This PR also applies some format change to existing code. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6009 Test Plan: make check Differential Revision: D18333238 Pulled By: riversand963 fbshipit-source-id: 2c4340a29b3edba53d19c132cbfd04caf6242aed	6 years ago
Sergei Petrunia	230bcae7b6	Add a limited support for iteration bounds into BaseDeltaIterator (#5403 ) Summary: For MDEV-19670: MyRocks: key lookups into deleted data are very slow BaseDeltaIterator remembers iterate_upper_bound and will not let delta_iterator_ walk above the iterate_upper_bound if base_iterator_ is not valid anymore. == Rationale == The most straightforward way would be to make the delta_iterator (which is a rocksdb::WBWIIterator) to support iterator bounds. But checking for bounds has an extra CPU overhead. So we put the check into BaseDeltaIterator, and only make it when base_iterator_ is not valid. (note: We could take it even further, and move the check a few lines down, and only check iterator bounds ourselves if base_iterator_ is not valid AND delta_iterator_ hit a tombstone). Pull Request resolved: https://github.com/facebook/rocksdb/pull/5403 Differential Revision: D15863092 Pulled By: maysamyabandeh fbshipit-source-id: 8da458e7b9af95ff49356666f69664b4a6ccf49b	6 years ago
Maysam Yabandeh	52733b4498	WritePrepared: Fix flaky test MaxCatchupWithNewSnapshot (#5850 ) Summary: MaxCatchupWithNewSnapshot tests that the snapshot sequence number will be larger than the max sequence number when the snapshot was taken. However since the test does not have access to the max sequence number when the snapshot was taken, it uses max sequence number after that, which could have advanced the snapshot by then, thus making the test flaky. The fix is to compare with max sequence number before the snapshot was taken, which is a lower bound for the value when the snapshot was taken. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5850 Test Plan: ~/gtest-parallel/gtest-parallel --repeat=12800 ./write_prepared_transaction_test --gtest_filter="MaxCatchupWithNewSnapshot" Differential Revision: D17608926 Pulled By: maysamyabandeh fbshipit-source-id: b122ae5a27f982b290bd60da852e28d3c5eb0136	6 years ago
sdong	0d91a981e9	Fix assertion in universal compaction periodic compaction (#6000 ) Summary: We recently added periodic compaction to universal compaction. An old assertion that we can't onlyl compact the last sorted run triggered. However, with periodic compaction, it is possible that we only compact the last sorted run, so the assertion now became stricter than needed. Relaxing this assertion. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6000 Test Plan: This should be a low risk change. Will observe whether stress test will pass after it. Differential Revision: D18285396 fbshipit-source-id: 9a6863debdf104c40a7f6c46ab62d84cdf5d8592	6 years ago
sdong	e4e1d35cc2	Revert "Disable pre-5.5 versions in the format compatibility test (#5990 )" (#5999 ) Summary: This reverts commit `351e25401b`. All branches have been fixed to buildable on FB environments, so we can revert it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5999 Differential Revision: D18281947 fbshipit-source-id: 6deaaf1b5df2349eee5d6ed9b91208cd7e23ec8e	6 years ago
Levi Tamasi	a44670e71b	Use aggregate initialization for FlushJobInfo/CompactionJobInfo (#5997 ) Summary: FlushJobInfo and CompactionJobInfo are aggregates; we should use the aggregate initialization syntax to ensure members (specifically those of built-in types) are value-initialized. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5997 Test Plan: make check Differential Revision: D18273398 Pulled By: ltamasi fbshipit-source-id: 35b1a63ad9ca01605d288329858af72fffd7f392	6 years ago
sdong	5b656584af	crash_test: disable periodic compaction in FIFO compaction. (#5993 ) Summary: A recent commit make periodic compaction option valid in FIFO, which means TTL. But we fail to disable it in crash test, causing assert failure. Fix it by having it disabled. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5993 Test Plan: Restart "make crash_test" many times and make sure --periodic_compaction_seconds=0 is always the case when --compaction_style=2 Differential Revision: D18263223 fbshipit-source-id: c91a802017d83ae89ac43827d1b0012861933814	6 years ago
Peter Dillinger	18f57f5ef8	Add new persistent 64-bit hash (#5984 ) Summary: For upcoming new SST filter implementations, we will use a new 64-bit hash function (XXH3 preview, slightly modified). This change updates hash.{h,cc} for that change, adds unit tests, and out-of-lines the implementations to keep hash.h as clean/small as possible. In developing the unit tests, I discovered that the XXH3 preview always returns zero for the empty string. Zero is problematic for some algorithms (including an upcoming SST filter implementation) if it occurs more often than at the "natural" rate, so it should not be returned from trivial values using trivial seeds. I modified our fork of XXH3 to return a modest hash of the seed for the empty string. With hash function details out-of-lines in hash.h, it makes sense to enable XXH_INLINE_ALL, so that direct calls to XXH64/XXH32/XXH3p are inlined. To fix array-bounds warnings on some inline calls, I injected some casts to uintptr_t in xxhash.cc. (Issue reported to Yann.) Revised: Reverted using XXH_INLINE_ALL for now. Some Facebook checks are unhappy about #include on xxhash.cc file. I would fix that by rename to xxhash_cc.h, but to best preserve history I want to do that in a separate commit (PR) from the uintptr casts. Also updated filter_bench for this change, improving the performance predictability of dry run hashing and adding support for 64-bit hash (for upcoming new SST filter implementations, minor dead code in the tool for now). Pull Request resolved: https://github.com/facebook/rocksdb/pull/5984 Differential Revision: D18246567 Pulled By: pdillinger fbshipit-source-id: 6162fbf6381d63c8cc611dd7ec70e1ddc883fbb8	6 years ago
Peter Dillinger	685e895652	Prepare filter tests for more implementations (#5967 ) Summary: This change sets up for alternate implementations underlying BloomFilterPolicy: * Refactor BloomFilterPolicy and expose in internal .h file so that it's easy to iterate over / select implementations for testing, regardless of what the best public interface will look like. Most notably updated db_bloom_filter_test to use this. * Hide FullFilterBitsBuilder from unit tests (alternate derived classes planned); expose the part important for testing (CalculateSpace), as abstract class BuiltinFilterBitsBuilder. (Also cleaned up internally exposed interface to CalculateSpace.) * Rename BloomTest -> BlockBasedBloomTest for clarity (despite ongoing confusion between block-based table and block-based filter) * Assert that block-based filter construction interface is only used on BloomFilterPolicy appropriately constructed. (A couple of tests updated to add ", true".) Pull Request resolved: https://github.com/facebook/rocksdb/pull/5967 Test Plan: make check Differential Revision: D18138704 Pulled By: pdillinger fbshipit-source-id: 55ef9273423b0696309e251f50b8c1b5e9ec7597	6 years ago
Levi Tamasi	351e25401b	Disable pre-5.5 versions in the format compatibility test (#5990 ) Summary: We have updated earlier release branches going back to 5.5 so they are built using gcc7 by default. Disabling ancient versions before that until we figure out a plan for them. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5990 Test Plan: Ran the script locally. Differential Revision: D18252386 Pulled By: ltamasi fbshipit-source-id: a7bbb30dc52ff2eaaf31a29ecc79f7cf4e2834dc	6 years ago
sdong	aa6f7d0995	Support periodic compaction in universal compaction (#5970 ) Summary: Previously, periodic compaction is not supported in universal compaction. Add the support using following approach: if any file is marked as qualified for periodid compaction, trigger a full compaction. If a full compaction is prevented by files being compacted, try to compact the higher levels than files currently being compacted. If in this way we can only compact the last sorted run and none of the file to be compacted qualifies for periodic compaction, skip the compact. This is to prevent the same single level compaction from being executed again and again. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5970 Test Plan: Add several test cases. Differential Revision: D18147097 fbshipit-source-id: 8ecc308154d9aca96fb192c51fbceba3947550c1	6 years ago
sdong	2a9e5caffe	Make FIFO compaction take default 30 days TTL by default (#5987 ) Summary: Right now, by default FIFO compaction has no TTL. We believe that a default TTL of 30 days will be better. With this patch, the default will be changed to 30 days. Default of Options.periodic_compaction_seconds will mean the same as options.ttl. If Options.ttl and Options.periodic_compaction_seconds left default, a default 30 days TTL will be used. If both options are set, the stricter value of the two will be used. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5987 Test Plan: Add an option sanitize test to cover the case. Differential Revision: D18237935 fbshipit-source-id: a6dcea1f36c3849e13c0a69e413d73ad8eab58c9	6 years ago
Maysam Yabandeh	dccaf9f03c	Turn compaction asserts to runtime check (#5935 ) Summary: Compaction iterator has many assert statements that are active only during test runs. Some rare bugs would show up only at runtime could violate the assert condition but go unnoticed since assert statements are not compiled in release mode. Turning the assert statements to runtime check sone pors and cons: Pros: - A bug that would result into incorrect data would be detected early before the incorrect data is written to the disk. Cons: - Runtime overhead: which should be negligible since compaction cpu is the minority in the overall cpu usage - The assert statements might already being violated at runtime, and turning them to runtime failure might result into reliability issues. The patch takes a conservative step in this direction by logging the assert violations at runtime. If we see any violation reported in logs, we investigate. Otherwise, we can go ahead turning them to runtime error. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5935 Differential Revision: D18229697 Pulled By: maysamyabandeh fbshipit-source-id: f1890eca80ccd7cca29737f1825badb9aa8038a8	6 years ago
sdong	0337d87b42	crash_test: disable atomic flush with pipelined write (#5986 ) Summary: Recently, pipelined write is enabled even if atomic flush is enabled, which causing sanitizing failure in db_stress. Revert this change. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5986 Test Plan: Run "make crash_test_with_atomic_flush" and see it to run for some while so that the old sanitizing error (which showed up quickly) doesn't show up. Differential Revision: D18228278 fbshipit-source-id: 27fdf2f8e3e77068c9725a838b9bef4ab25a2553	6 years ago
sdong	15119f08e2	Add more release branches to tools/check_format_compatible.sh (#5985 ) Summary: More release branches are created. We should include them in continuous format compatibility checks. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5985 Test Plan: Let's see whether it is passes. Differential Revision: D18226532 fbshipit-source-id: 75d8cad5b03ccea4ce16f00cea1f8b7893b0c0c8	6 years ago
sdong	a3960fc875	Move pipeline write waiting logic into WaitForPendingWrites() (#5716 ) Summary: In pipeline writing mode, memtable switching needs to wait for memtable writing to finish to make sure that when memtables are made immutable, inserts are not going to them. This is currently done in DBImpl::SwitchMemtable(). This is done after flush_scheduler_.TakeNextColumnFamily() is called to fetch the list of column families to switch. The function flush_scheduler_.TakeNextColumnFamily() itself, however, is not thread-safe when being called together with flush_scheduler_.ScheduleFlush(). This change provides a fix, which moves the waiting logic before flush_scheduler_.TakeNextColumnFamily(). WaitForPendingWrites() is a natural place where the logic can happen. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5716 Test Plan: Run all tests with ASAN and TSAN. Differential Revision: D18217658 fbshipit-source-id: b9c5e765c9989645bf10afda7c5c726c3f82f6c3	6 years ago
sdong	f22aaf8b3f	db_stress: CF Consistency check to use random CF to validate iterator results (#5983 ) Summary: Right now, in db_stress's iterator tests, we always use the same CF to validate iterator results. This commit changes it so that a randomized CF is used in Cf consistency test, where every CF should have exactly the same data. This would help catch more bugs. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5983 Test Plan: Run "make crash_test_with_atomic_flush". Differential Revision: D18217643 fbshipit-source-id: 3ac998852a0378bb59790b20c5f236f6a5d681fe	6 years ago
Sagar Vemuri	4c9aa30a62	Auto enable Periodic Compactions if a Compaction Filter is used (#5865 ) Summary: - Periodic compactions are auto-enabled if a compaction filter or a compaction filter factory is set, in Level Compaction. - The default value of `periodic_compaction_seconds` is changed to UINT64_MAX, which lets RocksDB auto-tune periodic compactions as needed. An explicit value of 0 will still work as before ie. to disable periodic compactions completely. For now, on seeing a compaction filter along with a UINT64_MAX value for `periodic_compaction_seconds`, RocksDB will make SST files older than 30 days to go through periodic copmactions. Some RocksDB users make use of compaction filters to control when their data can be deleted, usually with a custom TTL logic. But it is occasionally possible that the compactions get delayed by considerable time due to factors like low writes to a key range, data reaching bottom level, etc before the TTL expiry. Periodic Compactions feature was originally built to help such cases. Now periodic compactions are auto enabled by default when compaction filters or compaction filter factories are used, as it is generally helpful to all cases to collect garbage. `periodic_compaction_seconds` is set to a large value, 30 days, in `SanitizeOptions` when RocksDB sees that a `compaction_filter` or `compaction_filter_factory` is used. This is done only for Level Compaction style. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5865 Test Plan: - Added a new test `DBCompactionTest.LevelPeriodicCompactionWithCompactionFilters` to make sure that `periodic_compaction_seconds` is set if either `compaction_filter` or `compaction_filter_factory` options are set. - `COMPILE_WITH_ASAN=1 make check` Differential Revision: D17659180 Pulled By: sagar0 fbshipit-source-id: 4887b9cf2e53cf2dc93a7b658c6b15e1181217ee	6 years ago
Peter Dillinger	26dc29633e	filter_bench not needed for ROCKSDB_LITE (#5978 ) Summary: filter_bench is a specialized micro-benchmarking tool that should not be needed with ROCKSDB_LITE. This should fix the LITE build. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5978 Test Plan: make LITE=1 check Differential Revision: D18177941 Pulled By: pdillinger fbshipit-source-id: b73a171404661e09e018bc99afcf8d4bf1e2949c	6 years ago
Vijay Nadimpalli	79018ba51b	Upgrading version to 6.6.0 on Master (#5965 ) Summary: Upgrading version to 6.6.0 on Master. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5965 Differential Revision: D18119839 Pulled By: vjnadimpalli fbshipit-source-id: 4adbcbb82b108d2f626e88c786453baad8455f4e	6 years ago
Vijay Nadimpalli	1075c376ef	Fix for lite build (#5971 ) Summary: Fix for lite build Pull Request resolved: https://github.com/facebook/rocksdb/pull/5971 Test Plan: make J=1 -j64 LITE=1 all check Differential Revision: D18148306 Pulled By: vjnadimpalli fbshipit-source-id: 5b9a3edc3e73e054fee6b96e6f6e583cecc898f3	6 years ago
Peter Dillinger	3f891c40a0	More improvements to filter_bench (#5968 ) Summary: * Adds support for plain table filter. This is not critical right now, but does add a -impl flag that will be useful for new filter implementations initially targeted at block-based table (and maybe later ported to plain table) * Better mixing of inside vs. outside queries, for more realism * A -best_case option handy for implementation tuning inner loop * Option for whether to include hashing time in dry run / net timings No modifications to production code, just filter_bench. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5968 Differential Revision: D18139872 Pulled By: pdillinger fbshipit-source-id: 5b09eba963111b48f9e0525a706e9921070990e8	6 years ago
Peter Dillinger	b3dc2f3691	Update xxhash.cc to allow combined compilation (#5969 ) Summary: To fix unity_test Pull Request resolved: https://github.com/facebook/rocksdb/pull/5969 Test Plan: make unity_test Differential Revision: D18140426 Pulled By: pdillinger fbshipit-source-id: d5516e6d665f57e3706b9f9b965b0c458e58ccef	6 years ago
Vijay Nadimpalli	ec880436c1	API to get file_creation_time of the oldest file in the DB (#5948 ) Summary: Adding a new API to db.h that allows users to get file_creation_time of the oldest file in the DB. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5948 Test Plan: Added unit test. Differential Revision: D18056151 Pulled By: vjnadimpalli fbshipit-source-id: 448ec9d34cb6772e1e5a62db399ace00dcbfbb5d	6 years ago
Peter Dillinger	013babc685	Clean up some filter tests and comments (#5960 ) Summary: Some filtering tests were unfriendly to new implementations of FilterBitsBuilder because of dynamic_cast to FullFilterBitsBuilder. Most of those have now been cleaned up, worked around, or at least changed from crash on dynamic_cast failure to individual test failure. Also put some clarifying comments on filter-related APIs. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5960 Test Plan: make check Differential Revision: D18121223 Pulled By: pdillinger fbshipit-source-id: e83827d9d5d96315d96f8e25a99cd70f497d802c	6 years ago
Yanqin Jin	2309fd63bf	Update column families' log number altogether after flushing during recovery (#5856 ) Summary: A bug occasionally shows up in crash test, and https://github.com/facebook/rocksdb/issues/5851 reproduces it. The bug can surface in the following way. 1. Database has multiple column families. 2. Between one DB restart, the last log file is corrupted in the middle (not the tail) 3. During restart, DB crashes between flushing between two column families. Then DB will fail to be opened again with error "SST file is ahead of WALs". Solution is to update the log number associated with each column family altogether after flushing all column families' memtables. The version edits should be written to a new MANIFEST. Only after writing to all these version edits succeed does RocksDB (atomically) points the CURRENT file to the new MANIFEST. Test plan (on devserver): ``` $make all && make check ``` Specifically ``` $make db_test2 $./db_test2 --gtest_filter=DBTest2.CrashInRecoveryMultipleCF ``` Also checked for compatibility as follows. Use this branch, run DBTest2.CrashInRecoveryMultipleCF and preserve the db directory. Then checkout 5.4, build ldb, and dump the MANIFEST. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5856 Differential Revision: D17620818 Pulled By: riversand963 fbshipit-source-id: b52ce5969c9a8052cacec2bd805fcfb373589039	6 years ago
Peter Dillinger	ca7ccbe2ea	Misc hashing updates / upgrades (#5909 ) Summary: - Updated our included xxhash implementation to version 0.7.2 (== the latest dev version as of 2019-10-09). - Using XXH_NAMESPACE (like other fb projects) to avoid potential name collisions. - Added fastrange64, and unit tests for it and fastrange32. These are faster alternatives to hash % range. - Use preview version of XXH3 instead of MurmurHash64A for NPHash64 -- Had to update cache_test to increase probability of passing for any given hash function. - Use fastrange64 instead of % with uses of NPHash64 -- Had to fix WritePreparedTransactionTest.CommitOfDelayedPrepared to avoid deadlock apparently caused by new hash collision. - Set default seed for NPHash64 because specifying a seed rarely makes sense for it. - Removed unnecessary include xxhash.h in a popular .h file - Rename preview version of XXH3 to XXH3p for clarity and to ease backward compatibility in case final version of XXH3 is integrated. Relying on existing unit tests for NPHash64-related changes. Each new implementation of fastrange64 passed unit tests when manipulating my local build to select it. I haven't done any integration performance tests, but I consider the improved performance of the pieces being swapped in to be well established. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5909 Differential Revision: D18125196 Pulled By: pdillinger fbshipit-source-id: f6bf83d49d20cbb2549926adf454fd035f0ecc0d	6 years ago
Peter Dillinger	ec11eff3bc	FilterPolicy consolidation, part 2/2 (#5966 ) Summary: The parts that are used to implement FilterPolicy / NewBloomFilterPolicy and not used other than for the block-based table should be consolidated under table/block_based/filter_policy*. This change is step 2 of 2: mv util/bloom.cc table/block_based/filter_policy.cc This gets its own PR so that git has the best chance of following the rename for blame purposes. Note that low-level shared implementation details of Bloom filters remain in util/bloom_impl.h, and util/bloom_test.cc remains where it is for now. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5966 Test Plan: make check Differential Revision: D18124930 Pulled By: pdillinger fbshipit-source-id: 823bc09025b3395f092ef46a46aa5ba92a914d84	6 years ago
Levi Tamasi	f7e7b34ebe	Propagate SST and blob file numbers through the EventListener interface (#5962 ) Summary: This patch adds a number of new information elements to the FlushJobInfo and CompactionJobInfo structures that are passed to EventListeners via the OnFlush{Begin, Completed} and OnCompaction{Begin, Completed} callbacks. Namely, for flushes, the file numbers of the new SST and the oldest blob file it references are propagated. For compactions, the new pieces of information are the file number, level, and the oldest blob file referenced by each compaction input and output file. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5962 Test Plan: Extended the EventListener unit tests with logic that checks that these information elements are correctly propagated from the corresponding FileMetaData. Differential Revision: D18095568 Pulled By: ltamasi fbshipit-source-id: 6874359a6aadb53366b5fe87adcb2f9bd27a0a56	6 years ago
Peter Dillinger	dd19014a7a	FilterPolicy consolidation, part 1/2 (#5963 ) Summary: The parts that are used to implement FilterPolicy / NewBloomFilterPolicy and not used other than for the block-based table should be consolidated under table/block_based/filter_policy*. I don't foresee sharing these APIs with e.g. the Plain Table because they don't expose hashes for reuse in indexing. This change is step 1 of 2: (a) mv table/full_filter_bits_builder.h to table/block_based/filter_policy_internal.h which I expect to expand soon to internally reveal more implementation details for testing. (b) consolidate eventual contents of table/block_based/filter_policy.cc in util/bloom.cc, which has the most elaborate revision history (see step 2 ...) Step 2 soon to follow: mv util/bloom.cc table/block_based/filter_policy.cc This gets its own PR so that git has the best chance of following the rename for blame purposes. Note that low-level shared implementation details of Bloom filters are in util/bloom_impl.h. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5963 Test Plan: make check Differential Revision: D18121199 Pulled By: pdillinger fbshipit-source-id: 8f21732c3d8909777e3240e4ac3123d73140326a	6 years ago
Peter Dillinger	2837008525	Vary key size and alignment in filter_bench (#5933 ) Summary: The first version of filter_bench has selectable key size but that size does not vary throughout a test run. This artificially favors "branchy" hash functions like the existing BloomHash, MurmurHash1, probably because of optimal return for branch prediction. This change primarily varies those key sizes from -2 to +2 bytes vs. the average selected size. We also set the default key size at 24 to better reflect our best guess of typical key size. But steadily random key sizes may not be realistic either. So this change introduces a new filter_bench option: -vary_key_size_log2_interval=n where the same key size is used 2^n times and then changes to another size. I've set the default at 5 (32 times same size) as a compromise between deployments with rather consistent vs. rather variable key sizes. On my Skylake system, the performance boost to MurmurHash1 largely lies between n=10 and n=15. Also added -vary_key_alignment (bool, now default=true), though this doesn't currently seem to matter in hash functions under consideration. This change also does a "dry run" for each testing scenario, to improve the accuracy of those numbers, as there was more difference between scenarios than expected. Subtracting gross test run times from dry run times is now also embedded in the output, because these "net" times are generally the most useful. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5933 Differential Revision: D18121683 Pulled By: pdillinger fbshipit-source-id: 3c7efee1c5661a5fe43de555e786754ddf80dc1e	6 years ago
Dan Lambright	2509531123	Add test showing range tombstones can create excessively large compactions (#5956 ) Summary: For more information on the original problem see this [link](https://github.com/facebook/rocksdb/issues/3977). This change adds two new tests. They are identical other than one uses range tombstones and the other does not. Each test generates sub files at L2 which overlap with keys L3. The test that uses range tombstones generates a single file at L2. This single file will generate a very large range overlap that will in turn create excessively large compaction. 1: T001 - T005 2: 000 - 005 In contrast, the test that uses key ranges generates 3 files at L2. As a single file is compacted at a time, those 3 files will generate less work per compaction iteration. 1: 001 - 002 1: 003 - 004 1: 005 2: 000 - 005 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5956 Differential Revision: D18071631 Pulled By: dlambrig fbshipit-source-id: 12abae75fb3e0b022d228c6371698aa5e53385df	6 years ago
sdong	9f1e5a0b87	CfConsistencyStressTest to validate key consistent across CFs in TestGet() (#5863 ) Summary: Right now in CF consitency stres test's TestGet(), keys are just fetched without validation. With this change, in 1/2 the time, compare all the CFs share the same value with the same key. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5863 Test Plan: Run "make crash_test_with_atomic_flush" and see tests pass. Hack the code to generate some inconsistency and observe the test fails as expected. Differential Revision: D17934206 fbshipit-source-id: 00ba1a130391f28785737b677f80f366fb83cced	6 years ago
Peter Dillinger	6a32e3b562	Remove unused BloomFilterPolicy::hash_func_ (#5961 ) Summary: This is an internal, file-local "feature" that is not used and potentially confusing. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5961 Test Plan: make check Differential Revision: D18099018 Pulled By: pdillinger fbshipit-source-id: 7870627eeed09941d12538ec55d10d2e164fc716	6 years ago
Yanqin Jin	b4ebda7a39	Make buckifier python3 compatible (#5922 ) Summary: Make buckifier/buckify_rocksdb.py run on both Python 3 and 2 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5922 Test Plan: ``` $python3 buckifier/buckify_rocksdb.py $python3 buckifier/buckify_rocksdb.py '{"fake": {"extra_deps": [":test_dep", "//fakes/module:mock1"], "extra_compiler_flags": ["-DROCKSDB_LITE", "-Os"]}}' $python2 buckifier/buckify_rocksdb.py $python2 buckifier/buckify_rocksdb.py '{"fake": {"extra_deps": [":test_dep", "//fakes/module:mock1"], "extra_compiler_flags": ["-DROCKSDB_LITE", "-Os"]}}' ``` Differential Revision: D17920611 Pulled By: riversand963 fbshipit-source-id: cc6e2f36013a88a710d96098f6ca18cbe85e3f62	6 years ago
Zhichao Cao	0933360644	Fix the potential memory leak in trace_replay (#5955 ) Summary: In the previous PR https://github.com/facebook/rocksdb/issues/5934 , in the while loop, if/else if is used without ending with else to free the object referenced by ra, it might cause potential memory leak (warning during compiling). Fix it by changing the last "else if" to "else". Pull Request resolved: https://github.com/facebook/rocksdb/pull/5955 Test Plan: pass make asan check, pass the USE_CLANG=1 TEST_TMPDIR=/dev/shm/rocksdb OPT=-g make -j64 analyze. Differential Revision: D18071612 Pulled By: zhichao-cao fbshipit-source-id: 51c00023d0c97c2921507254329aed55d56e1786	6 years ago
Yanqin Jin	c0abc6bbc1	Use FLAGS_env for certain operations in db_bench (#5943 ) Summary: Since we already parse env_uri from command line and creates custom Env accordingly, we should invoke the methods of such Envs instead of using Env::Default(). Test Plan (on devserver): ``` $make db_bench db_stress $./db_bench -benchmarks=fillseq ./db_stress ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/5943 Differential Revision: D18018550 Pulled By: riversand963 fbshipit-source-id: 03b61329aaae0dfd914a0b902cc677f570f102e3	6 years ago
Yanqin Jin	925250f42f	Include db_stress_tool in rocksdb tools lib (#5950 ) Summary: include db_stress_tool in rocksdb tools lib Test Plan (on devserver): ``` $make db_stress $./db_stress $make all && make check ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/5950 Differential Revision: D18044399 Pulled By: riversand963 fbshipit-source-id: 895585abbbdfd8b954965921dba4b1400b7af1b1	6 years ago
Vijay Nadimpalli	5677f4f775	Using clang for internal ubsan tests (#5952 ) Summary: Using clang for internal ubsan tests. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5952 Differential Revision: D18048810 Pulled By: vjnadimpalli fbshipit-source-id: ae55677a1928397b067e972d0ecb4ac1b7e2c8dc	6 years ago
Peter Dillinger	27a124571f	Fix memory leak on error opening PlainTable (#5951 ) Summary: Several error paths in opening of a plain table would leak memory. PR https://github.com/facebook/rocksdb/issues/5940 opened the leak to one more error path, which happens to have been (mistakenly) exercised by CuckooTableDBTest.AdaptiveTable. That test has been fixed, and the exercising of plain table error cases (more than before) has been added as BadOptions1 and BadOptions2 to PlainTableDBTest. This effectively moved the memory leak to plain_table_db_test. Also here is a cheap fix for the memory leak, without (yet?) changing the signature of ReadTableProperties. This fixes ASAN on unit tests. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5951 Test Plan: make COMPILE_WITH_ASAN=1 check Differential Revision: D18051940 Pulled By: pdillinger fbshipit-source-id: e2952930c09a2b46c4f1ff09818c5090426929de	6 years ago
Zhichao Cao	7245fb5f63	Fix the potential memory leak of ReplayMultiThread (#5949 ) Summary: The pointer ra needs to be freed the status s returns not OK. In the previous PR https://github.com/facebook/rocksdb/issues/5934 , the ra is not freed which might cause potential memory leak. Fix this issue by moving the clarification of ra inside the while loop and freeing it as desired. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5949 Test Plan: pass make asan check. Differential Revision: D18045726 Pulled By: zhichao-cao fbshipit-source-id: d5445b7b832c8bb1dafe008bafea7bfe9eb0b1ce	6 years ago
Vijay Nadimpalli	2ce6aa5f39	Making platform 007 (gcc 7) default in build_detect_platform.sh (#5947 ) Summary: Making platform 007 (gcc 7) default in build_detect_platform.sh. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5947 Differential Revision: D18038837 Pulled By: vjnadimpalli fbshipit-source-id: 9ac2ddaa93bf328a416faec028970e039886378e	6 years ago
sdong	a0cd920026	LevelIterator to avoid gap after prefix bloom filters out a file (#5861 ) Summary: Right now, when LevelIterator::Seek() is called, when a file is filtered out by prefix bloom filter, the position is put to the beginning of the next file. This is a confusing internal interface because many keys in the levels are skipped. Avoid this behavior by checking the key of the next file against the seek key, and invalidate the whole iterator if the prefix doesn't match. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5861 Test Plan: Add a new unit test to validate the behavior; run all exsiting tests; run crash_test Differential Revision: D17918213 fbshipit-source-id: f06b47d937c7cc8919001f18dcc3af5b28c9cdac	6 years ago
sdong	30e2dc02f0	Fix VerifyChecksum readahead with mmap mode (#5945 ) Summary: A recent change introduced readahead inside VerifyChecksum(). However it is not compatible with mmap mode and generated wrong checksum verification failure. Fix it by not enabling readahead in mmap mode. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5945 Test Plan: Add a unit test that used to fail. Differential Revision: D18021443 fbshipit-source-id: 6f2eb600f81b26edb02222563a4006869d576bff	6 years ago
sdong	1a21afa789	Fix some dependency paths (#5946 ) Summary: Some dependency path is not correct so that ASAN cannot run with CLANG. Fix it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5946 Test Plan: Run ASAN with CLANG Differential Revision: D18040933 fbshipit-source-id: 1d82be9d350485cf1df1c792dad765188958641f	6 years ago
Levi Tamasi	29ccf2075c	Store the filter bits reader alongside the filter block contents (#5936 ) Summary: Amongst other things, PR https://github.com/facebook/rocksdb/issues/5504 refactored the filter block readers so that only the filter block contents are stored in the block cache (as opposed to the earlier design where the cache stored the filter block reader itself, leading to potentially dangling pointers and concurrency bugs). However, this change introduced a performance hit since with the new code, the metadata fields are re-parsed upon every access. This patch reunites the block contents with the filter bits reader to eliminate this overhead; since this is still a self-contained pure data object, it is safe to store it in the cache. (Note: this is similar to how the zstd digest is handled.) Pull Request resolved: https://github.com/facebook/rocksdb/pull/5936 Test Plan: make asan_check filter_bench results for the old code: ``` $ ./filter_bench -quick WARNING: Assertions are enabled; benchmarks unnecessarily slow Building... Build avg ns/key: 26.7153 Number of filters: 16669 Total memory (MB): 200.009 Bits/key actual: 10.0647 ---------------------------- Inside queries... Dry run (46b) ns/op: 33.4258 Single filter ns/op: 42.5974 Random filter ns/op: 217.861 ---------------------------- Outside queries... Dry run (25d) ns/op: 32.4217 Single filter ns/op: 50.9855 Random filter ns/op: 219.167 Average FP rate %: 1.13993 ---------------------------- Done. (For more info, run with -legend or -help.) $ ./filter_bench -quick -use_full_block_reader WARNING: Assertions are enabled; benchmarks unnecessarily slow Building... Build avg ns/key: 26.5172 Number of filters: 16669 Total memory (MB): 200.009 Bits/key actual: 10.0647 ---------------------------- Inside queries... Dry run (46b) ns/op: 32.3556 Single filter ns/op: 83.2239 Random filter ns/op: 370.676 ---------------------------- Outside queries... Dry run (25d) ns/op: 32.2265 Single filter ns/op: 93.5651 Random filter ns/op: 408.393 Average FP rate %: 1.13993 ---------------------------- Done. (For more info, run with -legend or -help.) ``` With the new code: ``` $ ./filter_bench -quick WARNING: Assertions are enabled; benchmarks unnecessarily slow Building... Build avg ns/key: 25.4285 Number of filters: 16669 Total memory (MB): 200.009 Bits/key actual: 10.0647 ---------------------------- Inside queries... Dry run (46b) ns/op: 31.0594 Single filter ns/op: 43.8974 Random filter ns/op: 226.075 ---------------------------- Outside queries... Dry run (25d) ns/op: 31.0295 Single filter ns/op: 50.3824 Random filter ns/op: 226.805 Average FP rate %: 1.13993 ---------------------------- Done. (For more info, run with -legend or -help.) $ ./filter_bench -quick -use_full_block_reader WARNING: Assertions are enabled; benchmarks unnecessarily slow Building... Build avg ns/key: 26.5308 Number of filters: 16669 Total memory (MB): 200.009 Bits/key actual: 10.0647 ---------------------------- Inside queries... Dry run (46b) ns/op: 33.2968 Single filter ns/op: 58.6163 Random filter ns/op: 291.434 ---------------------------- Outside queries... Dry run (25d) ns/op: 32.1839 Single filter ns/op: 66.9039 Random filter ns/op: 292.828 Average FP rate %: 1.13993 ---------------------------- Done. (For more info, run with -legend or -help.) ``` Differential Revision: D17991712 Pulled By: ltamasi fbshipit-source-id: 7ea205550217bfaaa1d5158ebd658e5832e60f29	6 years ago
Yanqin Jin	c53db172a1	Fix TestIterate for HashSkipList in db_stress (#5942 ) Summary: Since SeekForPrev (used by Prev) is not supported by HashSkipList when prefix is used, we disable it when stress testing HashSkipList. - Change the default memtablerep to skip list. - Avoid Prev() when memtablerep is HashSkipList and prefix is used. Test Plan (on devserver): ``` $make db_stress $./db_stress -ops_per_thread=10000 -reopen=1 -destroy_db_initially=true -column_families=1 -threads=1 -column_families=1 -memtablerep=prefix_hash $# or simply $./db_stress $./db_stress -memtablerep=prefix_hash ``` Results must print "Verification successful". Pull Request resolved: https://github.com/facebook/rocksdb/pull/5942 Differential Revision: D18017062 Pulled By: riversand963 fbshipit-source-id: af867e59aa9e6f533143c984d7d529febf232fd7	6 years ago
Peter Dillinger	5f8f2fda0e	Refactor / clean up / optimize FullFilterBitsReader (#5941 ) Summary: FullFilterBitsReader, after creating in BloomFilterPolicy, was responsible for decoding metadata bits. This meant that FullFilterBitsReader::MayMatch had some metadata checks in order to implement "always true" or "always false" functionality in the case of inconsistent or trivial metadata. This made for ugly mixing-of-concerns code and probably had some runtime cost. It also didn't really support plugging in alternative filter implementations with extensions to the existing metadata schema. BloomFilterPolicy::GetFilterBitsReader is now (exclusively) responsible for decoding filter metadata bits and constructing appropriate instances deriving from FilterBitsReader. "Always false" and "always true" derived classes allow FullFilterBitsReader not to be concerned with handling of trivial or inconsistent metadata. This also makes for easy expansion to alternative filter implementations in new, alternative derived classes. This change makes calls to FilterBitsReader::MayMatch necessarily virtual because there's now more than one built-in implementation. Compared with the previous implementation's extra 'if' checks in MayMatch, there's no consistent performance difference, measured by (an older revision of) filter_bench (differences here seem to be within noise): Inside queries... - Dry run (407) ns/op: 35.9996 + Dry run (407) ns/op: 35.2034 - Single filter ns/op: 47.5483 + Single filter ns/op: 47.4034 - Batched, prepared ns/op: 43.1559 + Batched, prepared ns/op: 42.2923 ... - Random filter ns/op: 150.697 + Random filter ns/op: 149.403 ---------------------------- Outside queries... - Dry run (980) ns/op: 34.6114 + Dry run (980) ns/op: 34.0405 - Single filter ns/op: 56.8326 + Single filter ns/op: 55.8414 - Batched, prepared ns/op: 48.2346 + Batched, prepared ns/op: 47.5667 - Random filter ns/op: 155.377 + Random filter ns/op: 153.942 Average FP rate %: 1.1386 Also, the FullFilterBitsReader ctor was responsible for a surprising amount of CPU in production, due in part to inefficient determination of the CACHE_LINE_SIZE used to construct the filter being read. The overwhelming common case (same as my CACHE_LINE_SIZE) is now substantially optimized, as shown with filter_bench with -new_reader_every=1 (old option - see below) (repeatable result): Inside queries... - Dry run (453) ns/op: 118.799 + Dry run (453) ns/op: 105.869 - Single filter ns/op: 82.5831 + Single filter ns/op: 74.2509 ... - Random filter ns/op: 224.936 + Random filter ns/op: 194.833 ---------------------------- Outside queries... - Dry run (aa1) ns/op: 118.503 + Dry run (aa1) ns/op: 104.925 - Single filter ns/op: 90.3023 + Single filter ns/op: 83.425 ... - Random filter ns/op: 220.455 + Random filter ns/op: 175.7 Average FP rate %: 1.13886 However PR#5936 has/will reclaim most of this cost. After that PR, the optimization of this code path is likely negligible, but nonetheless it's clear we aren't making performance any worse. Also fixed inadequate check of consistency between filter data size and num_lines. (Unit test updated.) Pull Request resolved: https://github.com/facebook/rocksdb/pull/5941 Test Plan: previously added unit tests FullBloomTest.CorruptFilters and FullBloomTest.RawSchema Differential Revision: D18018353 Pulled By: pdillinger fbshipit-source-id: 8e04c2b4a7d93223f49a237fd52ef2483929ed9c	6 years ago
Peter Dillinger	fe464bca5c	Fix PlainTableReader not to crash sst_dump (#5940 ) Summary: Plain table SSTs could crash sst_dump because of a bug in PlainTableReader that can leave table_properties_ as null. Even if it was intended not to keep the table properties in some cases, they were leaked on the offending code path. Steps to reproduce: $ db_bench --benchmarks=fillrandom --num=2000000 --use_plain_table --prefix-size=12 $ sst_dump --file=0000xx.sst --show_properties from [] to [] Process /dev/shm/dbbench/000014.sst Sst file format: plain table Raw user collected properties ------------------------------ Segmentation fault (core dumped) Also added missing unit testing of plain table full_scan_mode, and an assertion in NewIterator to check for regression. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5940 Test Plan: new unit test, manual, make check Differential Revision: D18018145 Pulled By: pdillinger fbshipit-source-id: 4310c755e824c4cd6f3f86a3abc20dfa417c5e07	6 years ago
Zhichao Cao	526e3b9763	Enable trace_replay with multi-threads (#5934 ) Summary: In the current trace replay, all the queries are serialized and called by single threads. It may not simulate the original application query situations closely. The multi-threads replay is implemented in this PR. Users can set the number of threads to replay the trace. The queries generated according to the trace records are scheduled in the thread pool job queue. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5934 Test Plan: test with make check and real trace replay. Differential Revision: D17998098 Pulled By: zhichao-cao fbshipit-source-id: 87eecf6f7c17a9dc9d7ab29dd2af74f6f60212c8	6 years ago

1 2 3 4 5 ...

8450 Commits (67e735dbf949b9a0f5c36390ce3fa10cdf03c9e4) All Branches Search

8450 Commits (67e735dbf949b9a0f5c36390ce3fa10cdf03c9e4)

All Branches