rocksdb

Commit Graph

Author	SHA1	Message	Date
Andrew Kryczka	dd6b7fc520	Return `Status` from `MemTable` mutation functions (#7656 ) Summary: This PR updates `MemTable::Add()`, `MemTable::Update()`, and `MemTable::UpdateCallback()` to return `Status` objects, and adapts the client code in `MemTableInserter`. The goal is to prepare these functions for key-value checksum, where we want to verify key-value integrity while adding to memtable. After this PR, the memtable mutation functions can report a failed integrity check by returning `Status::Corruption`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7656 Reviewed By: riversand963 Differential Revision: D24900497 Pulled By: ajkr fbshipit-source-id: 1a7e80581e3774676f2bbba2f0a0b04890f40009	5 years ago
Peter Dillinger	0baa5055f1	Add Ribbon schema test to bloom_test (#7696 ) Summary: These new unit tests should ensure that we don't accidentally change the interpretation of bits for what I call Standard128Ribbon filter internally, available publicly as NewExperimentalRibbonFilterPolicy. There is very little intuitive reason for the values we check against in these tests; I just plug in the right expected values upon watching the test fail initially. Most (but not all) of the tests are essentially "whitebox" "round-trip." We create a filter from fixed keys, and first compare the checksum of those filter bytes against a saved value. We also run queries against other fixed keys, comparing which return false positives against a saved set. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7696 Test Plan: test addition and refactoring only Reviewed By: jay-zhuang Differential Revision: D25082289 Pulled By: pdillinger fbshipit-source-id: b5ca646fdcb5a1c2ad2085eda4a1fd44c4287f67	5 years ago
Yanqin Jin	1a5fc4f577	Port corruption test to use custom env (#7699 ) Summary: Allow corruption_test to run on custom env loaded via `Env::LoadEnv()`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7699 Test Plan: ``` make corruption_test ./corruption_test ``` Also run on in-house custom env. Reviewed By: zhichao-cao Differential Revision: D25135525 Pulled By: riversand963 fbshipit-source-id: 7941e7ce342dc88ec2cd63e90f7674a2f57de6b7	5 years ago
anand76	7c19d43883	Fix initialization order of DBOptions and kHostnameForDbHostId (#7702 ) Summary: Fix initialization order of DBOptions and kHostnameForDbHostId by making the initialization of the latter static rather than dynamic. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7702 Reviewed By: ajkr Differential Revision: D25111633 Pulled By: anand1976 fbshipit-source-id: 7afad834a66e40bcd8694a43b40d378695212224	5 years ago
Cheng Chang	5c585e1908	Ship the track WAL in MANIFEST feature (#7689 ) Summary: Updates the option description and HISTORY. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7689 Test Plan: N/A Reviewed By: zhichao-cao Differential Revision: D25056238 Pulled By: cheng-chang fbshipit-source-id: 6af1ef6f8dcf2173cbc0fccadc0e06cefd92bcae	5 years ago
Dylan Wen	a65e905bbb	Fix typos in comments (#7687 ) Summary: Hi there, This PR fixes a few typos in comments in `cache/lru_cache.h`. Thanks Pull Request resolved: https://github.com/facebook/rocksdb/pull/7687 Reviewed By: ajkr Differential Revision: D25064674 Pulled By: jay-zhuang fbshipit-source-id: fe633369d5b82c5aac42d4ee8d551b9d657237d1	5 years ago
Cheng Chang	7169ca9c80	Do not track empty WALs (#7697 ) Summary: An empty WAL won't be backed up by the BackupEngine. So if we track the empty WALs in MANIFEST, then when restoring from a backup, it may report corruption that the empty WAL is missing, which is correct because the WAL is actually in the main DB but not in the backup DB, but missing an empty WAL does not logically break DB consistency. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7697 Test Plan: watch existing tests to pass Reviewed By: pdillinger Differential Revision: D25077194 Pulled By: cheng-chang fbshipit-source-id: 01917b57234b92b6063925f2ee9452c5732bdc03	5 years ago
Cheng Chang	8a97f35619	Call out a bug in HISTORY (#7690 ) Summary: It's worth mentioning the corner case bug fixed in PR 7621. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7690 Test Plan: N/A Reviewed By: zhichao-cao Differential Revision: D25056678 Pulled By: cheng-chang fbshipit-source-id: 1ab42ec080f3ffe21f5d97acf65ee0af993112ba	5 years ago
Akanksha Mahajan	6cacb0d3a1	Add cmake-mingw in circle-build (#7144 ) Summary: Add cmake-mignw in circle-build Pull Request resolved: https://github.com/facebook/rocksdb/pull/7144 Test Plan: watch circle cmake-mingw build Reviewed By: jay-zhuang Differential Revision: D25039744 Pulled By: akankshamahajan15 fbshipit-source-id: 92584c9d5ad161b93d5e5a1303aac306e7985108	5 years ago
Cheng Chang	8c93b16f02	Track WAL in MANIFEST: Update logic for computing min_log_number_to_keep in atomic flush (#7660 ) Summary: The logic for computing min_log_number_to_keep in atomic flush was incorrect. For example, when all column families are flushed, the min_log_number_to_keep should be the latest new log. But the incorrect logic calls `PrecomputeMinLogNumberToKeepNon2PC` for each column family, and returns the minimum of them. However, `PrecomputeMinLogNumberToKeepNon2PC(cf)` assumes column families other than `cf` are flushed, but in case all column families are flushed, this assumption is incorrect. Without this fix, the WAL referenced by the computed min_log_number_to_keep may actually contain no unflushed data, so the WAL might have actually been deleted from disk on recovery, then an incorrect error `Corruption: missing WAL` will be reported. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7660 Test Plan: run `make crash_test_with_atomic_flush` on devserver added a unit test in `db_flush_test` Reviewed By: riversand963 Differential Revision: D24906265 Pulled By: cheng-chang fbshipit-source-id: 08deda62e71f67f59e3b7925cdd86dd09bd4f430	5 years ago
Adam Retter	303d283420	RocksJava static lib dependencies should support MacOS 10.12+ (#7683 ) Summary: Expands on https://github.com/facebook/rocksdb/pull/7016 so that when `PORTABLE=1` is set the dependencies for RocksJava static target will also be built with backwards compatibility for MacOS as far back as 10.12 (i.e. 2016). Pull Request resolved: https://github.com/facebook/rocksdb/pull/7683 Reviewed By: ajkr Differential Revision: D25034164 Pulled By: pdillinger fbshipit-source-id: dc9e51828869ed9ec336a8a86683e4d0bfe04f27	5 years ago
Adam Retter	4c336c6912	Fix jemalloc compliation problem on macOS (#7624 ) Summary: Closes https://github.com/facebook/rocksdb/issues/7269 I have only tested this on macOS, let's see what CI makes of it for the other platforms... Pull Request resolved: https://github.com/facebook/rocksdb/pull/7624 Reviewed By: ajkr Differential Revision: D24834305 Pulled By: pdillinger fbshipit-source-id: ba818d8424297ccebd18ed854b044764c2dbab5f	5 years ago
Cheng Chang	699411b2ca	Fuzzing RocksDB (#7685 ) Summary: This is the initial PR to support adding fuzz tests to RocksDB. It includes the necessary build infrastructure, and includes an example fuzzer. There is also a README serving as the tutorial for how to add more tests. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7685 Test Plan: Manually build and run the fuzz test according to README. Reviewed By: pdillinger Differential Revision: D25013847 Pulled By: cheng-chang fbshipit-source-id: c91e3b337398d7f4d8f769fd5091cd080487b171	5 years ago
Yanqin Jin	84a700819e	Fix the logic of setting read_amp_bytes_per_bit from OPTIONS file (#7680 ) Summary: Instead of using `EncodeFixed32` which always serialize a integer to little endian, we should use the local machine's endianness when populating a native data structure during options parsing. Without this fix, `read_amp_bytes_per_bit` may be populated incorrectly on big-endian machines. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7680 Test Plan: make check Reviewed By: pdillinger Differential Revision: D24999166 Pulled By: riversand963 fbshipit-source-id: dc603cff6e17f8fa32479ce6df93b93082e6b0c4	5 years ago
Yanqin Jin	869f0538dd	Clean up after two test failures in db_basic_test (#7682 ) Summary: In db_basic_test.cc, there are two tests that rely on the underlying system's `LockFile` support to function correctly: DBBasicTest.OpenWhenOpen and DBBasicTest.CheckLock. In both tests, re-opening a db using `DB::Open` is expected to fail because the second open cannot lock the LOCK file. Some distributed file systems, e.g. HDFS do not support the POSIX-style file lock. Therefore, these unit tests will cause assertion failure and the second `Open` will create a db instance. Currently, these db instances are not closed after the assertion failure. Since these db instances are registered with some process-wide, static data structures, e.g. `PeriodicWorkScheduler::Default()`, they can still be accessed after the unit tests. However, the `Env` object created for this db instance is destroyed when the test finishes in `~DBTestBase()`. Consequently, it causes illegal memory access. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7682 Test Plan: Run the following on a distrubited file system: ``` make check ``` Reviewed By: anand1976 Differential Revision: D25004215 Pulled By: riversand963 fbshipit-source-id: f4327d7716c0e72b13bb43737ec9a5d156da4d52	5 years ago
anand76	9627e342c8	Use default FileSystem in GenerateUniqueId (#7672 ) Summary: Use ```FileSystem::Default``` to read ```/proc/sys/kernel/uuid```, so it works for ```Envs``` with remote ```FileSystem``` as well. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7672 Reviewed By: riversand963 Differential Revision: D24998702 Pulled By: anand1976 fbshipit-source-id: fa95c1d70f0e4ed17561201f047aa055046d06c3	5 years ago
Andrew Kryczka	1c5f13f2a5	Fail early when `merge_operator` not configured (#7667 ) Summary: An application may accidentally write merge operands without properly configuring `merge_operator`. We should alert them as early as possible that there's an API misuse. Previously RocksDB only notified them when a query or background operation needed to merge but couldn't. With this PR, RocksDB notifies them of the problem before applying the merge operand to the memtable (although it may already be in WAL, which seems it'd cause a crash loop until they enable `merge_operator`). Pull Request resolved: https://github.com/facebook/rocksdb/pull/7667 Reviewed By: riversand963 Differential Revision: D24933360 Pulled By: ajkr fbshipit-source-id: 3a4a2ceb0b7aed184113dd03b8efd735a8332f7f	5 years ago
jsteemann	7582c5682b	add ArangoDB to USERS.md, and fix typos in that file (#7675 ) Summary: Add ArangoDB to USERS.md. We are using RocksDB since 2016. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7675 Reviewed By: riversand963 Differential Revision: D24998955 Pulled By: ajkr fbshipit-source-id: 82c656bf56589e52aff8c491bab6fbc19b52cc91	5 years ago
Mammo, Mulugeta	1861de455e	Add arena_block_size flag to db_bench (#7654 ) Summary: db_bench currently does not allow overriding the default `arena_block_size `calculation ([memtable size/8](https://github.com/facebook/rocksdb/blob/master/db/column_family.cc#L216)). For memtables whose size is in gigabytes, the `arena_block_size` defaults to hundreds of megabytes (affecting performance). Exposing this option in db_bench would allow us to test the workloads with various `arena_block_size` values. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7654 Reviewed By: jay-zhuang Differential Revision: D24996812 Pulled By: ajkr fbshipit-source-id: a5e3d2c83d9f89e1bb8382f2e8dd476c79e33bef	5 years ago
Ramkumar Vadivelu	5bd1258381	Update release history to 6.15 (#7673 ) Summary: Update release history to 6.15 Pull Request resolved: https://github.com/facebook/rocksdb/pull/7673 Test Plan: No code change Reviewed By: ajkr Differential Revision: D24971069 Pulled By: ramvadiv fbshipit-source-id: 5cb3f5cbc1b19beb580ea8095acdef72cc092905	5 years ago
Cheng Chang	1aae41786a	Do not track WAL in MANIFEST when fsync is disabled in a test (#7669 ) Summary: If fsync is disabled in a unit test, then do not track WAL in MANIFEST, because on DB recovery, the WAL might be missing because the directory is not fsynced. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7669 Test Plan: Tests with fsync enabled should pass. Reviewed By: riversand963 Differential Revision: D24941431 Pulled By: cheng-chang fbshipit-source-id: ab3ff0f90769795cfb4e4d6dcf084ea5545d1975	5 years ago
Yanqin Jin	9aa1b1dc19	Hack to load OPTIONS file for read_amp_bytes_per_bit (#7659 ) Summary: A temporary hack to work around a bug in 6.10, 6.11, 6.12, 6.13 and 6.14. The bug will write out 8 bytes to OPTIONS file from the starting address of BlockBasedTableOptions.read_amp_bytes_per_bit which is actually a uint32. Consequently, the value of read_amp_bytes_per_bit written in the OPTIONS file is wrong. From 6.15, RocksDB will try to parse the read_amp_bytes_per_bit from OPTIONS file as a uint32. To be able to load OPTIONS file generated by affected releases before the fix, we need to manually parse read_amp_bytes_per_bit with this hack. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7659 Test Plan: Generate a db with current 6.14.fb (head at `b6db05dbb5`). Maybe use db_stress. Checkout this PR, run ``` ~/rocksdb/ldb --db=. --try_load_options --ignore_unknown_options idump --count_only ``` Expect success, and should not see ``` Failed: Invalid argument: Error parsing read_amp_bytes_per_bit:17179869184 ``` Also make check Reviewed By: anand1976 Differential Revision: D24954752 Pulled By: riversand963 fbshipit-source-id: c7b802fc3e52acd050a4fc1cd475016122234394	5 years ago
Akanksha Mahajan	e300ce211d	Update option "allow_data_in_errors" in BuildOptions (#7665 ) Summary: "allow_data_in_errors" is not updated in BuildOptions. So it would assume default value when BuildOptions is called. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7665 Test Plan: make check -j64 Reviewed By: zhichao-cao Differential Revision: D24929100 Pulled By: akankshamahajan15 fbshipit-source-id: dd6225a6c9f13b20027ff1b6de8e79801b57b3f7	5 years ago
Peter Dillinger	60af964372	Experimental (production candidate) SST schema for Ribbon filter (#7658 ) Summary: Added experimental public API for Ribbon filter: NewExperimentalRibbonFilterPolicy(). This experimental API will take a "Bloom equivalent" bits per key, and configure the Ribbon filter for the same FP rate as Bloom would have but ~30% space savings. (Note: optimize_filters_for_memory is not yet implemented for Ribbon filter. That can be added with no effect on schema.) Internally, the Ribbon filter is configured using a "one_in_fp_rate" value, which is 1 over desired FP rate. For example, use 100 for 1% FP rate. I'm expecting this will be used in the future for configuring Bloom-like filters, as I expect people to more commonly hold constant the filter accuracy and change the space vs. time trade-off, rather than hold constant the space (per key) and change the accuracy vs. time trade-off, though we might make that available. ### Benchmarking ``` $ ./filter_bench -impl=2 -quick -m_keys_total_max=200 -average_keys_per_filter=100000 -net_includes_hashing Building... Build avg ns/key: 34.1341 Number of filters: 1993 Total size (MB): 238.488 Reported total allocated memory (MB): 262.875 Reported internal fragmentation: 10.2255% Bits/key stored: 10.0029 ---------------------------- Mixed inside/outside queries... Single filter net ns/op: 18.7508 Random filter net ns/op: 258.246 Average FP rate %: 0.968672 ---------------------------- Done. (For more info, run with -legend or -help.) $ ./filter_bench -impl=3 -quick -m_keys_total_max=200 -average_keys_per_filter=100000 -net_includes_hashing Building... Build avg ns/key: 130.851 Number of filters: 1993 Total size (MB): 168.166 Reported total allocated memory (MB): 183.211 Reported internal fragmentation: 8.94626% Bits/key stored: 7.05341 ---------------------------- Mixed inside/outside queries... Single filter net ns/op: 58.4523 Random filter net ns/op: 363.717 Average FP rate %: 0.952978 ---------------------------- Done. (For more info, run with -legend or -help.) ``` 168.166 / 238.488 = 0.705 -> 29.5% space reduction 130.851 / 34.1341 = 3.83x construction time for this Ribbon filter vs. lastest Bloom filter (could make that as little as about 2.5x for less space reduction) ### Working around a hashing "flaw" bloom_test discovered a flaw in the simple hashing applied in StandardHasher when num_starts == 1 (num_slots == 128), showing an excessively high FP rate. The problem is that when many entries, on the order of number of hash bits or kCoeffBits, are associated with the same start location, the correlation between the CoeffRow and ResultRow (for efficiency) can lead to a solution that is "universal," or nearly so, for entries mapping to that start location. (Normally, variance in start location breaks the effective association between CoeffRow and ResultRow; the same value for CoeffRow is effectively different if start locations are different.) Without kUseSmash and with num_starts > 1 (thus num_starts ~= num_slots), this flaw should be completely irrelevant. Even with 10M slots, the chances of a single slot having just 16 (or more) entries map to it--not enough to cause an FP problem, which would be local to that slot if it happened--is 1 in millions. This spreadsheet formula shows that: =1/(10000000(1 - POISSON(15, 1, TRUE))) As kUseSmash==false (the setting for Standard128RibbonBitsBuilder) is intended for CPU efficiency of filters with many more entries/slots than kCoeffBits, a very reasonable work-around is to disallow num_starts==1 when !kUseSmash, by making the minimum non-zero number of slots 2kCoeffBits. This is the work-around I've applied. This also means that the new Ribbon filter schema (Standard128RibbonBitsBuilder) is not space-efficient for less than a few hundred entries. Because of this, I have made it fall back on constructing a Bloom filter, under existing schema, when that is more space efficient for small filters. (We can change this in the future if we want.) TODO: better unit tests for this case in ribbon_test, and probably update StandardHasher for kUseSmash case so that it can scale nicely to small filters. ### Other related changes * Add Ribbon filter to stress/crash test * Add Ribbon filter to filter_bench as -impl=3 * Add option string support, as in "filter_policy=experimental_ribbon:5.678;" where 5.678 is the Bloom equivalent bits per key. * Rename internal mode BloomFilterPolicy::kAuto to kAutoBloom * Add a general BuiltinFilterBitsBuilder::CalculateNumEntry based on binary searching CalculateSpace (inefficient), so that subclasses (especially experimental ones) don't have to provide an efficient implementation inverting CalculateSpace. * Minor refactor FastLocalBloomBitsBuilder for new base class XXH3pFilterBitsBuilder shared with new Standard128RibbonBitsBuilder, which allows the latter to fall back on Bloom construction in some extreme cases. * Mostly updated bloom_test for Ribbon filter, though a test like FullBloomTest::Schema is a next TODO to ensure schema stability (in case this becomes production-ready schema as it is). * Add some APIs to ribbon_impl.h for configuring Ribbon filters. Although these are reasonably covered by bloom_test, TODO more unit tests in ribbon_test * Added a "tool" FindOccupancyForSuccessRate to ribbon_test to get data for constructing the linear approximations in GetNumSlotsFor95PctSuccess. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7658 Test Plan: Some unit tests updated but other testing is left TODO. This is considered experimental but laying down schema compatibility as early as possible in case it proves production-quality. Also tested in stress/crash test. Reviewed By: jay-zhuang Differential Revision: D24899349 Pulled By: pdillinger fbshipit-source-id: 9715f3e6371c959d923aea8077c9423c7a9f82b8	5 years ago
Levi Tamasi	bbbb5a280d	Add options for integrated blob GC (#7661 ) Summary: This patch simply adds a couple of options that will enable users to configure garbage collection when using the integrated BlobDB implementation. The actual GC logic will be added in a separate step. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7661 Test Plan: `make check` Reviewed By: riversand963 Differential Revision: D24906544 Pulled By: ltamasi fbshipit-source-id: ee0e056a712a4b4475cd90de8b27d969bd61b7e1	5 years ago
Yanqin Jin	76ef894f9f	Add full_history_ts_low_ to FlushJob (#7655 ) Summary: https://github.com/facebook/rocksdb/issues/7556 enables `CompactionIterator` to perform garbage collection during compaction according to a lower bound (user-defined) timestamp `full_history_ts_low_`. This PR adds a data member `full_history_ts_low_` of type `std::string` to `FlushJob`, and `full_history_ts_low_` does not change during flush. `FlushJob` will pass a pointer to this data member to the `CompactionIterator` used during flush. Also refactored flush_job_test.cc to re-use some existing code, which is actually the majority of this PR. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7655 Test Plan: make check Reviewed By: ltamasi Differential Revision: D24933340 Pulled By: riversand963 fbshipit-source-id: 2e584bfd0cf6e5c295ab1af264e68e9d6a12fca3	5 years ago
Levi Tamasi	bb69b4ce7f	Fix InternalStats::DumpCFStats (#7666 ) Summary: https://github.com/facebook/rocksdb/pull/7461 accidentally broke `InternalStats::DumpCFStats` by making `DumpCFFileHistogram` overwrite the output of `DumpCFStatsNoFileHistogram` instead of appending to it, resulting in only the file histogram related information getting logged. The patch fixes this by reverting to appending in `DumpCFFileHistogram`. Fixes https://github.com/facebook/rocksdb/issues/7664 . Pull Request resolved: https://github.com/facebook/rocksdb/pull/7666 Test Plan: Ran `make check` and checked the info log of `db_bench`. Reviewed By: riversand963 Differential Revision: D24929051 Pulled By: ltamasi fbshipit-source-id: 636a3d5ebb5ce23de4f3fe4f03ad3f16cb2858f8	5 years ago
Yanqin Jin	cf9d8e45c0	Add full_history_ts_low_ to CompactionJob (#7657 ) Summary: https://github.com/facebook/rocksdb/issues/7556 enables `CompactionIterator` to perform garbage collection during compaction according to a lower bound (user-defined) timestamp `full_history_ts_low_`. This PR adds a data member `full_history_ts_low_` of type `std::string` to `CompactionJob`, and `full_history_ts_low_` does not change during compaction. `CompactionJob` will pass a pointer to this data member to the `CompactionIterator` used during compaction. Also refactored compaction_job_test.cc to re-use some existing code, which is actually the majority of this PR. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7657 Test Plan: make check Reviewed By: ltamasi Differential Revision: D24913803 Pulled By: riversand963 fbshipit-source-id: 11ad5329ddac365667152e7b3b02f84182c0ca8e	5 years ago
Levi Tamasi	0dc437d65c	Clean up CompactionProxy (#7662 ) Summary: `CompactionProxy` is currently both a concrete class used for actual `Compaction`s and a base class that `FakeCompaction` (which is used in `compaction_iterator_test`) is derived from. This is bad from an OO design standpoint, and also results in `FakeCompaction` containing an (uninitialized and unused) `Compaction*` member. The patch fixes this by making `CompactionProxy` a pure interface and introducing a separate concrete class `RealCompaction` for non-test/non-fake compactions. It also removes an unused parameter from the virtual method `level`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7662 Test Plan: `make check` Reviewed By: riversand963 Differential Revision: D24907680 Pulled By: ltamasi fbshipit-source-id: c100ecb1beef4b0ada35e799116c5bda71719ee7	5 years ago
Yanqin Jin	2400cd69e3	Update HISTORY.md for PR6069 (#7663 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/7663 Reviewed By: ajkr Differential Revision: D24913081 Pulled By: riversand963 fbshipit-source-id: 704f427812f2b4f92e16d6cbc93be64d730d1cf9	5 years ago
Andrew Kryczka	ec346da98c	Always apply bottommost_compression_opts when enabled (#7633 ) Summary: Previously, even when `bottommost_compression_opts`'s `enabled` flag was set, it only took effect when `bottommost_compression` was also set to something other than `kDisableCompressionOption`. This wasn't documented and, if we kept the old behavior, it'd make things complicated like the migration instructions in https://github.com/facebook/rocksdb/issues/7619. We can simplify the API by making `bottommost_compression_opts` always take effect when its `enabled` flag is set. Fixes https://github.com/facebook/rocksdb/issues/7631. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7633 Reviewed By: ltamasi Differential Revision: D24710358 Pulled By: ajkr fbshipit-source-id: bbbdf9c1b53c63a4239d902cc3f5a11da1874647	5 years ago
mrambacher	c442f6809f	Create a Customizable class to load classes and configurations (#6590 ) Summary: The Customizable class is an extension of the Configurable class and allows instances to be created by a name/ID. Classes that extend customizable can define their Type (e.g. "TableFactory", "Cache") and a method to instantiate them (TableFactory::CreateFromString). Customizable objects can be registered with the ObjectRegistry and created dynamically. Future PRs will make more types of objects extend Customizable. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6590 Reviewed By: cheng-chang Differential Revision: D24841553 Pulled By: zhichao-cao fbshipit-source-id: d0c2132bd932e971cbfe2c908ca2e5db30c5e155	5 years ago
Yanqin Jin	8b6b6aeb1a	Refactor with VersionEditHandler (#6581 ) Summary: Added a few classes in the same class hierarchy to remove code duplication and refactor the logic of reading and processing MANIFEST files. New classes are as follows. ``` class VersionEditHandlerBase; class ListColumnFamiliesHandler : VersionEditHandlerBase; class FileChecksumRetriever : VersionEditHandlerBase; class DumpManifestHandler : VersionEditHandler; ``` Classes that already existed before this PR are as follows. ``` class VersionEditHandler : VersionEditHandlerBase; ``` With these classes, refactored functions: `VersionSet::Recover()`, `VersionSet::ListColumnFamilies()`, `VersionSet::DumpManifest()`, `GetFileChecksumFromManifest()`. Test Plan (devserver): ``` make check COMPILE_WITH_ASAN=1 make check ``` These refactored code, especially recovery-related logic, will be tested intensively by all existing unit tests and stress tests. For example, run ``` make crash_test ``` Verified 3 successful runs on devserver. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6581 Reviewed By: ajkr Differential Revision: D20616217 Pulled By: riversand963 fbshipit-source-id: 048c7743aa4be2623ccd0cc3e61c0027e604e78b	5 years ago
Peter Dillinger	c57f914482	Use NPHash64 in more places (#7632 ) Summary: Since the hashes should not be persisted in output_validator nor mock_env. Also updated NPHash64 to use 64-bit seed, and comments. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7632 Test Plan: make check, and new build setting that enables modification to NPHash64, to check for behavior depending on specific values. Added that setting to one of the CircleCI configurations. Reviewed By: jay-zhuang Differential Revision: D24833780 Pulled By: pdillinger fbshipit-source-id: 02a57652ccf1ac105fbca79e77875bb7bf7c071f	5 years ago
Yanqin Jin	bcba372352	Report if unpinnable value encountered during backward iteration (#7618 ) Summary: There is an undocumented behavior about a certain combination of options and operations. - inplace_update_support = true, and - call `SeekForPrev()`, `SeekToLast()`, and/or `Prev()` on unflushed data. We should stop the backward iteration and report an error of `Status::NotSupported`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7618 Test Plan: make check Reviewed By: pdillinger Differential Revision: D24769619 Pulled By: riversand963 fbshipit-source-id: 81d199fa55ed4739ab10e719cc345a992238ccbb	5 years ago
Jay Zhuang	18aee7db7e	Fix a seek issue with prefix extractor and timestamp (#7644 ) Summary: During seek, prefix compare should not include timestamp. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7644 Test Plan: added unittest Reviewed By: riversand963 Differential Revision: D24772066 Pulled By: jay-zhuang fbshipit-source-id: 3982655a8bf8da256a738e8497b73b3d9bdac92e	5 years ago
Huisheng Liu	16d103d35b	fix read_amp_bytes_per_bit field size (#7651 ) Summary: The field in BlockBasedTableOptions is 4 bytes: // Default: 0 (disabled) uint32_t read_amp_bytes_per_bit = 0; Pull Request resolved: https://github.com/facebook/rocksdb/pull/7651 Reviewed By: ltamasi Differential Revision: D24844994 Pulled By: riversand963 fbshipit-source-id: e2695e55532256ef8996dd6939cad06987a80293	5 years ago
Akanksha Mahajan	202605143b	Fix crash test to run in DEBUG_LEVEL=0 mode in tmpfs (#7643 ) Summary: crash tests donot run in DEBUG_MODE=0 on tmpfs when use_direct_reads/use_direct_io_for_flush_and_compaction is set randomly because direct I/O is not supported on tmpfs and tests exit. Fix: Sanitize direct I/O read options in DEBUG_LEVEL=0 so that crash tests can run in tmpfs. When mmap_reads is set, direct I/O reads options are unset so we can sanitize direct I/O reads options in case of tmpfs as well. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7643 Test Plan: 1. export DEBUG_LEVEL=0; export TEST_TMPDIR="/dev/shm"; export CRASH_TEST_EXT_ARGS="--use_direct_reads=1 --mmap_read=0"; make crash_test -j64 2. In DEBUG_LEVEL=1 mode: make crash_test -j64 Reviewed By: jay-zhuang Differential Revision: D24766550 Pulled By: akankshamahajan15 fbshipit-source-id: 021720b2343c12c72004f84b26147625d3991d9e	5 years ago
Yanqin Jin	9f1c84ca47	Fix a bug in compaction iterator with timestamp (#7645 ) Summary: https://github.com/facebook/rocksdb/issues/7556 introduced support for compaction iterator to perform timestamp-aware garbage collection. However, there was a bug. The comparison between `ikey_.user_key` and `current_user_key_` should happen before `key_ = current_key_.SetInternalKey(key_, &ikey_);` (line 336 of compaction_iterator.cc). Otherwise, after this line, `current_key_` is always the same as `ikey_.user_key`. This PR also re-arranged the order of some data members because some of them are state variables of `CompactionIterator` while others are inputs from callers. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7645 Test Plan: make check Reviewed By: ltamasi Differential Revision: D24845028 Pulled By: riversand963 fbshipit-source-id: c7e79914832701462b86867e8463cd463b6c0c25	5 years ago
Cheng Chang	c3911f1a72	Track WAL in MANIFEST: Track deleted WALs in MANIFEST after recovering from the WALs (#7649 ) Summary: After replaying the WALs, the memtables are flushed synchronously to L0 instead of being flushed in background. Currently, we only track WAL obsoletion events in the code path of background flush jobs. This PR tracks these events in RecoverLogFiles. After this change, we can enable `track_and_verify_wal_in_manifest` in `db_stress`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7649 Test Plan: `python tools/db_crashtest.py whitebox` Reviewed By: riversand963 Differential Revision: D24824501 Pulled By: cheng-chang fbshipit-source-id: 207129f7b845c50b333680ce6818a68a2fad54b9	5 years ago
Cheng Chang	5e794b0841	Fix a recovery corner case (#7621 ) Summary: Consider the following sequence of events: 1. Db flushed an SST with file number N, appended to MANIFEST, and tried to sync the MANIFEST. 2. Syncing MANIFEST failed and db crashed. 3. Db tried to recover with this MANIFEST. In the meantime, no entry about the newly-flushed SST was found in the MANIFEST. Therefore, RocksDB replayed WAL and tried to flush to an SST file reusing the same file number N. This failed because file system does not support overwrite. Then Db deleted this file. 4. Db crashed again. 5. Db tried to recover. When db read the MANIFEST, there was an entry referencing N.sst. This could happen probably because the append in step 1 finally reached the MANIFEST and became visible. Since N.sst had been deleted in step 3, recovery failed. It is possible that N.sst created in step 1 is valid. Although step 3 would still fail since the MANIFEST was not synced properly in step 1 and 2, deleting N.sst would make it impossible for the db to recover even if the remaining part of MANIFEST was appended and visible after step 5. After this PR, in step 3, immediately after recovering from MANIFEST, a new MANIFEST is created, then we find that N.sst is not referenced in the MANIFEST, so we delete it, and we'll not reuse N as file number. Then in step 5, since the new MANIFEST does not contain N.sst, the recovery failure situation in step 5 won't happen. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7621 Test Plan: 1. some tests are updated, because these tests assume that new MANIFEST is created after WAL recovery. 2. a new unit test is added in db_basic_test to simulate step 3. Reviewed By: riversand963 Differential Revision: D24668144 Pulled By: cheng-chang fbshipit-source-id: 90d7487fbad2bc3714f5ede46ea949895b15ae3b	5 years ago
Peter Dillinger	8b8a2e9f05	Ribbon: major re-work of hashing, seeds, and more (#7635 ) Summary: * Fully optimized StandardHasher, in terms of efficiently generating Start, CoeffRow, and ResultRow from a stock hash value, with sufficient independence between them to have no measurably degraded behavior. (Degraded behavior would be an FP rate higher than explainable by 2^-b and, if using a 32-bit stock hash function, expected stock hash collisions.) Details in code comments. * Our standard 64-bit and 32-bit hash functions do not exhibit sufficient independence on sequential seeds (for one Ribbon construction attempt to have independent probability from the next). I have worked around this in the Ribbon code by "pre-mixing" "ordinal seeds," sequentially tried and appropriate for storage in persisted metadata, into "raw seeds," ready for application and appropriate for in-memory storage. This way the pre-mixing step (though fast) is only applied on loading or configuring the structure, not on each query or banding add. * Fix a subtle flaw in which backtracking not clearing ResultRow data could lead to elevated FP rate on keys that were backtracked on and should (for generality) exhibit the same FP rate as novel keys. * Added a basic test for PhsfQuery and construction algorithms (map or "retrieval structure" rather than set or filter), and made a few trivial related fixes. * Better random configuration generation in unit tests * Some other minor cleanup / clarification / etc. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7635 Test Plan: unit tests included Reviewed By: jay-zhuang Differential Revision: D24738978 Pulled By: pdillinger fbshipit-source-id: f9d03599d9e2ca3e30e9d3e7d81cd936b56f76f0	5 years ago
Cheng Chang	1e40696dd1	Track WAL in MANIFEST: LogAndApply WAL events to MANIFEST (#7601 ) Summary: When a WAL is synced, an edit is written to MANIFEST. After flushing memtables, the obsoleted WALs are piggybacked to MANIFEST while writing the new L0 files to MANIFEST. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7601 Test Plan: `track_and_verify_wals_in_manifest` is enabled by default for all tests extending `DBBasicTest`, and in db_stress_test. Unit test `wal_edit_test`, `version_edit_test`, and `version_set_test` are also updated. Watch all tests to pass. Reviewed By: ltamasi Differential Revision: D24553957 Pulled By: cheng-chang fbshipit-source-id: 66a569ff1bdced38e22900bd240b73113906e040	5 years ago
Cheng Chang	1ce105d0ea	Disable fsync in DBMergeOperatorTest to save test time (#7640 ) Summary: The test often times out in internal test infra. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7640 Test Plan: watch test to pass internally Reviewed By: anand1976 Differential Revision: D24764928 Pulled By: cheng-chang fbshipit-source-id: 587f2afc97f52909837943fd938a86ca94544b2c	5 years ago
Cheng Chang	cdc7ba3a32	DBTablePropertiesTest often times out in internal test infra (#7639 ) Summary: In this test, after flushing memtable, it will read directly from the sst files, so `env_do_fsync` was `true` to ensure that the flushed sst files can be read afterwards. Considering that the test does not last long, the data should be available in os buffer even without fsync, so this PR tries to disable fsync to reduce test time. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7639 Test Plan: watch the test to pass in internal infra Reviewed By: anand1976 Differential Revision: D24764689 Pulled By: cheng-chang fbshipit-source-id: ef827611a3eaca04201e4280ae801d6c8e60c138	5 years ago
Cheng Chang	da42eceabc	Skip fsync in txn tests (#7641 ) Summary: The tests often times out in internal infra, skipping fsync should reduce test time. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7641 Test Plan: watch existing tests to pass Reviewed By: anand1976 Differential Revision: D24765098 Pulled By: cheng-chang fbshipit-source-id: c62bf8110361aee901918d632cf4772435d05e8d	5 years ago
Cheng Chang	4c2aef04bd	ColumnFamilyTest often times out in internal test infra (#7638 ) Summary: Tries to fix by skipping fsync. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7638 Test Plan: watch the tests to pass Reviewed By: jay-zhuang Differential Revision: D24764355 Pulled By: cheng-chang fbshipit-source-id: 9c21b177709025ca1943066d94da89324ed47655	5 years ago
Cheng Chang	81543369e5	Disable fsync in db_range_del_test (#7637 ) Summary: This test often times out in internal test infra. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7637 Test Plan: watch test to pass Reviewed By: ajkr Differential Revision: D24763939 Pulled By: cheng-chang fbshipit-source-id: 6564ee2ef637e9faf6688d4b6a5d74a72a51c5e8	5 years ago
cheng-chang	1f627210ca	Simplify a test case in Java ReadOnlyTest (#7608 ) Summary: The original test nests a lot of `try` blocks. This PR flattens these blocks into independent blocks, so that each `try` block closes the DB before opening the next DB instance. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7608 Test Plan: watch the existing java tests to pass Reviewed By: zhichao-cao Differential Revision: D24611621 Pulled By: cheng-chang fbshipit-source-id: d486c5d37ac25d4b860d739ef2cdd58e6064d42d	5 years ago
Xie Yanbo	c9c9709a1a	Update clang-format-diff.py (#7609 ) Summary: `llvm-mirror/clang` is archived. Get the `clang-format-diff.py` file from the active source. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7609 Reviewed By: ajkr Differential Revision: D24711608 Pulled By: pdillinger fbshipit-source-id: b115d8765ff23fbb8190290a170de21565daba84	5 years ago

... 2 3 4 5 6 ...

9745 Commits (108e6b6354af1687b2e9b66914a843545fb611bf) All Branches Search

9745 Commits (108e6b6354af1687b2e9b66914a843545fb611bf)

All Branches