rocksdb

Commit Graph

Author	SHA1	Message	Date
Peter Dillinger	60af964372	Experimental (production candidate) SST schema for Ribbon filter (#7658 ) Summary: Added experimental public API for Ribbon filter: NewExperimentalRibbonFilterPolicy(). This experimental API will take a "Bloom equivalent" bits per key, and configure the Ribbon filter for the same FP rate as Bloom would have but ~30% space savings. (Note: optimize_filters_for_memory is not yet implemented for Ribbon filter. That can be added with no effect on schema.) Internally, the Ribbon filter is configured using a "one_in_fp_rate" value, which is 1 over desired FP rate. For example, use 100 for 1% FP rate. I'm expecting this will be used in the future for configuring Bloom-like filters, as I expect people to more commonly hold constant the filter accuracy and change the space vs. time trade-off, rather than hold constant the space (per key) and change the accuracy vs. time trade-off, though we might make that available. ### Benchmarking ``` $ ./filter_bench -impl=2 -quick -m_keys_total_max=200 -average_keys_per_filter=100000 -net_includes_hashing Building... Build avg ns/key: 34.1341 Number of filters: 1993 Total size (MB): 238.488 Reported total allocated memory (MB): 262.875 Reported internal fragmentation: 10.2255% Bits/key stored: 10.0029 ---------------------------- Mixed inside/outside queries... Single filter net ns/op: 18.7508 Random filter net ns/op: 258.246 Average FP rate %: 0.968672 ---------------------------- Done. (For more info, run with -legend or -help.) $ ./filter_bench -impl=3 -quick -m_keys_total_max=200 -average_keys_per_filter=100000 -net_includes_hashing Building... Build avg ns/key: 130.851 Number of filters: 1993 Total size (MB): 168.166 Reported total allocated memory (MB): 183.211 Reported internal fragmentation: 8.94626% Bits/key stored: 7.05341 ---------------------------- Mixed inside/outside queries... Single filter net ns/op: 58.4523 Random filter net ns/op: 363.717 Average FP rate %: 0.952978 ---------------------------- Done. (For more info, run with -legend or -help.) ``` 168.166 / 238.488 = 0.705 -> 29.5% space reduction 130.851 / 34.1341 = 3.83x construction time for this Ribbon filter vs. lastest Bloom filter (could make that as little as about 2.5x for less space reduction) ### Working around a hashing "flaw" bloom_test discovered a flaw in the simple hashing applied in StandardHasher when num_starts == 1 (num_slots == 128), showing an excessively high FP rate. The problem is that when many entries, on the order of number of hash bits or kCoeffBits, are associated with the same start location, the correlation between the CoeffRow and ResultRow (for efficiency) can lead to a solution that is "universal," or nearly so, for entries mapping to that start location. (Normally, variance in start location breaks the effective association between CoeffRow and ResultRow; the same value for CoeffRow is effectively different if start locations are different.) Without kUseSmash and with num_starts > 1 (thus num_starts ~= num_slots), this flaw should be completely irrelevant. Even with 10M slots, the chances of a single slot having just 16 (or more) entries map to it--not enough to cause an FP problem, which would be local to that slot if it happened--is 1 in millions. This spreadsheet formula shows that: =1/(10000000(1 - POISSON(15, 1, TRUE))) As kUseSmash==false (the setting for Standard128RibbonBitsBuilder) is intended for CPU efficiency of filters with many more entries/slots than kCoeffBits, a very reasonable work-around is to disallow num_starts==1 when !kUseSmash, by making the minimum non-zero number of slots 2kCoeffBits. This is the work-around I've applied. This also means that the new Ribbon filter schema (Standard128RibbonBitsBuilder) is not space-efficient for less than a few hundred entries. Because of this, I have made it fall back on constructing a Bloom filter, under existing schema, when that is more space efficient for small filters. (We can change this in the future if we want.) TODO: better unit tests for this case in ribbon_test, and probably update StandardHasher for kUseSmash case so that it can scale nicely to small filters. ### Other related changes * Add Ribbon filter to stress/crash test * Add Ribbon filter to filter_bench as -impl=3 * Add option string support, as in "filter_policy=experimental_ribbon:5.678;" where 5.678 is the Bloom equivalent bits per key. * Rename internal mode BloomFilterPolicy::kAuto to kAutoBloom * Add a general BuiltinFilterBitsBuilder::CalculateNumEntry based on binary searching CalculateSpace (inefficient), so that subclasses (especially experimental ones) don't have to provide an efficient implementation inverting CalculateSpace. * Minor refactor FastLocalBloomBitsBuilder for new base class XXH3pFilterBitsBuilder shared with new Standard128RibbonBitsBuilder, which allows the latter to fall back on Bloom construction in some extreme cases. * Mostly updated bloom_test for Ribbon filter, though a test like FullBloomTest::Schema is a next TODO to ensure schema stability (in case this becomes production-ready schema as it is). * Add some APIs to ribbon_impl.h for configuring Ribbon filters. Although these are reasonably covered by bloom_test, TODO more unit tests in ribbon_test * Added a "tool" FindOccupancyForSuccessRate to ribbon_test to get data for constructing the linear approximations in GetNumSlotsFor95PctSuccess. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7658 Test Plan: Some unit tests updated but other testing is left TODO. This is considered experimental but laying down schema compatibility as early as possible in case it proves production-quality. Also tested in stress/crash test. Reviewed By: jay-zhuang Differential Revision: D24899349 Pulled By: pdillinger fbshipit-source-id: 9715f3e6371c959d923aea8077c9423c7a9f82b8	5 years ago
Levi Tamasi	bbbb5a280d	Add options for integrated blob GC (#7661 ) Summary: This patch simply adds a couple of options that will enable users to configure garbage collection when using the integrated BlobDB implementation. The actual GC logic will be added in a separate step. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7661 Test Plan: `make check` Reviewed By: riversand963 Differential Revision: D24906544 Pulled By: ltamasi fbshipit-source-id: ee0e056a712a4b4475cd90de8b27d969bd61b7e1	5 years ago
Yanqin Jin	76ef894f9f	Add full_history_ts_low_ to FlushJob (#7655 ) Summary: https://github.com/facebook/rocksdb/issues/7556 enables `CompactionIterator` to perform garbage collection during compaction according to a lower bound (user-defined) timestamp `full_history_ts_low_`. This PR adds a data member `full_history_ts_low_` of type `std::string` to `FlushJob`, and `full_history_ts_low_` does not change during flush. `FlushJob` will pass a pointer to this data member to the `CompactionIterator` used during flush. Also refactored flush_job_test.cc to re-use some existing code, which is actually the majority of this PR. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7655 Test Plan: make check Reviewed By: ltamasi Differential Revision: D24933340 Pulled By: riversand963 fbshipit-source-id: 2e584bfd0cf6e5c295ab1af264e68e9d6a12fca3	5 years ago
Levi Tamasi	bb69b4ce7f	Fix InternalStats::DumpCFStats (#7666 ) Summary: https://github.com/facebook/rocksdb/pull/7461 accidentally broke `InternalStats::DumpCFStats` by making `DumpCFFileHistogram` overwrite the output of `DumpCFStatsNoFileHistogram` instead of appending to it, resulting in only the file histogram related information getting logged. The patch fixes this by reverting to appending in `DumpCFFileHistogram`. Fixes https://github.com/facebook/rocksdb/issues/7664 . Pull Request resolved: https://github.com/facebook/rocksdb/pull/7666 Test Plan: Ran `make check` and checked the info log of `db_bench`. Reviewed By: riversand963 Differential Revision: D24929051 Pulled By: ltamasi fbshipit-source-id: 636a3d5ebb5ce23de4f3fe4f03ad3f16cb2858f8	5 years ago
Yanqin Jin	cf9d8e45c0	Add full_history_ts_low_ to CompactionJob (#7657 ) Summary: https://github.com/facebook/rocksdb/issues/7556 enables `CompactionIterator` to perform garbage collection during compaction according to a lower bound (user-defined) timestamp `full_history_ts_low_`. This PR adds a data member `full_history_ts_low_` of type `std::string` to `CompactionJob`, and `full_history_ts_low_` does not change during compaction. `CompactionJob` will pass a pointer to this data member to the `CompactionIterator` used during compaction. Also refactored compaction_job_test.cc to re-use some existing code, which is actually the majority of this PR. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7657 Test Plan: make check Reviewed By: ltamasi Differential Revision: D24913803 Pulled By: riversand963 fbshipit-source-id: 11ad5329ddac365667152e7b3b02f84182c0ca8e	5 years ago
Levi Tamasi	0dc437d65c	Clean up CompactionProxy (#7662 ) Summary: `CompactionProxy` is currently both a concrete class used for actual `Compaction`s and a base class that `FakeCompaction` (which is used in `compaction_iterator_test`) is derived from. This is bad from an OO design standpoint, and also results in `FakeCompaction` containing an (uninitialized and unused) `Compaction*` member. The patch fixes this by making `CompactionProxy` a pure interface and introducing a separate concrete class `RealCompaction` for non-test/non-fake compactions. It also removes an unused parameter from the virtual method `level`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7662 Test Plan: `make check` Reviewed By: riversand963 Differential Revision: D24907680 Pulled By: ltamasi fbshipit-source-id: c100ecb1beef4b0ada35e799116c5bda71719ee7	5 years ago
Yanqin Jin	2400cd69e3	Update HISTORY.md for PR6069 (#7663 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/7663 Reviewed By: ajkr Differential Revision: D24913081 Pulled By: riversand963 fbshipit-source-id: 704f427812f2b4f92e16d6cbc93be64d730d1cf9	5 years ago
Andrew Kryczka	ec346da98c	Always apply bottommost_compression_opts when enabled (#7633 ) Summary: Previously, even when `bottommost_compression_opts`'s `enabled` flag was set, it only took effect when `bottommost_compression` was also set to something other than `kDisableCompressionOption`. This wasn't documented and, if we kept the old behavior, it'd make things complicated like the migration instructions in https://github.com/facebook/rocksdb/issues/7619. We can simplify the API by making `bottommost_compression_opts` always take effect when its `enabled` flag is set. Fixes https://github.com/facebook/rocksdb/issues/7631. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7633 Reviewed By: ltamasi Differential Revision: D24710358 Pulled By: ajkr fbshipit-source-id: bbbdf9c1b53c63a4239d902cc3f5a11da1874647	5 years ago
mrambacher	c442f6809f	Create a Customizable class to load classes and configurations (#6590 ) Summary: The Customizable class is an extension of the Configurable class and allows instances to be created by a name/ID. Classes that extend customizable can define their Type (e.g. "TableFactory", "Cache") and a method to instantiate them (TableFactory::CreateFromString). Customizable objects can be registered with the ObjectRegistry and created dynamically. Future PRs will make more types of objects extend Customizable. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6590 Reviewed By: cheng-chang Differential Revision: D24841553 Pulled By: zhichao-cao fbshipit-source-id: d0c2132bd932e971cbfe2c908ca2e5db30c5e155	5 years ago
Yanqin Jin	8b6b6aeb1a	Refactor with VersionEditHandler (#6581 ) Summary: Added a few classes in the same class hierarchy to remove code duplication and refactor the logic of reading and processing MANIFEST files. New classes are as follows. ``` class VersionEditHandlerBase; class ListColumnFamiliesHandler : VersionEditHandlerBase; class FileChecksumRetriever : VersionEditHandlerBase; class DumpManifestHandler : VersionEditHandler; ``` Classes that already existed before this PR are as follows. ``` class VersionEditHandler : VersionEditHandlerBase; ``` With these classes, refactored functions: `VersionSet::Recover()`, `VersionSet::ListColumnFamilies()`, `VersionSet::DumpManifest()`, `GetFileChecksumFromManifest()`. Test Plan (devserver): ``` make check COMPILE_WITH_ASAN=1 make check ``` These refactored code, especially recovery-related logic, will be tested intensively by all existing unit tests and stress tests. For example, run ``` make crash_test ``` Verified 3 successful runs on devserver. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6581 Reviewed By: ajkr Differential Revision: D20616217 Pulled By: riversand963 fbshipit-source-id: 048c7743aa4be2623ccd0cc3e61c0027e604e78b	5 years ago
Peter Dillinger	c57f914482	Use NPHash64 in more places (#7632 ) Summary: Since the hashes should not be persisted in output_validator nor mock_env. Also updated NPHash64 to use 64-bit seed, and comments. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7632 Test Plan: make check, and new build setting that enables modification to NPHash64, to check for behavior depending on specific values. Added that setting to one of the CircleCI configurations. Reviewed By: jay-zhuang Differential Revision: D24833780 Pulled By: pdillinger fbshipit-source-id: 02a57652ccf1ac105fbca79e77875bb7bf7c071f	5 years ago
Yanqin Jin	bcba372352	Report if unpinnable value encountered during backward iteration (#7618 ) Summary: There is an undocumented behavior about a certain combination of options and operations. - inplace_update_support = true, and - call `SeekForPrev()`, `SeekToLast()`, and/or `Prev()` on unflushed data. We should stop the backward iteration and report an error of `Status::NotSupported`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7618 Test Plan: make check Reviewed By: pdillinger Differential Revision: D24769619 Pulled By: riversand963 fbshipit-source-id: 81d199fa55ed4739ab10e719cc345a992238ccbb	5 years ago
Jay Zhuang	18aee7db7e	Fix a seek issue with prefix extractor and timestamp (#7644 ) Summary: During seek, prefix compare should not include timestamp. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7644 Test Plan: added unittest Reviewed By: riversand963 Differential Revision: D24772066 Pulled By: jay-zhuang fbshipit-source-id: 3982655a8bf8da256a738e8497b73b3d9bdac92e	5 years ago
Huisheng Liu	16d103d35b	fix read_amp_bytes_per_bit field size (#7651 ) Summary: The field in BlockBasedTableOptions is 4 bytes: // Default: 0 (disabled) uint32_t read_amp_bytes_per_bit = 0; Pull Request resolved: https://github.com/facebook/rocksdb/pull/7651 Reviewed By: ltamasi Differential Revision: D24844994 Pulled By: riversand963 fbshipit-source-id: e2695e55532256ef8996dd6939cad06987a80293	5 years ago
Akanksha Mahajan	202605143b	Fix crash test to run in DEBUG_LEVEL=0 mode in tmpfs (#7643 ) Summary: crash tests donot run in DEBUG_MODE=0 on tmpfs when use_direct_reads/use_direct_io_for_flush_and_compaction is set randomly because direct I/O is not supported on tmpfs and tests exit. Fix: Sanitize direct I/O read options in DEBUG_LEVEL=0 so that crash tests can run in tmpfs. When mmap_reads is set, direct I/O reads options are unset so we can sanitize direct I/O reads options in case of tmpfs as well. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7643 Test Plan: 1. export DEBUG_LEVEL=0; export TEST_TMPDIR="/dev/shm"; export CRASH_TEST_EXT_ARGS="--use_direct_reads=1 --mmap_read=0"; make crash_test -j64 2. In DEBUG_LEVEL=1 mode: make crash_test -j64 Reviewed By: jay-zhuang Differential Revision: D24766550 Pulled By: akankshamahajan15 fbshipit-source-id: 021720b2343c12c72004f84b26147625d3991d9e	5 years ago
Yanqin Jin	9f1c84ca47	Fix a bug in compaction iterator with timestamp (#7645 ) Summary: https://github.com/facebook/rocksdb/issues/7556 introduced support for compaction iterator to perform timestamp-aware garbage collection. However, there was a bug. The comparison between `ikey_.user_key` and `current_user_key_` should happen before `key_ = current_key_.SetInternalKey(key_, &ikey_);` (line 336 of compaction_iterator.cc). Otherwise, after this line, `current_key_` is always the same as `ikey_.user_key`. This PR also re-arranged the order of some data members because some of them are state variables of `CompactionIterator` while others are inputs from callers. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7645 Test Plan: make check Reviewed By: ltamasi Differential Revision: D24845028 Pulled By: riversand963 fbshipit-source-id: c7e79914832701462b86867e8463cd463b6c0c25	5 years ago
Cheng Chang	c3911f1a72	Track WAL in MANIFEST: Track deleted WALs in MANIFEST after recovering from the WALs (#7649 ) Summary: After replaying the WALs, the memtables are flushed synchronously to L0 instead of being flushed in background. Currently, we only track WAL obsoletion events in the code path of background flush jobs. This PR tracks these events in RecoverLogFiles. After this change, we can enable `track_and_verify_wal_in_manifest` in `db_stress`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7649 Test Plan: `python tools/db_crashtest.py whitebox` Reviewed By: riversand963 Differential Revision: D24824501 Pulled By: cheng-chang fbshipit-source-id: 207129f7b845c50b333680ce6818a68a2fad54b9	5 years ago
Cheng Chang	5e794b0841	Fix a recovery corner case (#7621 ) Summary: Consider the following sequence of events: 1. Db flushed an SST with file number N, appended to MANIFEST, and tried to sync the MANIFEST. 2. Syncing MANIFEST failed and db crashed. 3. Db tried to recover with this MANIFEST. In the meantime, no entry about the newly-flushed SST was found in the MANIFEST. Therefore, RocksDB replayed WAL and tried to flush to an SST file reusing the same file number N. This failed because file system does not support overwrite. Then Db deleted this file. 4. Db crashed again. 5. Db tried to recover. When db read the MANIFEST, there was an entry referencing N.sst. This could happen probably because the append in step 1 finally reached the MANIFEST and became visible. Since N.sst had been deleted in step 3, recovery failed. It is possible that N.sst created in step 1 is valid. Although step 3 would still fail since the MANIFEST was not synced properly in step 1 and 2, deleting N.sst would make it impossible for the db to recover even if the remaining part of MANIFEST was appended and visible after step 5. After this PR, in step 3, immediately after recovering from MANIFEST, a new MANIFEST is created, then we find that N.sst is not referenced in the MANIFEST, so we delete it, and we'll not reuse N as file number. Then in step 5, since the new MANIFEST does not contain N.sst, the recovery failure situation in step 5 won't happen. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7621 Test Plan: 1. some tests are updated, because these tests assume that new MANIFEST is created after WAL recovery. 2. a new unit test is added in db_basic_test to simulate step 3. Reviewed By: riversand963 Differential Revision: D24668144 Pulled By: cheng-chang fbshipit-source-id: 90d7487fbad2bc3714f5ede46ea949895b15ae3b	5 years ago
Peter Dillinger	8b8a2e9f05	Ribbon: major re-work of hashing, seeds, and more (#7635 ) Summary: * Fully optimized StandardHasher, in terms of efficiently generating Start, CoeffRow, and ResultRow from a stock hash value, with sufficient independence between them to have no measurably degraded behavior. (Degraded behavior would be an FP rate higher than explainable by 2^-b and, if using a 32-bit stock hash function, expected stock hash collisions.) Details in code comments. * Our standard 64-bit and 32-bit hash functions do not exhibit sufficient independence on sequential seeds (for one Ribbon construction attempt to have independent probability from the next). I have worked around this in the Ribbon code by "pre-mixing" "ordinal seeds," sequentially tried and appropriate for storage in persisted metadata, into "raw seeds," ready for application and appropriate for in-memory storage. This way the pre-mixing step (though fast) is only applied on loading or configuring the structure, not on each query or banding add. * Fix a subtle flaw in which backtracking not clearing ResultRow data could lead to elevated FP rate on keys that were backtracked on and should (for generality) exhibit the same FP rate as novel keys. * Added a basic test for PhsfQuery and construction algorithms (map or "retrieval structure" rather than set or filter), and made a few trivial related fixes. * Better random configuration generation in unit tests * Some other minor cleanup / clarification / etc. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7635 Test Plan: unit tests included Reviewed By: jay-zhuang Differential Revision: D24738978 Pulled By: pdillinger fbshipit-source-id: f9d03599d9e2ca3e30e9d3e7d81cd936b56f76f0	5 years ago
Cheng Chang	1e40696dd1	Track WAL in MANIFEST: LogAndApply WAL events to MANIFEST (#7601 ) Summary: When a WAL is synced, an edit is written to MANIFEST. After flushing memtables, the obsoleted WALs are piggybacked to MANIFEST while writing the new L0 files to MANIFEST. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7601 Test Plan: `track_and_verify_wals_in_manifest` is enabled by default for all tests extending `DBBasicTest`, and in db_stress_test. Unit test `wal_edit_test`, `version_edit_test`, and `version_set_test` are also updated. Watch all tests to pass. Reviewed By: ltamasi Differential Revision: D24553957 Pulled By: cheng-chang fbshipit-source-id: 66a569ff1bdced38e22900bd240b73113906e040	5 years ago
Cheng Chang	1ce105d0ea	Disable fsync in DBMergeOperatorTest to save test time (#7640 ) Summary: The test often times out in internal test infra. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7640 Test Plan: watch test to pass internally Reviewed By: anand1976 Differential Revision: D24764928 Pulled By: cheng-chang fbshipit-source-id: 587f2afc97f52909837943fd938a86ca94544b2c	5 years ago
Cheng Chang	cdc7ba3a32	DBTablePropertiesTest often times out in internal test infra (#7639 ) Summary: In this test, after flushing memtable, it will read directly from the sst files, so `env_do_fsync` was `true` to ensure that the flushed sst files can be read afterwards. Considering that the test does not last long, the data should be available in os buffer even without fsync, so this PR tries to disable fsync to reduce test time. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7639 Test Plan: watch the test to pass in internal infra Reviewed By: anand1976 Differential Revision: D24764689 Pulled By: cheng-chang fbshipit-source-id: ef827611a3eaca04201e4280ae801d6c8e60c138	5 years ago
Cheng Chang	da42eceabc	Skip fsync in txn tests (#7641 ) Summary: The tests often times out in internal infra, skipping fsync should reduce test time. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7641 Test Plan: watch existing tests to pass Reviewed By: anand1976 Differential Revision: D24765098 Pulled By: cheng-chang fbshipit-source-id: c62bf8110361aee901918d632cf4772435d05e8d	5 years ago
Cheng Chang	4c2aef04bd	ColumnFamilyTest often times out in internal test infra (#7638 ) Summary: Tries to fix by skipping fsync. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7638 Test Plan: watch the tests to pass Reviewed By: jay-zhuang Differential Revision: D24764355 Pulled By: cheng-chang fbshipit-source-id: 9c21b177709025ca1943066d94da89324ed47655	5 years ago
Cheng Chang	81543369e5	Disable fsync in db_range_del_test (#7637 ) Summary: This test often times out in internal test infra. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7637 Test Plan: watch test to pass Reviewed By: ajkr Differential Revision: D24763939 Pulled By: cheng-chang fbshipit-source-id: 6564ee2ef637e9faf6688d4b6a5d74a72a51c5e8	5 years ago
cheng-chang	1f627210ca	Simplify a test case in Java ReadOnlyTest (#7608 ) Summary: The original test nests a lot of `try` blocks. This PR flattens these blocks into independent blocks, so that each `try` block closes the DB before opening the next DB instance. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7608 Test Plan: watch the existing java tests to pass Reviewed By: zhichao-cao Differential Revision: D24611621 Pulled By: cheng-chang fbshipit-source-id: d486c5d37ac25d4b860d739ef2cdd58e6064d42d	5 years ago
Xie Yanbo	c9c9709a1a	Update clang-format-diff.py (#7609 ) Summary: `llvm-mirror/clang` is archived. Get the `clang-format-diff.py` file from the active source. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7609 Reviewed By: ajkr Differential Revision: D24711608 Pulled By: pdillinger fbshipit-source-id: b115d8765ff23fbb8190290a170de21565daba84	5 years ago
Yanqin Jin	b6d8e36741	Compute NeedCompact() after table builder Finish() (#7627 ) Summary: In `BuildTable()`, we call `builder->Finish()` before evaluating `builder->NeedCompact()`. However, we call `builder->NeedCompact()` before `builder->Finish()` in compaction job. This can be wrong because the table properties collectors may rely on the success of `Finish()` to provide correct result for `NeedCompact()`. Test plan (on devserver): make check Pull Request resolved: https://github.com/facebook/rocksdb/pull/7627 Reviewed By: ajkr Differential Revision: D24728741 Pulled By: riversand963 fbshipit-source-id: 5a0dce244e14eb1106c4f87021e6bebca82b486e	5 years ago
Yanqin Jin	fde0cd7ced	Add API to verify whole sst file checksum (#7578 ) Summary: Existing API `VerifyChecksum()` allows application to verify sst files' block checksums. Since whole file, user-specified checksum is tracked in MANIFEST, we can expose a new API to verify sst files' file checksums. ``` // Compute table file checksums if applicable and compare with MANIFEST. // Returns OK if no file has mismatching whole-file checksum. Status DB::VerifyFileChecksums(const ReadOptions& /read_options/); ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/7578 Test Plan: make check Reviewed By: pdillinger Differential Revision: D24436783 Pulled By: riversand963 fbshipit-source-id: 52b51519b842f2b3c4e3351998a97c86cbec85b3	5 years ago
Akanksha Mahajan	06a92fcf5c	Add "max_write_buffer_size_to_maintain" to crash test (#7634 ) Summary: Add "max_write_buffer_size_to_maintain" to crash test Pull Request resolved: https://github.com/facebook/rocksdb/pull/7634 Test Plan: make crash_test -j64 Reviewed By: zhichao-cao Differential Revision: D24710401 Pulled By: akankshamahajan15 fbshipit-source-id: 89e0412aaa56b2ef5a75603971b82f4b0b494ab7	5 years ago
Peter Dillinger	746909ceda	Ribbon: InterleavedSolutionStorage (#7598 ) Summary: The core algorithms for InterleavedSolutionStorage and the implementation SerializableInterleavedSolution make Ribbon fast for filter queries. Example output from new unit test: Simple outside query, hot, incl hashing, ns/key: 117.796 Interleaved outside query, hot, incl hashing, ns/key: 42.2655 Bloom outside query, hot, incl hashing, ns/key: 24.0071 Also includes misc cleanup of previous Ribbon code and comments. Some TODOs and FIXMEs remain for futher work / investigation. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7598 Test Plan: unit tests included (integration work and tests coming later) Reviewed By: jay-zhuang Differential Revision: D24559209 Pulled By: pdillinger fbshipit-source-id: fea483cd354ba782aea3e806f2bc96e183d59441	5 years ago
Yanqin Jin	0b94468bba	Avoid skipping a test in db_wal_test (#7628 ) Summary: Recent test report shows that some tests have been skipped. For DBWALTest that inherits from DBTestBase, the following will always be true, since `env_` is an instance of `SpecialEnv`, not `Env::Default()`. Thus the test will always be skipped. ``` if (options.env != Env::Default()) { ROCKSDB_GTEST_SKIP("Test requires default environment"); return; } ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/7628 Test Plan: ./db_wal_test --gtest_filter=DBWALTest.TruncateLastLogAfterRecoverWithoutFlush MEM_ENV=1 ./db_wal_test --gtest_filter=DBWALTest.TruncateLastLogAfterRecoverWithoutFlush make check Reviewed By: jay-zhuang Differential Revision: D24693006 Pulled By: riversand963 fbshipit-source-id: 7f2a772492a0f11bff17bbf5e9f493e9e9a1c125	5 years ago
Jay Zhuang	881e0dcc09	Fix MultiGet unable to query timestamp data issue (#7589 ) Summary: The filter query key should not contain timestamp. The timestamp is stripped for Get(), but not MultiGet(). Pull Request resolved: https://github.com/facebook/rocksdb/pull/7589 Reviewed By: riversand963 Differential Revision: D24494661 Pulled By: jay-zhuang fbshipit-source-id: fc5ff40f9d683a89a760c6ff0ab3aed05a70c317	5 years ago
Yanqin Jin	c992eb118b	Avoid skipping a test in db_test2 (#7629 ) Summary: Test report shows that this test has been skipped recently due to a condition that will never meet. `env_` is not equal to `Env::Default()` for DBTest2 that inherits from DBTestBase. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7629 Test Plan: make check ./db_test2 --gtest_filter=DBTest2.PinnableSliceAndMmapReads Reviewed By: jay-zhuang Differential Revision: D24693317 Pulled By: riversand963 fbshipit-source-id: b1bbd5c1e05a6fa57c1de0d74462b69e3c2d5215	5 years ago
Andrew Kryczka	1adbceb581	Expand effect of dictionary settings in `ColumnFamilyOptions::compression_opts` (#7619 ) Summary: In dictionary compression's initial implementation, in order to save CPU overhead, we only enabled it for bottom level under the assumption that the vast majority of data is stored there. At that time, there was no such thing as `ColumnFamilyOptions::bottommost_compression_opts`, so we just hardcoded disabling dictionary compression in flush and compactions to non-bottommost level. Now, we have users who generate all their files through flush and are considering using dictionary compression. To support such a use case, this PR expands the scope of `ColumnFamilyOptions::compression_opts` to additionally include flushed files and files generated by compaction to a non-bottommost level. Users can still get the old behavior by moving their dictionary settings to `ColumnFamilyOptions::bottommost_compression_opts` and explicitly enabling both that and `ColumnFamilyOptions::bottommost_compression`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7619 Reviewed By: ltamasi Differential Revision: D24665610 Pulled By: ajkr fbshipit-source-id: 656b90bce1033fe21c71e09af931ef5bde3e464c	5 years ago
Andrew Kryczka	a388c8cc6b	Add recent fixes to HISTORY.md (#7617 ) Summary: The recently reverted behavior changes were released to at least one place internally, so we should mention the reverts in release notes. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7617 Reviewed By: akankshamahajan15 Differential Revision: D24654343 Pulled By: ajkr fbshipit-source-id: eb64b2797d8508cd95a2dc2698122c1be29ce817	5 years ago
mrambacher	30beecef8c	Return NotFound from TableFactory configuration errors during options loading (#7615 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/7615 Reviewed By: riversand963 Differential Revision: D24637054 Pulled By: ajkr fbshipit-source-id: 7da20d44289eaa2387af4edf8c3c48057425cc1c	5 years ago
Akanksha Mahajan	6773901f76	Add 6.14 branch to check_format_compatible.sh (#7613 ) Summary: Add 6.14 to check_format_compatible.sh Pull Request resolved: https://github.com/facebook/rocksdb/pull/7613 Test Plan: ./tools/check_format_compatible.sh Reviewed By: riversand963 Differential Revision: D24628535 Pulled By: akankshamahajan15 fbshipit-source-id: a8bf1d5505a1fcc8a5bedc5ff4fdf33a22c3f2e6	5 years ago
mrambacher	7eb2824e3f	Revert LoadLatestOptions handling of ignore_unknown_options if versions differ (#7612 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/7612 Reviewed By: zhichao-cao Differential Revision: D24627054 Pulled By: riversand963 fbshipit-source-id: 451b4da742e3e84c7442bc7cc4959d39089b89d0	5 years ago
Yanqin Jin	394210f280	Remove unused includes (#7604 ) Summary: This is a PR generated semi-automatically by an internal tool to remove unused includes and `using` statements. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7604 Test Plan: make check Reviewed By: ajkr Differential Revision: D24579392 Pulled By: riversand963 fbshipit-source-id: c4bfa6c6b08da1de186690d37eb73d8fff45aecd	5 years ago
Jermy Li	99a0305bb8	java: correct method name RocksDB.GetColumnFamilyMetaData() (#7606 ) Summary: update GetColumnFamilyMetaData() to getColumnFamilyMetaData() Pull Request resolved: https://github.com/facebook/rocksdb/pull/7606 Reviewed By: zhichao-cao Differential Revision: D24610298 Pulled By: cheng-chang fbshipit-source-id: d24f9b65478da1456f50747637dc95688af874de	5 years ago
Zhichao Cao	ea347d80df	Updated GenerateOneFileChecksum to use requested_checksum_func_name (#7586 ) Summary: CreateFileChecksumGenerator may uses requested_checksum_func_name in generator context to decide which generator will be used. GenerateOneFileChecksum has not being updated to use it, which will always get the generator when the name is empty. Fix it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7586 Test Plan: make check Reviewed By: riversand963 Differential Revision: D24491989 Pulled By: zhichao-cao fbshipit-source-id: d9fdfdd431240f0a9a2e781ddbd48a7d6c609aad	5 years ago
jsteemann	2404f8b9ec	slightly improve jemalloc allocator API header (#7592 ) Summary: Fix a few typos and avoid a potential nullptr dereference. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7592 Reviewed By: zhichao-cao Differential Revision: D24582111 Pulled By: riversand963 fbshipit-source-id: 51e9260e8cad1fcdedd310c889f0faeec6efd937	5 years ago
vdimir	248d10fb96	Fix typo in arena.cc (#7593 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/7593 Reviewed By: zhichao-cao Differential Revision: D24576218 Pulled By: riversand963 fbshipit-source-id: a3d77191362ca696ae9df643f97f4ab5b7ecff12	5 years ago
darionyaphet	793e9b7f5b	Remove duplicate close (#7594 ) Summary: Because `Close()` have called in `Destroy()` Pull Request resolved: https://github.com/facebook/rocksdb/pull/7594 Reviewed By: zhichao-cao Differential Revision: D24576407 Pulled By: riversand963 fbshipit-source-id: eba70d73375fd47dd78ca64c6a1fab3628448276	5 years ago
Ramkumar Vadivelu	9a690a74e1	In ParseInternalKey(), include corrupt key info in Status (#7515 ) Summary: Fixes Issue https://github.com/facebook/rocksdb/issues/7497 When allow_data_in_errors db_options is set, log error key details in `ParseInternalKey()` Have fixed most of the calls. Have few TODOs still pending - because have to make more deeper changes to pass in the allow_data_in_errors flag. Will do those in a separate PR later. Tests: - make check - some of the existing tests that exercise the "internal key too small" condition are: dbformat_test, cuckoo_table_builder_test - some of the existing tests that exercise the corrupted key path are: corruption_test, merge_helper_test, compaction_iterator_test Example of new status returns: - Key too small - `Corrupted Key: Internal Key too small. Size=5` - Corrupt key with allow_data_in_errors option set to false: `Corrupted Key: '<redacted>' seq:3, type:3` - Corrupt key with allow_data_in_errors option set to true: `Corrupted Key: '61' seq:3, type:3` Pull Request resolved: https://github.com/facebook/rocksdb/pull/7515 Reviewed By: ajkr Differential Revision: D24240264 Pulled By: ramvadiv fbshipit-source-id: bc48f5d4475ac19d7713e16df37505b31aac42e7	5 years ago
Andrew Kryczka	6c2c0635c9	Require only one `Logger::Logv()` implementation (#7605 ) Summary: A user who extended `Logger` recently pointed out it is unusual to require they implement the two-argument `Logv()` overload when they've already implemented the three-argument `Logv()` overload. I agree with that and think we can fix it by only calling the two-argument overload from the default implementation of the three-argument overload. Then when the three-argument overload is overridden, RocksDB would not rely on the two-argument overload. Only `Logger::LogHeader()` needed adjustment to achieve this. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7605 Reviewed By: riversand963 Differential Revision: D24584749 Pulled By: ajkr fbshipit-source-id: 9aabe040ac761c4c0dbebc4be046967403ecaf21	5 years ago
Peter Dillinger	0e2e67562f	Give instructions instead of broken 2to3 for clang-format-diff.py (#7603 ) Summary: My previous change to use lib2to3 to migrate clang-format-diff.py for Python 2 only works if there's nothing to reformat. Instead, give instructions to download to REPO_ROOT. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7603 Test Plan: Try the instructions on a fresh CentOS 8 devserver Reviewed By: riversand963 Differential Revision: D24569608 Pulled By: pdillinger fbshipit-source-id: 1410ba163e016b226e883dec93fae3df9ed0eab2	5 years ago
mrambacher	f35f7f2704	Fix many tests to run with MEM_ENV and ENCRYPTED_ENV; Introduce a MemoryFileSystem class (#7566 ) Summary: This PR does a few things: 1. The MockFileSystem class was split out from the MockEnv. This change would theoretically allow a MockFileSystem to be used by other Environments as well (if we created a means of constructing one). The MockFileSystem implements a FileSystem in its entirety and does not rely on any Wrapper implementation. 2. Make the RocksDB test suite work when MOCK_ENV=1 and ENCRYPTED_ENV=1 are set. To accomplish this, a few things were needed: - The tests that tried to use the "wrong" environment (Env::Default() instead of env_) were updated - The MockFileSystem was changed to support the features it was missing or mishandled (such as recursively deleting files in a directory or supporting renaming of a directory). 3. Updated the test framework to have a ROCKSDB_GTEST_SKIP macro. This can be used to flag tests that are skipped. Currently, this defaults to doing nothing (marks the test as SUCCESS) but will mark the tests as SKIPPED when RocksDB is upgraded to a version of gtest that supports this (gtest-1.10). I have run a full "make check" with MEM_ENV, ENCRYPTED_ENV, both, and neither under both MacOS and RedHat. A few tests were disabled/skipped for the MEM/ENCRYPTED cases. The error_handler_fs_test fails/hangs for MEM_ENV (presumably a timing problem) and I will introduce another PR/issue to track that problem. (I will also push a change to disable those tests soon). There is one more test in DBTest2 that also fails which I need to investigate or skip before this PR is merged. Theoretically, this PR should also allow the test suite to run against an Env loaded from the registry, though I do not have one to try it with currently. Finally, once this is accepted, it would be nice if there was a CircleCI job to run these tests on a checkin so this effort does not become stale. I do not know how to do that, so if someone could write that job, it would be appreciated :) Pull Request resolved: https://github.com/facebook/rocksdb/pull/7566 Reviewed By: zhichao-cao Differential Revision: D24408980 Pulled By: jay-zhuang fbshipit-source-id: 911b1554a4d0da06fd51feca0c090a4abdcb4a5f	5 years ago
Yanqin Jin	6134ce6444	Perform post-flush updates of memtable list in a callback (#6069 ) Summary: Currently, the following interleaving of events can lead to SuperVersion containing both immutable memtables as well as the resulting L0. This can cause Get to return incorrect result if there are merge operands. This may also affect other operations such as single deletes. ``` time main_thr bg_flush_thr bg_compact_thr compact_thr set_opts_thr 0 \| WriteManifest:0 1 \| issue compact 2 \| wait 3 \| Merge(counter) 4 \| issue flush 5 \| wait 6 \| WriteManifest:1 7 \| wake up 8 \| write manifest 9 \| wake up 10 \| Get(counter) 11 \| remove imm V ``` The reason behind is that: one bg flush thread's installing new `Version` can be batched and performed by another thread that is the "leader" MANIFEST writer. This bg thread removes the memtables from current super version only after `LogAndApply` returns. After the leader MANIFEST writer signals (releasing mutex) this bg flush thread, it is possible that another thread sees this cf with both memtables (whose data have been flushed to the newest L0) and the L0 before this bg flush thread removes the memtables. To address this issue, each bg flush thread can pass a callback function to `LogAndApply`. The callback is responsible for removing the memtables. Therefore, the leader MANIFEST writer can call this callback and remove the memtables before releasing the mutex. Test plan (devserver) ``` $make merge_test $./merge_test --gtest_filter=MergeTest.MergeWithCompactionAndFlush $make check ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/6069 Reviewed By: cheng-chang Differential Revision: D18790894 Pulled By: riversand963 fbshipit-source-id: e41bd600c0448b4f4b2deb3f7677f95e3076b4ed	5 years ago

... 31 32 33 34 35 ...

11172 Commits (6358e1b967050b712f499dd4bac3474012ef4802) All Branches Search

11172 Commits (6358e1b967050b712f499dd4bac3474012ef4802)

All Branches