Summary:
This PR does the following:
-> Creates a WinFileSystem class. This class is the Windows equivalent of the PosixFileSystem and will be used on Windows systems.
-> Introduces a CustomEnv class. A CustomEnv is an Env that takes a FileSystem as constructor argument. I believe there will only ever be two implementations of this class (PosixEnv and WinEnv). There is still a CustomEnvWrapper class that takes an Env and a FileSystem and wraps the Env calls with the input Env but uses the FileSystem for the FileSystem calls
-> Eliminates the public uses of the LegacyFileSystemWrapper.
With this change in place, there are effectively the following patterns of Env:
- "Base Env classes" (PosixEnv, WinEnv). These classes implement the core Env functions (e.g. Threads) and have a hard-coded input FileSystem. These classes inherit from CompositeEnv, implement the core Env functions (threads) and delegate the FileSystem-like calls to the input file system.
- Wrapped Composite Env classes (MemEnv). These classes take in an Env and a FileSystem. The core env functions are re-directed to the wrapped env. The file system calls are redirected to the input file system
- Legacy Wrapped Env classes. These classes take in an Env input (but no FileSystem). The core env functions are re-directed to the wrapped env. A "Legacy File System" is created using this env and the file system calls directed to the env itself.
With these changes in place, the PosixEnv becomes a singleton -- there is only ever one created. Any other use of the PosixEnv is via another wrapped env. This cleans up some of the issues with the env construction and destruction.
Additionally, there were places in the code that required had an Env when they required a FileSystem. Many of these places would wrap the Env with a LegacyFileSystemWrapper instead of using the env->GetFileSystem(). These places were changed, thereby removing layers of additional redirection (LegacyFileSystem --> Env --> Env::FileSystem).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7703
Reviewed By: zhichao-cao
Differential Revision: D25762190
Pulled By: anand1976
fbshipit-source-id: 1a088e97fc916f28ac69c149cd1dcad0ab31704b
Summary:
Prior to this PR it prints the raw bytes which can include non-printable
characters. This PR adds the option to print in hex instead.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7820
Test Plan:
try it out
```
$ ./ldb file_checksum_dump --hex --db=/tmp/rocksdbtest-9383//db_basic_test_12281129388755189514/
16, FileChecksumCrc32c, 0xC789D948
```
Reviewed By: jay-zhuang
Differential Revision: D25738072
Pulled By: ajkr
fbshipit-source-id: 8cf2856877971756c0495cfa63a9a1281c414dc7
Summary:
The multireadrandom benchmark, when run for a specific number of reads (--reads argument), should base the duration on the actual number of keys read rather than number of batches.
Tests:
Run db_bench multireadrandom benchmark
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7817
Reviewed By: zhichao-cao
Differential Revision: D25717230
Pulled By: anand1976
fbshipit-source-id: 13f4d8162268cf9a34918655e60302d0aba3864b
Summary:
Added "no-elide-constructors to the ASSERT_STATUS_CHECK builds. This flag gives more errors/warnings for some of the Status checks where an inner class checks a Status and later returns it. In this case, without the elide check on, the returned status may not have been checked in the caller, thereby bypassing the checked code.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7798
Reviewed By: jay-zhuang
Differential Revision: D25680451
Pulled By: pdillinger
fbshipit-source-id: c3f14ed9e2a13f0a8c54d839d5fb4d1fc1e93917
Summary:
Right now tools/regression_test.sh always builds RocksDB with PORTABLE=1. There isn't a reason for that. Remove it. Users can always specify PORTABLE through envirionement variable.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7806
Test Plan: Run tools/regression_test.sh and see it still builds.
Reviewed By: ajkr
Differential Revision: D25687911
fbshipit-source-id: 1c0b03e5df890babc8b7d8af48b48774d9a4600c
Summary:
This disables Linux/amd64 builds in Travis for PRs, and adds a
gcc-10+c++20 build in CircleCI, which should fill out sufficient coverage
vs. what we had in Travis
Fixed a use of std::is_pod, which is deprecated in c++20
Fixed ++ on a volatile in db_repl_stress.cc, with bigger refactoring.
Although ++ on this volatile was probably ok with one thread writer and
one thread reader, the code was still overly complex. There was a
deadcode check for error
`if (replThread.no_read < dataPump.no_records)` which can be proven
never to happen based on the structure of the code. It infinite loops
instead for the case intended to be checked. I just simplified the code
for what should be the same checking power.
Also most configurations seem to be using make parallelism = 2 * vcores,
so fixing / using that.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7791
Test Plan:
CI
and `while ./db_repl_stress; do echo again; done` for a while
Reviewed By: siying
Differential Revision: D25669834
Pulled By: pdillinger
fbshipit-source-id: b2c688053d0b1d52c989903449d3cd27a04130d6
Summary:
So that we can more easily get aggregate live table data such
as total filter, index, and data sizes.
Also adds ldb support for getting properties
Also fixed some missing/inaccurate related comments in db.h
For example:
$ ./ldb --db=testdb get_property rocksdb.aggregated-table-properties
rocksdb.aggregated-table-properties.data_size: 102871
rocksdb.aggregated-table-properties.filter_size: 0
rocksdb.aggregated-table-properties.index_partitions: 0
rocksdb.aggregated-table-properties.index_size: 2232
rocksdb.aggregated-table-properties.num_data_blocks: 100
rocksdb.aggregated-table-properties.num_deletions: 0
rocksdb.aggregated-table-properties.num_entries: 15000
rocksdb.aggregated-table-properties.num_merge_operands: 0
rocksdb.aggregated-table-properties.num_range_deletions: 0
rocksdb.aggregated-table-properties.raw_key_size: 288890
rocksdb.aggregated-table-properties.raw_value_size: 198890
rocksdb.aggregated-table-properties.top_level_index_size: 0
$ ./ldb --db=testdb get_property rocksdb.aggregated-table-properties-at-level1
rocksdb.aggregated-table-properties-at-level1.data_size: 80909
rocksdb.aggregated-table-properties-at-level1.filter_size: 0
rocksdb.aggregated-table-properties-at-level1.index_partitions: 0
rocksdb.aggregated-table-properties-at-level1.index_size: 1787
rocksdb.aggregated-table-properties-at-level1.num_data_blocks: 81
rocksdb.aggregated-table-properties-at-level1.num_deletions: 0
rocksdb.aggregated-table-properties-at-level1.num_entries: 12466
rocksdb.aggregated-table-properties-at-level1.num_merge_operands: 0
rocksdb.aggregated-table-properties-at-level1.num_range_deletions: 0
rocksdb.aggregated-table-properties-at-level1.raw_key_size: 238210
rocksdb.aggregated-table-properties-at-level1.raw_value_size: 163414
rocksdb.aggregated-table-properties-at-level1.top_level_index_size: 0
$
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7779
Test Plan: Added a test to ldb_test.py
Reviewed By: jay-zhuang
Differential Revision: D25653103
Pulled By: pdillinger
fbshipit-source-id: 2905469a08a64dd6b5510cbd7be2e64d3234d6d3
Summary:
Primarily this change refactors the optimize_filters_for_memory
code for Bloom filters, based on malloc_usable_size, to also work for
Ribbon filters.
This change also replaces the somewhat slow but general
BuiltinFilterBitsBuilder::ApproximateNumEntries with
implementation-specific versions for Ribbon (new) and Legacy Bloom
(based on a recently deleted version). The reason is to emphasize
speed in ApproximateNumEntries rather than 100% accuracy.
Justification: ApproximateNumEntries (formerly CalculateNumEntry) is
only used by RocksDB for range-partitioned filters, called each time we
start to construct one. (In theory, it should be possible to reuse the
estimate, but the abstractions provided by FilterPolicy don't really
make that workable.) But this is only used as a heuristic estimate for
hitting a desired partitioned filter size because of alignment to data
blocks, which have various numbers of unique keys or prefixes. The two
factors lead us to prioritize reasonable speed over 100% accuracy.
optimize_filters_for_memory adds extra complication, because precisely
calculating num_entries for some allowed number of bytes depends on state
with optimize_filters_for_memory enabled. And the allocator-agnostic
implementation of optimize_filters_for_memory, using malloc_usable_size,
means we would have to actually allocate memory, many times, just to
precisely determine how many entries (keys) could be added and stay below
some size budget, for the current state. (In a draft, I got this
working, and then realized the balance of speed vs. accuracy was all
wrong.)
So related to that, I have made CalculateSpace, an internal-only API
only used for testing, non-authoritative also if
optimize_filters_for_memory is enabled. This simplifies some code.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7774
Test Plan:
unit test updated, and for FilterSize test, range of tested
values is greatly expanded (still super fast)
Also tested `db_bench -benchmarks=fillrandom,stats -bloom_bits=10 -num=1000000 -partition_index_and_filters -format_version=5 [-optimize_filters_for_memory] [-use_ribbon_filter]` with temporary debug output of generated filter sizes.
Bloom+optimize_filters_for_memory:
1 Filter size: 197 (224 in memory)
134 Filter size: 3525 (3584 in memory)
107 Filter size: 4037 (4096 in memory)
Total on disk: 904,506
Total in memory: 918,752
Ribbon+optimize_filters_for_memory:
1 Filter size: 3061 (3072 in memory)
110 Filter size: 3573 (3584 in memory)
58 Filter size: 4085 (4096 in memory)
Total on disk: 633,021 (-30.0%)
Total in memory: 634,880 (-30.9%)
Bloom (no offm):
1 Filter size: 261 (320 in memory)
1 Filter size: 3333 (3584 in memory)
240 Filter size: 3717 (4096 in memory)
Total on disk: 895,674 (-1% on disk vs. +offm; known tolerable overhead of offm)
Total in memory: 986,944 (+7.4% vs. +offm)
Ribbon (no offm):
1 Filter size: 2949 (3072 in memory)
1 Filter size: 3381 (3584 in memory)
167 Filter size: 3701 (4096 in memory)
Total on disk: 624,397 (-30.3% vs. Bloom)
Total in memory: 690,688 (-30.0% vs. Bloom)
Note that optimize_filters_for_memory is even more effective for Ribbon filter than for cache-local Bloom, because it can close the unused memory gap even tighter than Bloom filter, because of 16 byte increments for Ribbon vs. 64 byte increments for Bloom.
Reviewed By: jay-zhuang
Differential Revision: D25592970
Pulled By: pdillinger
fbshipit-source-id: 606fdaa025bb790d7e9c21601e8ea86e10541912
Summary:
Inject the random write error to stress test, it requires set reopen=0 and disable_wal=true.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7653
Test Plan: pass db_stress and python3 db_crashtest.py blackbox
Reviewed By: ajkr
Differential Revision: D25354132
Pulled By: zhichao-cao
fbshipit-source-id: 44721104eecb416e27f65f854912c40e301dd669
Summary:
db_bench currently does not allow overriding the default `arena_block_size `calculation ([memtable size/8](https://github.com/facebook/rocksdb/blob/master/db/column_family.cc#L216)). For memtables whose size is in gigabytes, the `arena_block_size` defaults to hundreds of megabytes (affecting performance).
Exposing this option in db_bench would allow us to test the workloads with various `arena_block_size` values.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7654
Reviewed By: jay-zhuang
Differential Revision: D24996812
Pulled By: ajkr
fbshipit-source-id: a5e3d2c83d9f89e1bb8382f2e8dd476c79e33bef
Summary:
Added experimental public API for Ribbon filter:
NewExperimentalRibbonFilterPolicy(). This experimental API will
take a "Bloom equivalent" bits per key, and configure the Ribbon
filter for the same FP rate as Bloom would have but ~30% space
savings. (Note: optimize_filters_for_memory is not yet implemented
for Ribbon filter. That can be added with no effect on schema.)
Internally, the Ribbon filter is configured using a "one_in_fp_rate"
value, which is 1 over desired FP rate. For example, use 100 for 1%
FP rate. I'm expecting this will be used in the future for configuring
Bloom-like filters, as I expect people to more commonly hold constant
the filter accuracy and change the space vs. time trade-off, rather than
hold constant the space (per key) and change the accuracy vs. time
trade-off, though we might make that available.
### Benchmarking
```
$ ./filter_bench -impl=2 -quick -m_keys_total_max=200 -average_keys_per_filter=100000 -net_includes_hashing
Building...
Build avg ns/key: 34.1341
Number of filters: 1993
Total size (MB): 238.488
Reported total allocated memory (MB): 262.875
Reported internal fragmentation: 10.2255%
Bits/key stored: 10.0029
----------------------------
Mixed inside/outside queries...
Single filter net ns/op: 18.7508
Random filter net ns/op: 258.246
Average FP rate %: 0.968672
----------------------------
Done. (For more info, run with -legend or -help.)
$ ./filter_bench -impl=3 -quick -m_keys_total_max=200 -average_keys_per_filter=100000 -net_includes_hashing
Building...
Build avg ns/key: 130.851
Number of filters: 1993
Total size (MB): 168.166
Reported total allocated memory (MB): 183.211
Reported internal fragmentation: 8.94626%
Bits/key stored: 7.05341
----------------------------
Mixed inside/outside queries...
Single filter net ns/op: 58.4523
Random filter net ns/op: 363.717
Average FP rate %: 0.952978
----------------------------
Done. (For more info, run with -legend or -help.)
```
168.166 / 238.488 = 0.705 -> 29.5% space reduction
130.851 / 34.1341 = 3.83x construction time for this Ribbon filter vs. lastest Bloom filter (could make that as little as about 2.5x for less space reduction)
### Working around a hashing "flaw"
bloom_test discovered a flaw in the simple hashing applied in
StandardHasher when num_starts == 1 (num_slots == 128), showing an
excessively high FP rate. The problem is that when many entries, on the
order of number of hash bits or kCoeffBits, are associated with the same
start location, the correlation between the CoeffRow and ResultRow (for
efficiency) can lead to a solution that is "universal," or nearly so, for
entries mapping to that start location. (Normally, variance in start
location breaks the effective association between CoeffRow and
ResultRow; the same value for CoeffRow is effectively different if start
locations are different.) Without kUseSmash and with num_starts > 1 (thus
num_starts ~= num_slots), this flaw should be completely irrelevant. Even
with 10M slots, the chances of a single slot having just 16 (or more)
entries map to it--not enough to cause an FP problem, which would be local
to that slot if it happened--is 1 in millions. This spreadsheet formula
shows that: =1/(10000000*(1 - POISSON(15, 1, TRUE)))
As kUseSmash==false (the setting for Standard128RibbonBitsBuilder) is
intended for CPU efficiency of filters with many more entries/slots than
kCoeffBits, a very reasonable work-around is to disallow num_starts==1
when !kUseSmash, by making the minimum non-zero number of slots
2*kCoeffBits. This is the work-around I've applied. This also means that
the new Ribbon filter schema (Standard128RibbonBitsBuilder) is not
space-efficient for less than a few hundred entries. Because of this, I
have made it fall back on constructing a Bloom filter, under existing
schema, when that is more space efficient for small filters. (We can
change this in the future if we want.)
TODO: better unit tests for this case in ribbon_test, and probably
update StandardHasher for kUseSmash case so that it can scale nicely to
small filters.
### Other related changes
* Add Ribbon filter to stress/crash test
* Add Ribbon filter to filter_bench as -impl=3
* Add option string support, as in "filter_policy=experimental_ribbon:5.678;"
where 5.678 is the Bloom equivalent bits per key.
* Rename internal mode BloomFilterPolicy::kAuto to kAutoBloom
* Add a general BuiltinFilterBitsBuilder::CalculateNumEntry based on
binary searching CalculateSpace (inefficient), so that subclasses
(especially experimental ones) don't have to provide an efficient
implementation inverting CalculateSpace.
* Minor refactor FastLocalBloomBitsBuilder for new base class
XXH3pFilterBitsBuilder shared with new Standard128RibbonBitsBuilder,
which allows the latter to fall back on Bloom construction in some
extreme cases.
* Mostly updated bloom_test for Ribbon filter, though a test like
FullBloomTest::Schema is a next TODO to ensure schema stability
(in case this becomes production-ready schema as it is).
* Add some APIs to ribbon_impl.h for configuring Ribbon filters.
Although these are reasonably covered by bloom_test, TODO more unit
tests in ribbon_test
* Added a "tool" FindOccupancyForSuccessRate to ribbon_test to get data
for constructing the linear approximations in GetNumSlotsFor95PctSuccess.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7658
Test Plan:
Some unit tests updated but other testing is left TODO. This
is considered experimental but laying down schema compatibility as early
as possible in case it proves production-quality. Also tested in
stress/crash test.
Reviewed By: jay-zhuang
Differential Revision: D24899349
Pulled By: pdillinger
fbshipit-source-id: 9715f3e6371c959d923aea8077c9423c7a9f82b8
Summary:
crash tests donot run in DEBUG_MODE=0 on tmpfs when
use_direct_reads/use_direct_io_for_flush_and_compaction is set randomly because
direct I/O is not supported on tmpfs and tests exit.
Fix: Sanitize direct I/O read options in DEBUG_LEVEL=0 so that crash
tests can run in tmpfs. When mmap_reads is set, direct I/O reads options are
unset so we can sanitize direct I/O reads options in case of tmpfs as well.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7643
Test Plan:
1. export DEBUG_LEVEL=0; export TEST_TMPDIR="/dev/shm";
export CRASH_TEST_EXT_ARGS="--use_direct_reads=1 --mmap_read=0";
make crash_test -j64
2. In DEBUG_LEVEL=1 mode: make crash_test -j64
Reviewed By: jay-zhuang
Differential Revision: D24766550
Pulled By: akankshamahajan15
fbshipit-source-id: 021720b2343c12c72004f84b26147625d3991d9e
Summary:
This is a PR generated **semi-automatically** by an internal tool to remove unused includes and `using` statements.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7604
Test Plan: make check
Reviewed By: ajkr
Differential Revision: D24579392
Pulled By: riversand963
fbshipit-source-id: c4bfa6c6b08da1de186690d37eb73d8fff45aecd
Summary:
Fixes Issue https://github.com/facebook/rocksdb/issues/7497
When allow_data_in_errors db_options is set, log error key details in `ParseInternalKey()`
Have fixed most of the calls. Have few TODOs still pending - because have to make more deeper changes to pass in the allow_data_in_errors flag. Will do those in a separate PR later.
Tests:
- make check
- some of the existing tests that exercise the "internal key too small" condition are: dbformat_test, cuckoo_table_builder_test
- some of the existing tests that exercise the corrupted key path are: corruption_test, merge_helper_test, compaction_iterator_test
Example of new status returns:
- Key too small - `Corrupted Key: Internal Key too small. Size=5`
- Corrupt key with allow_data_in_errors option set to false: `Corrupted Key: '<redacted>' seq:3, type:3`
- Corrupt key with allow_data_in_errors option set to true: `Corrupted Key: '61' seq:3, type:3`
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7515
Reviewed By: ajkr
Differential Revision: D24240264
Pulled By: ramvadiv
fbshipit-source-id: bc48f5d4475ac19d7713e16df37505b31aac42e7
Summary:
As suggested by pdillinger ,The name of kLogFile is misleading, in some tests, kLogFile is defined as info log. Replace it with kWalFile and move it to public, which will be used in https://github.com/facebook/rocksdb/issues/7523
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7580
Test Plan: make check
Reviewed By: riversand963
Differential Revision: D24485420
Pulled By: zhichao-cao
fbshipit-source-id: 955e3dacc1021bb590fde93b0a568ffe9ad80799
Summary:
If crash test fails, don't delete the `expected_values_file` for later
debug. More details: https://github.com/facebook/rocksdb/issues/7530
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7534
Test Plan: local host
Reviewed By: ajkr
Differential Revision: D24239655
Pulled By: jay-zhuang
fbshipit-source-id: 3566f91a30aae1e27d2f51d910cddd08edb7d4cf
Summary:
The old flag-based APIs (`BlockBasedTableOptions::pin_l0_filter_and_index_blocks_in_cache` and `BlockBasedTableOptions::pin_top_level_index_and_filter`) were insufficient for our needs. For example, it was impossible to pin only unpartitioned meta-blocks, which could prevent block cache contention when turning on dictionary compression or during a migration to partitioned indexes/filters. It was also impossible to pin all meta-blocks in memory while having predictable memory usage via block cache. If we had continued adding flags to address these scenarios, they would have had significant overlap causing confusion. Instead, this PR deprecates the flags and starts a new API with non-overlapping options.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7520
Test Plan:
- new unit test
- added new options to stress/crash test and ran for a while: `$ python tools/db_crashtest.py blackbox --simple --max_key=1000000 -write_buffer_size=1048576 -target_file_size_base=1048576 -max_bytes_for_level_base=4194304 --interval=10 -value_size_mult=33 -column_families=1 -reopen=0`
Reviewed By: pdillinger
Differential Revision: D24200034
Pulled By: ajkr
fbshipit-source-id: 3fa7cfc71e7960f7a867511dd6ae5834dd73b13e
Summary:
Cover paranoid_file_checks in crash test.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7489
Test Plan: Run crash tests for hours and didn't see any failure.
Reviewed By: ajkr
Differential Revision: D24063868
fbshipit-source-id: 7b48b110e66ce78ae5d0c99a9f32af86edd34c1e
Summary:
It's important to make sure no false positive is reported when options.paranoid_file_checks is used. Add it to stress test and a place holder in crash test. It is disabled in crash test as there appears to be a bug causing false positive.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7473
Test Plan: Run crash test
Reviewed By: ajkr
Differential Revision: D24026939
fbshipit-source-id: 89102acb45cf041776775ce44a4eef4b0f3a380c
Summary:
The patch introduces a helper method in `util/compression.h` called `UncompressData`
that dispatches calls to the correct uncompression method based on type, and changes
`UncompressBlockContentsForCompressionType` and `Benchmark::Uncompress` in
`db_bench` so they are implemented in terms of the new method. This eliminates
some code duplication. (`Benchmark::Compress` is also updated to use the previously
introduced `CompressData` helper.)
In addition, the patch brings the implementation of `Snappy_Uncompress` into sync with
the other uncompression methods by making the method compute the buffer size and allocate
the buffer itself. Finally, the patch eliminates some potentially risky back-and-forth conversions
between various unsigned and signed integer types by exposing the size of the allocated buffer
as a `size_t` instead of an `int`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7434
Test Plan:
`make check`
`./db_bench -benchmarks=compress,uncompress --compression_type ...`
Reviewed By: riversand963
Differential Revision: D23900011
Pulled By: ltamasi
fbshipit-source-id: b25df63ceec4639889be94acb22eb53e530c54e0
Summary:
Implement a parsing tool io_tracer_parser that takes IO trace file (binary file) with command line argument --io_trace_file and output file with --output_file and dumps the IO trace records in outputfile in human readable form.
Also added unit test cases that generates IO trace records and calls io_tracer_parse to parse those records.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7333
Test Plan:
make check -j64,
Add unit test cases.
Reviewed By: anand1976
Differential Revision: D23772360
Pulled By: akankshamahajan15
fbshipit-source-id: 9c20519c189362e6663352d08863326f3e496271
Summary:
Make "unreleased" section for HISTORY.md with things misplaced
into 6.12 and 6.13
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7401
Test Plan: see how it goes, and `git diff origin/6.13.fb HISTORY.md`
Reviewed By: jay-zhuang
Differential Revision: D23759740
Pulled By: pdillinger
fbshipit-source-id: fc441916c7ff2bbb8d5384137653b340d4c47674
Summary:
Update db_bench so that we can run it with user-defined timestamp.
Currently, only 64-bit timestamp is supported, while others are disabled by assertion.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7389
Test Plan: ./db_bench -benchmarks=fillseq,fillrandom,readrandom,readsequential,....., -user_timestamp_size=8
Reviewed By: ltamasi
Differential Revision: D23720830
Pulled By: riversand963
fbshipit-source-id: 486eacbb82de9a5441e79a61bfa9beef6581608a
Summary:
This PR merges the functionality of making the ColumnFamilyOptions, TableFactory, and DBOptions into Configurable into a single PR, resolving any merge conflicts
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5753
Reviewed By: ajkr
Differential Revision: D23385030
Pulled By: zhichao-cao
fbshipit-source-id: 8b977a7731556230b9b8c5a081b98e49ee4f160a
Summary:
1. Failed to compile because of use of FileSystem* instead of Env* to some methods;
2. Failed to compile with addition of ConfigOptions to some methods
3. Failed to run successfully because the database and/or db_bench would change some of the options, invalidating the comparison
4. Failed to run successfully if Snappy was not available.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7344
Reviewed By: siying
Differential Revision: D23501093
Pulled By: jay-zhuang
fbshipit-source-id: 81fd947e95fff9db8a4c5ff419d69d4c36bef23f
Summary:
Also enables a pull request to trigger all the Travis
configurations by writing FULL_CI in the commit message. (See what I did
there?)
First issue
make: *** No rule to make target 'jl/util/crc32c_ppc_asm.o', needed by 'rocksdbjava'. Stop.
Second issue
tools/db_bench_tool.cc:5514:38: error: ‘gen_exp.rocksdb::Benchmark::GenerateTwoTermExpKeys::keyrange_size_’ may be used uninitialized in this function
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7359
Test Plan: CI
Reviewed By: zhichao-cao
Differential Revision: D23582132
Pulled By: pdillinger
fbshipit-source-id: 06d794673fd522ba11cf6398385387e6bd97ef89
Summary:
gcc-4.8 returns error when using the constructor. Not sure if it's a compiler bug/limitation or code issue:
```
table/block_based/block_based_table_reader.cc:3183:67: error: use of deleted function ‘rocksdb::WritableFileStringStreamAdapter::WritableFileStringStreamAdapter(rocksdb::WritableFileStringStreamAdapter&&)’
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7358
Reviewed By: pdillinger
Differential Revision: D23577651
Pulled By: jay-zhuang
fbshipit-source-id: b0197e3d3538da61a6f3866410d88d2047fb9695
Summary:
(1) Skip check on specific key if restoring an old backup
(small minority of cases) because it can fail in those cases. (2) Remove
an old assertion about number of column families and number of keys
passed in, which is broken by atomic flush (cf_consistency) test. Like
other code (for better or worse) assume a single key and iterate over
column families. (3) Apply mock_direct_io to NewSequentialFile so that
db_stress backup works on /dev/shm.
Also add more context to output in case of backup/restore db_stress
failure.
Also a minor fix to BackupEngine to report first failure status in
creating new backup, and drop another clue about the potential
source of a "Backup failed" status.
Reverts "Disable backup/restore stress test (https://github.com/facebook/rocksdb/issues/7350)"
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7357
Test Plan:
Using backup_one_in=10000,
"USE_CLANG=1 make crash_test_with_atomic_flush" for 30+ minutes
"USE_CLANG=1 make blackbox_crash_test" for 30+ minutes
And with use_direct_reads with TEST_TMPDIR=/dev/shm/rocksdb
Reviewed By: riversand963
Differential Revision: D23567244
Pulled By: pdillinger
fbshipit-source-id: e77171c2e8394d173917e36898c02dead1c40b77
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/7315
Test Plan:
`ASSERT_STATUS_CHECKED=1 make sst_dump_test && ./sst_dump_test`
And manually run `./sst_dump --file=*.sst` before and after the change.
Reviewed By: pdillinger
Differential Revision: D23361669
Pulled By: jay-zhuang
fbshipit-source-id: 5bf51a2a90ee35c8c679e5f604732ec2aef5949a
Summary:
This change has the crash test randomly select from a few file
checksum implementations, or nullptr, for DB file_checksum_gen_factory.
For compatibility across runs on same DB, each non-null factory can
understand all the other functions, but the default changes.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7343
Test Plan:
'make blackbox_crash_test' for a while, including with some
debug output to ensure code is being exercised.
Reviewed By: zhichao-cao
Differential Revision: D23494580
Pulled By: pdillinger
fbshipit-source-id: 73bbc7ca32c1adaf619134c0c830f12894880b8a
Summary:
Although added to db_stress, testing of backup/restore
was never integrated into the crash test, originally concerned about
performance. I've enabled it now and to address the peformance concern,
testing backup/restore is always skipped once the db exceeds a certain
size threshold, default 100MB. This should provide sufficient
opportunity for testing BackupEngine without bogging down everything
else with heavier and heavier operations.
Also fixed backup/restore in db_stress by making sure PurgeOldBackups
can remove manifest files, which are normally kept around for db_stress.
Added more coverage of backup options, and up to three backups being
saved in one backup directory (in some cases).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7348
Test Plan:
ran 'make blackbox_crash_test' for a while, with heightened
probabilitly of taking backups (1/10k). Also confirmed with some debug
output that the code is being covered, TestBackupRestore only takes
a few seconds to complete when triggered, and even at 1/10k and ~50MB
database, there's <,~ 1 thread testing backups at any time.
Reviewed By: ajkr
Differential Revision: D23510835
Pulled By: pdillinger
fbshipit-source-id: b6b8735591808141f81f10773ac31634cf03b6c0
Summary:
This is adapted from https://github.com/facebook/rocksdb/issues/6678 but takes a different approach, avoiding opening a read-write DB and avoiding the `DeleteFile()` API.
First, this PR refactors how options variables are initialized in `ldb` so it can be reused in a subcommand that doesn't open a DB:
- Separated remaining option initialization logic out of `OpenDB()`. The new `PrepareOptions()` function initializes the full options state.
- Fixed an old TODO about applying the subcommand CF option overrides to the proper `ColumnFamilyOptions` object.
Second, this PR adds the `ldb unsafe_remove_sst_file` subcommand. It uses the `VersionSet`-level APIs to remove the file with the specified number.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7335
Test Plan: played with interactive python and this file removal command. Verified openability/correct results in case of multiple column families, multiple levels, etc.
Reviewed By: pdillinger
Differential Revision: D23454575
Pulled By: ajkr
fbshipit-source-id: 039b7a8cbfc42fd123dcb25821eef51d61148afe
Summary:
In block-based table builder, the cut-over from buffered to unbuffered
mode involves sampling the buffered blocks and generating a dictionary.
There was a bug where `SstFileWriter` passed zero as the `target_file_size`
causing the cutover to happen immediately, so there were no samples
available for generating the dictionary.
This PR changes the meaning of `target_file_size == 0` to mean buffer
the whole file before cutting over. It also adds dictionary compression
support to `sst_dump --command=recompress` for easy evaluation.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7323
Reviewed By: cheng-chang
Differential Revision: D23412158
Pulled By: ajkr
fbshipit-source-id: 3b232050e70ef3c2ee85a4b5f6fadb139c569873
Summary:
Delete database instances to make sure there are no loose threads
running before exit(). This fixes segfaults seen when running
workloads through CompositeEnvs with custom file systems.
For further background on the issues arising when using CompositeEnvs, see the discussion in:
https://github.com/facebook/rocksdb/pull/6878
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7327
Reviewed By: cheng-chang
Differential Revision: D23433244
Pulled By: ajkr
fbshipit-source-id: 4e19cf2067e3fe68c2a3fe1823f24b4091336bbe
Summary:
This pull request adds the parameter --fs_uri to db_bench and db_stress, creating a composite env combining the default env with a specified registered rocksdb file system.
This makes it easier to develop and test new RocksDB FileSystems.
The pull request also registers the posix file system for testing purposes.
Examples:
```
$./db_bench --fs_uri=posix:// --benchmarks=fillseq
$./db_stress --fs_uri=zenfs://nullb1
```
zenfs is a RocksDB FileSystem I'm developing to add support for zoned block devices, and in that case the zoned block device is specified in the uri (a zoned null block device in the above example).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6878
Reviewed By: siying
Differential Revision: D23023063
Pulled By: ajkr
fbshipit-source-id: 8b3fe7193ce45e683043b021779b7a4d547af247
Summary:
As part of the IOTracing project, this PR
1. Caches "FileSystemPtr" object(wrapper class that returns file system pointer based on tracing enabled) instead of "FileSystem" pointer.
2. FileSystemPtr object is created using FileSystem pointer and IOTracer
pointer.
3. IOTracer shared_ptr is created in DBImpl and it is passed to different classes through constructor.
4. When tracing is enabled through DB::StartIOTrace, FileSystemPtr
returns FileSystemTracingWrapper pointer for tracing purpose and when
it is disabled underlying FileSystem pointer is returned.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7180
Test Plan:
make check -j64
COMPILE_WITH_TSAN=1 make check -j64
Reviewed By: anand1976
Differential Revision: D22987117
Pulled By: akankshamahajan15
fbshipit-source-id: 6073617e4c2d5bc363914f3a1f55ae3b0a58fbf1
Summary:
The mechanism to mark files for compaction is most commonly used in
delete-triggered compaction. This PR adds an option to exercise the
marking mechanism on random files created by db_stress. This PR also
enables that option in db_crashtest.py on its db_stress runs at random.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7231
Test Plan:
- ran some minified crash tests; verified they succeed and we see `"compaction_reason": "FilesMarkedForCompaction"` regularly in the logs.
```
$ TEST_TMPDIR=/dev/shm python tools/db_crashtest.py blackbox --duration=600 --interval=30 --max_key=10000000 --write_buffer_size=1048576 --target_file_size_base=1048576 --max_bytes_for_level_base=4194304 --value_size_mult=33
$ TEST_TMPDIR=/dev/shm python tools/db_crashtest.py whitebox --duration=600 --interval=30 --max_key=1000000 --write_buffer_size=1048576 --target_file_size_base=1048576 --max_bytes_for_level_base=4194304 --value_size_mult=33 --random_kill_odd=8887
```
Reviewed By: anand1976
Differential Revision: D23025156
Pulled By: ajkr
fbshipit-source-id: a404c467ebc12afa94dae35956ea9b372f592a96
Summary:
Adds compaction statistics (total bytes read and written) for compactions that occur for delete-triggered, periodic, and TTL compaction reasons.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7165
Test Plan:
TTL and periodic can be checked by runnning db_bench with the options activated:
/db_bench --benchmarks="fillrandom,stats" --statistics --num=10000000 -base_background_compactions=16 -periodic_compaction_seconds=1
./db_bench --benchmarks="fillrandom,stats" --statistics --num=10000000 -base_background_compactions=16 -fifo_compaction_ttl=1
Setting the time to one second causes non-zero bytes read/written for those compaction reasons. Disabling them or setting them to times longer than the test run length causes the stats to return to zero as expected.
Delete-triggered compaction counting is tested in DBTablePropertiesTest.DeletionTriggeredCompactionMarking
Reviewed By: ajkr
Differential Revision: D22693050
Pulled By: akabcenell
fbshipit-source-id: d15cef4d94576f703015c8942d5f0d492f69401d