Summary:
compaction_pri = kMinOverlappingRatio usually provides much better write amplification than the default.
https://github.com/facebook/rocksdb/pull/4907 fixes one shortcome of this option. Make it default.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4911
Differential Revision: D13789262
Pulled By: siying
fbshipit-source-id: d90acf8c4dede44f00d183ca4c7a210259378269
Summary:
…ression
For `CompressionType` we have options `compression` and `bottommost_compression`. Thus, to make the compression options consitent with the compression type when bottommost_compression is enabled, we add the bottommost_compression_opts
Closes https://github.com/facebook/rocksdb/pull/3985
Reviewed By: riversand963
Differential Revision: D8385911
Pulled By: zhichao-cao
fbshipit-source-id: 07bc533dd61bcf1cef5927d8d62901c13d38d5fc
Summary:
RocksDB should always be able to parse an option file generated using the same or lower version. Unknown option should only happen if it is from a higher version. Change the behavior of RocksDBOptionsParser::Parse()'s behavior with ignore_unknown_options=true so that unknown option from a lower or the same version will never be skipped.
Closes https://github.com/facebook/rocksdb/pull/3527
Differential Revision: D7048851
Pulled By: siying
fbshipit-source-id: e261caea12f6515611a4a29f39acf2b619df2361
Summary:
I started adding gflags support for cmake on linux and got frustrated that I'd need to duplicate the build_detect_platform logic, which determines namespace based on attempting compilation. We can do it differently -- use the GFLAGS_NAMESPACE macro if available, and if not, that indicates it's an old gflags version without configurable namespace so we can simply hardcode "google".
Closes https://github.com/facebook/rocksdb/pull/3212
Differential Revision: D6456973
Pulled By: ajkr
fbshipit-source-id: 3e6d5bde3ca00d4496a120a7caf4687399f5d656
Summary:
Problem: Option string accepts only cache_size as parameter for block_cache which is specified as "block_cache=1M".
It doesn't accept other parameters like num_shards etc.
Changes :
1) ParseBlockBasedTableOption in block_based_table_factory is edited to accept cache options in the format "block_cache=<cache_size>:<num_shard_bits>:<strict_capacity_limit>:<high_pri_pool_ratio>".
Options other than cache_size are optional to maintain backward compatibility. The changes are valid for block_cache_compressed as well.
For example, "block_cache=1M:6:true:0.5", "block_cache=1M:6:true", "block_cache=1M:6" and "block_cache=1M" are all valid option strings.
2) Corresponding unit tests are added.
Closes https://github.com/facebook/rocksdb/pull/3108
Differential Revision: D6420997
Pulled By: sagar0
fbshipit-source-id: cdea8b785688d2802907974af27225ccc1c0cd43
Summary:
Currently, RocksDB does not allow reopening a preexisting DB with no merge operator defined, with a merge operator defined. This means that if a DB ever want to add a merge operator, there's no way to do so currently.
Fix this by adding a new verification type `kByNameAllowFromNull` which will allow old values to be nullptr, and new values to be non-nullptr.
Closes https://github.com/facebook/rocksdb/pull/2958
Differential Revision: D5961131
Pulled By: lth
fbshipit-source-id: 06179bebd0d90db3d43690b5eb7345e2d5bab1eb
Summary:
test the `DBOptions(const Options&)` and `ColumnFamilyOptions(const Options&)` constructors. Actually this'll work better once we refactor `RandomInitDBOptions` / `RandomInitCFOptions` to use the authoritative sources of struct members: `db_options_type_info` / `cf_options_type_info` (internal task T21804189 for this).
Closes https://github.com/facebook/rocksdb/pull/2873
Differential Revision: D5817141
Pulled By: ajkr
fbshipit-source-id: 8567c20feced9d1751fdf1f4383e2af30f7e3591
Summary:
Replace dynamic_cast<> so that users can choose to build with RTTI off, so that they can save several bytes per object, and get tiny more memory available.
Some nontrivial changes:
1. Add Comparator::GetRootComparator() to get around the internal comparator hack
2. Add the two experiemental functions to DB
3. Add TableFactory::GetOptionString() to avoid unnecessary casting to get the option string
4. Since 3 is done, move the parsing option functions for table factory to table factory files too, to be symmetric.
Closes https://github.com/facebook/rocksdb/pull/2645
Differential Revision: D5502723
Pulled By: siying
fbshipit-source-id: fd13cec5601cf68a554d87bfcf056f2ffa5fbf7c
Summary:
With c7004840d2, it's safe to open a DB with different prefix extractor. So it's safe to skip prefix extractor check.
Closes https://github.com/facebook/rocksdb/pull/2474
Differential Revision: D5294700
Pulled By: siying
fbshipit-source-id: eeb500da795eecb29b8c9c56a14cfd4afda12ecc
Summary:
Added a flag, `ignore_unknown_options`, to skip unknown options when loading an options file (using `LoadLatestOptions`/`LoadOptionsFromFile`) or while verifying options (using `CheckOptionsCompatibility`). This will help in downgrading the db to an older version.
Also added `--ignore_unknown_options` flag to ldb
**Example Use case:**
In MyRocks, if copying from newer version to older version, it is often impossible to start because of new RocksDB options that don't exist in older version, even though data format is compatible.
MyRocks uses these load and verify functions in [ha_rocksdb.cc::check_rocksdb_options_compatibility](e004fd9f41/storage/rocksdb/ha_rocksdb.cc (L3348-L3401)).
**Test Plan:**
Updated the unit tests.
`make check`
ldb:
$ ./ldb --db=/tmp/test_db --create_if_missing put a1 b1
OK
Now edit /tmp/test_db/<OPTIONS-file> and add an unknown option.
Try loading the options now, and it fails:
$ ./ldb --db=/tmp/test_db --try_load_options get a1
Failed: Invalid argument: Unrecognized option DBOptions:: abcd
Passes with the new --ignore_unknown_options flag
$ ./ldb --db=/tmp/test_db --try_load_options --ignore_unknown_options get a1
b1
Closes https://github.com/facebook/rocksdb/pull/2423
Differential Revision: D5212091
Pulled By: sagar0
fbshipit-source-id: 2ec17636feb47dc0351b53a77e5f15ef7cbf2ca7
Summary:
- `max_background_flushes` and `max_background_compactions` are still supported for backwards compatibility
- `base_background_compactions` is completely deprecated. Now we just throttle to one background compaction when there's no pressure.
- `max_background_jobs` is added to automatically partition the concurrent background jobs into flushes vs compactions. Currently it's very simple as we just allocate one-fourth of the jobs to flushes, and the remaining can be used for compactions.
- The test cases that set `base_background_compactions > 1` needed to be updated. I just grab the pressure token such that the desired number of compactions can be scheduled.
Closes https://github.com/facebook/rocksdb/pull/2205
Differential Revision: D4937461
Pulled By: ajkr
fbshipit-source-id: df52cbbd497e13bbc9a60560a5ac2a2526b3f1f9
Summary:
Replace Options::use_direct_writes with Options::use_direct_io_for_flush_and_compaction
Now if Options::use_direct_io_for_flush_and_compaction = true, we will enable direct io for both reads and writes for flush and compaction job. Whereas Options::use_direct_reads controls user reads like iterator and Get().
Closes https://github.com/facebook/rocksdb/pull/2117
Differential Revision: D4860912
Pulled By: lightmark
fbshipit-source-id: d93575a8a5e780cf7e40797287edc425ee648c19
Summary:
This is an effort to club all string related utility functions into one common place, in string_util, so that it is easier for everyone to know what string processing functions are available. Right now they seem to be spread out across multiple modules, like logging and options_helper.
Check the sub-commits for easier reviewing.
Closes https://github.com/facebook/rocksdb/pull/2094
Differential Revision: D4837730
Pulled By: sagar0
fbshipit-source-id: 344278a
Summary:
Move some files under util/ to new directories env/, monitoring/ options/ and cache/
Closes https://github.com/facebook/rocksdb/pull/2090
Differential Revision: D4833681
Pulled By: siying
fbshipit-source-id: 2fd8bef
Summary:
This option is needed to be enabled for Direct IO
and I cannot think of a reason where we need to disable it
remove it and default it to true
Closes https://github.com/facebook/rocksdb/pull/1944
Differential Revision: D4641088
Pulled By: IslamAbdelRahman
fbshipit-source-id: d7085b9
Summary:
…action
The two options, min_partial_merge_operands and verify_checksums_in_compaction, are not seldom used. Remove them to reduce the total number of options. Also remove them from Java and C interface.
Closes https://github.com/facebook/rocksdb/pull/1902
Differential Revision: D4601219
Pulled By: siying
fbshipit-source-id: aad4cb2
Summary:
Remove disableDataSync, and another similarly named disable_data_sync options.
This is being done to simplify options, and also because the performance gains of this feature can be achieved by other methods.
Closes https://github.com/facebook/rocksdb/pull/1859
Differential Revision: D4541292
Pulled By: sagar0
fbshipit-source-id: 5b3a6ca
Summary:
Change the default of delayed slowdown value to 16MB/s and further increase the L0 stop condition to 36 files.
Closes https://github.com/facebook/rocksdb/pull/1821
Differential Revision: D4489229
Pulled By: siying
fbshipit-source-id: 1003981
Summary:
Having -ve value for max_write_buffer_number does not make sense and cause us to do a left shift on a -ve value number
Closes https://github.com/facebook/rocksdb/pull/1579
Differential Revision: D4240798
Pulled By: IslamAbdelRahman
fbshipit-source-id: bd6267e
Summary: Add mutable options info into `OptionsTypeInfo` and use it to parse mutable options map. Also support `max_bytes_for_level_multiplier_additional` in option file.
Test Plan: unit test
Reviewers: yhchiang, IslamAbdelRahman, sdong
Reviewed By: sdong
Subscribers: andrewkr, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D63843
Summary:
ZSTD 1.0.0 is coming. We can finally add a support of ZSTD without worrying about compatibility.
Still keep ZSTDNotFinal for compatibility reason.
Test Plan: Run all tests. Run db_bench with ZSTD version with RocksDB built with ZSTD 1.0 and older.
Reviewers: andrewkr, IslamAbdelRahman
Reviewed By: IslamAbdelRahman
Subscribers: cyan, igor, IslamAbdelRahman, leveldb, andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D63141
Summary: To reduce number of options, merge source_compaction_factor, max_grandparent_overlap_bytes and expanded_compaction_factor into max_compaction_bytes.
Test Plan: Add two new unit tests. Run all existing tests, including jtest.
Reviewers: yhchiang, igor, IslamAbdelRahman
Reviewed By: IslamAbdelRahman
Subscribers: leveldb, andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D59829
Summary:
This diff improves the documentation for GetOptionsFromMap APIs and
fixes a bug in GetOptionsFromMap functions in convenience.h
where new_options will still be changed when the function
call is not successful.
Test Plan: options_test
Reviewers: IslamAbdelRahman, kradhakrishnan, andrewkr, sdong
Reviewed By: sdong
Subscribers: andrewkr, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D61731
Summary: Extend the option memtable_prefix_bloom_huge_page_tlb_size from just putting memtable bloom filter to huge page to memtable itself too.
Test Plan: Run all existing tests.
Reviewers: IslamAbdelRahman, yhchiang, andrewkr
Reviewed By: andrewkr
Subscribers: leveldb, andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D60513
Summary: filter_deltes is not a frequently used feature. Remove it.
Test Plan: Run all test suites.
Reviewers: igor, yhchiang, IslamAbdelRahman
Reviewed By: IslamAbdelRahman
Subscribers: leveldb, andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D59427
Summary:
memtable_prefix_bloom_probes is not a critical option. Remove it to reduce number of options.
It's easier for users to make mistakes with memtable_prefix_bloom_bits, turn it to memtable_prefix_bloom_bits_ratio
Test Plan: Run all existing tests
Reviewers: yhchiang, igor, IslamAbdelRahman
Reviewed By: IslamAbdelRahman
Subscribers: gunnarku, yoshinorim, MarkCallaghan, leveldb, andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D59199
Summary: AFIK, options builder is not used by anyone. Remove it.
Test Plan: Run all existing tests.
Reviewers: IslamAbdelRahman, andrewkr, igor
Reviewed By: igor
Subscribers: leveldb, andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D59319
Summary:
Expose a simple function to convert CompressionType to it's corresponding option string
This is for a diff @yoshinorim is working on for MyRocks
Test Plan: unittest
Reviewers: yhchiang, andrewkr, sdong
Reviewed By: sdong
Subscribers: andrewkr, dhruba, yoshinorim
Differential Revision: https://reviews.facebook.net/D58215
Summary:
Add a new option that can be used to set a specific compression algorithm for bottommost level.
This option will only affect levels larger than base level.
I have also updated CompactionJobInfo to include the compression algorithm used in compaction
Test Plan:
added new unittest
existing unittests
Reviewers: andrewkr, yhchiang, sdong
Reviewed By: sdong
Subscribers: lightmark, andrewkr, dhruba, yoshinorim
Differential Revision: https://reviews.facebook.net/D57669
Summary:
For backwards compatibility with older option strings, the parser needs
to treat this argument as optional.
Test Plan:
Updated unit test to cover case where compression_opts is present but
max_dict_bytes is omitted.
Reviewers: MarkCallaghan, sdong, IslamAbdelRahman
Reviewed By: IslamAbdelRahman
Subscribers: andrewkr, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D57759
Summary: We changed default options of max_open_files and max_file_opening_threads but didn't revert it in OptimizeForSmallDb().
Test Plan: Add a unit test
Reviewers: igor, yhchiang, IslamAbdelRahman
Reviewed By: IslamAbdelRahman
Subscribers: leveldb, andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D57675
Summary:
Changing several option defaults:
options.max_open_files changes from 5000 to -1
options.base_background_compactions changes from max_background_compactions to 1
options.wal_recovery_mode changes from kTolerateCorruptedTailRecords to kTolerateCorruptedTailRecords
options.compaction_pri changes from kByCompensatedSize to kByCompensatedSize
Test Plan: Write unit tests to see OldDefaults() works as expected.
Reviewers: IslamAbdelRahman, yhchiang, igor
Reviewed By: igor
Subscribers: MarkCallaghan, yiwu, kradhakrishnan, leveldb, andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D56427
Summary:
This adds a new metablock containing a shared dictionary that is used
to compress all data blocks in the SST file. The size of the shared dictionary
is configurable in CompressionOptions and defaults to 0. It's currently only
used for zlib/lz4/lz4hc, but the block will be stored in the SST regardless of
the compression type if the user chooses a nonzero dictionary size.
During compaction, computes the dictionary by randomly sampling the first
output file in each subcompaction. It pre-computes the intervals to sample
by assuming the output file will have the maximum allowable length. In case
the file is smaller, some of the pre-computed sampling intervals can be beyond
end-of-file, in which case we skip over those samples and the dictionary will
be a bit smaller. After the dictionary is generated using the first file in a
subcompaction, it is loaded into the compression library before writing each
block in each subsequent file of that subcompaction.
On the read path, gets the dictionary from the metablock, if it exists. Then,
loads that dictionary into the compression library before reading each block.
Test Plan: new unit test
Reviewers: yhchiang, IslamAbdelRahman, cyan, sdong
Reviewed By: sdong
Subscribers: andrewkr, yoshinorim, kradhakrishnan, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D52287
Comparable with Snappy on comp ratio.
Implemented using Windows API, does not require external package.
Avaiable since Windows 8 and server 2012.
Use -DXPRESS=1 with CMake to enable.
Summary: It is useful to print out IO stats in flush jobs too. Extend options.compaction_measure_io_stats to flush jobs and raname it.
Test Plan: Try db_bench and see the stats are printed out.
Reviewers: yhchiang
Reviewed By: yhchiang
Subscribers: kradhakrishnan, yiwu, IslamAbdelRahman, leveldb, andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D56769
Summary: Test DBOptionsAllFieldsSettable sometimes fails under valgrind. Move option settable tests to a separate test file and disable it in valgrind..
Test Plan: Run valgrind test and make sure the test doesn't run.
Reviewers: andrewkr, IslamAbdelRahman
Reviewed By: IslamAbdelRahman
Subscribers: kradhakrishnan, yiwu, yhchiang, leveldb, andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D56529
Summary: Cache shard bit 4 is sometimes too small and 6 is a more common value picked by users. Make that default. It shouldn't hurt much to change options.max_file_opening_threads default to be 16, which will reduce the worst case DB open time.
Test Plan: Run all existing tests.
Reviewers: IslamAbdelRahman, yhchiang, kradhakrishnan, igor, andrewkr
Reviewed By: andrewkr
Subscribers: andrewkr, MarkCallaghan, leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D55047
Summary:
Adapted a stderr logger from the option tests. Moved it to a separate
header so we can reuse it, e.g., from ldb subcommands for faster debugging. This
is especially useful to make errors/warnings more visible when running
"ldb repair", which involves potential data loss.
Test Plan:
ran options_test and "ldb repair"
$ ./ldb repair --db=./tmp/
[WARN] **** Repaired rocksdb ./tmp/; recovered 1 files; 588bytes. Some data may have been lost. ****
OK
Reviewers: IslamAbdelRahman, yhchiang, sdong
Reviewed By: sdong
Subscribers: andrewkr, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D56151
Summary:
When a block based table file is opened, if prefetch_index_and_filter is true, it will prefetch the index and filter blocks, putting them into the block cache.
What this feature adds: when a L0 block based table file is opened, if pin_l0_filter_and_index_blocks_in_cache is true in the options (and prefetch_index_and_filter is true), then the filter and index blocks aren't released back to the block cache at the end of BlockBasedTableReader::Open(). Instead the table reader takes ownership of them, hence pinning them, ie. the LRU cache will never push them out. Meanwhile in the table reader, further accesses will not hit the block cache, thus avoiding lock contention.
Test Plan:
'export TEST_TMPDIR=/dev/shm/ && DISABLE_JEMALLOC=1 OPT=-g make all valgrind_check -j32' is OK.
I didn't run the Java tests, I don't have Java set up on my devserver.
Reviewers: sdong
Reviewed By: sdong
Subscribers: andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D56133
Summary: Change some RocksDB default options to make it more friendly to server workloads.
Test Plan: Run all existing tests
Reviewers: yhchiang, IslamAbdelRahman
Reviewed By: IslamAbdelRahman
Subscribers: sumeet, muthu, benj, MarkCallaghan, igor, leveldb, andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D55941
Summary: Something changed and the special charactor seems to be conflict with an exisitng value. Change it to unblock the build.
Test Plan: Run the test and make sure it passes
Reviewers: kradhakrishnan
Reviewed By: kradhakrishnan
Subscribers: leveldb, andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D55845
Summary:
When a block based table file is opened, if prefetch_index_and_filter is true, it will prefetch the index and filter blocks, putting them into the block cache.
What this feature adds: when a L0 block based table file is opened, if pin_l0_filter_and_index_blocks_in_cache is true in the options (and prefetch_index_and_filter is true), then the filter and index blocks aren't released back to the block cache at the end of BlockBasedTableReader::Open(). Instead the table reader takes ownership of them, hence pinning them, ie. the LRU cache will never push them out. Meanwhile in the table reader, further accesses will not hit the block cache, thus avoiding lock contention.
When the table reader is destroyed, it releases the pinned blocks (if there were any). This has to happen before the cache is destroyed, so I had to introduce a TableReader::Close(), to guarantee the order of destruction.
Test Plan:
Added two unit tests for this. Existing unit tests run fine (default is pin_l0_filter_and_index_blocks_in_cache=false).
DISABLE_JEMALLOC=1 OPT=-g make all valgrind_check -j32
Mac: OK.
Linux: with D55287 patched in it's OK.
Reviewers: sdong
Reviewed By: sdong
Subscribers: andrewkr, leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D54801