Summary: Change some RocksDB default options to make it more friendly to server workloads.
Test Plan: Run all existing tests
Reviewers: yhchiang, IslamAbdelRahman
Reviewed By: IslamAbdelRahman
Subscribers: sumeet, muthu, benj, MarkCallaghan, igor, leveldb, andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D55941
Summary:
Introude SstFileManager::SetMaxAllowedSpaceUsage() that can be used to limit the maximum space usage allowed for RocksDB.
When this limit is exceeded WriteImpl() will fail and return Status::Aborted()
Test Plan: unit testing
Reviewers: yhchiang, anthony, andrewkr, sdong
Reviewed By: sdong
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D53763
Summary: Add a new option to BlockBasedTableOptions that will allow us to change the restart interval for the index block
Test Plan: unit tests
Reviewers: yhchiang, anthony, andrewkr, sdong
Reviewed By: sdong
Subscribers: march, dhruba
Differential Revision: https://reviews.facebook.net/D53721
Summary:
Concurrent memtable adds were incorrectly computing
the last sequence number for a write batch group when the
write batches were not solitary. This is the cause of
https://github.com/facebook/mysql-5.6/issues/155
Test Plan:
1. unit tests
2. new unit test
3. parallel db_bench stress tests with batch size of 10 and asserts enabled
Reviewers: igor, sdong
Reviewed By: sdong
Subscribers: IslamAbdelRahman, MarkCallaghan, dhruba
Differential Revision: https://reviews.facebook.net/D53595
Summary:
Add a new class SstFileTracker that will be notified whenever a DB add/delete/move and sst file, it will also replace DeleteScheduler
SstFileTracker can be used later to abort writes when we exceed a specific size
Test Plan: unit tests
Reviewers: rven, anthony, yhchiang, sdong
Reviewed By: sdong
Subscribers: igor, lovro, march, dhruba
Differential Revision: https://reviews.facebook.net/D50469
Summary: Break down DBTest.Randomized to multiple gtest tests based on config type
Test Plan: Run the test and all tests. Make sure configurations are correctly set
Reviewers: yhchiang, IslamAbdelRahman, rven, kradhakrishnan, andrewkr, anthony
Reviewed By: anthony
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D53247
Summary: Skip list now cannot estimate memory across allocators
consistently and hence triggers flush at different time. This breaks certain
unit tests.
The fix is to adopt key count instead of size for flush.
Test Plan: Ran test on dev box and mac (where it used to fail)
Reviewers: sdong
CC: leveldb@
Task ID: #9273334
Blame Rev:
Summary: DBTest.SuggestCompactRangeTest fails for the case when jemalloc is disabled, including ASAN and valgrind builds. It is caused by the improvement of skip list, which allocates different size of nodes for a new records. Fix it by using a special mem table that triggers a flush by number of entries. In that way the behavior will be consistent for all allocators.
Test Plan: Run the test with both of DISABLE_JEMALLOC=1 and 0
Reviewers: anthony, rven, yhchiang, kradhakrishnan, igor, IslamAbdelRahman
Reviewed By: IslamAbdelRahman
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D51423
* conversion from 'size_t' to 'type', by add static_cast
Tested:
* by build solution on Windows, Linux locally,
* run tests
* build CI system successful
Summary: SpecialEnv::time_elapse_only_sleep_ is not initialized, which might cause some test failures. Fix it.
Test Plan: Run some unit tests. Since tests already broken. Might want to commit it sooner.
Reviewers: IslamAbdelRahman, yhchiang, anthony
Reviewed By: anthony
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D50937
Summary:
This patch allows rocksdb to persist options into a file on
DB::Open, SetOptions, and Create / Drop ColumnFamily.
Options files are created under the same directory as the rocksdb
instance.
In addition, this patch also adds a fail_if_missing_options_file in DBOptions
that makes any function call return non-ok status when it is not able to
persist options properly.
// If true, then DB::Open / CreateColumnFamily / DropColumnFamily
// / SetOptions will fail if options file is not detected or properly
// persisted.
//
// DEFAULT: false
bool fail_if_missing_options_file;
Options file names are formatted as OPTIONS-<number>, and RocksDB
will always keep the latest two options files.
Test Plan:
Add options_file_test.
options_test
column_family_test
Reviewers: igor, IslamAbdelRahman, sdong, anthony
Reviewed By: anthony
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D48285
Summary:
Using a TLS random instance for skiplist makes it smaller
(useful for hash_skiplist_rep) and prepares skiplist for concurrent
adds. This diff also modifies the branching factor math to avoid an
unnecessary division.
This diff has the effect of changing the sequence of skip list node
height choices made by tests, so it has the potential to cause unit
test failures for tests that implicitly rely on the exact structure
of the skip list. Tests that try to exactly trigger a compaction are
likely suspects for this problem (these tests have always been brittle to
changes in the skiplist details). I've minimizes this risk by reseeding
the main thread's Random at the beginning of each test, increasing the
universal compaction size_ratio limit from 101% to 105% for some tests,
and verifying that the tests pass many times.
Test Plan: for i in `seq 0 9`; do make check; done
Reviewers: sdong, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D50439
Summary:
This diff exclude alot of tests in db_test that are not compiling / failing under ROCKSD_LITE
Test Plan:
OPT=-DROCKSDB_LITE make check -j64
make check -j64
Reviewers: yhchiang, igor, sdong
Reviewed By: sdong
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D48771
Summary: db_test_util is used in multiple test files but it dont compile under ROCKSDB_LITE
Test Plan:
make check
make static_lib
OPT=-DROCKSDB_LITE make db_wal_test
Reviewers: igor, yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D48579
Summary:
Separate a new class InternalIterator from class Iterator, when the look-up is done internally, which also means they operate on key with sequence ID and type.
This change will enable potential future optimizations but for now InternalIterator's functions are still the same as Iterator's.
At the same time, separate the cleanup function to a separate class and let both of InternalIterator and Iterator inherit from it.
Test Plan: Run all existing tests.
Reviewers: igor, yhchiang, anthony, kradhakrishnan, IslamAbdelRahman, rven
Reviewed By: rven
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D48549
Summary:
Long time ago we add InternalDumpCommand to ldb_tool https://reviews.facebook.net/D11517
This command is using TEST_NewInternalIterator although it's not a test. This patch move TEST_NewInternalIterator outside of db_impl_debug.cc
Test Plan:
make check
make static_lib
Reviewers: yhchiang, igor, sdong
Reviewed By: sdong
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D48561
Summary:
As part of tech debt week, we are cleaning up dependencies.
This diff moves db_test_util.[h,cc] from util to db directory.
Test Plan: make check
Reviewers: igor, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D48543
Summary:
Handle SST files with both ".sst" and ".ldb" suffix.
This enables user to migrate from leveldb to rocksdb.
Test Plan:
Added unit test with DB operating on SSTs with names schema.
See db/dc_test.cc:SSTsWithLdbSuffixHandling for details
Reviewers: yhchiang, sdong, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D48003
Summary:
PlainTableReader now only allows mmap-mode. Add the support to non-mmap mode for more flexibility.
Refactor the codes to move all logic of reading data to PlainTableKeyDecoder, and consolidate the calls to Read() call and ReadVarint32() call. Implement the calls for both of mmap and non-mmap case seperately. For non-mmap mode, make copy of keys in several places when we need to move the buffer after reading the keys.
Test Plan: Add the mode of non-mmap case in plain_table_db_test. Run it in valgrind mode too.
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D47187
Summary:
This patch fixes#7460559. It introduces SingleDelete as a new database
operation. This operation can be used to delete keys that were never
overwritten (no put following another put of the same key). If an overwritten
key is single deleted the behavior is undefined. Single deletion of a
non-existent key has no effect but multiple consecutive single deletions are
not allowed (see limitations).
In contrast to the conventional Delete() operation, the deletion entry is
removed along with the value when the two are lined up in a compaction. Note:
The semantics are similar to @igor's prototype that allowed to have this
behavior on the granularity of a column family (
https://reviews.facebook.net/D42093 ). This new patch, however, is more
aggressive when it comes to removing tombstones: It removes the SingleDelete
together with the value whenever there is no snapshot between them while the
older patch only did this when the sequence number of the deletion was older
than the earliest snapshot.
Most of the complex additions are in the Compaction Iterator, all other changes
should be relatively straightforward. The patch also includes basic support for
single deletions in db_stress and db_bench.
Limitations:
- Not compatible with cuckoo hash tables
- Single deletions cannot be used in combination with merges and normal
deletions on the same key (other keys are not affected by this)
- Consecutive single deletions are currently not allowed (and older version of
this patch supported this so it could be resurrected if needed)
Test Plan: make all check
Reviewers: yhchiang, sdong, rven, anthony, yoshinorim, igor
Reviewed By: igor
Subscribers: maykov, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D43179
Summary:
In some cases, equality comparisons can be done more efficiently than three-way
comparisons. There are quite a few places in the code where we only care about
equality. This patch adds an Equal() method that defaults to using the
Compare() method.
Test Plan: make clean all check
Reviewers: rven, anthony, yhchiang, igor, sdong
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D46233
Summary:
Now that the approach to parallelizing L0-L1 level-based
compactions by breaking the compaction job into subcompactions is
being extended to apply to universal compactions as well, the unit
tests need to account for this and run the universal compaction
tests with subcompactions both enabled and disabled.
Test Plan: make all && make check
Reviewers: sdong, igor, noetzli, anthony, yhchiang
Reviewed By: yhchiang
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D45657
Summary: malloc_usable_size() gets a better estimation of memory usage. It is already used to calculate block cache memory usage. Use it in arena too.
Test Plan: Run all unit tests
Reviewers: anthony, kradhakrishnan, rven, IslamAbdelRahman, yhchiang
Reviewed By: yhchiang
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D43317
Summary:
ReadaheadRandomAccessFile acts as a transparent layer on top of RandomAccessFile. When a Read() request is issued, it issues a much bigger request to the OS and caches the result. When a new request comes in and we already have the data cached, it doesn't have to issue any requests to the OS.
We add ReadaheadRandomAccessFile layer only when file is read during compactions.
D45105 was incorrectly closed by Phabricator because I committed it to a separate branch (not master), so I'm resubmitting the diff.
Test Plan: make check
Reviewers: MarkCallaghan, sdong
Reviewed By: sdong
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D45123
Summary:
See task #7983654. The example was triggering an assert in compaction job
because the compaction was not marked as manual. With this patch,
CompactionPicker::FormCompaction() marks compactions as manual. This patch
also fixes a couple of typos, adds optimistic_transaction_example to
.gitignore and librocksdb as a dependency for examples. Adding librocksdb as
a dependency makes sure that the examples are built with the latest changes
in librocksdb.
Test Plan: make clean && cd examples && make all && ./compact_files_example
Reviewers: rven, sdong, anthony, igor, yhchiang
Reviewed By: yhchiang
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D45117
Summary:
Up until this point we had DbOptions.num_subcompactions, but
it is semantically more correct to call this max_subcompactions since
we will schedule *up to* DbOptions.max_subcompactions smaller compactions
at a time during a compaction job.
I also added a --subcompactions option to db_bench
Test Plan: make all make check
Reviewers: sdong, igor, anthony, yhchiang
Reviewed By: yhchiang
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D45069
Summary: Currently compaction inputs share the same file descriptor and table reader as other foreground threads. It makes fadvise works less predictable. Add options.new_table_reader_for_compaction_inputs to enforce to create a new file descriptor and new table reader for it.
Test Plan: Add the option.
Reviewers: rven, anthony, kradhakrishnan, IslamAbdelRahman, igor, yhchiang
Reviewed By: igor
Subscribers: igor, MarkCallaghan, leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D43311
Summary:
Updated DBTest DBCompactionTest and CompactionJobStatsTest
to run compaction-related tests once with subcompactions enabled and
once disabled using the TEST_P test type in the Google Test suite.
Test Plan: ./db_test ./db_compaction-test ./compaction_job_stats_test
Reviewers: sdong, igor, anthony, yhchiang
Reviewed By: yhchiang
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D43443
Summary:
UpdateAccumulatedStats() is used to optimize compaction decision
esp. when the number of deletion entries are high, but this function
can slowdown DBOpen esp. in disk environment.
This patch adds DBOptions::skip_sats_update_on_db_open, which skips
UpdateAccumulatedStats() in DB::Open() time when it's set to true.
Test Plan: Add DBCompactionTest.SkipStatsUpdateTest
Reviewers: igor, anthony, IslamAbdelRahman, sdong
Reviewed By: sdong
Subscribers: tnovak, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D42843
Summary:
As of now compactions involving files from Level 0 and Level 1 are single
threaded because the files in L0, although sorted, are not range partitioned like
the other levels. This means that during L0-L1 compaction each file from L1
needs to be merged with potentially all the files from L0.
This attempt to parallelize the L0-L1 compaction assigns a thread and a
corresponding iterator to each L1 file that then considers only the key range
found in that L1 file and only the L0 files that have those keys (and only the
specific portion of those L0 files in which those keys are found). In this way
the overlap is minimized and potentially eliminated between different iterators
focusing on the same files.
The first step is to restructure the compaction logic to break L0-L1 compactions
into multiple, smaller, sequential compactions. Eventually each of these smaller
jobs will be run simultaneously. Areas to pay extra attention to are
# Correct aggregation of compaction job statistics across multiple threads
# Proper opening/closing of output files (make sure each thread's is unique)
# Keys that span multiple L1 files
# Skewed distributions of keys within L0 files
Test Plan: Make and run db_test (newer version has separate compaction tests) and compaction_job_stats_test
Reviewers: igor, noetzli, anthony, sdong, yhchiang
Reviewed By: yhchiang
Subscribers: MarkCallaghan, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D42699
Summary: Directly using TMPDIR can cause problems when running tests using parallel option. Fix them.
Test Plan: Run all tests in parallel
Reviewers: kradhakrishnan, yhchiang, IslamAbdelRahman, anthony
Reviewed By: anthony
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D42807
Summary:
When we first started, max_background_flushes was 0 by default and compaction thread was executing flushes (since there was no flush thread). Then, we switched the default max_background_flushes to 1. However, we still support the case where there is no flush thread and flushes are done in compaction. This is making our code a bit more complicated. By not supporting this use-case we can make our code simpler.
We have a special case that when you set max_background_flushes to 0, we
schedule the flush to execute on the compaction thread.
Test Plan: make check (there might be some unit tests that depend on this behavior)
Reviewers: IslamAbdelRahman, yhchiang, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D41931
Summary:
Trvial move in universal compaction was failing when trying to move files from levels other than 0.
This was because the DeleteFile while trivially moving, was only deleting files of level 0 which caused duplication of same file in different levels.
This is fixed by passing the right level as argument in the call of DeleteFile while doing trivial move.
Test Plan: ./db_test ran successfully with the new test cases.
Reviewers: sdong
Reviewed By: sdong
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D42135
Summary: Make TransactionLogIterator related tests from db_test.cc to db_log_iter_test.cc
Test Plan:
db_test
db_log_iter_test
Reviewers: sdong, IslamAbdelRahman, igor, anthony
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D42045
Summary: This option is guarding the feature implemented 2 and a half years ago: D8991. The feature was enabled by default back then and has been running without issues. There is no reason why any client would turn this feature off. I found no reference in fbcode.
Test Plan: none
Reviewers: sdong, yhchiang, anthony, dhruba
Reviewed By: dhruba
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D42063
Summary:
Move reusable part of db_test.cc to util/db_test_util.h.
This makes it more possible to partition db_test.cc into
multiple smaller test files.
Also, fixed many old lint errors in db_test.
Test Plan: db_test
Reviewers: igor, anthony, IslamAbdelRahman, sdong, kradhakrishnan
Reviewed By: sdong, kradhakrishnan
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D41973