Summary:
Add kSstFileTier to ReadTier, which allows Get and MultiGet to
read only directly from SST files and skip mem-tables.
kSstFileTier = 0x2 // data in SST files.
// Note that this ReadTier currently only supports
// Get and MultiGet and does not support iterators.
Test Plan: add new test in db_test.
Reviewers: anthony, IslamAbdelRahman, rven, kradhakrishnan, sdong
Reviewed By: sdong
Subscribers: igor, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D53511
Summary: MyRocks wants to be able to un-lock a key that was just locked by GetForUpdate(). To do this safely, I am now keeping track of the number of reads(for update) and writes for each key in a transaction. UndoGetForUpdate() will only unlock a key if it hasn't been written and the read count reaches 0.
Test Plan: more unit tests
Reviewers: igor, rven, yhchiang, spetrunia, sdong
Reviewed By: spetrunia, sdong
Subscribers: spetrunia, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D47043
Summary: Default crash test uses prefix hash memtable, which is not compatible to concurrent memtable. Allow prefix test run with skip list and use skip list memtable when concurrent insert is used.
Test Plan: Run "python -u tools/db_crashtest.py whitebox" and watch sometimes skip list is used.
Reviewers: anthony, yhchiang, kradhakrishnan, andrewkr, IslamAbdelRahman
Reviewed By: IslamAbdelRahman
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D53907
Summary: Previous commit introduces a test that is not supported in LITE. Fix it.
Test Plan: Build the test with ROCKSDB_LITE.
Reviewers: kradhakrishnan, IslamAbdelRahman, anthony, yhchiang, andrewkr
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D53901
Summary: If users turn on concurrent insert but the memtable doesn't support it, they might see unexcepted crash. Fix it by explicitly fail.
Test Plan:
Run different setting of stress_test and make sure it fails correctly.
Will add a unit test too.
Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, andrewkr, ngbronson
Reviewed By: ngbronson
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D53895
Summary: This set of changes is part of the work to introduce benchmark for universal style compaction in RocksDB. It's conceptually separate from the compaction work, so sending it out as a separate diff to get it out of the way.
Test Plan:
- Run `./tools/run_flash_bench.sh`.
- Look at the contents of `report.txt` and `report2.txt` to make sure that data is reported and attributed correctly.
- During `db_bench` execution time make sure that the correct flags are passed to `--disable_wal` depending on the benchmark being executed.
Reviewers: MarkCallaghan
Reviewed By: MarkCallaghan
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D53865
Summary: Time to cut branch for release 4.5. Change the versions.
Test Plan: Not needed
Reviewers: IslamAbdelRahman, yhchiang, kradhakrishnan, andrewkr, anthony
Reviewed By: anthony
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D53883
Summary:
copy from task 8196669:
1) Optimistic transactions do not support batching writes from different threads.
2) Pessimistic transactions do not support batching writes if an expiration time is set.
In these 2 cases, we currently do not do any write batching in DBImpl::WriteImpl() because there is a WriteCallback that could decide at the last minute to abort the write. But we could support batching write operations with callbacks if we make sure to process the callbacks correctly.
To do this, we would first need to modify write_thread.cc to stop preventing writes with callbacks from being batched together. Then we would need to change DBImpl::WriteImpl() to call all WriteCallback's in a batch, only write the batches that succeed, and correctly set the state of each batch's WriteThread::Writer.
Test Plan: Added test WriteWithCallbackTest to write_callback_test.cc which creates multiple client threads and verifies that writes are batched and executed properly.
Reviewers: hermanlee4, anthony, ngbronson
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D52863
Summary: Add a new option to BlockBasedTableOptions that will allow us to change the restart interval for the index block
Test Plan: unit tests
Reviewers: yhchiang, anthony, andrewkr, sdong
Reviewed By: sdong
Subscribers: march, dhruba
Differential Revision: https://reviews.facebook.net/D53721
Summary: Add an option of --allow_concurrent_memtable_write in stress test and cover it in crash test
Test Plan: Run crash test and make sure three combinations of the two options show up randomly.
Reviewers: IslamAbdelRahman, yhchiang, andrewkr, anthony, kradhakrishnan
Reviewed By: kradhakrishnan
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D53811
Summary:
Remove obolete references to files in src.mk
Fix incorrect path for reference in source.mk
Test Plan: Ran build to ensure changes do not break anything.
Reviewers: leveldb, sdong
Reviewed By: sdong
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D53733
Summary:
When making environment specific changes, it is better to run all CI
tests. This diff provides a mechanism to do that
Format is:
ROCKSDB_CHECK_ALL=1 arc diff
Test Plan: Submit request for diff
Reviewers: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D53631
Summary: unit_481 is misspelt. Fixing it.
Test Plan: Running make commit_prereq
Reviewers: leveldb
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D53757
Summary:
InlineSkipList::InsertConcurrently should invalidate the
sequential-insertion cache prev_[] for all inserts of multi-level nodes,
not just those that increase the height of the skip list. The invariant
for prev_ is that prev_[i] (i > 0) is supposed to be the predecessor of
prev_[0] at level i. Before this diff InsertConcurrently could violate
this constraint when inserting a multi-level node after prev_[i] but
before prev_[0].
This diff also reenables kConcurrentSkipList as db_test's
MultiThreaded/MultiThreadedDBTest.MultiThreaded/29.
Test Plan:
1. unit tests
2. temporarily hack kConcurrentSkipList timing so that it is fast but has a 1.5% failure rate on my dev box (1ms stagger on thread launch, 1s test duration, failure rate baseline over 1000 runs)
3. observe 1000 passes post-fix
Reviewers: igor, sdong
Reviewed By: sdong
Subscribers: MarkCallaghan, dhruba
Differential Revision: https://reviews.facebook.net/D53751
Summary:
Some of the tests aren't considered to be critical when it comes to getting key benchmarking data for RocksDB. Therefore we'll introduce an environment variable `SKIP_LOW_PRI_TESTS` which enables skipping those test cases. By default all the tests will be run. If you want to optimize the test-case execution then do the following:
`
$ export SKIP_LOW_PRI_TESTS=1
$ ./tools/run_flash_bench.sh
`
Test Plan: Verified that when `SKIP_LOW_PRI_TESTS` is not set then `benchmark.sh` is called for all the scenarios and when `SKIP_LOW_PRI_TESTS` is set to `1` then `benchmark.sh` is called only for the test-cases which are critical.
Reviewers: MarkCallaghan
Reviewed By: MarkCallaghan
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D53739
Summary:
Before this diff, there were duplicated constants to refer to properties (user-
facing API had strings and InternalStats had an enum). I noticed these were
inconsistent in terms of which constants are provided, names of constants, and
documentation of constants. Overall it seemed annoying/error-prone to maintain
these duplicated constants.
So, this diff gets rid of InternalStats's constants and replaces them with a map
keyed on the user-facing constant. The value in that map contains a function
pointer to get the property value, so we don't need to do string matching while
holding db->mutex_. This approach has a side benefit of making many small
handler functions rather than a giant switch-statement.
Test Plan: db_properties_test passes, running "make commit-prereq -j32"
Reviewers: sdong, yhchiang, kradhakrishnan, IslamAbdelRahman, rven, anthony
Reviewed By: anthony
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D53253
Summary:
Doing inline checking of transaction expiration instead of
using a callback.
Test Plan: To be added
Reviewers: anthony
Reviewed By: anthony
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D53673
Summary: Disable test that is intermittently failing
Test Plan: unit tests
Reviewers: igor, andrewkr, sdong
Reviewed By: sdong
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D53715
Summary:
db/c_test.c uses the functions in db/c.cc. If we have tags generated
for one but not the other, it's easy to make mistakes like updating a function
signature and missing a call site.
Test Plan:
$ make tags
in vim:
:cscope find s rocksdb_options_set_compression_options
...
3 325 db/c_test.c <<main>>
rocksdb_options_set_compression_options(options, -14, -1, 0);
Reviewers: sdong, yhchiang, IslamAbdelRahman
Reviewed By: IslamAbdelRahman
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D53685
Summary:
Concurrent memtable adds were incorrectly computing
the last sequence number for a write batch group when the
write batches were not solitary. This is the cause of
https://github.com/facebook/mysql-5.6/issues/155
Test Plan:
1. unit tests
2. new unit test
3. parallel db_bench stress tests with batch size of 10 and asserts enabled
Reviewers: igor, sdong
Reviewed By: sdong
Subscribers: IslamAbdelRahman, MarkCallaghan, dhruba
Differential Revision: https://reviews.facebook.net/D53595
Summary: Travis CI fails most of the times because of timing out. To unblock it, disable LITE tests in Travis CI.
Test Plan: Will see.
Reviewers: anthony, yhchiang, kradhakrishnan, IslamAbdelRahman, andrewkr
Reviewed By: andrewkr
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D53679
Summary:
Add unit tests:
(1) insert entries of 8MB key and 3GB value to DB
(2) insert entry of 3GB key and 3GB value into write batch and make sure we can read it.
(3) insert 3 billions of key-value pairs into write batch and make sure we can read it.
Disable them because not all platform can run it.
Test Plan: Run the tests
Reviewers: IslamAbdelRahman, yhchiang, kradhakrishnan, andrewkr, anthony
Reviewed By: anthony
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D53619
Summary:
Following up on D53493, we can still enable the filter-skipping
optimization for last file in L0. It's correct to assume the key will be present
in the last L0 file when we're hit-optimized and L0 is deepest.
The FilePicker encapsulates the state for traversing each level's files, so I
needed to make it expose whether the returned file is last in its level.
Test Plan:
verified below test fails before this patch and passes afterwards.
The change to how the test memtable is populated is needed so file 1 has keys
(0, 30, 60), file 2 has keys (10, 40, 70), etc.
$ ./db_universal_compaction_test --gtest_filter=UniversalCompactionNumLevels/DBTestUniversalCompaction.OptimizeFiltersForHits/*
Reviewers: sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D53583
This change impacts only non-buffered I/O on Windows.
Currently, there is a buffer per RandomAccessFile
instance that is protected by a lock. The reason we
maintain the buffer is non-buffered I/O requires an aligned
buffer to work.
XPerf traces demonstrate that we accumulate a considerable
wait time while waiting for that lock.
This change enables to set random access buffer size to zero
which would indicate a per request allocation.
We are expecting that allocation expense would be much less than
I/O costs plus wait time due to the fact that the memory heap
would tend to re-use page aligned allocations especially with the
use of Jemalloc.
This change does not affect buffer use as a read_ahead_buffer for
compaction purposes.
Summary:
Fixed the asan error on column_family_test caused by not disabling
SyncPoint.
Test Plan: column_family_test
Reviewers: anthony, rven, kradhakrishnan, sdong, IslamAbdelRahman
Reviewed By: IslamAbdelRahman
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D53505
Summary:
If options.base_background_compactions is given, we try to schedule number of compactions not existing this number, only when L0 files increase to certain number, or pending compaction bytes more than certain threshold, we schedule compactions based on options.max_background_compactions.
The watermarks are calculated based on slowdown thresholds.
Test Plan:
Add new test cases in column_family_test.
Adding more unit tests.
Reviewers: IslamAbdelRahman, yhchiang, kradhakrishnan, rven, anthony
Reviewed By: anthony
Subscribers: leveldb, dhruba, yoshinorim
Differential Revision: https://reviews.facebook.net/D53409
Summary: Now slowing down for the last mem table takes priority against some stopping conditions. This is logically confusing. Fix it.
Test Plan: Run all existing tests.
Reviewers: yhchiang, IslamAbdelRahman, kradhakrishnan, andrewkr, anthony
Reviewed By: anthony
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D53529
Summary:
Add a new class SstFileTracker that will be notified whenever a DB add/delete/move and sst file, it will also replace DeleteScheduler
SstFileTracker can be used later to abort writes when we exceed a specific size
Test Plan: unit tests
Reviewers: rven, anthony, yhchiang, sdong
Reviewed By: sdong
Subscribers: igor, lovro, march, dhruba
Differential Revision: https://reviews.facebook.net/D50469
Summary: It's a regression bug caused by e089db40f9. With the change, if options.optimize_filters_for_hits=true and there are only L0 files (like single level universal compaction), we skip all the files in L0, which is more than necessary. Fix it by always trying to query bloom filter for files in level 0.
Test Plan: Add a unit test for it.
Reviewers: anthony, rven, yhchiang, IslamAbdelRahman, kradhakrishnan, andrewkr
Reviewed By: andrewkr
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D53493