Summary:
This diff has two fixes.
1. Fix the bug where compaction does not fail when RocksDB can't create a new file.
2. When NewWritableFiles() fails in OpenCompactionOutputFiles(), previously such fail-to-created file will be still be included as a compaction output. This patch also fixes this bug.
3. Allow VersionEdit::EncodeTo() to return Status and add basic check.
Test Plan:
./version_edit_test
export ROCKSDB_TESTS=FileCreationRandomFailure
./db_test
Reviewers: ljin, sdong, nkg-, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D25581
Summary: RocksDB already depends on C++11, so we might as well all the goodness that C++11 provides. This means that we don't need AtomicPointer anymore. The less things in port/, the easier it will be to port to other platforms.
Test Plan: make check + careful visual review verifying that NoBarried got memory_order_relaxed, while Acquire/Release methods got memory_order_acquire and memory_order_release
Reviewers: rven, yhchiang, ljin, sdong
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D27543
Summary: as title
Test Plan: unit test
Reviewers: sdong, yhchiang, rven, igor
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D25347
Summary: as title
Test Plan: n/a
Reviewers: yhchiang, sdong, igor
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D24963
Summary:
This is not a critical options. Making it dynamic so that we can remove
more reference to cfd->options()
Test Plan: unit test
Reviewers: yhchiang, sdong, igor
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D24957
Summary:
Fix the following Mac compile error.
db/db_test.cc:8686:52: error: C++11 forbids default arguments for lambda expressions [-Werror,-Wlambda-extensions]
auto gen_l0_kb = [this](int start, int size, int stride = 1) {
^ ~
Test Plan:
db_test
Summary: as title
Test Plan: make release
Reviewers: igor
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D25029
Summary: as title
Test Plan:
unit test
I am only able to build the test case for hard_rate_limit.
soft_rate_limit is essentially the same thing as hard_rate_limit
Reviewers: igor, sdong, yhchiang
Reviewed By: yhchiang
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D24759
Summary: Add more tests as well
Test Plan: unit test
Reviewers: igor, sdong, yhchiang
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D24747
Summary: as title
Test Plan: unit test
Reviewers: sdong, yhchiang, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D24729
Summary:
Fixed the following compile error on Mac.
db/db_test.cc:8618:52: error: C++11 forbids default arguments for lambda expressions [-Werror,-Wlambda-extensions]
auto gen_l0_kb = [this](int start, int size, int stride = 1) {
^ ~
1 error generated.
Test Plan:
db_test
Summary:
The test only covers changing write_buffer_size. Other changable
parameters such bloom bits/probes are not obvious how to test.
Suggestions are welcome
Test Plan: db_test
Reviewers: sdong, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D24429
Summary:
Fixed signed-unsigned comparison warning in db_test.cc
db/db_test.cc:8606:3: note: in instantiation of function template specialization 'rocksdb::test::Tester::IsEq<int, unsigned long>' requested here
ASSERT_EQ(2, metadata.size());
^
Test Plan:
make db_test
Summary:
make compaction related options changeable. Most of changes are tedious,
following the same convention: grabs MutableCFOptions at the beginning
of compaction under mutex, then pass it throughout the job and register
it in SuperVersion at the end.
Test Plan: make all check
Reviewers: igor, yhchiang, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D23349
Fix for:
[db/db_test.cc:6141]: (performance) Function parameter
'key' should be passed by reference.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Summary:
Add the MultiGet API to allow prefetching.
With file size of 1.5G, I configured it to have 0.9 hash ratio that can
fill With 115M keys and result in 2 hash functions, the lookup QPS is
~4.9M/s vs. 3M/s for Get().
It is tricky to set the parameters right. Since files size is determined
by power-of-two factor, that means # of keys is fixed in each file. With
big file size (thus smaller # of files), we will have more chance to
waste lot of space in the last file - lower space utilization as a
result. Using smaller file size can improve the situation, but that
harms lookup speed.
Test Plan: db_bench
Reviewers: yhchiang, sdong, igor
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D23673
Summary:
Add a CompactedDBImpl that will enabled when calling OpenForReadOnly()
and the DB only has one level (>0) of files. As a performan comparison,
CuckooTable performs 2.1M/s with CompactedDBImpl vs. 1.78M/s with
ReadOnlyDBImpl.
Test Plan: db_bench
Reviewers: yhchiang, igor, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D23553
Summary: Those were introduced with 2fb1fea30f because the flushing behavior changed when max_background_flushes is > 0.
Test Plan: make check
Reviewers: ljin, yhchiang, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D23577
Summary: We currently don't test mmap reads as part of db_test. Piggyback it on kWalDir test config.
Test Plan: make check
Reviewers: ljin, sdong, yhchiang
Reviewed By: yhchiang
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D23337
Summary:
1. wrap a filter policy like what fbcode/multifeed/rocksdb/MultifeedRocksDbKey.h
to ensure that rocksdb works fine after filterpolicy interface change
Test Plan: 1. valgrind ./bloom_test
Reviewers: ljin, igor, yhchiang, dhruba, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D23229
Summary: Get valgrind to stop complaining about uninitialized value
Test Plan: valgrind not complaining anymore
Reviewers: sdong, yhchiang, ljin
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D23289
Summary: The test makes sure that we don't call flush too often. For that, it's ok to check if we have less than 10 table files. Otherwise, the test is flaky because it's hard to estimate number of entries in the memtable before it gets flushed (any ideas?)
Test Plan: Still works, but hopefully less flaky.
Reviewers: ljin, sdong, yhchiang
Reviewed by: yhchiang
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D23241
Summary:
When memtable is full it calls the registered callback. That callback then registers column family as needing the flush. Every write checks if there are some column families that need to be flushed. This completely eliminates the need for MakeRoomForWrite() function and simplifies our Write code-path.
There is some complexity with the concurrency when the column family is dropped. I made it a bit less complex by dropping the column family from the write thread in https://reviews.facebook.net/D22965. Let me know if you want to discuss this.
Test Plan: make check works. I'll also run db_stress with creating and dropping column families for a while.
Reviewers: yhchiang, sdong, ljin
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D23067
Summary:
See t5106397.
Also, few more changes:
1. in unit tests, the assumption is that writes will be dropped when there is no space left on device. I changed the wording around it.
2. InvalidArgument() errors are only when user-provided arguments are invalid. When the file is corrupted, we need to return Status::Corruption
Test Plan: make check
Reviewers: sdong, ljin
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D23145
Summary:
In column family's SanitizeOptions() [1], we make sure that min_write_buffer_number_to_merge is normal value. However, this test depended on the fact that setting min_write_buffer_number_to_merge to be bigger than max_write_buffer_number will cause a deadlock. I'm not sure how it worked before.
This diff fixes it by scheduling sleeping background task, which will actually block any attempts of flushing.
[1] https://github.com/facebook/rocksdb/blob/master/db/column_family.cc#L104
Test Plan: the test works now
Reviewers: yhchiang, sdong, ljin
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D23103
Summary: Avoid creating unnecessary sst files while db opening
Test Plan: make all check
Reviewers: sdong, igor
Reviewed By: igor
Subscribers: zagfox, yhchiang, ljin, leveldb
Differential Revision: https://reviews.facebook.net/D20661
Summary: ...
Test Plan: Can't repro the test failure, but let's see what jenkins says
Reviewers: zagfox, sdong, ljin
Reviewed By: sdong, ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D23061
Summary:
When we have multiple column families, users can issue Flush() on every column families to make sure everything is flushes, even if some of them might be empty. By skipping the waiting for empty cases, it can be greatly speed up.
Still wait for people's comments before writing unit tests for it.
Test Plan: Will write a unit test to make sure it is correct.
Reviewers: ljin, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D22953
Summary:
1. Make filter_block.h a base class. Derive block_based_filter_block and full_filter_block. The previous one is the traditional filter block. The full_filter_block is newly added. It would generate a filter block that contain all the keys in SST file.
2. When querying a key, table would first check if full_filter is available. If not, it would go to the exact data block and check using block_based filter.
3. User could choose to use full_filter or tradional(block_based_filter). They would be stored in SST file with different meta index name. "filter.filter_policy" or "full_filter.filter_policy". Then, Table reader is able to know the fllter block type.
4. Some optimizations have been done for full_filter_block, thus it requires a different interface compared to the original one in filter_policy.h.
5. Actual implementation of filter bits coding/decoding is placed in util/bloom_impl.cc
Benchmark: base commit 1d23b5c470
Command:
db_bench --db=/dev/shm/rocksdb --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --write_buffer_size=134217728 --max_write_buffer_number=2 --target_file_size_base=33554432 --max_bytes_for_level_base=1073741824 --verify_checksum=false --max_background_compactions=4 --use_plain_table=0 --memtablerep=prefix_hash --open_files=-1 --mmap_read=1 --mmap_write=0 --bloom_bits=10 --bloom_locality=1 --memtable_bloom_bits=500000 --compression_type=lz4 --num=393216000 --use_hash_search=1 --block_size=1024 --block_restart_interval=16 --use_existing_db=1 --threads=1 --benchmarks=readrandom —disable_auto_compactions=1
Read QPS increase for about 30% from 2230002 to 2991411.
Test Plan:
make all check
valgrind db_test
db_stress --use_block_based_filter = 0
./auto_sanity_test.sh
Reviewers: igor, yhchiang, ljin, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D20979
Summary: If we drop column family only from (single) write thread, we can be sure that nobody will drop the column family while we're writing (and our mutex is released). This greatly simplifies my patch that's getting rid of MakeRoomForWrite().
Test Plan: make check, but also running stress test
Reviewers: ljin, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D22965
Summary: fixed memory leak in unit test DBIteratorBoundTest
Test Plan: ran valgrind test on my unit test
Reviewers: sdong
Differential Revision: https://reviews.facebook.net/D22911
Summary:
Simply code by removing code path which does not use Arena
from NewInternalIterator
Test Plan:
make all check
make valgrind_check
Reviewers: sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D22395
Summary: In DBImpl::Recover method, while loading memtables, also check if memtables are empty. Use this in DBImplReadonly to determine whether to lookup memtable or not.
Test Plan:
db_test
make check all
Reviewers: sdong, yhchiang, ljin, igor
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D22281
Summary:
I will move compression related options in a separate diff since this
diff is already pretty lengthy.
I guess I will also need to change JNI accordingly :(
Test Plan: make all check
Reviewers: yhchiang, igor, sdong
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D21915
Summary:
Currently, PlainTable must use mmap_reads. When PlainTable is used but
allow_mmap_reads is not set, rocksdb will fail in flush.
This diff improve Options sanitization and add MmapReadRequired() to
TableFactory.
Test Plan:
export ROCKSDB_TESTS=PlainTableOptionsSanitizeTest
make db_test -j32
./db_test
Reviewers: sdong, ljin
Reviewed By: ljin
Subscribers: you, leveldb
Differential Revision: https://reviews.facebook.net/D21939
Summary: A previous change triggered a change by mistake: DestroyDB() will keep info logs under DB directory. Revert the unintended change.
Test Plan: Add a unit test case to verify it.
Reviewers: ljin, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D22209