Summary:
Get() now doesn't make use of bloom filter if it is prefix based. Add the check.
Didn't touch block based bloom filter. I can't fully reason whether it is correct to do that. But it's straight-forward to for full bloom filter.
Test Plan:
make all check
Add a test case in DBTest
Reviewers: rven, yhchiang, igor
Reviewed By: igor
Subscribers: MarkCallaghan, leveldb, dhruba, yoshinorim
Differential Revision: https://reviews.facebook.net/D31941
Summary:
This diff adds BlockBasedTable format_version = 2. New format version brings better compressed block format for these compressions:
1) Zlib -- encode decompressed size in compressed block header
2) BZip2 -- encode decompressed size in compressed block header
3) LZ4 and LZ4HC -- instead of doing memcpy of size_t encode size as varint32. memcpy is very bad because the DB is not portable accross big/little endian machines or even platforms where size_t might be 8 or 4 bytes.
It does not affect format for snappy.
If you write a new database with format_version = 2, it will not be readable by RocksDB versions before 3.10. DB::Open() will return corruption in that case.
Test Plan:
Added a new test in db_test.
I will also run db_bench and verify VSIZE when block_cache == 1GB
Reviewers: yhchiang, rven, MarkCallaghan, dhruba, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D31461
Summary:
In this diff I add another parameter to BlockBasedTableOptions that will let users specify block based table's format. This will greatly simplify block based table's format changes in the future.
First format change that this will support is encoding decompressed size in Zlib and BZip2 blocks. This diff is blocking https://reviews.facebook.net/D31311.
Test Plan: Added a unit tests. More tests to come as part of https://reviews.facebook.net/D31311.
Reviewers: dhruba, MarkCallaghan, yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D31383
Summary: We keep checksum functions in util/, there is no reason for compression to be in port/
Test Plan: compiles
Reviewers: sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D31281
Summary:
Introduces a new class for managing write buffer memory across column
families. We supplement ColumnFamilyOptions::write_buffer_size with
ColumnFamilyOptions::write_buffer, a shared pointer to a WriteBuffer
instance that enforces memory limits before flushing out to disk.
Test Plan: Added SharedWriteBuffer unit test to db_test.cc
Reviewers: sdong, rven, ljin, igor
Reviewed By: igor
Subscribers: tnovak, yhchiang, dhruba, xjin, MarkCallaghan, yoshinorim
Differential Revision: https://reviews.facebook.net/D22581
Summary: Block plain_table_index.cc in ROCKSDB_LITE
Test Plan:
make clean
make OPT=-DROCKSDB_LITE shared_lib -j32
make clean
make shared_lib -j32
Reviewers: ljin, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D29535
Summary:
In some environment such as android, the c++ library does not have
std::to_string. This path adds rocksdb::ToString(), which wraps std::to_string
when std::to_string is not available, and implements std::to_string
in the other case.
Test Plan:
make dbg -j32
./db_test
make clean
make dbg OPT=-DOS_ANDROID -j32
./db_test
Reviewers: ljin, sdong, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D29181
Summary: Created a CompatibleOptions object that can be used as a LevelDB Options object and then converted to a RocksDB Options object using the ConvertOptions() method.
Test Plan: Unit test included in diff.
Reviewers: ljin
Reviewed By: ljin
Subscribers: sdong, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D28893
Summary:
The very last reference happens in DBImpl::GetOptions()
I built with both DBImpl::GetOptions() and ColumnFamilyData::options() commented out
Test Plan: make all check
Reviewers: sdong, yhchiang, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D29073
Summary:
This is just a simple test that passes two files though a compaction. It shows the framework so that people can continue building new compaction *unit* tests.
In the future we might want to move some Compaction* tests from DBTest here. For example, CompactBetweenSnapshot seems a good candidate.
Hopefully this test can be simpler when we mock out VersionSet.
Test Plan: this is a test
Reviewers: ljin, rven, yhchiang, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D28449
Summary:
Make db_stress built for ROCKSDB_LITE.
The test doesn't pass tough. It seg fault quickly. But I took a look and it doesn't seem to be related to lite version. Likely to be a bug inside RocksDB.
Test Plan: make db_stress
Reviewers: yhchiang, rven, ljin, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D28797
Summary: So iOS size_t is 32-bit, so we need to static_cast<size_t> any uint64_t :(
Test Plan: TARGET_OS=IOS make static_lib
Reviewers: dhruba, ljin, yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D28743
Summary:
We need to turn on -Wshorten-64-to-32 for mobile. See D1671432 (internal phabricator) for details.
This diff turns on the warning flag and fixes all the errors. There were also some interesting errors that I might call bugs, especially in plain table. Going forward, I think it makes sense to have this flag turned on and be very very careful when converting 64-bit to 32-bit variables.
Test Plan: compiles
Reviewers: ljin, rven, yhchiang, sdong
Reviewed By: yhchiang
Subscribers: bobbaldwin, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D28689
Summary: Previously I made `make check` work with -Wshadow, but there are some tools that are not compiled using `make check`.
Test Plan: make all
Reviewers: yhchiang, rven, ljin, sdong
Reviewed By: ljin, sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D28497
Summary: It turns out that -Wshadow has different rules for gcc than clang. Previous commit fixed clang. This commits fixes the rest of the warnings for gcc.
Test Plan: compiles
Reviewers: ljin, yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D28131
Summary:
...and fix all the errors :)
Jim suggested turning on -Wshadow because it helped him fix number of critical bugs in fbcode. I think it's a good idea to be -Wshadow clean.
Test Plan: compiles
Reviewers: yhchiang, rven, sdong, ljin
Reviewed By: ljin
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D27711
Summary:
Apply InfoLogLevel to the logs in table/block_based_table_reader.cc
Also, add missing checks for the returned status in BlockBasedTable::Open
Test Plan: make
Reviewers: sdong, ljin, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D28005
Summary: Apply InfoLogLevel to the logs in table/meta_blocks.cc
Test Plan: make
Reviewers: ljin, sdong, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D27903
Summary: Apply InfoLogLevel to the logs in table/plain_table_index.cc
Test Plan: make
Reviewers: ljin, sdong, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D27909
Summary: Apply InfoLogLevel to the logs in table/block_based_table_builder.cc
Test Plan: make
Reviewers: igor, ljin, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D27921
Summary:
This diff replaces BlockBasedTable in flush_job_test with TableMock, making it depend on less things and making it closer to an unit test than integration test.
It also introduces a framework to compile mock classes -- Any file named *mock.cc will not be compiled into the build. It will only get compiled into the tests. What way we can mock out most other classes, Version, VersionSet, DBImpl, etc.
Test Plan: flush_job_test
Reviewers: ljin, rven, yhchiang, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D27681
Summary:
Make inplace_update_support and inplace_update_num_locks dynamic.
inplace_callback becomes immutable
We are almost free of references to cfd->options() in db_impl
Test Plan: unit test
Reviewers: igor, yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D25293
Summary:
This fixes the case that filter policy is missing in SST file, but we
open the table with filter policy on and cache_index_and_filter_blocks =
false. The current behavior is that we will try to load it every time on
Get() but fail.
Test Plan: unit test
Reviewers: yhchiang, igor, rven, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D25455
Summary:
make compaction related options changeable. Most of changes are tedious,
following the same convention: grabs MutableCFOptions at the beginning
of compaction under mutex, then pass it throughout the job and register
it in SuperVersion at the end.
Test Plan: make all check
Reviewers: igor, yhchiang, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D23349
Fix for:
[table/cuckoo_table_reader.cc:196]: (performance) Function
parameter 'target' should be passed by reference.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Use %zu instead of %zd since size_t and uint32_t are unsigned.
Fix for:
[table/plain_table_factory.cc:55]: (warning) %zd in format string (no. 1)
requires 'ssize_t' but the argument type is 'size_t {aka unsigned long}'.
[table/plain_table_factory.cc:58]: (warning) %zd in format string (no. 1)
requires 'ssize_t' but the argument type is 'size_t {aka unsigned long}'.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Fix for:
[table/table_test.cc:1218]: (performance) Function parameter
'prefix' should be passed by reference.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Fix for:
[table/cuckoo_table_reader.cc:198]: (performance) Function
parameter 'file_data' should be passed by reference.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
[table/bloom_block.h:29]: (performance) Function parameter
'keys_hashes' should be passed by reference.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Summary:
Intead of passing callback function pointer and its arg on Table::Get()
interface, passing GetContext. This makes the interface cleaner and
possible better perf. Also adding a fast pass for SaveValue()
Test Plan: make all check
Reviewers: igor, yhchiang, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D24057
Summary:
when I changed std::vector<std::string, std::string> to std::string to
store key/value pairs in builder, I missed the handling for kDeletion
type. As a result, value_size_ can be wrong if the first add key is for
deletion.
The is captured by ./cuckoo_table_db_test
Test Plan:
./cuckoo_table_db_test
./cuckoo_table_reader_test
./cuckoo_table_builder_test
Reviewers: sdong, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D24045
Summary:
cuckoo table iterator creation is quite expensive since it needs to load
all data and sort them. After compaction, RocksDB creates a new iterator
of the new file to make sure it is in good state. That makes the DB
creation quite slow. Delay the iterator db sort to the seek time to
speed it up.
Test Plan: db_bench
Reviewers: igor, yhchiang, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D23775
Summary:
builder currently buffers all key value pairs as a vector of
pair<string, string>. That is too much due to std::string
overhead. It wasn't able to fit 1B key/values (12bytes total) in 100GB
of ram. Switch to use a plain string to store the key/value sequence and
use only 12GB of ram as a result.
Test Plan: db_bench
Reviewers: igor, sdong, yhchiang
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D23763
Summary:
When creating a new iterator, instead of storing mapping from key to
bucket id for sorting, store only bucket id and read key from mmap file
based on the id. This reduces from 20 bytes per entry to only 4 bytes.
Test Plan: db_bench
Reviewers: igor, yhchiang, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D23757
Summary:
Using module to calculate hash makes lookup ~8% slower. But it has its
benefit: file size is more predictable, more space enffient
Test Plan: db_bench
Reviewers: igor, yhchiang, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D23691
Summary:
Add a CompactedDBImpl that will enabled when calling OpenForReadOnly()
and the DB only has one level (>0) of files. As a performan comparison,
CuckooTable performs 2.1M/s with CompactedDBImpl vs. 1.78M/s with
ReadOnlyDBImpl.
Test Plan: db_bench
Reviewers: yhchiang, igor, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D23553
Summary:
It was commented out in D22545 by accident. Keep the option in
ImmutableOptions for now. I can make it dynamic in
https://reviews.facebook.net/D23349
Test Plan: make release
Reviewers: sdong, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D23865
Summary:
MurmurHash becomes expensive when we do millions Get() a second in one
thread. Add this option to allow the first hash function to use identity
function as hash function. It results in QPS increase from 3.7M/s to
~4.3M/s. I did not observe improvement for end to end RocksDB
performance. This may be caused by other bottlenecks that I will address
in a separate diff.
Test Plan:
```
[ljin@dev1964 rocksdb] ./cuckoo_table_reader_test --enable_perf --file_dir=/dev/shm --write --identity_as_first_hash=0
==== Test CuckooReaderTest.WhenKeyExists
==== Test CuckooReaderTest.WhenKeyExistsWithUint64Comparator
==== Test CuckooReaderTest.CheckIterator
==== Test CuckooReaderTest.CheckIteratorUint64
==== Test CuckooReaderTest.WhenKeyNotFound
==== Test CuckooReaderTest.TestReadPerformance
With 125829120 items, utilization is 93.75%, number of hash functions: 2.
Time taken per op is 0.272us (3.7 Mqps) with batch size of 0, # of found keys 125829120
With 125829120 items, utilization is 93.75%, number of hash functions: 2.
Time taken per op is 0.138us (7.2 Mqps) with batch size of 10, # of found keys 125829120
With 125829120 items, utilization is 93.75%, number of hash functions: 2.
Time taken per op is 0.142us (7.1 Mqps) with batch size of 25, # of found keys 125829120
With 125829120 items, utilization is 93.75%, number of hash functions: 2.
Time taken per op is 0.142us (7.0 Mqps) with batch size of 50, # of found keys 125829120
With 125829120 items, utilization is 93.75%, number of hash functions: 2.
Time taken per op is 0.144us (6.9 Mqps) with batch size of 100, # of found keys 125829120
With 104857600 items, utilization is 78.12%, number of hash functions: 2.
Time taken per op is 0.201us (5.0 Mqps) with batch size of 0, # of found keys 104857600
With 104857600 items, utilization is 78.12%, number of hash functions: 2.
Time taken per op is 0.121us (8.3 Mqps) with batch size of 10, # of found keys 104857600
With 104857600 items, utilization is 78.12%, number of hash functions: 2.
Time taken per op is 0.123us (8.1 Mqps) with batch size of 25, # of found keys 104857600
With 104857600 items, utilization is 78.12%, number of hash functions: 2.
Time taken per op is 0.121us (8.3 Mqps) with batch size of 50, # of found keys 104857600
With 104857600 items, utilization is 78.12%, number of hash functions: 2.
Time taken per op is 0.112us (8.9 Mqps) with batch size of 100, # of found keys 104857600
With 83886080 items, utilization is 62.50%, number of hash functions: 2.
Time taken per op is 0.251us (4.0 Mqps) with batch size of 0, # of found keys 83886080
With 83886080 items, utilization is 62.50%, number of hash functions: 2.
Time taken per op is 0.107us (9.4 Mqps) with batch size of 10, # of found keys 83886080
With 83886080 items, utilization is 62.50%, number of hash functions: 2.
Time taken per op is 0.099us (10.1 Mqps) with batch size of 25, # of found keys 83886080
With 83886080 items, utilization is 62.50%, number of hash functions: 2.
Time taken per op is 0.100us (10.0 Mqps) with batch size of 50, # of found keys 83886080
With 83886080 items, utilization is 62.50%, number of hash functions: 2.
Time taken per op is 0.116us (8.6 Mqps) with batch size of 100, # of found keys 83886080
With 73400320 items, utilization is 54.69%, number of hash functions: 2.
Time taken per op is 0.189us (5.3 Mqps) with batch size of 0, # of found keys 73400320
With 73400320 items, utilization is 54.69%, number of hash functions: 2.
Time taken per op is 0.095us (10.5 Mqps) with batch size of 10, # of found keys 73400320
With 73400320 items, utilization is 54.69%, number of hash functions: 2.
Time taken per op is 0.096us (10.4 Mqps) with batch size of 25, # of found keys 73400320
With 73400320 items, utilization is 54.69%, number of hash functions: 2.
Time taken per op is 0.098us (10.2 Mqps) with batch size of 50, # of found keys 73400320
With 73400320 items, utilization is 54.69%, number of hash functions: 2.
Time taken per op is 0.105us (9.5 Mqps) with batch size of 100, # of found keys 73400320
[ljin@dev1964 rocksdb] ./cuckoo_table_reader_test --enable_perf --file_dir=/dev/shm --write --identity_as_first_hash=1
==== Test CuckooReaderTest.WhenKeyExists
==== Test CuckooReaderTest.WhenKeyExistsWithUint64Comparator
==== Test CuckooReaderTest.CheckIterator
==== Test CuckooReaderTest.CheckIteratorUint64
==== Test CuckooReaderTest.WhenKeyNotFound
==== Test CuckooReaderTest.TestReadPerformance
With 125829120 items, utilization is 93.75%, number of hash functions: 2.
Time taken per op is 0.230us (4.3 Mqps) with batch size of 0, # of found keys 125829120
With 125829120 items, utilization is 93.75%, number of hash functions: 2.
Time taken per op is 0.086us (11.7 Mqps) with batch size of 10, # of found keys 125829120
With 125829120 items, utilization is 93.75%, number of hash functions: 2.
Time taken per op is 0.088us (11.3 Mqps) with batch size of 25, # of found keys 125829120
With 125829120 items, utilization is 93.75%, number of hash functions: 2.
Time taken per op is 0.083us (12.1 Mqps) with batch size of 50, # of found keys 125829120
With 125829120 items, utilization is 93.75%, number of hash functions: 2.
Time taken per op is 0.083us (12.1 Mqps) with batch size of 100, # of found keys 125829120
With 104857600 items, utilization is 78.12%, number of hash functions: 2.
Time taken per op is 0.159us (6.3 Mqps) with batch size of 0, # of found keys 104857600
With 104857600 items, utilization is 78.12%, number of hash functions: 2.
Time taken per op is 0.078us (12.8 Mqps) with batch size of 10, # of found keys 104857600
With 104857600 items, utilization is 78.12%, number of hash functions: 2.
Time taken per op is 0.080us (12.6 Mqps) with batch size of 25, # of found keys 104857600
With 104857600 items, utilization is 78.12%, number of hash functions: 2.
Time taken per op is 0.080us (12.5 Mqps) with batch size of 50, # of found keys 104857600
With 104857600 items, utilization is 78.12%, number of hash functions: 2.
Time taken per op is 0.082us (12.2 Mqps) with batch size of 100, # of found keys 104857600
With 83886080 items, utilization is 62.50%, number of hash functions: 2.
Time taken per op is 0.154us (6.5 Mqps) with batch size of 0, # of found keys 83886080
With 83886080 items, utilization is 62.50%, number of hash functions: 2.
Time taken per op is 0.077us (13.0 Mqps) with batch size of 10, # of found keys 83886080
With 83886080 items, utilization is 62.50%, number of hash functions: 2.
Time taken per op is 0.077us (12.9 Mqps) with batch size of 25, # of found keys 83886080
With 83886080 items, utilization is 62.50%, number of hash functions: 2.
Time taken per op is 0.078us (12.8 Mqps) with batch size of 50, # of found keys 83886080
With 83886080 items, utilization is 62.50%, number of hash functions: 2.
Time taken per op is 0.079us (12.6 Mqps) with batch size of 100, # of found keys 83886080
With 73400320 items, utilization is 54.69%, number of hash functions: 2.
Time taken per op is 0.218us (4.6 Mqps) with batch size of 0, # of found keys 73400320
With 73400320 items, utilization is 54.69%, number of hash functions: 2.
Time taken per op is 0.083us (12.0 Mqps) with batch size of 10, # of found keys 73400320
With 73400320 items, utilization is 54.69%, number of hash functions: 2.
Time taken per op is 0.085us (11.7 Mqps) with batch size of 25, # of found keys 73400320
With 73400320 items, utilization is 54.69%, number of hash functions: 2.
Time taken per op is 0.086us (11.6 Mqps) with batch size of 50, # of found keys 73400320
With 73400320 items, utilization is 54.69%, number of hash functions: 2.
Time taken per op is 0.078us (12.8 Mqps) with batch size of 100, # of found keys 73400320
```
Reviewers: sdong, igor, yhchiang
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D23451