Summary:
When having fixed max_bytes_for_level_base, the ratio of size of largest level and the second one can range from 0 to the multiplier. This makes LSM tree frequently irregular and unpredictable. It can also cause poor space amplification in some cases.
In this improvement (proposed by Igor Kabiljo), we introduce a parameter option.level_compaction_use_dynamic_max_bytes. When turning it on, RocksDB is free to pick a level base in the range of (options.max_bytes_for_level_base/options.max_bytes_for_level_multiplier, options.max_bytes_for_level_base] so that real level ratios are close to options.max_bytes_for_level_multiplier.
Test Plan: New unit tests and pass tests suites including valgrind.
Reviewers: MarkCallaghan, rven, yhchiang, igor, ikabiljo
Reviewed By: ikabiljo
Subscribers: yoshinorim, ikabiljo, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D31437
Summary:
Summary:
Added a new option to ColumnFamllyOptions - optimize_filters_for_hits. This option can be used in the case where most
accesses to the store are key hits and we dont need to optimize performance for key misses.
This is useful when you have a very large database and most of your lookups succeed. The option allows the store to
not store and use filters in the last level (the largest level which contains data). These filters can take a large amount of
space for large databases (in memory and on-disk). For the last level, these filters are only useful for key misses and not
for key hits. If we are not optimizing for key misses, we can choose to not store these filters for that level.
This option is only provided for BlockBasedTable. We skip the filters when we are compacting
Test Plan:
1. Modified db_test toalso run tests with an additonal option (skip_filters_on_last_level)
2. Added another unit test to db_test which specifically tests that filters are being skipped
Reviewers: rven, igor, sdong
Reviewed By: sdong
Subscribers: lgalanis, yoshinorim, MarkCallaghan, rven, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D33717
Summary:
This patch will update the Makefile and source code so that we can build RocksDB successfully on FreeBSD 10 and 11 (64-bit and 32-bit)
I have also encountered some problems when running tests on FreeBSD, I will try to fix them individually in different diffs
Notes:
- FreeBSD uses clang as it's default compiler (http://lists.freebsd.org/pipermail/freebsd-current/2012-September/036480.html)
- GNU C++ compiler have C++ 11 problems on FreeBSD (https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=193528)
- make is not gmake on FreeBSD (http://www.khmere.com/freebsd_book/html/ch01.html)
Test Plan:
Using VMWare Fusion Create 4 VM machines (FreeBSD 11 64-bit, FreeBSD 11 32-bit, FreeBSD 10 64-bit, FreeBSD 10 32-bit)
- pkg install git gmake gflags archivers/snappy
- git clone https://github.com/facebook/rocksdb.git
- apply this patch
- setenv CXX c++
- setenv CPATH /usr/local/include/
- setenv LIBRARY_PATH /usr/local/lib/
- gmake db_bench
- make sure compilation is successful and db_bench is running
- gmake all
- make sure compilation is successful
Reviewers: sdong, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D33891
Summary:
The LoadDependency function does not take a lock when it runs
and it could be modifying data structures while other threads are
accessing it.
Test Plan: Run TSAN.
Reviewers: igor, sdong
Reviewed By: igor, sdong
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D34095
Summary:
When using latest clang (3.6 or 3.7/trunck) rocksdb is failing with many errors. Some errors are uninitialized use errors.
```
...
CC db/log_test.o
util/ldb_cmd.cc:394:16: error: base class 'rocksdb::LDBCommand' is uninitialized when used here to access 'rocksdb::LDBCommand::BuildCmdLineOptions' [-Werror,-Wuninitialized]
BuildCmdLineOptions({ARG_FROM, ARG_TO, ARG_HEX, ARG_KEY_HEX,
^
...
```
```lang=c++
CompactorCommand::CompactorCommand(const vector<string>& params,
const map<string, string>& options, const vector<string>& flags) :
LDBCommand(options, flags, false,
BuildCmdLineOptions({ARG_FROM, ARG_TO, ARG_HEX, ARG_KEY_HEX,
ARG_VALUE_HEX, ARG_TTL})),
null_from_(true), null_to_(true) {
. . .
}
```
For the fourth parameter of the base constructor (`LDBCommand`) we call `BuildCmdLineOptions`, which is a private non-static method of `LDBCommand` base class.
This diff adds missing `static` keyword for `LDBCommand::BuildCmdLineOptions` method.
Test Plan:
Build with trunk clang and make sure all tests are passing.
```lang=bash
% # Have trunk clang present in path.
% ROCKSDB_NO_FBCODE=1 CC=clang CXX=clang++ make check
``
Reviewers: meyering, sdong, rven, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D34083
Summary:
When using latest clang (3.6 or 3.7/trunck) rocksdb is failing with many errors. Almost all of them are missing override errors. This diff adds missing override keyword. No manual changes.
Prerequisites: bear and clang 3.5 build with extra tools
```lang=bash
% USE_CLANG=1 bear make all # generate a compilation database http://clang.llvm.org/docs/JSONCompilationDatabase.html
% clang-modernize -p . -include . -add-override
% make format
```
Test Plan:
Make sure all tests are passing.
```lang=bash
% #Use default fb code clang.
% make check
```
Verify less error and no missing override errors.
```lang=bash
% # Have trunk clang present in path.
% ROCKSDB_NO_FBCODE=1 CC=clang CXX=clang++ make
```
Reviewers: igor, kradhakrishnan, rven, meyering, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D34077
Summary:
BlockBasedTable pre-fetches the filter and index blocks on Open call.
This is an optimistic optimization targeted for runtime scenario. The
optimization is unnecessary for sst_dump_tool
- Added a provision to disable pre-fetching of index and filter blocks
in BlockBasedTable
- Disabled pre-fetching for the sst_dump tool
Stack for reference :
#01 0x00000000005ed944 in snappy::InternalUncompress<snappy::SnappyArrayWriter> () from /home/engshare/third-party2/snappy/1.0.3/src/snappy-1.0.3/snappy.cc:148
#02 0x00000000005edeee in snappy::RawUncompress () from /home/engshare/third-party2/snappy/1.0.3/src/snappy-1.0.3/snappy.cc:947
#03 0x00000000004e0b4d in rocksdb::UncompressBlockContents () from /data/users/paultuckfield/rocksdb/./util/compression.h:69
#04 0x00000000004e145c in rocksdb::ReadBlockContents () from /data/users/paultuckfield/rocksdb/table/format.cc:334
#05 0x00000000004ca424 in rocksdb::(anonymous namespace)::ReadBlockFromFile () from /data/users/paultuckfield/rocksdb/table/block_based_table_reader.cc:70
#06 0x00000000004cccad in rocksdb::BlockBasedTable::CreateIndexReader () from /data/users/paultuckfield/rocksdb/table/block_based_table_reader.cc:173
#07 0x00000000004d17e5 in rocksdb::BlockBasedTable::Open () from /data/users/paultuckfield/rocksdb/table/block_based_table_reader.cc:553
#08 0x00000000004c8184 in rocksdb::BlockBasedTableFactory::NewTableReader () from /data/users/paultuckfield/rocksdb/table/block_based_table_factory.cc:51
#09 0x0000000000598463 in rocksdb::SstFileReader::NewTableReader () from /data/users/paultuckfield/rocksdb/util/sst_dump_tool.cc:69
#10 0x00000000005986c2 in rocksdb::SstFileReader::SstFileReader () from /data/users/paultuckfield/rocksdb/util/sst_dump_tool.cc:26
#11 0x0000000000599047 in rocksdb::SSTDumpTool::Run () from /data/users/paultuckfield/rocksdb/util/sst_dump_tool.cc:332
#12 0x0000000000409b06 in main () from /data/users/paultuckfield/rocksdb/tools/sst_dump.cc:12
Test Plan:
- Added a unit test to trigger the code.
- Also did some manual verification.
- Passed all unit tests
task #6296048
Reviewers: igor, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D34041
Summary:
The unit test was supposed to check that the old file and the new file contains
the header message.
Test Plan: Run the unit test.
Reviewers: sdong
Reviewed By: sdong
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D33705
Summary: as above
Test Plan:
Run "make EXTRA_CXXFLAGS='-W -Wextra'" and see fewer errors.
Reviewers: ljin, sdong, igor.sugak, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D33753
Summary:
During running AutoRollLoggerTest on FreeBSD we have found out that AutoRollLogger is not flushing correctly (test fails on FreeBSD)
This diff add Flush to AutoRollLogger to fix this problem
Test Plan:
[My machine] make all check ( all tests pass)
[FreeBSD VM] running AutoRollLoggerTest ( all tests pass )
Reviewers: sdong, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D33633
Summary:
In mongo, we currently have a single column family and I'd like to support setting rocksdb::Options from string. This diff provides an option to GetOptionsFromString()
There's one more problem. Currently GetColumnFamilyOptionsFromString() overwrites block_based options. In mongo I set default values for block_cache and some other values of BlockBasedTableOptions and I don't want them reset to default with GetOptionsFromString().
Test Plan: added unit test
Reviewers: yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D33729
Summary:
Prior to this change, "make check" would always waste a lot of
time relinking 60+ binaries. With this change, it does that
only when the generated file, util/build_version.cc, changes,
and that happens only when the date changes or when the
current git SHA changes.
This change makes some other improvements: before, there was no
rule to build a deleted util/build_version.cc. If it was somehow
removed, any attempt to link a program would fail.
There is no longer any need for the separate file,
build_tools/build_detect_version. Its functionality is
now in the Makefile.
* Makefile (DEPFILES): Don't filter-out util/build_version.cc.
No need, and besides, removing that dependency was wrong.
(date, git_sha, gen_build_version): New helper variables.
(util/build_version.cc): New rule, to create this file
and update it only if it would contain new information.
* build_tools/build_detect_platform: Remove file.
* db/db_impl.cc: Now, print only date (not the time).
* util/build_version.h (rocksdb_build_compile_time): Remove
declaration. No longer used.
Test Plan:
- Run "make check" twice, and note that the second time no linking is performed.
- Remove util/build_version.cc and ensure that any "make"
command regenerates it before doing anything else.
- Run this: strings librocksdb.a|grep _build_.
That prints output including the following:
rocksdb_build_git_date:2015-02-19
rocksdb_build_git_sha:2.8.fb-1792-g3cb6cc0
Reviewers: ljin, sdong, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D33591
Summary:
This is a diff for managed iterator. A managed iterator
is a wrapper around an iterator which saves the options for that
iterator as well as the current key/value so that the underlying iterator
and its associated memory can be released when it is aged out
automatically or on the request of the user. Will provide the automatic release as a follow-up diff.
Test Plan: Managed* tests in db_test and XF tests for managed iterator
Reviewers: igor, yhchiang, anthony, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D31401
Summary:
scan-build complains with division by zero warning in a test. Added an assertion to prevent this.
scan-build report: http://home.fburl.com/~sugak/latest6/report-c61be9.html#EndPath
Test Plan:
Make sure scan-build does not report 'Division by zero' and all tests are passing.
```lang=bash
% make analyze
% make check
```
Reviewers: igor, meyering
Reviewed By: meyering
Subscribers: sdong, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D33495
Summary:
Add thread_status_util_debug.cc back as InstrumentedMutex related tests
are using it to produce wait that can be reflected in the counter.
Test Plan:
./perf_context_test
export ROCKSDB_TESTS=MutexWaitStats
./db_test
Reviewers: sdong, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D33525
Summary: Allow GetThreadList to reflect flush activity.
Test Plan:
Developed ThreadStatusFlush test and updated ThreadStatusMultiCompaction test.
./db_test ./thread_list_test
Reviewers: sdong, rven, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D32871
Summary:
In the existing implementation of `ASSERT*`, test termination happens in `~Tester`, which is called when instance of `Tester` goes out of scope. This is the cause of many scan-build bugs.
This diff changes `ASSERT*` to terminate the test immediately. Also added one suppression in `util/signal_test.cc`
scan-build bugs
before: http://home.fburl.com/~sugak/latest/index.html
after: http://home.fburl.com/~sugak/latest2/index.html
Test Plan:
Modify some test to fail an assertion and make sure that `ASSERT*` terminated the test.
Run `make analyze` and make sure no 'Called C++ object pointer is null' and 'Dereference of null pointer' bugs reported.
Run tests and make sure no failing tests:
```lang=bash
% make check
% USE_CLANG=1 make check
```
Reviewers: meyering, lgalanis, sdong, rven, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D33381
Summary: Add counters in perf context to allow users to figure out how time spent on waiting for DB mutex
Test Plan: Add a test and run it.
Reviewers: yhchiang, rven, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D33177
Summary:
This change adds LogHeader provision to the logger. For the rolling logger
implementation, the headers are copied over to the new log file every time
there is a log roll over.
Test Plan: Added a unit test to test the rolling log case.
Reviewers: sdong
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D32817
Summary:
This diff basically reverts D30249 and also adds a unit test that was failing before this patch.
I have no idea how I didn't catch this terrible bug when writing a diff, sorry about that :(
I think we should redesign our system of keeping track of and deleting files. This is already a second bug in this critical piece of code. I'll think of few ideas.
BTW this diff is also a regression when running lots of column families. I plan to revisit this separately.
Test Plan: added a unit test
Reviewers: yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D33045
Summary:
Add a counter for collecting the wait time on db mutex.
Also add MutexWrapper and CondVarWrapper for measuring wait time.
Test Plan:
./db_test
export ROCKSDB_TESTS=MutexWaitStats
./db_test
verify stats output using db_bench
make clean
make release
./db_bench --statistics=1 --benchmarks=fillseq,readwhilewriting --num=10000 --threads=10
Sample output:
rocksdb.db.mutex.wait.micros COUNT : 7546866
Reviewers: MarkCallaghan, rven, sdong, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D32787
Summary: A bug in MockEnv causes fault_injestion_test to fail. I don't know why it doesn't fail every time but it doesn't seem to be right.
Test Plan:
Run fault_injestion_test
Also run db_test with MEM_ENV=1 until the first failure.
Reviewers: yhchiang, rven, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D32877
Summary:
Add ThreadStatus::GetOperationName() and ThreadStatus::GetStateName(),
two utility functions that help interpreting ThreadStatus.
Test Plan: ./thread_list_test
Reviewers: sdong, rven, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D32793
Summary: Increasing parallelism of flushes will help bulk load throughput.
Test Plan: Compile it.
Reviewers: MarkCallaghan, yhchiang, rven, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D32685
Summary: I broke it with 2fd8f750ab
Test Plan: make unity
Reviewers: yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D32577
Summary:
./util/xfunc.h:31:1: error: class 'Options' was previously declared as a struct [-Werror,-Wmismatched-tags]
class Options;
^
Test Plan:
make dbg -j32
Summary:
This Diff provides the implementation of the cross functional
test infrastructure. This provides the ability to test a single feature
with every existing regression test in order to identify issues with
interoperability between features.
Test Plan:
Reference implementation of inplace update support cross
functional test. Able to find interoperability issues with inplace
support and ran all of db_test. Will add separate diff for those changes.
Reviewers: igor, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D32247
Summary: Add CappedFixTransform, which is the same as fixed length prefix extractor, except that when slice is shorter than the fixed length, it will use the full key.
Test Plan:
Add a test case for
db_test
options_test
and a new test
Reviewers: yhchiang, rven, igor
Reviewed By: igor
Subscribers: MarkCallaghan, leveldb, dhruba, yoshinorim
Differential Revision: https://reviews.facebook.net/D31887
Summary: This was a feature request by osquery. See task t5617758
Test Plan: compiles and memenv_test runs
Reviewers: yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D32115
Summary: This bug fails DBTest.CheckLock
Test Plan: DBTest.CheckLock now passes with MEM_ENV=1.
Reviewers: rven, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D32451
Summary:
1) need to do acquire load when read the first entry in the bucket.
2) Make num_entries atomic
Test Plan: Ran DBTest.MultiThreaded with TSAN
Reviewers: yhchiang, rven, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D32361
Summary: Add a new test case in fault_injection_test, which covers parallel compactions and multiple levels. Use MockEnv to run the new test case to speed it up. Improve MockEnv to avoid DestoryDB(), previously failed when deleting lock files.
Test Plan: Run ./fault_injection_test, including valgrind
Reviewers: rven, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D32415
Summary: TSAN complained that these are non-atomic reads and writes from different threads.
Test Plan: TSAN no longer complains
Reviewers: yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D32409
Summary: We don't have plans to work on this in the short term. If we ever resurrect the project, we can find the code in the history. No need for it to linger around
Test Plan: no test
Reviewers: yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D32349
Summary:
ThreadSanitizer complains data race of super version and version's destructor with Get(). This patch will fix those warning.
The warning is likely from ColumnFamilyData::ReturnThreadLocalSuperVersion(). With relaxed consistency of CAS, reading the data of the super version can technically happen after swapping it in, enabling the background thread to clean it up.
Test Plan: make all check
Reviewers: rven, igor, yhchiang
Reviewed By: yhchiang
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D32265
Summary:
Make options_test runnable on ROCKSDB_LITE by blocking
those tests that require non-ROCKSDB_LITE feature.
Test Plan:
make options_test OPT=-DROCKSDB_LITE -j32
./options_test
make clean
make options_test -j32
./options_test
Reviewers: sdong, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D32025
Summary:
This diff enables to configure prefix_extractor
string parameter as a CF option.
Test Plan: make all check, ./options_test
Reviewers: sdong, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D31653
Summary:
This diff adds BlockBasedTable format_version = 2. New format version brings better compressed block format for these compressions:
1) Zlib -- encode decompressed size in compressed block header
2) BZip2 -- encode decompressed size in compressed block header
3) LZ4 and LZ4HC -- instead of doing memcpy of size_t encode size as varint32. memcpy is very bad because the DB is not portable accross big/little endian machines or even platforms where size_t might be 8 or 4 bytes.
It does not affect format for snappy.
If you write a new database with format_version = 2, it will not be readable by RocksDB versions before 3.10. DB::Open() will return corruption in that case.
Test Plan:
Added a new test in db_test.
I will also run db_bench and verify VSIZE when block_cache == 1GB
Reviewers: yhchiang, rven, MarkCallaghan, dhruba, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D31461
Summary: We keep checksum functions in util/, there is no reason for compression to be in port/
Test Plan: compiles
Reviewers: sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D31281
Summary:
Since https://reviews.facebook.net/D16119, we ignore partial tailing writes. Because of that, we no longer need skip_log_error_on_recovery.
The documentation says "Skip log corruption error on recovery (If client is ok with losing most recent changes)", while the option actually ignores any corruption of the WAL (not only just the most recent changes). This is very dangerous and can lead to DB inconsistencies. This was originally set up to ignore partial tailing writes, which we now do automatically (after D16119). I have digged up old task t2416297 which confirms my findings.
Test Plan: There was actually no tests that verified correct behavior of skip_log_error_on_recovery.
Reviewers: yhchiang, rven, dhruba, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D30603