Summary:
Improve documentation for RocksEnv and its C++ resource. Specifically,
the result of RocksEnv::Default() is a singleton, and the ownership
of its c++ resource belongs to rocksdb c++. As a result, calling
dispose() of the return value of RocksDB::Default() will be no-op.
Test Plan: no code change.
Reviewers: haobo, ankgup87
Reviewed By: ankgup87
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D18945
Summary: Provide an convenience option to create column families if they are missing from the DB. Task #4460490
Test Plan: added unit test. also, stress test for some time
Reviewers: sdong, haobo, dhruba, ljin, yhchiang
Reviewed By: yhchiang
Subscribers: yhchiang, leveldb
Differential Revision: https://reviews.facebook.net/D18951
Summary: We have a perf regression of Write() even with one column family. Make fast path for single column family to avoid the perf regression. See task #4455480
Test Plan: make check
Reviewers: sdong, ljin
Reviewed By: sdong, ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D18963
Summary: Now sst_dump fails in block based tables if binary search index is used, as it requires a prefix extractor. Add it.
Test Plan: Run it against such a file to make sure it fixes the problem.
Reviewers: yhchiang, kailiu
Reviewed By: kailiu
Subscribers: ljin, igor, dhruba, haobo, leveldb
Differential Revision: https://reviews.facebook.net/D18927
Summary: In universal compaction, MaxFileSizeForLevel is ULLONG_MAX. We've been preallocation files to UULONG_MAX size all these time :)
Test Plan: make check
Reviewers: dhruba, haobo, ljin, sdong, yhchiang
Reviewed By: yhchiang
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D18915
Summary:
388d2054c7
added extra line to db_bench output, breaking regression tests. This diff makes it more robust and fixes the issue
Test Plan: ran it
Reviewers: ljin, sdong
Reviewed By: sdong
Subscribers: sdong, leveldb
Differential Revision: https://reviews.facebook.net/D18897
Summary:
In this patch, try to allocate the whole iterator tree starting from DBIter from an arena
1. ArenaWrappedDBIter is created when serves as the entry point of an iterator tree, with an arena in it.
2. Add an option to create iterator from arena for following iterators: DBIter, MergingIterator, MemtableIterator, all mem table's iterators, all table reader's iterators and two level iterator.
3. MergeIteratorBuilder is created to incrementally build the tree of internal iterators. It is passed to mem table list and version set and add iterators to it.
Limitations:
(1) Only DB::NewIterator() without tailing uses the arena. Other cases, including readonly DB and compactions are still from malloc
(2) Two level iterator itself is allocated in arena, but not iterators inside it.
Test Plan: make all check
Reviewers: ljin, haobo
Reviewed By: haobo
Subscribers: leveldb, dhruba, yhchiang, igor
Differential Revision: https://reviews.facebook.net/D18513
Summary:
This patch changes meaning of options.bloom_locality: 0 means disable cache line optimization and any positive number means use CACHE_LINE_SIZE as block size (the previous behavior is the block size will be CACHE_LINE_SIZE*options.bloom_locality). By doing it, the divide operations inside a block can be replaced by a shift.
Performance is improved:
https://reviews.facebook.net/P471
Also, improve the basic algorithm in two ways:
(1) make sure num of blocks is an odd number
(2) rotate bytes after every probe in locality mode. Since the divider is 2^n, unless doing it, we are never able to use all the bits.
Improvements of false positive: https://reviews.facebook.net/P459
Test Plan: make all check
Reviewers: ljin, haobo
Reviewed By: haobo
Subscribers: dhruba, yhchiang, igor, leveldb
Differential Revision: https://reviews.facebook.net/D18843
Summary:
At the end of BackgroundCallCompaction(), we call SignalAll(), even though we don't need to. If compaction hasn't done anything and there's another compaction running, there is no need to signal on the condition variable. Doing so creates a tight feedback loop which results in log files like:
wait for memtable flush
compaction nothing to do
wait for memtable flush
compaction nothing to do
This change eliminates that
Test Plan:
make check
Also:
icanadi@dev1440 ~ $ grep "nothing to do" /fast-rocksdb-tmp/rocksdb_test/column_family_test/LOG | wc -l
7435
icanadi@dev1440 ~ $ grep "nothing to do" /fast-rocksdb-tmp/rocksdb_test/column_family_test/LOG | wc -l
372
First version is before the change, second version is after the change.
Reviewers: dhruba, ljin, haobo, yhchiang, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D18855
Summary:
We've seen some production issues where column family is detected as stale, although there is only one column family in the system. This is a quick fix that:
1) doesn't flush stale column families if there's only one of them
2) Use 4 as a coefficient instead of 2 for determening when a column family is stale. This will make flushing less aggressive, while still keep a nice dynamic flushing of very stale CFs.
Test Plan: make check
Reviewers: dhruba, haobo, ljin, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D18861
Summary: Add an option to indicates the variation of background threads. If the flag is not 0, for every 100 milliseconds, adjust thread pool size to a value within the range.
Test Plan: run db_stress
Reviewers: haobo, dhruba, igor, ljin
Reviewed By: ljin
Subscribers: leveldb, nkg-
Differential Revision: https://reviews.facebook.net/D18759
Summary:
Forward iterator puts everything together in a flat structure instead of
a hierarchy of nested iterators. this should simplify the code and
provide better performance. It also enables more optimization since all
information are accessiable in one place.
Init evaluation shows about 6% improvement
Test Plan: db_test and db_bench
Reviewers: dhruba, igor, tnovak, sdong, haobo
Reviewed By: haobo
Subscribers: sdong, leveldb
Differential Revision: https://reviews.facebook.net/D18795
Summary: One more option to allow iterator refreshing when using normal iterator
Test Plan: ran db_bench
Reviewers: haobo, sdong, igor
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D18849
Summary: 220132b65e correctly fixed the issue of thread ID printing when terminating a thread. Nothing wrong with it. This diff prints the ID in the same way as in PosixLogger::logv() so that users can be more easily to correlates them.
Test Plan: run env_test and make sure it prints correctly.
Reviewers: igor, haobo, ljin, yhchiang
Reviewed By: yhchiang
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D18819
Summary:
1. Move disOwnNativeHandle() function from RocksDB to RocksObject
to allow other RocksObject to use disOwnNativeHandle() when its
ownership of native handle has been transferred.
2. RocksObject now has an abstract implementation of dispose(),
which does the following two things. First, it checks whether
both isOwningNativeHandle() and isInitialized() return true.
If so, it will call the protected abstract function dispose0(),
which all the subclasses of RocksObject should implement. Second,
it sets nativeHandle_ = 0. This redesign ensure all subclasses
of RocksObject have the same dispose behavior.
3. All subclasses of RocksObject now should implement dispose0()
instead of dispose(), and dispose0() will be called only when
isInitialized() returns true.
Test Plan:
make rocksdbjava
make jtest
Reviewers: dhruba, sdong, ankgup87, rsumbaly, swapnilghike, zzbennett, haobo
Reviewed By: haobo
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D18801
Summary:
Originally, the boolean arguments in Java DB benchmark only takes
true and false as valid input. This diff allows boolean arguments
to be set using 0 or 1.
Test Plan:
make rocksdbjava
make jdb_bench
java/jdb_bench.sh --db=/tmp/path-not-exist --use_existing_db=1
java/jdb_bench.sh --db=/tmp/path-not-exist --use_existing_db=0
java/jdb_bench.sh --db=/tmp/path-not-exist --use_existing_db=1
Reviewers: haobo, dhruba, sdong, ankgup87, rsumbaly, swapnilghike, zzbennett
Reviewed By: haobo
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D18687
Summary:
Introducing new compaction style -- FIFO.
FIFO compaction style has write amplification of 1 (+1 for WAL) and it deletes the oldest files when the total DB size exceeds pre-configured values.
FIFO compaction style is suited for storing high-frequency event logs.
Test Plan: Added a unit test
Reviewers: dhruba, haobo, sdong
Reviewed By: dhruba
Subscribers: alberts, leveldb
Differential Revision: https://reviews.facebook.net/D18765
Summary: Per request from @nkg-, temporarily print thread ID when a thread terminates. It is a temp solution as we try to minimized stderr messages.
Test Plan: env_test
Reviewers: haobo, igor, dhruba
Reviewed By: igor
CC: nkg-, leveldb
Differential Revision: https://reviews.facebook.net/D18753
Summary:
Add a feature to decrease the number of threads in thread pool.
Also instantly schedule more threads if number of threads is increased.
Here is the way it is implemented: each background thread needs its thread ID. After decreasing number of threads, all threads are woken up. The thread with the largest thread ID will terminate. If there are more threads to terminate, the thread will wake up all threads again.
Another change is made so that when number of threads is increased, more threads are created and all previous excessive threads are woken up to do the work.
Test Plan: Add a unit test.
Reviewers: haobo, dhruba
Reviewed By: haobo
CC: yhchiang, igor, nkg-, leveldb
Differential Revision: https://reviews.facebook.net/D18675
Summary:
This diff adds rapidjson (https://code.google.com/p/rapidjson/) to RocksDB repository. First step to adding JSON is to add a JSON parser :)
I'm not sure if rapidjson is the right choice. I only considered folly as an alternative. Using folly JSON parser has serious downsides. I tried extracting folly::json from the folly library. However, it depends on folly::dynamic, which basically depends on everything else. I would prefer to avoid adding complete folly library as RocksDB dependency. Folly has a lot of its own dependencies (https://github.com/facebook/folly) and also looks like open source world has some trouble installing it (https://groups.google.com/forum/#!forum/facebook-folly -- 60% of the posts are compile errors). We can discuss this if you think otherwise.
RapidJSON does not have any dependencies whatsoever. No boost, no STL. We don't need to compile it since it's all header files. The performance results are also impressive: https://code.google.com/p/rapidjson/wiki/Performance. However, I'll make sure to write code in a way that we can easily switch JSON implementations going forward. Quora thread has some alternatives: http://www.quora.com/What-is-the-best-C-JSON-library
Test Plan: none
Reviewers: dhruba, haobo, yhchiang, sdong, jamesgpearce
Reviewed By: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D18729
Summary:
Materialize the hash index to avoid the soaring cpu/flash usage
when initializing the database.
Test Plan: existing unit tests passed
Reviewers: sdong, haobo
Reviewed By: sdong
CC: leveldb
Differential Revision: https://reviews.facebook.net/D18339