Summary:
Easy thing goes first. This patch moves arena to internal dir; based
on which, the coming patch will deal with memtable_rep.
Test Plan: make check
Reviewers: haobo, sdong, dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15615
Summary:
Add option '--show_properties' to sst_dump tool to allow displaying
property block of the specified files.
Test Plan:
Run sst_dump with the following arguments, which covers cases affected by
this diff:
1. with only --file
2. with both --file and --show_properties
3. with --file, --show_properties, and --from
Reviewers: kailiu, xjin
Differential Revision: https://reviews.facebook.net/D15453
Summary: By removing some includes form options.h and reply on forward declaration, we can more easily reason the dependencies.
Test Plan: make all check
Reviewers: kailiu, haobo, igor, dhruba
Reviewed By: kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15411
Summary:
It looks like we might have some trouble when building the new release with 4.8, since fbcode is using glibc2.17-fb by default and we are using glibc2.17. It was reported by Benjamin Renard in our internal group.
This diff moves our fbcode build to use glibc2.17-fb by default. I got some linker errors when compiling, complaining that `google::SetUsageMessage()` was undefined. After deleting all offending lines, the compile was successful and everything works.
Test Plan:
Compiled
Ran ./db_bench ./db_stress ./db_repl_stress
Reviewers: kailiu
Reviewed By: kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15405
Summary: I'm separating code-cleanup part of https://reviews.facebook.net/D14517. This will make D14517 easier to understand and this diff easier to review.
Test Plan: make check
Reviewers: haobo, kailiu, sdong, dhruba, tnovak
Reviewed By: tnovak
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15099
Summary:
Let's get rid of TransformRep and it's children. We have confirmed that HashSkipListRep works better with multifeed, so there is no benefit to keeping this around.
This diff is mostly just deleting references to obsoleted functions. I also have a diff for fbcode that we'll need to push when we switch to new release.
I had to expose HashSkipListRepFactory in the client header files because db_impl.cc needs access to GetTransform() function for SanitizeOptions.
Test Plan: make check
Reviewers: dhruba, haobo, kailiu, sdong
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D14397
Summary:
These bugs were caught by ASAN crash test.
1. The first one, in table/filter_block.cc is very nasty. We first reference entries_ and store the reference to Slice prev. Then, we call entries_.append(), which can change the reference. The Slice prev now points to junk.
2. The second one is a bug in a test, so it's not very serious. Once we set read_opts.prefix, we never clear it, so some other function might still reference it.
Test Plan: asan crash test now runs more than 5 mins. Before, it failed immediately. I will run the full one, but the full one takes quite some time (5 hours)
Reviewers: dhruba, haobo, kailiu
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D14223
Summary: This diff invoves some more complicated issues in the posix environment.
Test Plan: works under mac os. will need to verify dev box.
Reviewers: dhruba
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D14061
Summary: This diff leverage the existing block cache and extend it to cache index/filter block.
Test Plan:
Added new tests in db_test and table_test
The correctness is checked by:
1. make check
2. make valgrind_check
Performance is test by:
1. 10 times of build_tools/regression_build_test.sh on two versions of rocksdb before/after the code change. Test results suggests no significant difference between them. For the two key operatons `overwrite` and `readrandom`, the average iops are both 20k and ~260k, with very small variance).
2. db_stress.
Reviewers: dhruba
Reviewed By: dhruba
CC: leveldb, haobo, xjin
Differential Revision: https://reviews.facebook.net/D13167
Summary:
Archive cleaning will still happen every WAL_ttl seconds
but archived logs will be deleted only if archive size
is greater then a WAL_size_limit value.
Empty archived logs will be deleted evety WAL_ttl.
Test Plan:
1. Unit tests pass.
2. Benchmark.
Reviewers: emayanke, dhruba, haobo, sdong, kailiu, igor
Reviewed By: emayanke
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13869
Summary:
Rocksdb can now support a uncompressed block cache, or a compressed
block cache or both. Lookups first look for a block in the
uncompressed cache, if it is not found only then it is looked up
in the compressed cache. If it is found in the compressed cache,
then it is uncompressed and inserted into the uncompressed cache.
It is possible that the same block resides in the compressed cache
as well as the uncompressed cache at the same time. Both caches
have their own individual LRU policy.
Test Plan: Unit test case attached.
Reviewers: kailiu, sdong, haobo, leveldb
Reviewed By: haobo
CC: xjin, haobo
Differential Revision: https://reviews.facebook.net/D12675
Summary: Added an option --count_delim=<char> which takes the given character as delimiter ('.' by default) and reports count of each row type found in the db
Test Plan:
1. Created test in file (for DBDumperCommand) rocksdb/tools/ldb_test.py which puts various key value pair in db and checks the output using dump --count_delim ,--count_delim="." and --count_delim=",".
2. Created test in file (for InternalDumperCommand) rocksdb/tools/ldb_test.py which puts various key value pair in db and checks the output using dump --count_delim ,--count_delim="." and --count_delim=",".
3. Manually created a database with several keys of several type and verified by running the command
./ldb db=<path> dump --count_delim="<char>"
./ldb db=<path> idump --count_delim="<char>"
Reviewers: vamsi, dhruba, emayanke, kailiu
Reviewed By: vamsi
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13815
Summary: Don't define if already defined.
Test Plan: Running make release in parallel, will not commit if it fails.
Reviewers: emayanke
Reviewed By: emayanke
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13833
Summary:
This patch is to address @haobo's comments on D13521:
1. rename Table to be TableReader and make its factory function to be GetTableReader
2. move the compression type selection logic out of TableBuilder but to compaction logic
3. more accurate comments
4. Move stat name constants into BlockBasedTable implementation.
5. remove some uncleaned codes in simple_table_db_test
Test Plan: pass test suites.
Reviewers: haobo, dhruba, kailiu
Reviewed By: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13785
Summary: This patch makes Table and TableBuilder a abstract class and make all the implementation of the current table into BlockedBasedTable and BlockedBasedTable Builder.
Test Plan: Make db_test.cc to work with block based table. Add a new test simple_table_db_test.cc where a different simple table format is implemented.
Reviewers: dhruba, haobo, kailiu, emayanke, vamsi
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13521
Summary: assert(Overlap) significantly slows down the benchmark. Ignore assertions when executing blob_store_bench.
Test Plan: Ran the benchmark
Reviewers: dhruba, haobo, kailiu
Reviewed By: kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13737
Summary: Apparently C++ doesn't like it if you copy around its atomic<> variables. When running a benchmark for a longer time, benchmark used to stall. Changed WorkerThread in config to WorkerThread*. It works now.
Test Plan: Ran benchmark
Reviewers: dhruba
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13731
Summary: Converted db_stress, db_repl_stress and db_bench to use gflags
Test Plan: I tested by printing out all the flags from old and new versions. Tried defaults, + various combinations with "interesting flags". Also, tested by running db_crashtest.py and db_crashtest2.py.
Reviewers: emayanke, dhruba, haobo, kailiu, sdong
Reviewed By: emayanke
CC: leveldb, xjin
Differential Revision: https://reviews.facebook.net/D13581
Summary:
Finally, arc diff works again! This has been sitting in my repo for a while.
I would like some comments on my BlobStore benchmark. We don't have to check this in.
Also, I don't do any fsync in the BlobStore, so this is all extremely fast. I'm not sure what durability guarantees we need from the BlobStore.
Test Plan: Nope
Reviewers: dhruba, haobo, kailiu, emayanke
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13527
Summary: When I debug the unit test failures when enabling background flush thread, I feel the function names can be made clearer for people to understand. Also, if the names are fixed, in many places, some tests' bugs are obvious (and some of those tests are failing). This patch is to clean it up for future maintenance.
Test Plan: Run test suites.
Reviewers: haobo, dhruba, xjin
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13431
Summary:
Change namespace from leveldb to rocksdb. This allows a single
application to link in open-source leveldb code as well as
rocksdb code into the same process.
Test Plan: compile rocksdb
Reviewers: emayanke
Reviewed By: emayanke
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13287
Summary: Will use iterators to verify keys in the db for half of its keys and Gets for the other half.
Test Plan: ./db_stress --max_key=1000 --ops_per_thread=100
Reviewers: dhruba, haobo
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13227
Summary: Using an iterator instead of the Get method, each thread goes through a portion of the database and verifies values by comparing to the shared state.
Test Plan:
./db_stress --db=/tmp/tmppp --max_key=10000 --ops_per_thread=10000
To test some basic cases, the following lines can be added (each set in turn) to the verifyDb method with the following expected results:
// Should abort with "Unexpected value found"
shared.Delete(start);
// Should abort with "Value not found"
WriteOptions write_opts;
db_->Delete(write_opts, Key(start));
// Should succeed
WriteOptions write_opts;
shared.Delete(start);
db_->Delete(write_opts, Key(start));
// Should abort with "Value not found"
WriteOptions write_opts;
db_->Delete(write_opts, Key(start + (end-start)/2));
// Should abort with "Value not found"
db_->Delete(write_opts, Key(end-1));
// Should abort with "Unexpected value"
shared.Delete(end-1);
// Should abort with "Unexpected value"
shared.Delete(start + (end-start)/2);
// Should abort with "Value not found"
db_->Delete(write_opts, Key(start));
shared.Delete(start);
db_->Delete(write_opts, Key(end-1));
db_->Delete(write_opts, Key(end-2));
To test the out of range abort, change the key in the for loop to Key(i+1), so that the key defined by the index i is now outside of the supposed range of the database.
Reviewers: emayanke
Reviewed By: emayanke
CC: dhruba, xjin
Differential Revision: https://reviews.facebook.net/D13071
Summary: Adding in the iterpercent flag to tests.
Test Plan: make crash_test
Reviewers: emayanke
Reviewed By: emayanke
Differential Revision: https://reviews.facebook.net/D13035
Summary:
Added MultiIterate() which does a seek and some Next/Prev
calls. Iterator status is checked only, no data integrity check
Test Plan:
make db_stress
./db_stress --iterpercent=<nonzero value> --readpercent=, etc.
Reviewers: emayanke, dhruba, xjin
Reviewed By: emayanke
CC: leveldb
Differential Revision: https://reviews.facebook.net/D12915
Summary:
Added a new field called max_size_amplification_ratio in the
CompactionOptionsUniversal structure. This determines the maximum
percentage overhead of space amplification.
The size amplification is defined to be the ratio between the size of
the oldest file to the sum of the sizes of all other files. If the
size amplification exceeds the specified value, then min_merge_width
and max_merge_width are ignored and a full compaction of all files is done.
A value of 10 means that the size a database that stores 100 bytes
of user data could occupy 110 bytes of physical storage.
Test Plan: Unit test DBTest.UniversalCompactionSpaceAmplification added.
Reviewers: haobo, emayanke, xjin
Reviewed By: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D12825
Summary: Replace include/leveldb with include/rocksdb.
Test Plan:
make clean; make check
make clean; make release
Differential Revision: https://reviews.facebook.net/D12489
Summary:
This patch adds three new MemTableRep's: UnsortedRep, PrefixHashRep, and VectorRep.
UnsortedRep stores keys in an std::unordered_map of std::sets. When an iterator is requested, it dumps the keys into an std::set and iterates over that.
VectorRep stores keys in an std::vector. When an iterator is requested, it creates a copy of the vector and sorts it using std::sort. The iterator accesses that new vector.
PrefixHashRep stores keys in an unordered_map mapping prefixes to ordered sets.
I also added one API change. I added a function MemTableRep::MarkImmutable. This function is called when the rep is added to the immutable list. It doesn't do anything yet, but it seems like that could be useful. In particular, for the vectorrep, it means we could elide the extra copy and just sort in place. The only reason I haven't done that yet is because the use of the ArenaAllocator complicates things (I can elaborate on this if needed).
Test Plan:
make -j32 check
./db_stress --memtablerep=vector
./db_stress --memtablerep=unsorted
./db_stress --memtablerep=prefixhash --prefix_size=10
Reviewers: dhruba, haobo, emayanke
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D12117
Summary:
Most code change in this diff is code cleanup/rewrite. The logic changes include:
(1) add universal compaction to db_crashtest2.py
(2) randomly set --test_batches_snapshots to be 0 or 1 in db_crashtest2.py. Old codes always use 1.
(3) use different tmp directory as db directory in different runs. I saw some intermittent errors in my local tests. Use of different tmp directory seems to be able to solve the issue.
Test Plan: Have run "make crashtest" for multiple times. Also run "make all check"
Reviewers: emayanke, dhruba, haobo
Reviewed By: emayanke
Differential Revision: https://reviews.facebook.net/D12369
Test Plan:
- make all check;
- make release;
- make stringappend_test; ./stringappend_test
Reviewers: haobo, emayanke
Reviewed By: haobo
CC: leveldb, kailiu
Differential Revision: https://reviews.facebook.net/D12381
Summary: Also expanded class LogFile to have startSequene and FileSize and exposed it publicly
Test Plan: make all check
Reviewers: dhruba, haobo
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D12087
Summary:
Updated db_bench and utilities/merge_operators.h to allow for dynamic benchmarking
of merge operators in db_bench. Added a new test (--benchmarks=mergerandom), which performs
a bunch of random Merge() operations over random keys. Also added a "--merge_operator=" flag
so that the tester can easily benchmark different merge operators. Currently supports
the PutOperator and UInt64Add operator. Support for stringappend or list append may come later.
Test Plan:
1. make db_bench
2. Test the PutOperator (simulating Put) as follows:
./db_bench --benchmarks=fillrandom,readrandom,updaterandom,readrandom,mergerandom,readrandom --merge_operator=put
--threads=2
3. Test the UInt64AddOperator (simulating numeric addition) similarly:
./db_bench --value_size=8 --benchmarks=fillrandom,readrandom,updaterandom,readrandom,mergerandom,readrandom
--merge_operator=uint64add --threads=2
Reviewers: haobo, dhruba, zshao, MarkCallaghan
Reviewed By: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D11535
Summary: I changed the db_stress configs, but forgot to update the scripts using the old configs.
Test Plan: 'make blackbox_crash_test' and 'make whitebox_crash_test' start running normally now (I haven't run them til the end, though).
Reviewers: vamsi
Reviewed By: vamsi
CC: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D12303
Summary:
Minor fix to current codes, including: coding style, output format,
comments. No major logic change. There are only 2 real changes, please see my inline comments.
Test Plan: make all check
Reviewers: haobo, dhruba, emayanke
Differential Revision: https://reviews.facebook.net/D12297
Summary: The ability to dump internal keys associated with certain user keys, directly from sst files, is very useful for diagnosis. Will incorporate it directly into ldb later.
Test Plan: run it
Reviewers: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D12075
Summary: rocksdb replicaiton will need this when writing value+TS from master to slave 'as is'
Test Plan: make
Reviewers: dhruba, vamsi, haobo
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D11919
Summary: Removed KeyMayExistImpl because KeyMayExist demanded Get like semantics now. Removed no_io from memtable and imm because we need the proper value now and shouldn't just stop when we see Merge in memtable. Added checks to block_cache. Updated documentation and unit-test
Test Plan: make all check;db_stress for 1 hour
Reviewers: dhruba, haobo
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D11853
Summary: filter_deletes options introduced in db_stress makes it drop Deletes on key if KeyMayExist(key) returns false on the key. code change was simple and tested so not wasting reviewer's time.
Test Plan: maek crash_test; python tools/db_crashtest[1|2].py
CC: dhruba, vamsi
Differential Revision: https://reviews.facebook.net/D11769
Summary:
Introduced KeyMayExist checking during writebatch-delete and removed from Outer Delete API because it uses writebatch-delete.
Added code to skip getting Table from disk if not already present in table_cache.
Some renaming of variables.
Introduced KeyMayExistImpl which allows checking since specified sequence number in GetImpl useful to check partially written writebatch.
Changed KeyMayExist to not be pure virtual and provided a default implementation.
Expanded unit-tests in db_test to check appropriately.
Ran db_stress for 1 hour with ./db_stress --max_key=100000 --ops_per_thread=10000000 --delpercent=50 --filter_deletes=1 --statistics=1.
Test Plan: db_stress;make check
Reviewers: dhruba, haobo
Reviewed By: dhruba
CC: leveldb, xjin
Differential Revision: https://reviews.facebook.net/D11745
Summary:
Wrote a new function in db_impl.c-CheckKeyMayExist that calls Get but with a new parameter turned on which makes Get return false only if bloom filters can guarantee that key is not in database. Delete calls this function and if the option- deletes_use_filter is turned on and CheckKeyMayExist returns false, the delete will be dropped saving:
1. Put of delete type
2. Space in the db,and
3. Compaction time
Test Plan:
make all check;
will run db_stress and db_bench and enhance unit-test once the basic design gets approved
Reviewers: dhruba, haobo, vamsi
Reviewed By: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D11607
Summary: db_stress should alos print complete statistics like db_bench. Needed this when I wanted to measure number of delete-IOs dropped due to CheckKeyMayExist to be introduced to rocksdb codebase later- to make deltes in rocksdb faster
Test Plan: make db_stress;./db_stress --max_key=100 --ops_per_thread=1000 --statistics=1
Reviewers: sheki, dhruba, vamsi, haobo
Reviewed By: dhruba
Differential Revision: https://reviews.facebook.net/D11655
Summary:
There is a new option called hybrid_mode which, when switched on,
causes HBase style compactions. Files from L0 are
compacted back into L0. This meat of this compaction algorithm
is in PickCompactionHybrid().
All files reside in L0. That means all files have overlapping
keys. Each file has a time-bound, i.e. each file contains a
range of keys that were inserted around the same time. The
start-seqno and the end-seqno refers to the timeframe when
these keys were inserted. Files that have contiguous seqno
are compacted together into a larger file. All files are
ordered from most recent to the oldest.
The current compaction algorithm starts to look for
candidate files starting from the most recent file. It continues to
add more files to the same compaction run as long as the
sum of the files chosen till now is smaller than the next
candidate file size. This logic needs to be debated
and validated.
The above logic should reduce write amplification to a
large extent... will publish numbers shortly.
Test Plan: dbstress runs for 6 hours with no data corruption (tested so far).
Differential Revision: https://reviews.facebook.net/D11289