Summary:
In InternalGet, BlockReader returns an Iterator which is legitimately freed at the end of the 'else' scope. BUT there is a break statement in between and must be freed there too!
The best solution would be to move to unique_ptr and let it handle. Changed it to a unique_ptr.
Test Plan: valgrind ./db_test;make all check
Reviewers: dhruba, haobo
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D12681
Summary: PutValues calls Flush in ttl_test which clears memtables. KeyMayExist called after that will not be able to read those key-values
Test Plan: make all check OPT=-g
Reviewers:leveldb
Summary:
The way counters/statistics are implemented in rocksdb demands that enum Tickers and TickerNameMap follow the same order, otherwise statistics exposed from fbcode/rocks get out-of-sync. 2 counters for prefix had violated this order and when I built counters for fbcode/mcrocksdb, statistics for sequence number were appearing out-of-sync.
The other change is to record sequence-number using setTickerCount only and not recordTick. This is because of difference in statistics as understood by rocks/utils which uses ServiceData::statistics function and rocksdb statistics. In rocksdb there is just 1 counter for a countername. But in ServiceData there are 4 independent buckets for every countername-Count, Sum, Average and Rate. SetTickerCount and RecordTick update the same variable in rocksdb but different buckets in ServiceData. Therefore, I had to choose one consistent function from RecordTick or SetTickerCount for sequence number in rocksdb. I chose SetTickerCount because the statistics object in options passed during rocksdb-open is user-dependent and SetTickerCount makes sense there.
There will be a corresponding diff to mcorcksdb in fbcode shortly.
Test Plan: make all check; check ticker value using fprintfs
Reviewers: dhruba
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D12669
Summary: db->DeleteFile calls ParseFileName to check name that was returned for sst file. Now, sst filename is returned using TableFileName which uses MakeFileName. This puts a / at the front of the name and ParseFileName doesn't like that. Changed ParseFileName to tolerate /s at the beginning. The test delet_file_test used to pass earlier because this behaviour of MakeFileName had been changed a while back to not return a / during which delete_file_test was checked in. But MakeFileName had to be reverted to add / at the front because GetLiveFiles used at many places outside rocksdb used the previous behaviour of MakeFileName.
Test Plan: make;./delete_filetest;make all check
Reviewers: dhruba, haobo, vamsi
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D12663
Summary: value needed to be filtered of timestamp
Test Plan: ./ttl_test
Reviewers: dhruba, haobo, vamsi
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D12657
Summary: WouldBlock was an internediate statue but was changed to Incomplete
Test Plan: visual
Reviewers: dhruba
Differential Revision: https://reviews.facebook.net/D12651
Summary: // won't hurt but a missing / hurts sometimes
Test Plan: make all check; ./db_repl_stress
Reviewers: vamsi
Reviewed By: vamsi
CC: dhruba
Differential Revision: https://reviews.facebook.net/D12621
Summary:
The DeleteFile API was removing files inside the db-lock. This
is now changed to remove files outside the db-lock.
The GetLiveFilesMetadata() returns the smallest and largest
seqnuence number of each file as well.
Test Plan: deletefile_test
Reviewers: emayanke, haobo
Reviewed By: haobo
CC: leveldb
Maniphest Tasks: T63
Differential Revision: https://reviews.facebook.net/D12567
Summary: Let TransformRepFactory own the passed in transform. Also make it better encapsulated.
Test Plan: make valgrind_check;
Reviewers: dhruba, emayanke
Reviewed By: emayanke
CC: leveldb
Differential Revision: https://reviews.facebook.net/D12591
Summary:
If ReadOptions.non_blocking_io is set to true, then KeyMayExists
and Iterators will return data that is cached in RAM.
If the Iterator needs to do IO from storage to serve the data,
then the Iterator.status() will return Status::IsRetry().
Test Plan:
Enhanced unit test DBTest.KeyMayExist to detect if there were are IOs
issues from storage. Added DBTest.NonBlockingIteration to verify
nonblocking Iterations.
Reviewers: emayanke, haobo
Reviewed By: haobo
CC: leveldb
Maniphest Tasks: T63
Differential Revision: https://reviews.facebook.net/D12531
Summary:
As title. This is possible as tickers are atomic now.
db_bench on high qps in-memory muti-thread random get workload, showed ~5% throughput improvement.
Test Plan: make check; db_bench; db_stress
Reviewers: dhruba
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D12555
Summary: In KeyMayExist.db_test we do a Flush which causes sst file to be written and added as open file in TableCache, but block cache for the file is not populated. So value_found should have been false where it was true and KeyMayExist.db_test should not have passed earlier. But it passed because BlockReader in table/table.cc takes 2 default arguments at the end called for_compaction and no_io. Although I passed no_io=true from InternalGet to BlockReader, but it understood for_compaction=true and defaulted no_io to false. This is a bug and although will be removed by Dhruba's new patch to incorporate no_io in readoptions, I'm submitting this patch to fix this bug independently of that patch.
Test Plan: make all check
Reviewers: dhruba, haobo
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D12537
Summary: initialiszer list is fasteri/preferable because it can straightaway call the constructor for this object, otherwise it will be created first and then again initialized. Although gain may not be much in this case because files_ is just a pointer and not a complex object, this is recommended practice.
Test Plan: make all check
Reviewers: dhruba, haobo
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D12519
Summary: There is a memory leak because TransformRepFactory does not delete its SliceTransform pointer. This patch adds a delete to the destructor.
Test Plan:
make check
make valgrind_check
Reviewers: dhruba, emayanke, haobo
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D12513
Summary: Fix code so that the filter_block layer only assumes keys are internal when prefix_extractor is set.
Test Plan: ./filter_block_test
Reviewers: dhruba, haobo
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D12501
Summary: Replace include/leveldb with include/rocksdb.
Test Plan:
make clean; make check
make clean; make release
Differential Revision: https://reviews.facebook.net/D12489
Summary:
This patch adds three new MemTableRep's: UnsortedRep, PrefixHashRep, and VectorRep.
UnsortedRep stores keys in an std::unordered_map of std::sets. When an iterator is requested, it dumps the keys into an std::set and iterates over that.
VectorRep stores keys in an std::vector. When an iterator is requested, it creates a copy of the vector and sorts it using std::sort. The iterator accesses that new vector.
PrefixHashRep stores keys in an unordered_map mapping prefixes to ordered sets.
I also added one API change. I added a function MemTableRep::MarkImmutable. This function is called when the rep is added to the immutable list. It doesn't do anything yet, but it seems like that could be useful. In particular, for the vectorrep, it means we could elide the extra copy and just sort in place. The only reason I haven't done that yet is because the use of the ArenaAllocator complicates things (I can elaborate on this if needed).
Test Plan:
make -j32 check
./db_stress --memtablerep=vector
./db_stress --memtablerep=unsorted
./db_stress --memtablerep=prefixhash --prefix_size=10
Reviewers: dhruba, haobo, emayanke
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D12117
Summary:
Jenkin reports errors that:
* Linking error on some machines. The error message shows it cannot find some gcov related symbols.
* lcov error due to the version issues.
Test Plan:
run make in different platforms
Reviewers:
CC:
Task ID: #
Blame Rev:
Summary: If use_prefix_filters is set and read_range>1, then the random seeks will set a the prefix filter to be the prefix of the key which was randomly selected as the target. Still need to add statistics (perhaps in a separate diff).
Test Plan: ./db_bench --benchmarks=fillseq,prefixscanrandom --num=10000000 --statistics=1 --use_prefix_blooms=1 --use_prefix_api=1 --bloom_bits=10
Reviewers: dhruba
Reviewed By: dhruba
CC: leveldb, haobo
Differential Revision: https://reviews.facebook.net/D12273
Summary: An api to query the level, key ranges, size etc for each SST file and an api to delete a specific file from the db and all associated state in the bookkeeping datastructures.
Notes: Editing the manifest version does not release the obsolete files right away. However deleting the file directly will mess up the iterator. We may need a more aggressive/timely file deletion api.
I have used std::unique_ptr - will switch to boost:: since this is external. thoughts?
Unit test is fragile right now as it expects the compaction at certain levels.
Test Plan: unittest
Reviewers: dhruba, vamsi, emayanke
CC: zshao, leveldb, haobo
Task ID: #
Blame Rev:
Summary: Previously, RocksDB's build scripts used relative pathnames like ./build_detect_platform. This can cause problems if the user uses CDPATH. Also, it just doesn't seem right to me.
Test Plan:
make clean
make -j32 check
Reviewers: MarkCallaghan, dhruba, kailiu
Reviewed By: kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D12459
Summary:
Sometimes you don't need to iterate through the whole WriteBatch. This diff makes the Handler member functions return a bool that indicates whether to abort or not. If they return true, the iteration stops.
One thing I just thought of is that this will break backwards-compability. Maybe it would be better to add a virtual member function WriteBatch::Handler::ShouldAbort() that returns false by default. Comments requested.
I still have to add a new unit test for the abort code, but let's finalize the API first.
Test Plan: make -j32 check
Reviewers: dhruba, haobo, vamsi, emayanke
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D12339
Summary: Was going through the iterator related code, did some cleanup along the way. Basically replaced array with vector and adopted range based loop where applicable.
Test Plan: make check; make valgrind_check
Reviewers: dhruba, emayanke
Reviewed By: emayanke
CC: leveldb
Differential Revision: https://reviews.facebook.net/D12435
Summary:
Most code change in this diff is code cleanup/rewrite. The logic changes include:
(1) add universal compaction to db_crashtest2.py
(2) randomly set --test_batches_snapshots to be 0 or 1 in db_crashtest2.py. Old codes always use 1.
(3) use different tmp directory as db directory in different runs. I saw some intermittent errors in my local tests. Use of different tmp directory seems to be able to solve the issue.
Test Plan: Have run "make crashtest" for multiple times. Also run "make all check"
Reviewers: emayanke, dhruba, haobo
Reviewed By: emayanke
Differential Revision: https://reviews.facebook.net/D12369
Test Plan:
- make all check;
- make release;
- make stringappend_test; ./stringappend_test
Reviewers: haobo, emayanke
Reviewed By: haobo
CC: leveldb, kailiu
Differential Revision: https://reviews.facebook.net/D12381
Summary: Also expanded class LogFile to have startSequene and FileSize and exposed it publicly
Test Plan: make all check
Reviewers: dhruba, haobo
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D12087
Summary:
-Added null checks and revisions to DBIter::MergeValuesNewToOld()
-Added DBIter test to stringappend_test
-Major fix with Merge and TTL
More plans for fixes later.
Test Plan:
-make clean; make stringappend_test -j 32; ./stringappend_test
-make all check;
Reviewers: haobo, emayanke, vamsi, dhruba
Reviewed By: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D12315
Summary:
The flags is accidentally introduced, and resulted in problems in 3rd party release.
Test Plan:
run make in all platforms (when doing the 3rd party release)
Summary:
* Added LIBNAME to enable configurable library name.
* remove/check fPIC in linux platform from build_detect_platform
Test Plan: make
Reviewers: emayanke
Differential Revision: https://reviews.facebook.net/D12321
Summary: statistic for sequence number is needed by wormhole. setTickerCount is demanded for this statistic. I can't simply recordTick(max_sequence) when db recovers because the statistic iobject is owned by client and may/may not be reset during reopen. Eg. statistic is reset in mcrocksdb whereas it is not in db_stress. Therefore it is best to go with setTickerCount
Test Plan: ./db_stress ... --statistics=1 and observed expected sequence number
Reviewers: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D12327
Summary:
In release, "found variable assigned but not used anywhere". Changed it to work with
assert. Someone accept this :).
Test Plan: make release -j 32
Reviewers: haobo, dhruba, emayanke
Reviewed By: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D12309
Summary:
Updated db_bench and utilities/merge_operators.h to allow for dynamic benchmarking
of merge operators in db_bench. Added a new test (--benchmarks=mergerandom), which performs
a bunch of random Merge() operations over random keys. Also added a "--merge_operator=" flag
so that the tester can easily benchmark different merge operators. Currently supports
the PutOperator and UInt64Add operator. Support for stringappend or list append may come later.
Test Plan:
1. make db_bench
2. Test the PutOperator (simulating Put) as follows:
./db_bench --benchmarks=fillrandom,readrandom,updaterandom,readrandom,mergerandom,readrandom --merge_operator=put
--threads=2
3. Test the UInt64AddOperator (simulating numeric addition) similarly:
./db_bench --value_size=8 --benchmarks=fillrandom,readrandom,updaterandom,readrandom,mergerandom,readrandom
--merge_operator=uint64add --threads=2
Reviewers: haobo, dhruba, zshao, MarkCallaghan
Reviewed By: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D11535
Summary:
Previously I changed the line `source ./fbcode.gcc471.sh` to `source fbcode.gcc471.sh`. It works in my devbox but failed in some jenkin servers. I revert the previous code to make sure it works well under all circumstances.
Test Plan:
Test in the jenkin server as well as dev box.
Reviewers:
CC:
Task ID: #
Blame Rev:
Summary: As Aaron suggested, there are quite some problems with our Makefile and scripts. So in this diff I did some cleanup for them and revise some part of the scripts/makefile to help people better understand some mysterious parts.
Test Plan:
Ran make in several modes;
Ran the updated scripts.
Reviewers: dhruba, emayanke, akushner
Differential Revision: https://reviews.facebook.net/D12285
Summary: I changed the db_stress configs, but forgot to update the scripts using the old configs.
Test Plan: 'make blackbox_crash_test' and 'make whitebox_crash_test' start running normally now (I haven't run them til the end, though).
Reviewers: vamsi
Reviewed By: vamsi
CC: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D12303
Summary:
Minor fix to current codes, including: coding style, output format,
comments. No major logic change. There are only 2 real changes, please see my inline comments.
Test Plan: make all check
Reviewers: haobo, dhruba, emayanke
Differential Revision: https://reviews.facebook.net/D12297
Summary:
This patch adds the ability for the user to add sequences of arbitrary data (blobs) to write batches. These blobs are saved to the log along with everything else in the write batch. You can add multiple blobs per WriteBatch and the ordering of blobs, puts, merges, and deletes are preserved.
Blobs are not saves to SST files. RocksDB ignores blobs in every way except for writing them to the log.
Before committing this patch, I need to add some test code. But I'm submitting it now so people can comment on the API.
Test Plan: make -j32 check
Reviewers: dhruba, haobo, vamsi
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D12195
Summary:
As title. No locking/atomic is needed due to thread local. There is also no need to modify the existing client interface, in order to expose related counters.
perf_context_test shows a simple example of retrieving the number of user key comparison done for each put and get call. More counters could be added later.
Sample output
./perf_context_test 1000000
==== Test PerfContextTest.KeyComparisonCount
Inserting 1000000 key/value pairs
...
total user key comparison get: 43446523
total user key comparison put: 8017877
max user key comparison get: 88939
avg user key comparison get:43
Basically, the current skiplist does well on average, but could perform poorly in extreme cases.
Test Plan: run perf_context_test <total number of entries to put/get>
Reviewers: dhruba
Differential Revision: https://reviews.facebook.net/D12225
Summary: added options to Dump() I missed in D12027. I also ran a script to look for other missing options and found a couple which I added. Should we also print anything for "PrepareForBulkLoad", "memtable_factory", and "statistics"? Or should we leave those alone since it's not easy to print useful info for those?
Test Plan: run anything and look at LOG file to make sure these are printed now.
Reviewers: dhruba
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D12219
Summary:
With Merge returning bool, it can keep failing silently(eg. While faling to fetch timestamp in TTL). We need to detect this through a rocksdb counter which can get bumped whenever Merge returns false. This will also be super-useful for the mcrocksdb-counter service where Merge may fail.
Added a counter NUMBER_MERGE_FAILURES and appropriately updated db/merge_helper.cc
I felt that it would be better to directly add counter-bumping in Merge as a default function of MergeOperator class but user should not be aware of this, so this approach seems better to me.
Test Plan: make all check
Reviewers: dnicholas, haobo, dhruba, vamsi
CC: leveldb
Differential Revision: https://reviews.facebook.net/D12129
Summary: Similar to v2 (db and table code understands prefixes), but use ReadOptions as in v3. Also, make the CreateFilter code faster and cleaner.
Test Plan: make db_test; export LEVELDB_TESTS=PrefixScan; ./db_test
Reviewers: dhruba
Reviewed By: dhruba
CC: haobo, emayanke
Differential Revision: https://reviews.facebook.net/D12027