Summary:
Developing a capability for storing values on external backing file(s).
This is just a highly unoptimized first pass - supports:
1) Allocating some portion of external file to be used to store value
2) Freeing the range, enabling it to be reused by other values
As next steps, I plan to:
1) Create some kind of stress testing. Once I can measure stuff, I can focus on optimizing.
2) Optimize locking.
3) Optimize freelist data structure. Currently we have O(n) for both freeing and allocation.
4) Figure out how to do recovery.
Test Plan: Created a unit test.
Reviewers: dhruba, haobo, kailiu
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13389
Summary:
Enable background flush thread in this patch and fix unit tests with:
(1) After background flush, schedule a background compaction if condition satisfied;
(2) Fix a bug that if universal compaction is enabled and number of levels are set to be 0, compaction will not be automatically triggered
(3) Fix unit tests to wait for compaction to finish instead of flush, before checking the compaction results.
Test Plan: pass all unit tests
Reviewers: haobo, xjin, dhruba
Reviewed By: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13461
Summary:
tmpfs might not support fallocate(). Fix unit test so that this
does not cause a unit test to fail.
Test Plan: ./env_test
Reviewers: emayanke, igor, kailiu
Reviewed By: kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13455
Summary:
This is needed to make existing dbs be able to open and also because BytewiseComparator was not changed since leveldb.
The inverted order in the error message caused confusion prebiously
Test Plan: make; open existing db
Reviewers: leveldb, dhruba
Reviewed By: dhruba
Differential Revision: https://reviews.facebook.net/D13449
Summary:
* Logstore requests a valid change of reutrning an empty iterator and not an error in case of no log files.
* Changed the code to return the writebatch containing the sequence number requested from GetupdatesSince even if it lies in the middle. Earlier we used to return the next writebatch,. This also allows me oto guarantee that no files played upon by the iterator are redundant. I mean the starting log file has at least a sequence number >= the sequence number requested form GetupdatesSince.
* Cleaned up redundant logic in Iterator::Next and made a new function SeekToStartSequence for greater readability and maintainibilty.
* Modified a test in db_test accordingly
Please check the logic carefully and suggest improvements. I have a separate patch out for more improvements like restricting reader to read till written sequences.
Test Plan:
* transaction log iterator tests in db_test,
* db_repl_stress.
* rocks_log_iterator_test in fbcode/wormhole/rocksdb/test - 2 tests thriving on hacks till now can get simplified
* testing on the shadow setup for sigma with replication
Reviewers: dhruba, haobo, kailiu, sdong
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13437
Summary:
So far we only have key/value pairs as well as bloom filter stored in the
sst file. It will be great if we are able to store more metadata about
this table itself, for example, the entry size, bloom filter name, etc.
This diff is the first step of this effort. It allows table to keep the
basic statistics mentioned in http://fburl.com/14995441, as well as
allowing writing user-collected stats to stats block.
After this diff, we will figure out the interface of how to allow user to collect their interested statistics.
Test Plan:
1. Added several unit tests.
2. Ran `make check` to ensure it doesn't break other tests.
Reviewers: dhruba, haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13419
Summary: When I debug the unit test failures when enabling background flush thread, I feel the function names can be made clearer for people to understand. Also, if the names are fixed, in many places, some tests' bugs are obvious (and some of those tests are failing). This patch is to clean it up for future maintenance.
Test Plan: Run test suites.
Reviewers: haobo, dhruba, xjin
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13431
Summary:
With this patch, when LRUCache.Insert() is called and the cache is full, it will first try to free up entries whose reference counter is 1 (would become 0 after remo\
ving from the cache). We do it in two passes, in the first pass, we only try to release those unreferenced entries. If we cannot free enough space after traversing t\
he first remove_scan_cnt_ entries, we start from the beginning again and remove those entries being used.
Test Plan: add two unit tests to cover the codes
Reviewers: dhruba, haobo, emayanke
Reviewed By: emayanke
CC: leveldb, emayanke, xjin
Differential Revision: https://reviews.facebook.net/D13377
Summary:
This careless error was causing ASSERT_OK(DestroyDB) to fail in db_test.
Basically .. was being returned as a child of db/archive and ParseFileName returned false on that,
but 'type' was set to LogFile from earlier and not reset. The return of ParseFileName was not being checked to delete the log file or not.
Test Plan: make all check
Reviewers: dhruba, haobo, xjin, kailiu, nkg-
Reviewed By: nkg-
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13413
Summary:
As title. Fix an lint error:
Lint: CppLint Error
Single-argument constructor 'Value(int v)' may inadvertently be used as a type conversion constructor. Prefix the function with the 'explicit' keyword to avoid this, or add an /* implicit */ comment to suppress this warning.
Test Plan: N/A
Reviewers: emayanke, haobo, dhruba
Reviewed By: emayanke
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13401
Summary: virtual NewRandomRWFile is not implemented on EnvHdfs, causing build failure.
Test Plan: make clean; make all check
Reviewers: dhruba, haobo, kailiu
Reviewed By: kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13383
Summary: I have implemented basic simple use case that I need for External Value Store I'm working on. There is a potential for making this prettier by refactoring/combining WritableFile and RandomAccessFile, avoiding some copypasta. However, I decided to implement just the basic functionality, so I can continue working on the other diff.
Test Plan: Added a unittest
Reviewers: dhruba, haobo, kailiu
Reviewed By: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13365
Summary:
The previous release 2.4 had a mapping to alias the older
namespace to rocksdb. This mapping is not needed in the new
release.
Test Plan:
make check
make release
Reviewers: emayanke
Reviewed By: emayanke
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13359
Summary: In some cases, you might not want to store the data log (write ahead log) files in the same dir as the sst files. An example use case is leaf, which stores sst files in tmpfs. And would like to save the log files in a separate dir (disk) to save memory.
Test Plan: make all. Ran db_test test. A few test failing. P2785018. If you guys don't see an obvious problem with the code, maybe somebody from the rocksdb team could help me debug the issue here. Running this on leaf worked well. I could see logs stored on disk, and deleted appropriately after compactions. Obviously this is only one set of options. The unit tests cover different options. Seems like I'm missing some edge cases.
Reviewers: dhruba, haobo, leveldb
CC: xinyaohu, sumeet
Differential Revision: https://reviews.facebook.net/D13239
Summary: While working on D13239, I noticed that the same options are not used for opening and destroying at db. So adding that. Also added asserts for successful DestroyDB calls.
Test Plan: Ran unit tests. Atleast 1 unit test is failing. They failures are a result of some past logic change. I'm not really planning to fix those. But I would like to check this in. And hopefully the respective unit test owners can fix the broken tests
Reviewers: leveldb, haobo
CC: xinyaohu, sumeet, dhruba
Differential Revision: https://reviews.facebook.net/D13329
Summary:
In talbe.cc, when reading the metablock, it uses BytewiseComparator();
However in table_builder.cc, we use r->options.comparator. After tracing
the creation of r->options.comparator, I found this comparator is an
InternalKeyComparator, which wraps the user defined comparator(details
can be found in DBImpl::SanitizeOptions().
I encountered this problem when adding metadata about "bloom filter"
before. With different comparator, we may fail to do the binary sort.
Current code works well since there is only one entry in meta block.
Test Plan:
make all check
I've also tested this change in https://reviews.facebook.net/D8283 before.
Reviewers: dhruba, haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13335
Summary: Split Unref into two parts -> cheap and expensive. Try to call expensive Unref outside of critical section to decrease lock contention.
Test Plan: unittests
Reviewers: dhruba, haobo
Reviewed By: dhruba
CC: leveldb, kailiu
Differential Revision: https://reviews.facebook.net/D13299
Summary:
Previous patch introduced a unit test failure in
DBTest.NumImmutableMemTable because of change in property names.
Test Plan:
Reviewers:
CC:
Task ID: #
Blame Rev:
Summary:
Change namespace from leveldb to rocksdb. This allows a single
application to link in open-source leveldb code as well as
rocksdb code into the same process.
Test Plan: compile rocksdb
Reviewers: emayanke
Reviewed By: emayanke
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13287
Summary: This is useful to keep track of refreshes in transaction log iterator
Test Plan: make; db_stress --statistics=1 shows it
Reviewers: dhruba, haobo
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13281
Summary:
As explained in comments in GetLiveFiles in db.h, this option will cause flush to be skipped in GetLiveFiles because some use-cases use GetSortedWalFiles after GetLiveFiles to generate more complete snapshots.
Using GetSortedWalFiles after GetLiveFiles allows us to not Flush in GetLiveFiles first because wals have everything.
Note: file deletions will be disabled before calling GLF or GSWF so live logs will not move to archive logs or get delted.
Note: Manifest file is truncated to a proper value in GLF, so it will always reply from the proper wal files on a restart
Test Plan: make
Reviewers: dhruba, haobo
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13257
Summary: as title
Test Plan: make check
Reviewers: emayanke
Reviewed By: emayanke
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13263
Summary: As title. This is just a quick hack and not ready for commit. fails a lot of unit test. I will test/debug it directly in ViewState shadow .
Test Plan: Try it in shadow test.
Reviewers: dhruba, xjin
CC: leveldb
Differential Revision: https://reviews.facebook.net/D12933
Summary:
We saw SIGSEGV when set options.num_levels=1 in universal compaction
style. Dug into this issue for a while, and finally found the root cause (thank Haobo for discussion).
Test Plan: Add new unit test. It throws SIGSEGV without this change. Also run "make all check".
Reviewers: haobo, dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13251
Summary: Will use iterators to verify keys in the db for half of its keys and Gets for the other half.
Test Plan: ./db_stress --max_key=1000 --ops_per_thread=100
Reviewers: dhruba, haobo
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13227
Summary: as title. unit test not polished. this is for a quick live test
Test Plan: live
Reviewers: dhruba
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13221
Summary:
I am planning to add more to statistics classes but found current way of using enum is very verbose and unnecessarily increase the
difficulity of adding new statistics.
In this diff I removed the code that explicitly specifies the value of each enum entry. This will help us easily add new statistic
items more conveniently without manually adding the value of other enum entries by one.
Test Plan: make; make check;
Reviewers: haobo, dhruba, xjin, emayanke, vamsi
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13197
Summary: Using an iterator instead of the Get method, each thread goes through a portion of the database and verifies values by comparing to the shared state.
Test Plan:
./db_stress --db=/tmp/tmppp --max_key=10000 --ops_per_thread=10000
To test some basic cases, the following lines can be added (each set in turn) to the verifyDb method with the following expected results:
// Should abort with "Unexpected value found"
shared.Delete(start);
// Should abort with "Value not found"
WriteOptions write_opts;
db_->Delete(write_opts, Key(start));
// Should succeed
WriteOptions write_opts;
shared.Delete(start);
db_->Delete(write_opts, Key(start));
// Should abort with "Value not found"
WriteOptions write_opts;
db_->Delete(write_opts, Key(start + (end-start)/2));
// Should abort with "Value not found"
db_->Delete(write_opts, Key(end-1));
// Should abort with "Unexpected value"
shared.Delete(end-1);
// Should abort with "Unexpected value"
shared.Delete(start + (end-start)/2);
// Should abort with "Value not found"
db_->Delete(write_opts, Key(start));
shared.Delete(start);
db_->Delete(write_opts, Key(end-1));
db_->Delete(write_opts, Key(end-2));
To test the out of range abort, change the key in the for loop to Key(i+1), so that the key defined by the index i is now outside of the supposed range of the database.
Reviewers: emayanke
Reviewed By: emayanke
CC: dhruba, xjin
Differential Revision: https://reviews.facebook.net/D13071
Summary: as title
Test Plan: make check
Reviewers: dhruba, emayanke
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13179
Summary: A previous diff moved these outside of lock protected area. Moved back in now. Also moved tmp_batch_ update outside of lock protected area, as only the single write thread can access it.
Test Plan: make check
Reviewers: dhruba
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13137
Summary: The original optimization missed updating links other than the lowest level.
Test Plan: make check; perf_context_test
Reviewers: dhruba
Reviewed By: dhruba
CC: leveldb, adsharma
Differential Revision: https://reviews.facebook.net/D13119
Summary:
The constructor for Vector memtable has a parameter called 'count'
that specifies the capacity of the vector to be reserved at allocation
time. It was incorrectly used to initialize the size of the vector.
Test Plan: Enhanced db_test.
Reviewers: haobo, xjin, emayanke
Reviewed By: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13083
Summary:
Added a new api to the Environment that allows clearing out not-needed
pages from the OS cache. This will be helpful when the compressed
block cache replaces the OS cache.
Test Plan: EnvPosixTest.InvalidateCache
Reviewers: haobo
Reviewed By: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13041
Summary: Adding in the iterpercent flag to tests.
Test Plan: make crash_test
Reviewers: emayanke
Reviewed By: emayanke
Differential Revision: https://reviews.facebook.net/D13035
Summary:
There is a use-case where we want to insert data into rocksdb as
fast as possible. Vector rep is used for this purpose.
The background flush thread needs to flush the vectorrep to
storage. It acquires the dblock then sorts the vector, releases
the dblock and then writes the sorted vector to storage. This is
suboptimal because the lock is held during the sort, which
prevents new writes for occuring.
This patch moves the sorting of the vector rep to outside the
db mutex. Performance is now as fastas the underlying storage
system. If you are doing buffered writes to rocksdb files, then
you can observe throughput upwards of 200 MB/sec writes.
This is an early draft and not yet ready to be reviewed.
Test Plan:
make check
Task ID: #
Blame Rev:
Reviewers: haobo
Reviewed By: haobo
CC: leveldb, haobo
Differential Revision: https://reviews.facebook.net/D12987
Summary:
Added MultiIterate() which does a seek and some Next/Prev
calls. Iterator status is checked only, no data integrity check
Test Plan:
make db_stress
./db_stress --iterpercent=<nonzero value> --readpercent=, etc.
Reviewers: emayanke, dhruba, xjin
Reviewed By: emayanke
CC: leveldb
Differential Revision: https://reviews.facebook.net/D12915
Summary: as title
Test Plan: make db_test; ./db_test
Reviewers: dhruba
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13005