Summary:
Task 4580155. Some conditions in DBImpl::MakeRoomForWrite can be cached in
ColumnFamilyData, because theirs value can be changed only during compaction,
adding new memtable and/or add recalculation of compaction score.
These conditions are:
cfd->imm()->size() == cfd->options()->max_write_buffer_number - 1
cfd->current()->NumLevelFiles(0) >= cfd->options()->level0_stop_writes_trigger
cfd->options()->soft_rate_limit > 0.0 &&
(score = cfd->current()->MaxCompactionScore()) > cfd->options()->soft_rate_limit
cfd->options()->hard_rate_limit > 1.0 &&
(score = cfd->current()->MaxCompactionScore()) > cfd->options()->hard_rate_limit
P.S.
As it's my first diff, Siying suggested to add everybody as a reviewers
for this diff. Sorry, if I forgot someone or add someone by mistake.
Test Plan: make all check
Reviewers: haobo, xjin, dhruba, yhchiang, zagfox, ljin, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D19311
Summary: Include max_write_buffer_number >= 2 to SanitizeOptions.
Test Plan: make all check
Reviewers: haobo, sdong, igor, ljin
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D19077
Summary:
We've seen some production issues where column family is detected as stale, although there is only one column family in the system. This is a quick fix that:
1) doesn't flush stale column families if there's only one of them
2) Use 4 as a coefficient instead of 2 for determening when a column family is stale. This will make flushing less aggressive, while still keep a nice dynamic flushing of very stale CFs.
Test Plan: make check
Reviewers: dhruba, haobo, ljin, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D18861
Summary:
Introducing new compaction style -- FIFO.
FIFO compaction style has write amplification of 1 (+1 for WAL) and it deletes the oldest files when the total DB size exceeds pre-configured values.
FIFO compaction style is suited for storing high-frequency event logs.
Test Plan: Added a unit test
Reviewers: dhruba, haobo, sdong
Reviewed By: dhruba
Subscribers: alberts, leveldb
Differential Revision: https://reviews.facebook.net/D18765
Summary:
This diff addresses task #4296714 and rethinks how users provide us with TablePropertiesCollectors as part of Options.
Here's description of task #4296714:
I'm debugging #4295529 and noticed that our count of user properties kDeletedKeys is wrong. We're sharing one single InternalKeyPropertiesCollector with all Table Builders. In LOG Files, we're outputting number of kDeletedKeys as connected with a single table, while it's actually the total count of deleted keys since creation of the DB.
For example, this table has 3155 entries and 1391828 deleted keys.
The problem with current approach that we call methods on a single TablePropertiesCollector for all the tables we create. Even worse, we could do it from multiple threads at the same time and TablePropertiesCollector has no way of knowing which table we're calling it for.
Good part: Looks like nobody inside Facebook is using Options::table_properties_collectors. This means we should be able to painfully change the API.
In this change, I introduce TablePropertiesCollectorFactory. For every table we create, we call `CreateTablePropertiesCollector`, which creates a TablePropertiesCollector for a single table. We then use it sequentially from a single thread, which means it doesn't have to be thread-safe.
Test Plan:
Added a test in table_properties_collector_test that fails on master (build two tables, assert that kDeletedKeys count is correct for the second one).
Also, all other tests
Reviewers: sdong, dhruba, haobo, kailiu
Reviewed By: kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D18579
Summary:
replace the super version acquisision in tailing itrator with thread
local
Test Plan: will post results
Reviewers: igor, haobo, sdong, yhchiang, dhruba
Reviewed By: igor
CC: leveldb
Differential Revision: https://reviews.facebook.net/D17757
Summary:
Similar to GetImp(), use SuperVersion from thread local instead of acquriing mutex.
I don't expect this change will make a dent on NewIterator() performance
because the bottleneck seems to be on the rest part of the API
Test Plan:
make asan_check
will post perf numbers
Reviewers: haobo, igor, sdong, dhruba, yhchiang
Reviewed By: sdong
CC: leveldb
Differential Revision: https://reviews.facebook.net/D17643
Summary:
I'm cleaning up some code preparing for the big diff review tomorrow. This is the first part of the cleanup.
Changes are mostly cosmetic. The goal is to decrease amount of code difference between columnfamilies and master branch.
This diff also fixes race condition when dropping column family.
Test Plan: Ran db_stress with variety of parameters
Reviewers: dhruba, haobo
Differential Revision: https://reviews.facebook.net/D16833
Summary:
Column family IDs should be unique, even if column family is dropped. To achieve this, we save max column family in manifest.
Note that the diff is still not ready. I'm only using differential to move the patch to my Mac machine.
Test Plan: added a test to column_family_test
Reviewers: dhruba, haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D16581
Summary: Added a function DeleteSuperVersion that can be called in DBImpl destructor before PurgingObsoleteFiles. That way, PurgeObsoleteFiles will be able to delete all files held by alive super versions.
Test Plan: column_family_test with valgrind
Reviewers: dhruba, haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D16545
Summary:
I though I might get away with as little changes to LogAndApply() as possible. It turns out this is not the case.
This diff introduces different behavior of LogAndApply() for three cases:
1. column family add
2. column family drop
3. no-column family manipulation
(1) and (2) don't support group commit yet.
There were a lot of problems with old version od LogAndApply, detected by db_stress. The biggest was non-atomicity of manifest writes and metadata changes (i.e. if column family add is in manifest, it also has to be in in-memory data structure).
Test Plan: db_stress
Reviewers: dhruba, haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D16491
Summary:
* Add ColumnFamilyHandle::GetID() function. Client needs to know column family's ID to be able to construct WriteBatch
* Handle WriteBatch::Handler failure gracefully. Since WriteBatch is not a very smart function (it takes raw CF id), client can add data to WriteBatch for column family that doesn't exist. In that case, we need to gracefully return failure status from DB::Write(). To do that, I added a return Status to WriteBatch functions PutCF, DeleteCF and MergeCF.
Test Plan: Added test to column_family_test
Reviewers: dhruba, haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D16323
Summary:
The change to the public behavior:
* When opening a DB or creating new column family client gets a ColumnFamilyHandle.
* As long as column family handle is alive, client can do whatever he wants with it, even drop it
* Dropped column family can still be read from (using the column family handle)
* Added a new call CloseColumnFamily(). Client has to close all column families that he has opened before deleting the DB
* As soon as column family is closed, any calls to DB using that column family handle will fail (also any outstanding calls)
Internally:
* Ref-counting ColumnFamilyData
* New thread-safety for ColumnFamilySet
* Dropped column families are now completely dropped and their memory cleaned-up
Test Plan: added some tests to column_family_test
Reviewers: dhruba, haobo, kailiu, sdong
CC: leveldb
Differential Revision: https://reviews.facebook.net/D16101
Summary: When we open a DB, we should dump only DBOptions and then when we create a new column family, we dump ColumnFamilyOptions for each one.
Test Plan: make check, confirm contents of the LOG
Reviewers: dhruba, haobo, sdong, kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D16011
Summary: Revised thread-safety guarantees and implemented a way to spinlock the object.
Test Plan: make check
Reviewers: dhruba, haobo, sdong, kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15975
Summary:
WriteBatch can have multiple column families in one batch. Every column family has different options. So we have to add a way for write batch to get options for an arbitrary column family.
This required a bit more acrobatics since lots of interfaces had to be changed.
Test Plan: make check
Reviewers: dhruba, haobo, sdong, kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15957
Summary: Replaced most of occurrences of Options with more specific DBOptions. This brings us very close to supporting different configuration options for each column family.
Test Plan: make check
Reviewers: dhruba, haobo, kailiu, sdong
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15933
Summary:
Lots of code expects Options on construction/function call. My original idea was to split Options argument into ColumnFamilyOptions and DBOptions (the latter only if needed). However, this will require huge code changes very deep in the stack.
The better idea is to have ColumnFamilyData hold both ColumnFamilyOptions and Options. ColumnFamilyData::Options would be constructed from DBOptions (same for each column family) and ColumnFamilyOptions (different for each column family)
Now when we construct a class or call any method that requires Options, we can just push him ColumnFamilyData::Options and be sure that it's using column-family-specific settings.
Test Plan: make check
Reviewers: dhruba, haobo, kailiu, sdong
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15927
Summary:
Adapting table cache to column families is interesting. We want table cache to be global LRU, so if some column families are use not as often as others, we want them to be evicted from cache. However, current TableCache object also constructs tables on its own. If table is not found in the cache, TableCache automatically creates new table. We want each column family to be able to specify different table factory.
To solve the problem, we still have a single LRU, but we provide the LRUCache object to TableCache on construction. We have one TableCache per column family, but the underyling cache is shared by all TableCache objects.
This allows us to have a global LRU, but still be able to support different table factories for different column families. Also, in the future it will also be able to support different directories for different column families.
Test Plan: make check
Reviewers: dhruba, haobo, kailiu, sdong
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15915
Summary: InternalStats is a messy thing, keeping both DB data and column family data. However, it's better off living in ColumnFamilyData than in DBImpl. For now, at least.
Test Plan: make check
Reviewers: dhruba, kailiu, haobo, sdong
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15879
Summary:
There are three SanitizeOption-s now : one for DBOptions, one for ColumnFamilyOptions and one for Options (which just calls the other two)
I have also reshuffled some options -- table_cache options and info_log should live in DBOptions, for example.
Test Plan: make check doesn't complain
Reviewers: dhruba, haobo, kailiu, sdong
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15873
Summary: Support for different column families in Iterator and MultiGet code path.
Test Plan: make check
Reviewers: dhruba, haobo, kailiu, sdong
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15849
Summary: Compaction picker and internal key comparator are different for each column family (not global), so they should live in ColumnFamilyData
Test Plan: make check
Reviewers: dhruba, haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15801
Summary:
Sometimes we iterate through column families, and unlock the mutex in the body of the iteration. While mutex is unlocked, some column family might be created or dropped. We need to be able to continue iterating through column families even though our current column family got dropped.
This diff implements circular linked lists that connect all column families. It then uses the link list to enable iterating through linked lists. Even if the column family is dropped, its next_ pointer still can be used to advance to another alive column family.
Test Plan: make check
Reviewers: dhruba, haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15603
Summary: Making room for write will be the hardest part of the column family implementation. For now, I just iterate through all column families and run MakeRoomForWrite() for every one.
Test Plan: make check does not complain
Reviewers: dhruba, haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15597
Summary: ColumnFamilyData grew a lot, there's much more data that it holds now. It makes more sense to encapsulate it better by making it a class.
Test Plan: make check
Reviewers: dhruba, haobo, kailiu, sdong
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15579
Summary: This one is big. It adds ability to write to and read from different column families (see the unit test). It also supports recovery of different column families from log, which was the hardest part to reason about. We need to make sure to never delete the log file which has unflushed data from any column family. To support that, I added another concept, which is versions_->MinLogNumber()
Test Plan: Added a unit test in column_family_test
Reviewers: dhruba, haobo, sdong, kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15537
Summary: All memtables and immutable memtables are moved from DBImpl to ColumnFamilyData. For now, they are all referenced from default column family in DBImpl. It shouldn't be hard to get them from custom column family.
Test Plan: make check
Reviewers: dhruba, kailiu, sdong
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15459
Summary:
I created a separate class ColumnFamilySet to keep track of column families. Before we did this in VersionSet and I believe this approach is cleaner.
Let me know if you have any comments. I will commit tomorrow.
Test Plan: make check
Reviewers: dhruba, haobo, kailiu, sdong
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15357