rocksdb

Commit Graph

Author	SHA1	Message	Date
Levi Tamasi	06c3b85b9a	Disallow using the base DB's storage directory as blob_dir in BlobDB (#6810 ) Summary: https://github.com/facebook/rocksdb/pull/6807 extends the logic that identifies and purges obsolete files to blob files handled by RocksDB itself. In order to prevent that from interfering with the current BlobDB code, we need to make sure that `BlobDBOptions::blob_dir` is different from the storage directories used by the base DB. (Note: this is true by default.) The patch adds a check that explicitly disallows this configuration and returns `Status::NotSupported` from `BlobDB::Open` in such cases. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6810 Test Plan: Tested using the BlobDB mode of `db_bench`. Reviewed By: riversand963 Differential Revision: D21412676 Pulled By: ltamasi fbshipit-source-id: 6630cc7481e48c8bf55d59423b25f14d52ffe681	5 years ago
sdong	fdf882ded2	Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433 ) Summary: When dynamically linking two binaries together, different builds of RocksDB from two sources might cause errors. To provide a tool for user to solve the problem, the RocksDB namespace is changed to a flag which can be overridden in build time. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6433 Test Plan: Build release, all and jtest. Try to build with ROCKSDB_NAMESPACE with another flag. Differential Revision: D19977691 fbshipit-source-id: aa7f2d0972e1c31d75339ac48478f34f6cfcfb3e	5 years ago
Levi Tamasi	1dd7873e08	Remove earlier partial BlobDB GC implementation (#6278 ) Summary: In addition to removing the earlier partially implemented garbage collection logic from the BlobDB codebase, the patch also removes the test cases (as well as the related sync points, as appropriate) that were only relevant for the old implementation, and reworks the remaining ones so they use the new GC logic. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6278 Test Plan: `make check` Differential Revision: D19335226 Pulled By: ltamasi fbshipit-source-id: 0cc1794bc9892feda1426ed5522a318f3cb1b692	5 years ago
Levi Tamasi	7a7ca8eb5b	BlobDB: only compare CF IDs when checking whether an API call is for the default CF (#6226 ) Summary: BlobDB currently only supports using the default column family. The earlier code enforces this by comparing the `ColumnFamilyHandle` passed to the `Get`/`Put`/etc. call with the handle returned by `DefaultColumnFamily` (which, at the end of the day, comes from `DBImpl::default_cf_handle_`). Since other `ColumnFamilyHandle`s can also point to the default column family, this can reject legitimate requests as well. (As an example, with the earlier code, the handle returned by `BlobDB::Open` cannot actually be used in API calls.) The patch fixes this by comparing only the IDs of the column family handles instead of the pointers themselves. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6226 Test Plan: `make check` Differential Revision: D19187461 Pulled By: ltamasi fbshipit-source-id: 54ce2e12ebb1f07e6d1e70e3b1e0213dfa94bda2	5 years ago
Levi Tamasi	583c6953d8	Move out valid blobs from the oldest blob files during compaction (#6121 ) Summary: The patch adds logic that relocates live blobs from the oldest N non-TTL blob files as they are encountered during compaction (assuming the BlobDB configuration option `enable_garbage_collection` is `true`), where N is defined as the number of immutable non-TTL blob files multiplied by the value of a new BlobDB configuration option called `garbage_collection_cutoff`. (The default value of this parameter is 0.25, that is, by default the valid blobs residing in the oldest 25% of immutable non-TTL blob files are relocated.) Pull Request resolved: https://github.com/facebook/rocksdb/pull/6121 Test Plan: Added unit test and tested using the BlobDB mode of `db_bench`. Differential Revision: D18785357 Pulled By: ltamasi fbshipit-source-id: 8c21c512a18fba777ec28765c88682bb1a5e694e	5 years ago
Levi Tamasi	8e7aa62813	BlobDB: Maintain mapping between blob files and SSTs (#6020 ) Summary: The patch adds logic to BlobDB to maintain the mapping between blob files and SSTs for which the blob file in question is the oldest blob file referenced by the SST file. The mapping is initialized during database open based on the information retrieved using `GetLiveFilesMetaData`, and updated after flushes/compactions based on the information received through the `EventListener` interface (or, in the case of manual compactions issued through the `CompactFiles` API, the `CompactionJobInfo` object). Pull Request resolved: https://github.com/facebook/rocksdb/pull/6020 Test Plan: Added a unit test; also tested using the BlobDB mode of `db_bench`. Differential Revision: D18410508 Pulled By: ltamasi fbshipit-source-id: dd9e778af781cfdb0d7056298c54ba9cebdd54a5	5 years ago
anand76	fefd4b98c5	Introduce a new MultiGet batching implementation (#5011 ) Summary: This PR introduces a new MultiGet() API, with the underlying implementation grouping keys based on SST file and batching lookups in a file. The reason for the new API is twofold - the definition allows callers to allocate storage for status and values on stack instead of std::vector, as well as return values as PinnableSlices in order to avoid copying, and it keeps the original MultiGet() implementation intact while we experiment with batching. Batching is useful when there is some spatial locality to the keys being queries, as well as larger batch sizes. The main benefits are due to - 1. Fewer function calls, especially to BlockBasedTableReader::MultiGet() and FullFilterBlockReader::KeysMayMatch() 2. Bloom filter cachelines can be prefetched, hiding the cache miss latency The next step is to optimize the binary searches in the level_storage_info, index blocks and data blocks, since we could reduce the number of key comparisons if the keys are relatively close to each other. The batching optimizations also need to be extended to other formats, such as PlainTable and filter formats. This also needs to be added to db_stress. Benchmark results from db_bench for various batch size/locality of reference combinations are given below. Locality was simulated by offsetting the keys in a batch by a stride length. Each SST file is about 8.6MB uncompressed and key/value size is 16/100 uncompressed. To focus on the cpu benefit of batching, the runs were single threaded and bound to the same cpu to eliminate interference from other system events. The results show a 10-25% improvement in micros/op from smaller to larger batch sizes (4 - 32). Batch Sizes 1 \| 2 \| 4 \| 8 \| 16 \| 32 Random pattern (Stride length 0) 4.158 \| 4.109 \| 4.026 \| 4.05 \| 4.1 \| 4.074 - Get 4.438 \| 4.302 \| 4.165 \| 4.122 \| 4.096 \| 4.075 - MultiGet (no batching) 4.461 \| 4.256 \| 4.277 \| 4.11 \| 4.182 \| 4.14 - MultiGet (w/ batching) Good locality (Stride length 16) 4.048 \| 3.659 \| 3.248 \| 2.99 \| 2.84 \| 2.753 4.429 \| 3.728 \| 3.406 \| 3.053 \| 2.911 \| 2.781 4.452 \| 3.45 \| 2.833 \| 2.451 \| 2.233 \| 2.135 Good locality (Stride length 256) 4.066 \| 3.786 \| 3.581 \| 3.447 \| 3.415 \| 3.232 4.406 \| 4.005 \| 3.644 \| 3.49 \| 3.381 \| 3.268 4.393 \| 3.649 \| 3.186 \| 2.882 \| 2.676 \| 2.62 Medium locality (Stride length 4096) 4.012 \| 3.922 \| 3.768 \| 3.61 \| 3.582 \| 3.555 4.364 \| 4.057 \| 3.791 \| 3.65 \| 3.57 \| 3.465 4.479 \| 3.758 \| 3.316 \| 3.077 \| 2.959 \| 2.891 dbbench command used (on a DB with 4 levels, 12 million keys)- TEST_TMPDIR=/dev/shm numactl -C 10 ./db_bench.tmp -use_existing_db=true -benchmarks="readseq,multireadrandom" -write_buffer_size=4194304 -target_file_size_base=4194304 -max_bytes_for_level_base=16777216 -num=12000000 -reads=12000000 -duration=90 -threads=1 -compression_type=none -cache_size=4194304000 -batch_size=32 -disable_auto_compactions=true -bloom_bits=10 -cache_index_and_filter_blocks=true -pin_l0_filter_and_index_blocks_in_cache=true -multiread_batched=true -multiread_stride=4 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5011 Differential Revision: D14348703 Pulled By: anand1976 fbshipit-source-id: 774406dab3776d979c809522a67bedac6c17f84b	6 years ago
Levi Tamasi	79b6ab43ce	BlobDB: Remove GC interval option (#5044 ) Summary: Remove BlobDBOptions.garbage_collection_interval_secs for now, since garbage collection is not yet implemented in BlobDB. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5044 Differential Revision: D14354046 Pulled By: ltamasi fbshipit-source-id: 2b966b6d1e088ba9462f3ea73e115013562fbc04	6 years ago
Sagar Vemuri	3cfc7515fc	Remove an unused option (#4888 ) Summary: Remove `garbage_collection_deletion_size_threshold` as it is not used anywhere. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4888 Differential Revision: D13685982 Pulled By: sagar0 fbshipit-source-id: e08d3017b9a0c8fa99bc332b595ee4ed9db70c87	6 years ago
Sagar Vemuri	55e03b67df	Correct the comment about inlined blob option (#4887 ) Summary: - Corrected a comment asserting that the values "smaller" than a min_blob_size will be inlined in the base db. - Also fixed the type of ttl_range_secs while dumping blobdb options. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4887 Differential Revision: D13680163 Pulled By: sagar0 fbshipit-source-id: 306c8cf2daa52210ffc334a6924ef44ffdedf887	6 years ago
Yi Wu	c970358574	BlobDB: Can return expiration together with Get() (#4227 ) Summary: Add API to allow fetching expiration of a key with `Get()`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4227 Differential Revision: D9169897 Pulled By: yiwu-arbug fbshipit-source-id: 2a6f216c493dc75731ddcef1daa689b517fab31b	6 years ago
Yi Wu	140f256da2	BlobDB: Cleanup TTLExtractor interface (#4229 ) Summary: Cleanup TTLExtractor interface. The original purpose of it is to allow our users keep using existing `Write()` interface but allow it to accept TTL via `TTLExtractor`. However the interface is confusing. Will replace it with something like `WriteWithTTL(batch, ttl)` in the future. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4229 Differential Revision: D9174390 Pulled By: yiwu-arbug fbshipit-source-id: 68201703d784408b851336ab4dd9b84188245b2d	6 years ago
Yi Wu	6d454d7376	BlobDB: is_fifo=true also evict non-TTL blob files (#4049 ) Summary: Previously with is_fifo=true we only evict TTL file. Changing it to also evict non-TTL files from oldest to newest, after exhausted TTL files. Closes https://github.com/facebook/rocksdb/pull/4049 Differential Revision: D8604597 Pulled By: yiwu-arbug fbshipit-source-id: bc4209ee27c1528ce4b72833e6f1e1bff80082c1	6 years ago
Yi Wu	b864bc9b5b	Blob DB: Improve FIFO eviction Summary: Improving blob db FIFO eviction with the following changes, * Change blob_dir_size to max_db_size. Take into account SST file size when computing DB size. * FIFO now only take into account live sst files and live blob files. It is normal for disk usage to go over max_db_size because there are obsolete sst files and blob files pending deletion. * FIFO eviction now also evict TTL blob files that's still open. It doesn't evict non-TTL blob files. * If FIFO is triggered, it will pass an expiration and the current sequence number to compaction filter. Compaction filter will then filter inlined keys to evict those with an earlier expiration and smaller sequence number. So call LSM FIFO. * Compaction filter also filter those blob indexes where corresponding blob file is gone. * Add an event listener to listen compaction/flush event and update sst file size. * Implement DB::Close() to make sure base db, as well as event listener and compaction filter, destruct before blob db. * More blob db statistics around FIFO. * Fix some locking issue when accessing a blob file. Closes https://github.com/facebook/rocksdb/pull/3556 Differential Revision: D7139328 Pulled By: yiwu-arbug fbshipit-source-id: ea5edb07b33dfceacb2682f4789bea61de28bbfa	7 years ago
Yi Wu	1209b6db5c	Blob DB: remove existing garbage collection implementation Summary: Red diff to remove existing implementation of garbage collection. The current approach is reference counting kind of approach and require a lot of effort to get the size counter right on compaction and deletion. I'm going to go with a simple mark-sweep kind of approach and will send another PR for that. CompactionEventListener was added solely for blob db and it adds complexity and overhead to compaction iterator. Removing it as well. Closes https://github.com/facebook/rocksdb/pull/3551 Differential Revision: D7130190 Pulled By: yiwu-arbug fbshipit-source-id: c3a375ad2639a3f6ed179df6eda602372cc5b8df	7 years ago
Yi Wu	e62a763752	Blob DB: miscellaneous changes Summary: * Expose garbage collection related options * Minor logging and counter name update * Remove unused constants. Closes https://github.com/facebook/rocksdb/pull/3451 Differential Revision: D6867077 Pulled By: yiwu-arbug fbshipit-source-id: 6c3272a9c9d78b125a0bd6b2e56d00d087cdd6c8	7 years ago
Yi Wu	48cf8da2bb	BlobDB: update blob_db_options.bytes_per_sync behavior Summary: Previously, if blob_db_options.bytes_per_sync, there is a background job to call fsync() for every bytes_per_sync bytes written to a blob file. With the change we simply pass bytes_per_sync as env_options_ to blob files so that sync_file_range() will be used instead. Closes https://github.com/facebook/rocksdb/pull/3297 Differential Revision: D6606994 Pulled By: yiwu-arbug fbshipit-source-id: 452424be52e32ba92f5ea603b564e9b88929af47	7 years ago
Yi Wu	250a51a3f9	BlobDB: refactor DB open logic Summary: Refactor BlobDB open logic. List of changes: Major: * On reopen, mark blob files found as immutable, do not use them for writing new keys. * Not to scan the whole file to find file footer. Instead just seek to the end of the file and try to read footer. Minor: * Move most of the real logic from blob_db.cc to blob_db_impl.cc. * Not to hold shared_ptr of event listeners in global maps in blob_db.cc * Some changes to BlobFile interface. * Improve logging and error handling. Closes https://github.com/facebook/rocksdb/pull/3246 Differential Revision: D6526147 Pulled By: yiwu-arbug fbshipit-source-id: 9dc4cdd63359a2f9b696af817086949da8d06952	7 years ago
Yi Wu	f662f8f0b6	Blob DB: option to enable garbage collection Summary: Add an option to enable/disable auto garbage collection, where we keep counting how many keys have been evicted by either deletion or compaction and decide whether to garbage collect a blob file. Default disable auto garbage collection for now since the whole logic is not fully tested and we plan to make major change to it. Closes https://github.com/facebook/rocksdb/pull/3117 Differential Revision: D6224756 Pulled By: yiwu-arbug fbshipit-source-id: cdf53bdccec96a4580a2b3a342110ad9e8864dfe	7 years ago
Yi Wu	f6082d1944	Blob DB: cleanup unused options Summary: * cleanup num_concurrent_simple_blobs. We don't do concurrent writes (by taking write_mutex_) so it doesn't make sense to have multiple non TTL files open. We can revisit later when we want to improve writes. * cleanup eviction callback. we don't have plan to use it now. * rename s/open_simple_blob_files_/open_non_ttl_file_/ and s/open_blob_files_/open_ttl_files_/ to avoid confusion. Closes https://github.com/facebook/rocksdb/pull/3088 Differential Revision: D6182598 Pulled By: yiwu-arbug fbshipit-source-id: 99e6f5e01fa66d31309cdb06ce48502464bac6ad	7 years ago
Yi Wu	5a2a6483dc	Blob DB: Inline small values in base DB Summary: Adding the `min_blob_size` option to allow storing small values in base db (in LSM tree) together with the key. The goal is to improve performance for small values, while taking advantage of blob db's low write amplification for large values. Also adding expiration timestamp to blob index. It will be useful to evict stale blob indexes in base db by adding a compaction filter. I'll work on the compaction filter in future patches. See blob_index.h for the new blob index format. There are 4 cases when writing a new key: * small value w/o TTL: put in base db as normal value (i.e. ValueType::kTypeValue) * small value w/ TTL: put (type, expiration, value) to base db. * large value w/o TTL: write value to blob log and put (type, file, offset, size, compression) to base db. * large value w/TTL: write value to blob log and put (type, expiration, file, offset, size, compression) to base db. Closes https://github.com/facebook/rocksdb/pull/3066 Differential Revision: D6142115 Pulled By: yiwu-arbug fbshipit-source-id: 9526e76e19f0839310a3f5f2a43772a4ad182cd0	7 years ago
Yi Wu	dcd36a6aee	Make it explicit blob db doesn't support CF Summary: Blob db doesn't currently support column families. Return NotSupported status explicitly. Closes https://github.com/facebook/rocksdb/pull/2825 Differential Revision: D5757438 Pulled By: yiwu-arbug fbshipit-source-id: 44de9408fd032c98e8ae337d4db4ed37169bd9fa	7 years ago
Yi Wu	92afe830f9	Update all blob db TTL and timestamps to uint64_t Summary: The current blob db implementation use mix of int32_t, uint32_t and uint64_t for TTL and expiration. Update all timestamps to uint64_t for consistency. Closes https://github.com/facebook/rocksdb/pull/2683 Differential Revision: D5557103 Pulled By: yiwu-arbug fbshipit-source-id: e4eab2691629a755e614e8cf1eed9c3a681d0c42	7 years ago
Yi Wu	1900771bd2	Dump Blob DB options to info log Summary: * Dump blob db options to info log * Remove BlobDBOptionsImpl to disallow dynamic cast BlobDBOptions into BlobDBOptionsImpl. Move options there to be constants or into BlobDBOptions. The dynamic cast is broken after #2645 * Change some of the default options * Remove blob_db_options.min_blob_size, which is unimplemented. Will implement it soon. Closes https://github.com/facebook/rocksdb/pull/2671 Differential Revision: D5529912 Pulled By: yiwu-arbug fbshipit-source-id: dcd58ca981db5bcc7f123b65a0d6f6ae0dc703c7	7 years ago
Yi Wu	aaf42fe775	Move blob_db/ttl_extractor.h into blob_db/blob_db.h Summary: Move blob_db/ttl_extractor.h into blob_db/blob_db.h Also exclude TTLExtractor from LITE build. Closes https://github.com/facebook/rocksdb/pull/2665 Differential Revision: D5520009 Pulled By: yiwu-arbug fbshipit-source-id: 4813dcc272c7cc4bf2cdac285256d9a17d78c7b7	7 years ago
Yi Wu	6083bc79f8	Blob DB TTL extractor Summary: Introducing blob_db::TTLExtractor to replace extract_ttl_fn. The TTL extractor can be use to extract TTL from keys insert with Put or WriteBatch. Change over existing extract_ttl_fn are: * If value is changed, it will be return via std::string* (rather than Slice). With Slice the new value has to be part of the existing value. With std::string* the limitation is removed. * It can optionally return TTL or expiration. Other changes in this PR: * replace `std::chrono::system_clock` with `Env::NowMicros` so that I can mock time in tests. * add several TTL tests. * other minor naming change. Closes https://github.com/facebook/rocksdb/pull/2659 Differential Revision: D5512627 Pulled By: yiwu-arbug fbshipit-source-id: 0dfcb00d74d060b8534c6130c808e4d5d0a54440	7 years ago
Sagar Vemuri	72502cf227	Revert "comment out unused parameters" Summary: This reverts the previous commit `1d7048c598`, which broke the build. Did a `git revert 1d7048c`. Closes https://github.com/facebook/rocksdb/pull/2627 Differential Revision: D5476473 Pulled By: sagar0 fbshipit-source-id: 4756ff5c0dfc88c17eceb00e02c36176de728d06	7 years ago
Victor Gao	1d7048c598	comment out unused parameters Summary: This uses `clang-tidy` to comment out unused parameters (in functions, methods and lambdas) in fbcode. Cases that the tool failed to handle are fixed manually. Reviewed By: igorsugak Differential Revision: D5454343 fbshipit-source-id: 5dee339b4334e25e963891b519a5aa81fbf627b2	7 years ago
Siying Dong	3c327ac2d0	Change RocksDB License Summary: Closes https://github.com/facebook/rocksdb/pull/2589 Differential Revision: D5431502 Pulled By: siying fbshipit-source-id: 8ebf8c87883daa9daa54b2303d11ce01ab1f6f75	7 years ago
foolenough	21b17d7686	Fix BlobDB::Get which only get out the value offset Summary: Blob db use StackableDB::get which only get out the value offset, but not the value. Fix by making BlobDB::Get override the designated getter. Closes https://github.com/facebook/rocksdb/pull/2553 Differential Revision: D5396823 Pulled By: yiwu-arbug fbshipit-source-id: 5a7d1cf77ee44490f836a6537225955382296878	7 years ago
Yi Wu	7a380deff7	Update blob_db_test Summary: I'm trying to improve unit test of blob db. I'm rewriting blob db test. In this patch: * Rewrite tests of basic put/write/delete operations. * Add disable_background_tasks to BlobDBOptionsImpl to allow me not running any background job for basic unit tests. * Move DestroyBlobDB out from BlobDBImpl to be a standalone function. * Remove all garbage collection related tests. Will rewrite them in following patch. * Disabled compression test since it is failing. Will fix in a followup patch. Closes https://github.com/facebook/rocksdb/pull/2446 Differential Revision: D5243306 Pulled By: yiwu-arbug fbshipit-source-id: 157c71ad3b699307cb88baa3830e9b6e74f8e939	8 years ago
Anirban Rahut	d85ff4953c	Blob storage pr Summary: The final pull request for Blob Storage. Closes https://github.com/facebook/rocksdb/pull/2269 Differential Revision: D5033189 Pulled By: yiwu-arbug fbshipit-source-id: 6356b683ccd58cbf38a1dc55e2ea400feecd5d06	8 years ago
Siying Dong	d616ebea23	Add GPLv2 as an alternative license. Summary: Closes https://github.com/facebook/rocksdb/pull/2226 Differential Revision: D4967547 Pulled By: siying fbshipit-source-id: dd3b58ae1e7a106ab6bb6f37ab5c88575b125ab4	8 years ago
sdong	8b79422b52	[Proof-Of-Concept] RocksDB Blob Storage with a blob log file. Summary: This is a proof of concept of a RocksDB blob log file. The actual value of the Put() is appended to a blob log using normal data block format, and the handle of the block is written as the value of the key in RocksDB. The prototype only supports Put() and Get(). It doesn't support DB restart, garbage collection, Write() call, iterator, snapshots, etc. Test Plan: Add unit tests. Reviewers: arahut Reviewed By: arahut Subscribers: kradhakrishnan, leveldb, andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D61485	8 years ago

34 Commits (b3585a11b4d9745038dcd022dfdb0cd54b8d0d53)