rocksdb

Commit Graph

Author	SHA1	Message	Date
sdong	1d725ca51d	Deprecate BlockBasedTableOptions.hash_index_allow_collision=false. Summary: Deprecate this one option and delete code and tests that are now superfluous. Test Plan: all tests pass Reviewers: igor, yhchiang, IslamAbdelRahman Reviewed By: IslamAbdelRahman Subscribers: msalib, leveldb, andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D55317	9 years ago
Islam AbdelRahman	d719b095dc	Introduce PinnedIteratorsManager (Reduce PinData() overhead / Refactor PinData) Summary: While trying to reuse PinData() / ReleasePinnedData() .. to optimize away some memcpys I realized that there is a significant overhead for using PinData() / ReleasePinnedData if they were called many times. This diff refactor the pinning logic by introducing PinnedIteratorsManager a centralized component that will be created once and will be notified whenever we need to Pin an Iterator. This implementation have much less overhead than the original implementation Test Plan: make check -j64 COMPILE_WITH_ASAN=1 make check -j64 Reviewers: yhchiang, sdong, andrewkr Reviewed By: andrewkr Subscribers: andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D56493	9 years ago
Baraa Hamodi	21e95811d1	Updated all copyright headers to the new format.	9 years ago
Islam AbdelRahman	aececc209e	Introduce ReadOptions::pin_data (support zero copy for keys) Summary: This patch update the Iterator API to introduce new functions that allow users to keep the Slices returned by key() valid as long as the Iterator is not deleted ReadOptions::pin_data : If true keep loaded blocks in memory as long as the iterator is not deleted Iterator::IsKeyPinned() : If true, this mean that the Slice returned by key() is valid as long as the iterator is not deleted Also add a new option BlockBasedTableOptions::use_delta_encoding to allow users to disable delta_encoding if needed. Benchmark results (using https://phabricator.fb.com/P20083553) ``` // $ du -h /home/tec/local/normal.4K.Snappy/db10077 // 6.1G /home/tec/local/normal.4K.Snappy/db10077 // $ du -h /home/tec/local/zero.8K.LZ4/db10077 // 6.4G /home/tec/local/zero.8K.LZ4/db10077 // Benchmarks for shard db10077 // _build/opt/rocks/benchmark/rocks_copy_benchmark \ // --normal_db_path="/home/tec/local/normal.4K.Snappy/db10077" \ // --zero_db_path="/home/tec/local/zero.8K.LZ4/db10077" // First run // ============================================================================ // rocks/benchmark/RocksCopyBenchmark.cpp relative time/iter iters/s // ============================================================================ // BM_StringCopy 1.73s 576.97m // BM_StringPiece 103.74% 1.67s 598.55m // ============================================================================ // Match rate : 1000000 / 1000000 // Second run // ============================================================================ // rocks/benchmark/RocksCopyBenchmark.cpp relative time/iter iters/s // ============================================================================ // BM_StringCopy 611.99ms 1.63 // BM_StringPiece 203.76% 300.35ms 3.33 // ============================================================================ // Match rate : 1000000 / 1000000 ``` Test Plan: Unit tests Reviewers: sdong, igor, anthony, yhchiang, rven Reviewed By: rven Subscribers: dhruba, lovro, adsharma Differential Revision: https://reviews.facebook.net/D48999	9 years ago
sdong	35ad531be3	Seperate InternalIterator from Iterator Summary: Separate a new class InternalIterator from class Iterator, when the look-up is done internally, which also means they operate on key with sequence ID and type. This change will enable potential future optimizations but for now InternalIterator's functions are still the same as Iterator's. At the same time, separate the cleanup function to a separate class and let both of InternalIterator and Iterator inherit from it. Test Plan: Run all existing tests. Reviewers: igor, yhchiang, anthony, kradhakrishnan, IslamAbdelRahman, rven Reviewed By: rven Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D48549	9 years ago
Igor Canadi	0a019d74a0	Use malloc_usable_size() for accounting block cache size Summary: Currently, when we insert something into block cache, we say that the block cache capacity decreased by the size of the block. However, size of the block might be less than the actual memory used by this object. For example, 4.5KB block will actually use 8KB of memory. So even if we configure block cache to 10GB, our actually memory usage of block cache will be 20GB! This problem showed up a lot in testing and just recently also showed up in MongoRocks production where we were using 30GB more memory than expected. This diff will fix the problem. Instead of counting the block size, we will count memory used by the block. That way, a block cache configured to be 10GB will actually use only 10GB of memory. I'm using non-portable function and I couldn't find info on portability on Google. However, it seems to work on Linux, which will cover majority of our use-cases. Test Plan: 1. fill up mongo instance with 80GB of data 2. restart mongo with block cache size configured to 10GB 3. do a table scan in mongo 4. memory usage before the diff: 12GB. memory usage after the diff: 10.5GB Reviewers: sdong, MarkCallaghan, rven, yhchiang Reviewed By: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D40635	10 years ago
Igor Canadi	767777c2bd	Turn on -Wshorten-64-to-32 and fix all the errors Summary: We need to turn on -Wshorten-64-to-32 for mobile. See D1671432 (internal phabricator) for details. This diff turns on the warning flag and fixes all the errors. There were also some interesting errors that I might call bugs, especially in plain table. Going forward, I think it makes sense to have this flag turned on and be very very careful when converting 64-bit to 32-bit variables. Test Plan: compiles Reviewers: ljin, rven, yhchiang, sdong Reviewed By: yhchiang Subscribers: bobbaldwin, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D28689	10 years ago
Igor Canadi	ff76895614	Remove some unnecessary constructors Summary: This is continuing the work done by `27b22f13a3` It's just cleaning up some unnecessary constructors. The most important change is removing Block::Block(const BlockContents& contents) constructor. It was only used from the unit test. Test Plan: compiles Reviewers: sdong, yhchiang, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23547	10 years ago
Igor Canadi	54cada92b1	Run make format on PR #249	10 years ago
Torrie Fischer	fb6456b00d	Replace naked calls to operator new and delete (Fixes #222 ) This replaces a mishmash of pointers in the Block and BlockContents classes with std::unique_ptr. It also changes the semantics of BlockContents to be limited to use as a constructor parameter for Block objects, as it owns any block buffers handed to it.	10 years ago
Lei Jin	23861857c4	ReadOptions.total_order_seek to allow total order seek for block-based table when hash index is enabled Summary: as title Test Plan: table_test Reviewers: igor, yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22239	10 years ago
sdong	1242bfcad7	Add DB property "rocksdb.estimate-table-readers-mem" Summary: Add a DB Property "rocksdb.estimate-table-readers-mem" to return estimated memory usage by all loaded table readers, other than allocated from block cache. Refactor the property codes to allow getting property from a version, with DB mutex not acquired. Test Plan: Add several checks of this new property in existing codes for various cases. Reviewers: yhchiang, ljin Reviewed By: ljin Subscribers: xjin, igor, leveldb Differential Revision: https://reviews.facebook.net/D20733	10 years ago
Feng Zhu	8f09d53fd1	remove malloc when create data and index iterator in Get Summary: Define Block::Iter to be an independent class to be used by block_based_table_reader When creating data and index iterator, update an existing iterator rather than new one Thus malloc and free could be reduced Benchmark, Base: commit `76286ee67e` commands: --db=/dev/shm/rocksdb --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --write_buffer_size=134217728 --max_write_buffer_number=2 --target_file_size_base=33554432 --max_bytes_for_level_base=1073741824 --verify_checksum=false --max_background_compactions=4 --use_plain_table=0 --memtablerep=prefix_hash --open_files=-1 --mmap_read=1 --mmap_write=0 --bloom_bits=10 --bloom_locality=1 --memtable_bloom_bits=500000 --compression_type=lz4 --num=2621440 --use_hash_search=1 --block_size=1024 --block_restart_interval=1 --use_existing_db=1 --threads=1 --benchmarks=readrandom —disable_auto_compactions=1 malloc: 3.30% -> 1.42% free: 3.59%->1.61% Test Plan: make all check run db_stress valgrind ./db_test ./table_test Reviewers: ljin, yhchiang, dhruba, igor, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20655	10 years ago
Haobo Xu	0f0076ed5a	[RocksDB] Reduce memory footprint of the blockbased table hash index. Summary: Currently, the in-memory hash index of blockbased table uses a precise hash map to track the prefix to block range mapping. In some use cases, especially when prefix itself is big, the memory overhead becomes a problem. This diff introduces a fixed hash bucket array that does not store the prefix and allows prefix collision, which is similar to the plaintable hash index, in order to reduce the memory consumption. Just a quick draft, still testing and refining. Test Plan: unit test and shadow testing Reviewers: dhruba, kailiu, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19047	11 years ago
Kai Liu	75b59d5146	Enable hash index for block-based table Summary: Based on previous patches, this diff eventually provides the end-to-end mechanism for users to specify the hash-index. Test Plan: Wrote several new unit tests. Reviewers: sdong, haobo, dhruba Reviewed By: sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D16539	11 years ago
kailiu	444cafc28c	Fix inconsistent code format Summary: Found some function follows camel style. When naming funciton, we have two styles: Trivially expose internal data in readonly mode: `all_lower_case()` Regular function: `CapitalizeFirstLetter()` I renames these functions. Test Plan: make -j32 Reviewers: haobo, sdong, dhruba, igor CC: leveldb Differential Revision: https://reviews.facebook.net/D16383	11 years ago
Dhruba Borthakur	b4ad5e89ae	Implement a compressed block cache. Summary: Rocksdb can now support a uncompressed block cache, or a compressed block cache or both. Lookups first look for a block in the uncompressed cache, if it is not found only then it is looked up in the compressed cache. If it is found in the compressed cache, then it is uncompressed and inserted into the uncompressed cache. It is possible that the same block resides in the compressed cache as well as the uncompressed cache at the same time. Both caches have their own individual LRU policy. Test Plan: Unit test case attached. Reviewers: kailiu, sdong, haobo, leveldb Reviewed By: haobo CC: xjin, haobo Differential Revision: https://reviews.facebook.net/D12675	11 years ago
Dhruba Borthakur	9cd221094c	Add appropriate LICENSE and Copyright message. Summary: Add appropriate LICENSE and Copyright message. Test Plan: make check Reviewers: CC: Task ID: # Blame Rev:	11 years ago
Dhruba Borthakur	4463b11cad	Migrate names of properties from 'leveldb' prefix to 'rocksdb' prefix. Summary: Migrate names of properties from 'leveldb' prefix to 'rocksdb' prefix. Test Plan: make check Reviewers: emayanke, haobo Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D13311	11 years ago
Dhruba Borthakur	a143ef9b38	Change namespace from leveldb to rocksdb Summary: Change namespace from leveldb to rocksdb. This allows a single application to link in open-source leveldb code as well as rocksdb code into the same process. Test Plan: compile rocksdb Reviewers: emayanke Reviewed By: emayanke CC: leveldb Differential Revision: https://reviews.facebook.net/D13287	11 years ago
Dhruba Borthakur	1186192ed1	Replace include/leveldb with include/rocksdb. Summary: Replace include/leveldb with include/rocksdb. Test Plan: make clean; make check make clean; make release Differential Revision: https://reviews.facebook.net/D12489	11 years ago
Haobo Xu	49fbd5531b	[RocksDB] Refactor table.cc to reduce code duplication and improve readability. Summary: In table.cc, the code section that reads in BlockContent and then put it into a Block, appears at least 4 times. This is too much duplication. BlockReader is much shorter after the change and reads way better. D10077 attempted that for index block read. This is a complete cleanup. Test Plan: make check; ./db_stress Reviewers: dhruba, sheki Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D10527	12 years ago
Sanjay Ghemawat	85584d497e	Added bloom filter support. In particular, we add a new FilterPolicy class. An instance of this class can be supplied in Options when opening a database. If supplied, the instance is used to generate summaries of keys (e.g., a bloom filter) which are placed in sstables. These summaries are consulted by DB::Get() so we can avoid reading sstable blocks that are guaranteed to not contain the key we are looking for. This change provides one implementation of FilterPolicy based on bloom filters. Other changes: - Updated version number to 1.4. - Some build tweaks. - C binding for CompactRange. - A few more benchmarks: deleteseq, deleterandom, readmissing, seekrandom. - Minor .gitignore update.	13 years ago
Sanjay Ghemawat	9013f13b15	use mmap on 64-bit machines to speed-up reads; small build fixes	13 years ago
Hans Wennborg	36a5f8ed7f	A number of fixes: - Replace raw slice comparison with a call to user comparator. Added test for custom comparators. - Fix end of namespace comments. - Fixed bug in picking inputs for a level-0 compaction. When finding overlapping files, the covered range may expand as files are added to the input set. We now correctly expand the range when this happens instead of continuing to use the old range. For example, suppose L0 contains files with the following ranges: F1: a .. d F2: c .. g F3: f .. j and the initial compaction target is F3. We used to search for range f..j which yielded {F2,F3}. However we now expand the range as soon as another file is added. In this case, when F2 is added, we expand the range to c..j and restart the search. That picks up file F1 as well. This change fixes a bug related to deleted keys showing up incorrectly after a compaction as described in Issue 44. (Sync with upstream @25072954)	13 years ago
dgrogan@chromium.org	69c6d38342	reverting disastrous MOE commit, returning to r21 git-svn-id: https://leveldb.googlecode.com/svn/trunk@23 62dab493-f737-651d-591e-8d6aee1b9529	14 years ago
dgrogan@chromium.org	b743906eea	Revision created by MOE tool push_codebase.	14 years ago
dgrogan@chromium.org	b409afe968	chmod a-x git-svn-id: https://leveldb.googlecode.com/svn/trunk@21 62dab493-f737-651d-591e-8d6aee1b9529	14 years ago
dgrogan@chromium.org	f779e7a5d8	@20602303 . Default file permission is now 755. git-svn-id: https://leveldb.googlecode.com/svn/trunk@20 62dab493-f737-651d-591e-8d6aee1b9529	14 years ago
jorlow@chromium.org	4671a695fc	Move include files into a leveldb subdir. git-svn-id: https://leveldb.googlecode.com/svn/trunk@18 62dab493-f737-651d-591e-8d6aee1b9529	14 years ago
jorlow@chromium.org	f67e15e50f	Initial checkin. git-svn-id: https://leveldb.googlecode.com/svn/trunk@2 62dab493-f737-651d-591e-8d6aee1b9529	14 years ago

31 Commits (a791a2cf2d3d48f8c52315a59159ec42339b8fdc)