rocksdb

Commit Graph

Author	SHA1	Message	Date
Manuel Ung	b9846370e9	WriteUnPrepared: Add support for recovering WriteUnprepared transactions (#4078 ) Summary: This adds support for recovering WriteUnprepared transactions through the following changes: - The information in `RecoveredTransaction` is extended so that it can reference multiple batches. - `MarkBeginPrepare` is extended with a bool indicating whether it is an unprepared begin, and this is passed down to `InsertRecoveredTransaction` to indicate whether the current transaction is prepared or not. - `WriteUnpreparedTxnDB::Initialize` is overridden so that it will rollback unprepared transactions from the recovered transactions. This can be done without updating the prepare heap/commit map, because this is before the DB has finished initializing, and after writing the rollback batch, those data structures should not contain information about the rolled back transaction anyway. Commit/Rollback of live transactions is still unimplemented and will come later. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4078 Differential Revision: D8703382 Pulled By: lth fbshipit-source-id: 7e0aada6c23bd39299f1f20d6c060492e0e6b60a	7 years ago
Yanqin Jin	db7ae0a485	Fix a map lookup that may throw exception. (#4098 ) Summary: `std::map::at(key)` throws std::out_of_range if key does not exist. Current code does not handle this. Although this case is unlikely, I feel it's safe to use `std::map::find`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4098 Differential Revision: D8753865 Pulled By: riversand963 fbshipit-source-id: 9a9ba43badb0fb5e0d24cd87903931fd12f3f8ec	7 years ago
Yanqin Jin	d4d9fe8e57	Fix a bug caused by not copying the block trailer. (#4096 ) Summary: This was caught by crash test, and the following is a simple way to reproduce it and verify the fix. One way to trigger this code path is to use the following configuration: - Compress SST file - Enable direct IO and prefetch buffer - Do NOT use compressed block cache Closes https://github.com/facebook/rocksdb/pull/4096 Differential Revision: D8742009 Pulled By: riversand963 fbshipit-source-id: f13381078bbb0dce92f60bd313a78ab602bcacd2	7 years ago
Huachao Huang	35b83327a7	compaction: fix max_subcompactions option for CompactRange (#4082 ) Summary: The max_subcompactions option was introduced in https://github.com/facebook/rocksdb/pull/3775. Closes https://github.com/facebook/rocksdb/pull/4082 Differential Revision: D8743258 Pulled By: ajkr fbshipit-source-id: d60ee75769dfc19ab6f8754e4ff3a267848f1ed9	7 years ago
Yanqin Jin	39218a72a4	Increase the size of LRU cache. (#4090 ) Summary: Increase the size of each shard so that the number of cache hit/miss match expectation. Otherwise FilterBlockInBlockCache test will fail. Closes https://github.com/facebook/rocksdb/pull/4090 Differential Revision: D8736158 Pulled By: riversand963 fbshipit-source-id: 5cdbc06b02390389fd5b72a6d251d88949ad3d91	7 years ago
Siying Dong	17027aeffc	Change default value of `bytes_max_delete_chunk` to 0 in NewSstFileManager() (#4092 ) Summary: Now by default, with NewSstFileManager, checkpoints may be corrupted. Disable this feature to avoid this issue. Closes https://github.com/facebook/rocksdb/pull/4092 Differential Revision: D8729856 Pulled By: siying fbshipit-source-id: 914c321d6eaf52d8c5981171322d85dd29088307	7 years ago
Adam Retter	0d234dfce4	Remove unused arg which causes compilation failure (#4080 ) Summary: It seems that compilation has been made stricter about unused args. Closes https://github.com/facebook/rocksdb/pull/4080 Differential Revision: D8712049 Pulled By: sagar0 fbshipit-source-id: 984af1982638af3568aac1a167f565f4741badee	7 years ago
Andrey Zagrebin	e099c2dd55	check if data size exceeds java array vm limit when it is copied in jni (#3850 ) Summary: to address issue #3849 Closes https://github.com/facebook/rocksdb/pull/3850 Differential Revision: D8695487 Pulled By: sagar0 fbshipit-source-id: 04baeb2127663934ed1321fe6d9a9ec23c86e16b	7 years ago
Daniel Black	36fa49ceb5	transaction_test: -Wunused-variable with clang-7 (#4074 ) Summary: clang version 7.0.0- (trunk) Target: x86_64-unknown-linux-gnu Thread model: posix InstalledDir: /usr/bin clang++-7 -DROCKSDB_USE_RTTI -g -W -Wextra -Wall -Wsign-compare -Wshadow -Wno-unused-parameter -Werror -I. -I./include -std=c++11 -DROCKSDB_PLATFORM_POSIX -DROCKSDB_LIB_IO_POSIX -DOS_LINUX -fno-builtin-memcmp -DROCKSDB_FALLOCATE_PRESENT -DSNAPPY -DGFLAGS=google -DZLIB -DBZIP2 -DROCKSDB_MALLOC_USABLE_SIZE -DROCKSDB_PTHREAD_ADAPTIVE_MUTEX -DROCKSDB_BACKTRACE -DROCKSDB_RANGESYNC_PRESENT -DROCKSDB_SCHED_GETCPU_PRESENT -Wshorten-64-to-32 -march=native -DHAVE_SSE42 -DROCKSDB_SUPPORT_THREAD_LOCAL -isystem ./third-party/gtest-1.7.0/fused-src -DTRAVIS -O2 -fno-omit-frame-pointer -momit-leaf-frame-pointer -Woverloaded-virtual -Wnon-virtual-dtor -Wno-missing-field-initializers -c utilities/transactions/transaction_test.cc -o utilities/transactions/transaction_test.o utilities/transactions/transaction_test.cc:2282:22: error: unused variable 'txn_options' [-Werror,-Wunused-variable] TransactionOptions txn_options; ^ utilities/transactions/transaction_test.cc:2822:22: error: unused variable 'txn_options' [-Werror,-Wunused-variable] TransactionOptions txn_options; ^ utilities/transactions/transaction_test.cc:2928:22: error: unused variable 'txn_options' [-Werror,-Wunused-variable] TransactionOptions txn_options; ^ utilities/transactions/transaction_test.cc:3109:22: error: unused variable 'txn_options' [-Werror,-Wunused-variable] TransactionOptions txn_options; ^ utilities/transactions/transaction_test.cc:4364:22: error: unused variable 'txn_options' [-Werror,-Wunused-variable] TransactionOptions txn_options; ^ Closes https://github.com/facebook/rocksdb/pull/4074 Differential Revision: D8698051 Pulled By: ajkr fbshipit-source-id: 6255618eefdd189962fbea1b02cf1eb5ae501274	7 years ago
Maysam Yabandeh	2462763b2e	Fix mis-spoken assert on prefetch_filter and prefetch_index (#4077 ) Summary: We can have prefetch_index without prefetch_filter but not the other way around. The assert statement is fixed. Closes https://github.com/facebook/rocksdb/pull/4077 Differential Revision: D8694472 Pulled By: maysamyabandeh fbshipit-source-id: ccd2804d9d9cdafb1c3e65062c7bc38603e69004	7 years ago
Maysam Yabandeh	29ffbb8a50	Charging block cache more accurately (#4073 ) Summary: Currently the block cache is charged only by the size of the raw data block and excludes the overhead of the c++ objects that contain the raw data block. The patch improves the accuracy of the charge by including the c++ object overhead into it. Closes https://github.com/facebook/rocksdb/pull/4073 Differential Revision: D8686552 Pulled By: maysamyabandeh fbshipit-source-id: 8472f7fc163c0644533bc6942e20cdd5725f520f	7 years ago
Zhongyi Xie	b3efb1cbe0	fix clang analyzer warnings (#4072 ) Summary: clang analyze is giving the following warnings: > db/compaction_job.cc:1178:16: warning: Called C++ object pointer is null } else if (meta->smallest.size() > 0) { ^~~~~~~~~~~~~~~~~~~~~ db/compaction_job.cc:1201:33: warning: Access to field 'marked_for_compaction' results in a dereference of a null pointer (loaded from variable 'meta') meta->marked_for_compaction = sub_compact->builder->NeedCompact(); ~~~~ db/version_set.cc:2770:26: warning: Called C++ object pointer is null uint32_t cf_id = last_writer->cfd->GetID(); ^~~~~~~~~~~~~~~~~~~~~~~~~ Closes https://github.com/facebook/rocksdb/pull/4072 Differential Revision: D8685852 Pulled By: miasantreble fbshipit-source-id: b0e2fd9dfc1cbba2317723e09886384b9b1c9085	7 years ago
Manuel Ung	8ad63a4b86	WriteUnPrepared: Add new WAL marker kTypeBeginUnprepareXID (#4069 ) Summary: This adds a new WAL marker of type kTypeBeginUnprepareXID. Also, DBImpl now contains a field called batch_per_txn (meaning one WriteBatch per transaction, or possibly multiple WriteBatches). This would also indicate that this DB is using WriteUnprepared policy. Recovery code would be able to make use of this extra field on DBImpl in a separate diff. For now, it is just used to determine whether the WAL is compatible or not. Closes https://github.com/facebook/rocksdb/pull/4069 Differential Revision: D8675099 Pulled By: lth fbshipit-source-id: ca27cae1738e46d65f2bb92860fc759deb874749	7 years ago
Andrew Kryczka	25403c2265	Prefetch cache lines for filter lookup (#4068 ) Summary: Since the filter data is unaligned, even though we ensure all probes are within a span of `cache_line_size` bytes, those bytes can span two cache lines. In that case I doubt hardware prefetching does a great job considering we don't necessarily access those two cache lines in order. This guess seems correct since adding explicit prefetch instructions reduced filter lookup overhead by 19.4%. Closes https://github.com/facebook/rocksdb/pull/4068 Differential Revision: D8674189 Pulled By: ajkr fbshipit-source-id: 747427d9a17900151c17820488e3f7efe06b1871	7 years ago
Anand Ananthabhotla	52d4c9b7f6	Allow DB resume after background errors (#3997 ) Summary: Currently, if RocksDB encounters errors during a write operation (user requested or BG operations), it sets DBImpl::bg_error_ and fails subsequent writes. This PR allows the DB to be resumed for certain classes of errors. It consists of 3 parts - 1. Introduce Status::Severity in rocksdb::Status to indicate whether a given error can be recovered from or not 2. Refactor the error handling code so that setting bg_error_ and deciding on severity is in one place 3. Provide an API for the user to clear the error and resume the DB instance This whole change is broken up into multiple PRs. Initially, we only allow clearing the error for Status::NoSpace() errors during background flush/compaction. Subsequent PRs will expand this to include more errors and foreground operations such as Put(), and implement a polling mechanism for out-of-space errors. Closes https://github.com/facebook/rocksdb/pull/3997 Differential Revision: D8653831 Pulled By: anand1976 fbshipit-source-id: 6dc835c76122443a7668497c0226b4f072bc6afd	7 years ago
Yanqin Jin	26d67e357e	Support group commits of version edits (#3944 ) Summary: This PR supports the group commit of multiple version edit entries corresponding to different column families. Column family drop/creation still cannot be grouped. This PR is a subset of [PR 3752](https://github.com/facebook/rocksdb/pull/3752). Closes https://github.com/facebook/rocksdb/pull/3944 Differential Revision: D8432536 Pulled By: riversand963 fbshipit-source-id: 8f11bd05193b6c0d9272d82e44b676abfac113cb	7 years ago
Maysam Yabandeh	0a5b5d88b2	Remove ReadOnly part of PinnableSliceAndMmapReads from Lite (#4070 ) Summary: Lite does not support readonly DBs. Closes https://github.com/facebook/rocksdb/pull/4070 Differential Revision: D8677858 Pulled By: maysamyabandeh fbshipit-source-id: 536887d2363ee2f5d8e1ea9f1a511e643a1707fa	7 years ago
Taewook Oh	b557499eee	Suppress leak warning for clang(LLVM) asan (#4066 ) Summary: Instead of __SANITIZE_ADDRESS__ macro, LLVM uses __has_feature(address_sanitzer) to check if ASAN is enabled for the build. I tested it with MySQL sanitizer build that uses RocksDB as a submodule. Closes https://github.com/facebook/rocksdb/pull/4066 Reviewed By: riversand963 Differential Revision: D8668941 Pulled By: taewookoh fbshipit-source-id: af4d1da180c1470d257a228f431eebc61490bc36	7 years ago
Yanqin Jin	7f850b889d	Remove 'ALIGNAS' from StatisticsImpl. (#4061 ) Summary: Remove over-alignment on `StatisticsImpl` whose benefit is vague and causes UBSAN check to fail due to `std::make_shared` not respecting the over-alignment requirement. Test plan ``` $ make clean && COMPILE_WITH_UBSAN=1 OPT=-g make -j16 ubsan_check ``` Closes https://github.com/facebook/rocksdb/pull/4061 Differential Revision: D8656506 Pulled By: riversand963 fbshipit-source-id: db355ae9c7bdd2c9e9c5e63cabba13d8d82cc5f9	7 years ago
Zhongyi Xie	14f409c0f1	PrefixMayMatch: remove unnecessary check for prefix_extractor_ (#4067 ) Summary: with https://github.com/facebook/rocksdb/pull/3601 and https://github.com/facebook/rocksdb/pull/3899, `prefix_extractor_` is not really being used in block based filter and full filter's version of `PrefixMayMatch` because now `prefix_extractor` is passed as an argument. Also it is now possible that prefix_extractor_ may be initialized to nullptr when a non-standard prefix_extractor is used and also for ROCKSDB_LITE. Removing these checks should not break any existing tests. Closes https://github.com/facebook/rocksdb/pull/4067 Differential Revision: D8669002 Pulled By: miasantreble fbshipit-source-id: 0e701ba912b8a26734fadb72d15bb1b266b6176a	7 years ago
Zhichao Cao	1f6efabe23	Add bottommost_compression_opts to for bottommost_compression (#3985 ) Summary: …ression For `CompressionType` we have options `compression` and `bottommost_compression`. Thus, to make the compression options consitent with the compression type when bottommost_compression is enabled, we add the bottommost_compression_opts Closes https://github.com/facebook/rocksdb/pull/3985 Reviewed By: riversand963 Differential Revision: D8385911 Pulled By: zhichao-cao fbshipit-source-id: 07bc533dd61bcf1cef5927d8d62901c13d38d5fc	7 years ago
Maysam Yabandeh	235ab9dd32	Pin mmap files in ReadOnlyDB (#4053 ) Summary: https://github.com/facebook/rocksdb/pull/3881 fixed a bug where PinnableSlice pin mmap files which could be deleted with background compaction. This is however a non-issue for ReadOnlyDB when there is no compaction running and max_open_files is -1. This patch reenables the pinning feature for that case. Closes https://github.com/facebook/rocksdb/pull/4053 Differential Revision: D8662546 Pulled By: maysamyabandeh fbshipit-source-id: 402962602eb0f644e17822748332999c3af029fd	7 years ago
Maximilian Alexander	e8f9d7f0d4	Added PingCaps Rust RocksDB and ObjectiveRocks (#4065 ) Summary: 1. I added PingCap's more up-to-date Rust Binding of RocksDB 2. I also added ObjectiveRocks which is a very nice binding for _both_ Swift and Objective-C Closes https://github.com/facebook/rocksdb/pull/4065 Differential Revision: D8670340 Pulled By: siying fbshipit-source-id: 3db28bf3a464c3e050df52cc92b19248b7f43944	7 years ago
chouxi	818c84e116	Store timestamp in deadlock detection (#4060 ) Summary: - Summary Add timestamp into the DeadlockInfo to store the timestamp when deadlock detected on the rocksdb side. - Testplan: `make check -j64` Closes https://github.com/facebook/rocksdb/pull/4060 Differential Revision: D8655380 Pulled By: chouxi fbshipit-source-id: f58e1aa5e09eb1d1eed0a181d4e2304aaf01efe8	7 years ago
Daniel Black	e5ae1bb465	Remove bogus gcc-8.1 warning (#3870 ) Summary: Various rearrangements of the cch maths failed or replacing = '\0' with memset failed to convince the compiler it was nul terminated. So took the perverse option of changing strncpy to strcpy. Return null if memory couldn't be allocated. util/status.cc: In static member function ‘static const char* rocksdb::Status::CopyState(const char)’: util/status.cc:28:15: error: ‘char strncpy(char, const char, size_t)’ output truncated before terminating nul copying as many bytes from a string as its length [-Werror=stringop-truncation] std::strncpy(result, state, cch - 1); ~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~ util/status.cc:19:18: note: length computed here std::strlen(state) + 1; // +1 for the null terminator ~~~~~~~~~~~^~~~~~~ cc1plus: all warnings being treated as errors make: *** [Makefile:645: shared-objects/util/status.o] Error 1 closes #2705 Closes https://github.com/facebook/rocksdb/pull/3870 Differential Revision: D8594114 Pulled By: anand1976 fbshipit-source-id: ab20f3a456a711e4d29144ebe630e4fe3c99ec25	7 years ago
Manuel Ung	a16e00b7b9	WriteUnPrepared Txn: Disable seek to snapshot optimization (#3955 ) Summary: This is implemented by extending ReadCallback with another function `MaxUnpreparedSequenceNumber` which returns the largest visible sequence number for the current transaction, if there is uncommitted data written to DB. Otherwise, it returns zero, indicating no uncommitted data. There are the places where reads had to be modified. - Get and Seek/Next was just updated to seek to max(snapshot_seq, MaxUnpreparedSequenceNumber()) instead, and iterate until a key was visible. - Prev did not need need updates since it did not use the Seek to sequence number optimization. Assuming that locks were held when writing unprepared keys, and ValidateSnapshot runs, there should only be committed keys and unprepared keys of the current transaction, all of which are visible. Prev will simply iterate to get the last visible key. - Reseeking to skip keys optimization was also disabled for write unprepared, since it's possible to hit the max_skip condition even while reseeking. There needs to be some way to resolve infinite looping in this case. Closes https://github.com/facebook/rocksdb/pull/3955 Differential Revision: D8286688 Pulled By: lth fbshipit-source-id: 25e42f47fdeb5f7accea0f4fd350ef35198caafe	7 years ago
Nikhil Benesch	17339dc2f3	Add table property tracking number of range deletions (#4016 ) Summary: Add a new table property, rocksdb.num.range-deletions, which tracks the number of range deletions in a block-based table. Range deletions are no longer counted in rocksdb.num.entries; as discovered in PR #3778, there are various code paths that implicitly assume that rocksdb.num.entries counts only true keys, not range deletions. /cc ajkr nvanbenschoten Closes https://github.com/facebook/rocksdb/pull/4016 Differential Revision: D8527575 Pulled By: ajkr fbshipit-source-id: 92e7edbe78fda53756a558013c9fb496e7764fd7	7 years ago
Zhongyi Xie	408205a36b	use user_key and iterate_upper_bound to determine compatibility of bloom filters (#3899 ) Summary: Previously in https://github.com/facebook/rocksdb/pull/3601 bloom filter will only be checked if `prefix_extractor` in the mutable_cf_options matches the one found in the SST file. This PR relaxes the requirement by checking if all keys in the range [user_key, iterate_upper_bound) all share the same prefix after transforming using the BF in the SST file. If so, the bloom filter is considered compatible and will continue to be looked at. Closes https://github.com/facebook/rocksdb/pull/3899 Differential Revision: D8157459 Pulled By: miasantreble fbshipit-source-id: 18d17cba56a1005162f8d5db7a27aba277089c41	7 years ago
Bas van Schaik	967aa8157a	Create lgtm.yml for LGTM.com C/C++ analysis (#4058 ) Summary: As discussed with thatsafunnyname [here](https://discuss.lgtm.com/t/c-c-lang-missing-for-facebook-rocksdb/1079): this configuration enables C/C++ analysis for RocksDB on LGTM.com. The initial commit will contain a build command (simple `make`) that previously resulted in a build error. The build log will then be available on LGTM.com for you to investigate (if you like). I'll immediately add a second commit to this PR to correct the build command to `make static_lib`, which worked when I tested it earlier today. If you like you can also enable automatic code review in pull requests. This will alert you to any new code issues before they actually get merged into `master`. Here's an example of how that works for the AMPHTML project: https://github.com/ampproject/amphtml/pull/13060. You can enable it yourself here: https://lgtm.com/projects/g/facebook/rocksdb/ci/. I'll also add a badge to your README.md in a separate commit — feel free to remove that from this PR if you don't like it. (Full disclosure: I'm part of the LGTM.com team 🙂. Ping samlanning) Closes https://github.com/facebook/rocksdb/pull/4058 Differential Revision: D8648410 Pulled By: ajkr fbshipit-source-id: 98d55fc19cff1b07268ac8425b63e764806065aa	7 years ago
Peter (Stig) Edwards	2694b6dc26	Remove unused imports, from python scripts. (#4057 ) Summary: Also remove redefined variable. As reported on https://lgtm.com/projects/g/facebook/rocksdb/ Closes https://github.com/facebook/rocksdb/pull/4057 Differential Revision: D8648342 Pulled By: ajkr fbshipit-source-id: afd2ba84d1364d316010179edd44777e64ca9183	7 years ago
Andrew Kryczka	a8e503e545	Fix universal compaction scheduling conflict with CompactFiles (#4055 ) Summary: Universal size-amp-triggered compaction was pulling the final sorted run into the compaction without checking whether any of its files are already being compacted. When all compactions are automatic, it is safe since it verifies the second-last sorted run is not already being compacted, which implies the last sorted run is also not being compacted (in automatic compaction multiple sorted runs are always compacted together). But with manual compaction, files in the last sorted run can be compacted independently, so the last sorted run also must be checked. We were seeing the below assertion failure in `db_stress`. Also the test case included in this PR repros the failure. ``` db_universal_compaction_test: db/compaction.cc:312: void rocksdb::Compaction::MarkFilesBeingCompacted(bool): Assertion `mark_as_compacted ? !inputs_[i][j]->being_compacted : inputs_[i][j]->being_compacted' failed. Aborted (core dumped) ``` Closes https://github.com/facebook/rocksdb/pull/4055 Differential Revision: D8630094 Pulled By: ajkr fbshipit-source-id: ac3b30a874678b76e113d4f6c42c1260411b08f8	7 years ago
Daniel Black	346d1069c3	Align StatisticsImpl / StatisticsData (#4036 ) Summary: Pinned the alignment of StatisticsData to the cacheline size rather than just extending its size (which could go over two cache lines)if unaligned in allocation. Avoid compile errors in the process as per individual commit messages. strengthen static_assert to CACHELINE rather than the highest common multiple. Closes https://github.com/facebook/rocksdb/pull/4036 Differential Revision: D8582844 Pulled By: yiwu-arbug fbshipit-source-id: 363c37029f28e6093e06c60b987bca9aa204bc71	7 years ago
Yi Wu	6d454d7376	BlobDB: is_fifo=true also evict non-TTL blob files (#4049 ) Summary: Previously with is_fifo=true we only evict TTL file. Changing it to also evict non-TTL files from oldest to newest, after exhausted TTL files. Closes https://github.com/facebook/rocksdb/pull/4049 Differential Revision: D8604597 Pulled By: yiwu-arbug fbshipit-source-id: bc4209ee27c1528ce4b72833e6f1e1bff80082c1	7 years ago
Sagar Vemuri	189f0c27aa	Make BlockBasedTableIterator compaction-aware (#4048 ) Summary: Pass in `for_compaction` to `BlockBasedTableIterator` via `BlockBasedTableReader::NewIterator`. In `7103559f49`, `for_compaction` was set in `BlockBasedTable::Rep` via `BlockBasedTable::SetupForCompaction`. In hindsight it was not the right decision; it also caused TSAN to complain. Closes https://github.com/facebook/rocksdb/pull/4048 Differential Revision: D8601056 Pulled By: sagar0 fbshipit-source-id: 30127e898c15c38c1080d57710b8c5a6d64a0ab3	7 years ago
Yi Wu	a71e467381	Blob DB: enable readahead for garbage collection (#3648 ) Summary: Enable readahead for blob DB garbage collection, which should improve GC performance a little bit. Closes https://github.com/facebook/rocksdb/pull/3648 Differential Revision: D7383791 Pulled By: yiwu-arbug fbshipit-source-id: 642b3327f7105eca85986d3fb2d8f960a3d83cf1	7 years ago
Yanqin Jin	2729dd72ad	Reclaim memory allocated to backup_engine. Summary: Closes https://github.com/facebook/rocksdb/pull/4045 Differential Revision: D8595609 Pulled By: riversand963 fbshipit-source-id: 5ba5954d804b82b0e7264b2e18e1da4c94103b53	7 years ago
Maysam Yabandeh	80ade9ad83	Pin top-level index on partitioned index/filter blocks (#4037 ) Summary: Top-level index in partitioned index/filter blocks are small and could be pinned in memory. So far we use that by cache_index_and_filter_blocks to false. This however make it difficult to keep account of the total memory usage. This patch introduces pin_top_level_index_and_filter which in combination with cache_index_and_filter_blocks=true keeps the top-level index in cache and yet pinned them to avoid cache misses and also cache lookup overhead. Closes https://github.com/facebook/rocksdb/pull/4037 Differential Revision: D8596218 Pulled By: maysamyabandeh fbshipit-source-id: 3a5f7f9ca6b4b525b03ff6bd82354881ae974ad2	7 years ago
Yi Wu	c726f7fda8	Fix dangling checkpoint pointer in db_stress (#4042 ) Summary: Fix db_stress failed to delete checkpoint pointer. It's caught by asan_crash test. Closes https://github.com/facebook/rocksdb/pull/4042 Differential Revision: D8592604 Pulled By: yiwu-arbug fbshipit-source-id: 7b2d67d5e3dfb05f71c33fcf320482303e97d3ef	7 years ago
Adam Retter	64c85d0d97	Set DEBUG_LEVEL=0 for RocksJava Mac Release (#4040 ) Summary: Closes https://github.com/facebook/rocksdb/issues/2717 Closes https://github.com/facebook/rocksdb/pull/4040 Differential Revision: D8592058 Pulled By: sagar0 fbshipit-source-id: d01099a1067aa32659abb0b4bed641d919a3927e	7 years ago
Zhongyi Xie	795e663df0	option for timing measurement of non-blocking ops during compaction (#4029 ) Summary: For example calling CompactionFilter is always timed and gives the user no way to disable. This PR will disable the timer if `Statistics::stats_level_` (which is part of DBOptions) is `kExceptDetailedTimers` Closes https://github.com/facebook/rocksdb/pull/4029 Differential Revision: D8583670 Pulled By: miasantreble fbshipit-source-id: 913be9fe433ae0c06e88193b59d41920a532307f	7 years ago
Andrew Kryczka	0a5b16c7c5	Cleanup staging directory at start of checkpoint (#4035 ) Summary: - Attempt to clean the checkpoint staging directory before starting a checkpoint. It was already cleaned up at the end of checkpoint. But it wasn't cleaned up in the edge case where the process crashed while staging checkpoint files. - Attempt to clean the checkpoint directory before calling `Checkpoint::Create` in `db_stress`. This handles the case where checkpoint directory was created by a previous `db_stress` run but the process crashed before cleaning it up. - Use `DestroyDB` for cleaning checkpoint directory since a checkpoint is a DB. Closes https://github.com/facebook/rocksdb/pull/4035 Reviewed By: yiwu-arbug Differential Revision: D8580223 Pulled By: ajkr fbshipit-source-id: 28c667400e249fad0fdedc664b349031b7b61599	7 years ago
Sagar Vemuri	645e57c22d	Assert for Direct IO at the beginning in PositionedRead (#3891 ) Summary: Moved the direct-IO assertion to the top in `PosixSequentialFile::PositionedRead`, as it doesn't make sense to check for sector alignments before checking for direct IO. Closes https://github.com/facebook/rocksdb/pull/3891 Differential Revision: D8267972 Pulled By: sagar0 fbshipit-source-id: 0ecf77c0fb5c35747a4ddbc15e278918c0849af7	7 years ago
Yi Wu	58c221440c	Update TARGETS file (#4028 ) Summary: -Wshorten-64-to-32 is invalid flag in fbcode. Changing it to -Warrowing. Closes https://github.com/facebook/rocksdb/pull/4028 Differential Revision: D8553694 Pulled By: yiwu-arbug fbshipit-source-id: 1523cbcb4c76cf1d2b10a4d28b5f58c78e6cb876	7 years ago
Yanqin Jin	397495964b	Fix a warning (treated as error) caused by type mismatch. Summary: Closes https://github.com/facebook/rocksdb/pull/4032 Differential Revision: D8573061 Pulled By: riversand963 fbshipit-source-id: 112324dcb35956d6b3ec891073f4f21493933c8b	7 years ago
Sagar Vemuri	7103559f49	Improve direct IO range scan performance with readahead (#3884 ) Summary: This PR extends the improvements in #3282 to also work when using Direct IO. We see 4.5X performance improvement in seekrandom benchmark doing long range scans, when using direct reads, on flash. Description: This change improves the performance of iterators doing long range scans (e.g. big/full index or table scans in MyRocks) by using readahead and prefetching additional data on each disk IO, and storing in a local buffer. This prefetching is automatically enabled on noticing more than 2 IOs for the same table file during iteration. The readahead size starts with 8KB and is exponentially increased on each additional sequential IO, up to a max of 256 KB. This helps in cutting down the number of IOs needed to complete the range scan. Implementation Details: - Used `FilePrefetchBuffer` as the underlying buffer to store the readahead data. `FilePrefetchBuffer` can now take file_reader, readahead_size and max_readahead_size as input to the constructor, and automatically do readahead. - `FilePrefetchBuffer::TryReadFromCache` can now call `FilePrefetchBuffer::Prefetch` if readahead is enabled. - `AlignedBuffer` (which is the underlying store for `FilePrefetchBuffer`) now takes a few additional args in `AlignedBuffer::AllocateNewBuffer` to allow copying data from the old buffer. - Made sure not to re-read partial chunks of data that were already available in the buffer, from device again. - Fixed a couple of cases where `AlignedBuffer::cursize_` was not being properly kept up-to-date. Constraints: - Similar to #3282, this gets currently enabled only when ReadOptions.readahead_size = 0 (which is the default value). - Since the prefetched data is stored in a temporary buffer allocated on heap, this could increase the memory usage if you have many iterators doing long range scans simultaneously. - Enabled only for user reads, and disabled for compactions. Compaction reads are controlled by the options `use_direct_io_for_flush_and_compaction` and `compaction_readahead_size`, and the current feature takes precautions not to mess with them. Benchmarks: I used the same benchmark as used in #3282. Data fill: ``` TEST_TMPDIR=/data/users/$USER/benchmarks/iter ./db_bench -benchmarks=fillrandom -num=1000000000 -compression_type="none" -level_compaction_dynamic_level_bytes ``` Do a long range scan: Seekrandom with large number of nexts ``` TEST_TMPDIR=/data/users/$USER/benchmarks/iter ./db_bench -benchmarks=seekrandom -use_direct_reads -duration=60 -num=1000000000 -use_existing_db -seek_nexts=10000 -statistics -histogram ``` ``` Before: seekrandom : 37939.906 micros/op 26 ops/sec; 29.2 MB/s (1636 of 1999 found) With this change: seekrandom : 8527.720 micros/op 117 ops/sec; 129.7 MB/s (6530 of 7999 found) ``` ~4.5X perf improvement. Taken on an average of 3 runs. Closes https://github.com/facebook/rocksdb/pull/3884 Differential Revision: D8082143 Pulled By: sagar0 fbshipit-source-id: 4d7a8561cbac03478663713df4d31ad2620253bb	7 years ago
Yanqin Jin	524c6e6b72	Add file name info to SequentialFileReader. (#4026 ) Summary: We potentially need this information for tracing, profiling and diagnosis. Closes https://github.com/facebook/rocksdb/pull/4026 Differential Revision: D8555214 Pulled By: riversand963 fbshipit-source-id: 4263e06c00b6d5410b46aa46eb4e358ff2161dd2	7 years ago
Andrew Kryczka	14cee194d6	Support file ingestion in stress test (#4018 ) Summary: Once per `ingest_external_file_one_in` operations, uses SstFileWriter to create a file containing `ingest_external_file_width` consecutive keys. The file is named containing the thread ID to avoid clashes. The file is then added to the DB using `IngestExternalFile`. We can't enable it by default in crash test because `nooverwritepercent` and `test_batches_snapshot` both must be zero for the DB's whole lifetime. Perhaps we should setup a separate test with that config as range deletion also requires it. Closes https://github.com/facebook/rocksdb/pull/4018 Differential Revision: D8507698 Pulled By: ajkr fbshipit-source-id: 1437ea26fd989349a9ce8b94117241c65e40f10f	7 years ago
Dmitri Smirnov	61d69d450d	Hide jemalloc aligned allocation functions into .cc (#4025 ) Summary: so they could be overriden Closes https://github.com/facebook/rocksdb/pull/4025 Differential Revision: D8526287 Pulled By: siying fbshipit-source-id: 9537b299dc907b4d1eeaf77a8784b13cb058280d	7 years ago
Maysam Yabandeh	28a9d8910b	Fix the bug with duplicate prefix in partition filters (#4024 ) Summary: https://github.com/facebook/rocksdb/pull/3764 introduced an optimization feature to skip duplicate prefix entires in full bloom filters. Unfortunately it also introduces a bug in partitioned full filters, where the duplicate prefix should still be inserted if it is in a new partition. The patch fixes the bug by resetting the duplicate detection logic each time a partition is cut. This bug could result into false negatives, which means that DB could skip an existing key. Closes https://github.com/facebook/rocksdb/pull/4024 Differential Revision: D8518866 Pulled By: maysamyabandeh fbshipit-source-id: 044f4d988e606a330ecafd8c79daceb68b8796bf	7 years ago
Siying Dong	92ee3350e0	BlockBasedTableIterator to keep BlockIter after out of upper bound (#4004 ) Summary: `b555ed30a4` makes the BlockBasedTableIterator to be invalidated if the current position if over the upper bound. However, this can bring performance regression to the case of multiple Seek()s hitting the same data block but all out of upper bound. For example, if an SST file has a data block containing following keys : {a, z} The user sets the upper bound to be "x", and it executed following queries: Seek("b") Seek("c") Seek("d") Before the upper bound optimization, these queries always come to this same current data block of the iterator, but now inside each Seek() the data block is read from the block cache but is returned again. To prevent this regression case, we keep the current data block iterator if it is upper bound. Closes https://github.com/facebook/rocksdb/pull/4004 Differential Revision: D8463192 Pulled By: siying fbshipit-source-id: 8710628b30acde7063a097c3184d6c4333a8ef81	7 years ago

... 3 4 5 6 7 ...

7450 Commits (d6f2ecf49c28fee225477d39e2a1535a87919afe) All Branches Search

7450 Commits (d6f2ecf49c28fee225477d39e2a1535a87919afe)

All Branches