rocksdb

Commit Graph

Author	SHA1	Message	Date
Peter Dillinger	27f3af5966	Fix serious FSDirectory use-after-Close bug (missing fsync) (#10460 ) Summary: TL;DR: due to a recent change, if you drop a column family, often that DB will no longer fsync after writing new SST files to remaining or new column families, which could lead to data loss on power loss. More bug detail: The intent of https://github.com/facebook/rocksdb/issues/10049 was to Close FSDirectory objects at DB::Close time rather than waiting for DB object destruction. Unfortunately, it also closes shared FSDirectory objects on DropColumnFamily (& destroy remaining handles), which can lead to use-after-Close on FSDirectory shared with remaining column families. Those "uses" are only Fsyncs (or redundant Closes). In the default Posix filesystem, an Fsync on a closed FSDirectory is a quiet no-op. Consequently (under most configurations), if you drop a column family, that DB will no longer fsync after writing new SST files to column families sharing the same directory (true under most configurations). More fix detail: Basically, this removes unnecessary Close ops on destroying ColumnFamilyData. We let `shared_ptr` take care of calling the destructor at the right time. If the intent was to require Close be called before destroying FSDirectory, that was not made clear by the author of FileSystem and was not at all enforced by https://github.com/facebook/rocksdb/issues/10049, which could have added `assert(fd_ == -1)` to `~PosixDirectory()` but did not. To keep this fix simple, we relax the unit test for https://github.com/facebook/rocksdb/issues/10049 to allow timely destruction of FSDirectory to suffice as Close (in CountedFileSystem). Added a TODO to revisit that. Also in this PR: * Added a TODO to share FSDirectory instances between DB and its column families. (Already shared among column families.) * Made DB::Close attempt to close all its open FSDirectory objects even if there is a failure in closing one. Also code clean-up around this logic. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10460 Test Plan: add an assert to check for use-after-Close. With that existing tests can detect the misuse. With fix, tests pass (except noted relaxing of unit test for https://github.com/facebook/rocksdb/issues/10049) Reviewed By: ajkr Differential Revision: D38357922 Pulled By: pdillinger fbshipit-source-id: d42079cadbedf0a969f03389bf586b3b4e1f9137	2 years ago
Jay Zhuang	fcccc412d7	Remove Travis CI (#10407 ) Summary: Travis CI is depreciated and haven't been maintained for some time. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10407 Reviewed By: ajkr Differential Revision: D38078382 Pulled By: jay-zhuang fbshipit-source-id: f42057f2f41f722bdce56bf195f67a94835191fb	2 years ago
Akanksha Mahajan	f6c4d7a576	Fix hang in MultiRead with O_DIRECT and io_uring (#10368 ) Summary: Fix bug in O_DIRECT and io_uring when its EOF and bytes_read = 0 because of wrong check, it got added into incomplete list and gets stuck in an infinite loop as it will always return bytes_read = 0. The bug was introduced by PR https://github.com/facebook/rocksdb/pull/10197 and that PR is not released yet in any release branch. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10368 Test Plan: Added new unit test Reviewed By: siying Differential Revision: D37885184 Pulled By: akankshamahajan15 fbshipit-source-id: 35b36a44b696d29b2f6f25301aa1b19547b4e03b	2 years ago
Akanksha Mahajan	2acbf386a3	Provide support for direct_reads with async_io (#10197 ) Summary: Provide support for use_direct_reads with async_io. TestPlan: - Updated unit tests - db_bench: Results in https://github.com/facebook/rocksdb/pull/10197#issuecomment-1159239420 - db_stress ``` export CRASH_TEST_EXT_ARGS=" --async_io=1 --use_direct_reads=1" make crash_test -j ``` - Ran db_bench on previous RocksDB version before any async_io implementation (as there have many changes in different PRs in this area) https://github.com/facebook/rocksdb/pull/10197#issuecomment-1160781563. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10197 Reviewed By: anand1976 Differential Revision: D37255646 Pulled By: akankshamahajan15 fbshipit-source-id: fec61ae15bf4d625f79dea56e4f86e0e307ba920	2 years ago
sdong	69a32eecab	Use madvise() for mmaped file advise (#10170 ) Summary: A recent PR https://github.com/facebook/rocksdb/pull/10142 enabled fadvise for mmaped file. However, we were told that it might not take effective and madvise() should be used. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10170 Test Plan: Run existing tests Run a benchmark using mmap with advise random and see I/O size is indeed small. Reviewed By: anand1976 Differential Revision: D37158582 fbshipit-source-id: 8b3a74f0e89d2e16aac78ee4124c05841d4135c3	2 years ago
sdong	80afa77660	Run fadvise with mmap file (#10142 ) Summary: Right now with mmap file, we don't run fadvise following users' requests. There is no reason for that so this diff does that. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10142 Test Plan: A simple readrandom against files with page cache dropped shows latency improvement from 7.8 us to 2.8: ./db_bench -use_existing_db --benchmarks=readrandom --num=100 Reviewed By: anand1976 Differential Revision: D37074975 fbshipit-source-id: ccc72bcac1b5fd634eb8fa2b6a5d9afe332e0bf6	2 years ago
zczhu	e88d8935ae	Add comments/permit unchecked error to close_db_dir pull requests (#10093 ) Summary: In [close_db_dir](https://github.com/facebook/rocksdb/pull/10049) pull request, some merging conflicts occurred (some comments and one line `s.PermitUncheckedError()` are missing). This pull request aims to put them back. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10093 Reviewed By: ajkr Differential Revision: D36884117 Pulled By: littlepig2013 fbshipit-source-id: 8c0e2a8793fc52804067c511843bd1ff4912c1c3	2 years ago
Zichen Zhu	65893ad959	Explicitly closing all directory file descriptors (#10049 ) Summary: Currently, the DB directory file descriptor is left open until the deconstruction process (`DB::Close()` does not close the file descriptor). To verify this, comment out the lines between `db_ = nullptr` and `db_->Close()` (line 512, 513, 514, 515 in ldb_cmd.cc) to leak the ``db_'' object, build `ldb` tool and run ``` strace --trace=open,openat,close ./ldb --db=$TEST_TMPDIR --ignore_unknown_options put K1 V1 --create_if_missing ``` There is one directory file descriptor that is not closed in the strace log. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10049 Test Plan: Add a new unit test DBBasicTest.DBCloseAllDirectoryFDs: Open a database with different WAL directory and three different data directories, and all directory file descriptors should be closed after calling Close(). Explicitly call Close() after a directory file descriptor is not used so that the counter of directory open and close should be equivalent. Reviewed By: ajkr, hx235 Differential Revision: D36722135 Pulled By: littlepig2013 fbshipit-source-id: 07bdc2abc417c6b30997b9bbef1f79aa757b21ff	2 years ago
sdong	736a7b5433	Remove own ToString() (#9955 ) Summary: ToString() is created as some platform doesn't support std::to_string(). However, we've already used std::to_string() by mistake for 16 months (in db/db_info_dumper.cc). This commit just remove ToString(). Pull Request resolved: https://github.com/facebook/rocksdb/pull/9955 Test Plan: Watch CI tests Reviewed By: riversand963 Differential Revision: D36176799 fbshipit-source-id: bdb6dcd0e3a3ab96a1ac810f5d0188f684064471	3 years ago
Akanksha Mahajan	36bc3da97f	Fix segfault in FilePrefetchBuffer with async_io enabled (#9777 ) Summary: If FilePrefetchBuffer object is destroyed and then later Poll() calls callback on object which has been destroyed, it gives segfault on accessing destroyed object. It was caught after adding unit tests that tests Posix implementation of ReadAsync and Poll APIs. This PR also updates and fixes existing IOURing tests which were not running locally because RocksDbIOUringEnable function wasn't defined and IOUring was disabled for those tests Pull Request resolved: https://github.com/facebook/rocksdb/pull/9777 Test Plan: Added new unit test Reviewed By: anand1976 Differential Revision: D35254002 Pulled By: akankshamahajan15 fbshipit-source-id: 68e80054ffb14ae25c255920ebc6548ca5f130a1	3 years ago
Akanksha Mahajan	8465cccde2	Posix API support for Async Read and Poll APIs (#9578 ) Summary: Provide support for Async Read and Poll in Posix file system using IOUring. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9578 Test Plan: In progress Reviewed By: anand1976 Differential Revision: D34690256 Pulled By: akankshamahajan15 fbshipit-source-id: 291cbd1380a3cb904b726c34c0560d1b2ce44a2e	3 years ago
Peter Dillinger	e7ac7363b4	Add to HISTORY and minor loose ends from #9294 , #9254 (#9386 ) Summary: Loose ends relate to mmap on 32-bit systems. (Testing is more complicated when the feature was completely disabled on 32-bit.) Pull Request resolved: https://github.com/facebook/rocksdb/pull/9386 Test Plan: CI Reviewed By: ajkr Differential Revision: D33590715 Pulled By: pdillinger fbshipit-source-id: f2637036a538a552200adee65b6765fce8cae27b	3 years ago
Yanqin Jin	1a8e9f0e07	Use fcntl(F_FULLFSYNC) on OS X (#9356 ) Summary: Closing https://github.com/facebook/rocksdb/issues/5954 fsync/fdatasync on Linux: ``` (fsync/fdatasync) includes writing through or flushing a disk cache if present. ``` However, on OS X and iOS: ``` (fsync) will flush all data from the host to the drive (i.e. the "permanent storage device"), the drive itself may not physically write the data to the platters for quite some time and it may be written in an out-of-order sequence. ``` Solution is to use `fcntl(F_FULLFSYNC)` on OS X so that we get the same persistence guarantee. According to OSX man page, ``` The F_FULLFSYNC fcntl asks the drive to flush all buffered data to permanent storage. ``` This suggests that it will be no faster than `fsync` on Linux, since Linux, according to its man page, ``` writing through or flushing a disk cache if present ``` It means Linux may not flush all data from disk cache. This is similar to bug reports/fixes in: - golang: https://github.com/golang/go/issues/26650 - leveldb: `296de8d5b8`. Not sure if we should fallback to fsync since we break persistence contract. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9356 Reviewed By: jay-zhuang Differential Revision: D33417416 Pulled By: riversand963 fbshipit-source-id: 475548ff9c5eaccde325e0f6842694271cbc8cb7	3 years ago
Adam Retter	c879910102	Fix fstatfs call for compilation on 32 bit systems (#9251 ) Summary: On some 32-bit systems, BTRFS_SUPER_MAGIC is unsigned while __fsword_t is signed. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9251 Reviewed By: ajkr Differential Revision: D32961651 Pulled By: pdillinger fbshipit-source-id: 78e85fc1336f304a21e4d5961e60957c90daed63	3 years ago
Jay Zhuang	29102641dd	Skip directory fsync for filesystem btrfs (#8903 ) Summary: Directory fsync might be expensive on btrfs and it may not be needed. Here are 4 directory fsync cases: 1. creating a new file: dir-fsync is not needed on btrfs, as long as the new file itself is synced. 2. renaming a file: dir-fsync is not needed if the renamed file is synced. So an API `FsyncAfterFileRename(filename, ...)` is provided to sync the file on btrfs. By default, it just calls dir-fsync. 3. deleting files: dir-fsync is forced by set `IOOptions.force_dir_fsync = true` 4. renaming multiple files (like backup and checkpoint): dir-fsync is forced, the same as above. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8903 Test Plan: run tests on btrfs and non btrfs Reviewed By: ajkr Differential Revision: D30885059 Pulled By: jay-zhuang fbshipit-source-id: dd2730b31580b0bcaedffc318a762d7dbf25de4a	3 years ago
anand76	7743f033b1	More robust checking of IO uring completion data (#8894 ) Summary: Potential bugs in the IO uring implementation can cause bad data to be returned in the completion queue. Add some checks in the PosixRandomAccessFile::MultiRead completion handling code to catch such errors and fail the entire MultiRead. Also log some diagnostic messages and stack trace. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8894 Reviewed By: siying, pdillinger Differential Revision: D30826982 Pulled By: anand1976 fbshipit-source-id: af91815ac760e095d6cc0466cf8bd5c10167fd15	3 years ago
sdong	60e5af83c1	Handle return code by io_uring_submit_and_wait() and io_uring_wait_cqe() (#8311 ) Summary: Right now return codes by io_uring_submit_and_wait() and io_uring_wait_cqe() are not handled. It is not the good practice. Although these two functions are not supposed to return non-0 values in normal exeuction, people suspect that they might return non-0 value when an interruption happens, and the code might cause hanging. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8311 Test Plan: Make sure at least normal test cases still pass. Reviewed By: anand1976 Differential Revision: D28500828 fbshipit-source-id: 8a76cea9cafbd041102e0b6a8eef9d0bfed7c211	4 years ago
sdong	e19908cba6	Refactor kill point (#8241 ) Summary: Refactor kill point to one single class, rather than several extern variables. The intention was to drop unflushed data before killing to simulate some job, and I tried to a pointer to fault ingestion fs to the killing class, but it ended up with harder than I thought. Perhaps we'll need to do this in another way. But I thought the refactoring itself is good so I send it out. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8241 Test Plan: make release and run crash test for a while. Reviewed By: anand1976 Differential Revision: D28078486 fbshipit-source-id: f9182c1455f52e6851c13f88a21bade63bcec45f	4 years ago
anand76	7d7f14480e	Always truncate the latest WAL file on DB Open (#8122 ) Summary: Currently, we only truncate the latest alive WAL files when the DB is opened. If the latest WAL file is empty or was flushed during Open, its not truncated since the file will be deleted later on in the Open path. However, before deletion, a new WAL file is created, and if the process crash loops between the new WAL file creation and deletion of the old WAL file, the preallocated space will keep accumulating and eventually use up all disk space. To prevent this, always truncate the latest WAL file, even if its empty or the data was flushed. Tests: Add unit tests to db_wal_test Pull Request resolved: https://github.com/facebook/rocksdb/pull/8122 Reviewed By: riversand963 Differential Revision: D27366132 Pulled By: anand1976 fbshipit-source-id: f923cc03ef033ccb32b140d36c6a63a8152f0e8e	4 years ago
Jay Zhuang	45c65d6dcf	Use thread-safe `strerror_r()` to get error message (#8087 ) Summary: `strerror()` is not thread-safe, using `strerror_r()` instead. The API could be different on the different platforms, used the code from `0deef031cb/folly/String.cpp (L457)` Pull Request resolved: https://github.com/facebook/rocksdb/pull/8087 Reviewed By: mrambacher Differential Revision: D27267151 Pulled By: jay-zhuang fbshipit-source-id: 4b8856d1ec069d5f239b764750682c56e5be9ddb	4 years ago
Yanqin Jin	394210f280	Remove unused includes (#7604 ) Summary: This is a PR generated semi-automatically by an internal tool to remove unused includes and `using` statements. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7604 Test Plan: make check Reviewed By: ajkr Differential Revision: D24579392 Pulled By: riversand963 fbshipit-source-id: c4bfa6c6b08da1de186690d37eb73d8fff45aecd	4 years ago
mrambacher	d9d190742c	Make env_test work with ASSERT_STATUS_CHECKED (#7176 ) Summary: Make (most of) the env_test pass when ASSERT_STATUS_CHECKED is enabled. One test that opens a database is currently disabled in this mode, as there are many errors that need revisited for DB tests and status checks. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7176 Reviewed By: cheng-chang Differential Revision: D22799278 Pulled By: ajkr fbshipit-source-id: 16d8a02eaeecd6df1060249b6a5811292801f2ed	4 years ago
Peter Dillinger	88b4210701	Remove racially charged terms "whitelist" and "blacklist" (#7008 ) Summary: We don't need them. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7008 Test Plan: "make check" and ensure "make crash_test" starts Reviewed By: ajkr Differential Revision: D22143838 Pulled By: pdillinger fbshipit-source-id: 72c8e16603abc59f4954e304466bc4dc1f58f94e	4 years ago
Akanksha Mahajan	a1523efcdf	Status check enforcement for io_posix_test and options_settable_test (#6857 ) Summary: Added status check enforcement for io_posix_test and options_settable_test Pull Request resolved: https://github.com/facebook/rocksdb/pull/6857 Test Plan: ASSERT_STATUS_CHECKED=1 make -j48 check Reviewed By: ajkr Differential Revision: D21647904 Pulled By: akankshamahajan15 fbshipit-source-id: b7f2321eb6c141a88cd5e1270ecb7d58f00341af	5 years ago
Cheng Chang	ada700b906	Re-read the whole request in direct IO mode when IO uring returns partial result (#6853 ) Summary: If both direct IO and IO uring are enabled, when IO uring returns partial result, we'll try to read the remaining part of the request, but the starting address/offset of the remaining part might not be aligned to the block size, in direct IO mode, the unaligned offset causes bug. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6853 Test Plan: run make check with both direct IO and IO uring enabled, this is covered by one of the continuous tests. Reviewed By: anand1976 Differential Revision: D21603023 Pulled By: cheng-chang fbshipit-source-id: 942f6a11ff21e1892af6c4464e02bab4c707787c	5 years ago
Cheng Chang	1758f76f2d	Fix unused variable of r in release mode (#6750 ) Summary: In release mode, asserts are not compiled, so `r` is not used, causing compiler warnings. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6750 Test Plan: make check under release mode Reviewed By: anand1976 Differential Revision: D21220365 Pulled By: cheng-chang fbshipit-source-id: fd4afa9843d54af68c4da8660ec61549803e1167	5 years ago
Cheng Chang	51bdfae010	Check alignment of MultiRead requests in direct IO mode (#6739 ) Summary: Add assertions to check direct IO's alignment requirements in MultiRead. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6739 Test Plan: make check Reviewed By: siying Differential Revision: D21143825 Pulled By: cheng-chang fbshipit-source-id: 26f1623b062a1851080771128feac0669a61f5e9	5 years ago
phantomape	cb671ea1ca	env: Add clearerr() before repeating an interrupted file read (#6609 ) Summary: This change updates PosixSequentialFile::Read to call clearerr() before fread()ing again after an EINTR is returned on a previous fread. The original fix is from `bd8f1ebb91`. Fixing https://github.com/facebook/rocksdb/issues/6509 Signed-off-by: phantomape <cxucheng@outlook.com> Pull Request resolved: https://github.com/facebook/rocksdb/pull/6609 Reviewed By: zhichao-cao Differential Revision: D20731482 Pulled By: riversand963 fbshipit-source-id: 7f1f3a1449077d5560f45c465a78d08633740ba0	5 years ago
Cheng Chang	2d9efc9ab2	Cache result of GetLogicalBufferSize in Linux (#6457 ) Summary: In Linux, when reopening DB with many SST files, profiling shows that 100% system cpu time spent for a couple of seconds for `GetLogicalBufferSize`. This slows down MyRocks' recovery time when site is down. This PR introduces two new APIs: 1. `Env::RegisterDbPaths` and `Env::UnregisterDbPaths` lets `DB` tell the env when it starts or stops using its database directories . The `PosixFileSystem` takes this opportunity to set up a cache from database directories to the corresponding logical block sizes. 2. `LogicalBlockSizeCache` is defined only for OS_LINUX to cache the logical block sizes. Other modifications: 1. rename `logical buffer size` to `logical block size` to be consistent with Linux terms. 2. declare `GetLogicalBlockSize` in `PosixHelper` to expose it to `PosixFileSystem`. 3. change the functions `IOError` and `IOStatus` in `env/io_posix.h` to have external linkage since they are used in other translation units too. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6457 Test Plan: 1. A new unit test is added for `LogicalBlockSizeCache` in `env/io_posix_test.cc`. 2. A new integration test is added for `DB` operations related to the cache in `db/db_logical_block_size_cache_test.cc`. `make check` Differential Revision: D20131243 Pulled By: cheng-chang fbshipit-source-id: 3077c50f8065c0bffb544d8f49fb10bba9408d04	5 years ago
sdong	942eaba091	Handle io_uring partial results (#6441 ) Summary: The logic that handles io_uring partial results was wrong. Fix the logic by putting it into a queue and continue reading. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6441 Test Plan: Make sure this patch fixes the application test case where the bug was discovered; in env_test, add a unit test that simulates partial results and make sure the results are still correct. Differential Revision: D20018616 fbshipit-source-id: 5398a7e34d74c26d52aa69dfd604e93e95d99c62	5 years ago
sdong	fdf882ded2	Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433 ) Summary: When dynamically linking two binaries together, different builds of RocksDB from two sources might cause errors. To provide a tool for user to solve the problem, the RocksDB namespace is changed to a flag which can be overridden in build time. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6433 Test Plan: Build release, all and jtest. Try to build with ROCKSDB_NAMESPACE with another flag. Differential Revision: D19977691 fbshipit-source-id: aa7f2d0972e1c31d75339ac48478f34f6cfcfb3e	5 years ago
anand76	afa2420c2b	Introduce a new storage specific Env API (#5761 ) Summary: The current Env API encompasses both storage/file operations, as well as OS related operations. Most of the APIs return a Status, which does not have enough metadata about an error, such as whether its retry-able or not, scope (i.e fault domain) of the error etc., that may be required in order to properly handle a storage error. The file APIs also do not provide enough control over the IO SLA, such as timeout, prioritization, hinting about placement and redundancy etc. This PR separates out the file/storage APIs from Env into a new FileSystem class. The APIs are updated to return an IOStatus with metadata about the error, as well as to take an IOOptions structure as input in order to allow more control over the IO. The user can set both ```options.env``` and ```options.file_system``` to specify that RocksDB should use the former for OS related operations and the latter for storage operations. Internally, a ```CompositeEnvWrapper``` has been introduced that inherits from ```Env``` and redirects individual methods to either an ```Env``` implementation or the ```FileSystem``` as appropriate. When options are sanitized during ```DB::Open```, ```options.env``` is replaced with a newly allocated ```CompositeEnvWrapper``` instance if both env and file_system have been specified. This way, the rest of the RocksDB code can continue to function as before. This PR also ports PosixEnv to the new API by splitting it into two - PosixEnv and PosixFileSystem. PosixEnv is defined as a sub-class of CompositeEnvWrapper, and threading/time functions are overridden with Posix specific implementations in order to avoid an extra level of indirection. The ```CompositeEnvWrapper``` translates ```IOStatus``` return code to ```Status```, and sets the severity to ```kSoftError``` if the io_status is retryable. The error handling code in RocksDB can then recover the DB automatically. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5761 Differential Revision: D18868376 Pulled By: anand1976 fbshipit-source-id: 39efe18a162ea746fabac6360ff529baba48486f	5 years ago
sdong	d1ae2c3faf	Fix an asan warning caused by the recent io_uring change (#6135 ) Summary: ASAN reports: internal_repo_rocksdb/repo:db_test - MultiThreaded/MultiThreadedDBTest.MultiThreaded/43: fatal ==2692739==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6130000500ca at pc 0x0000006be780 bp 0x7efef85ccd20 sp 0x7efef85cc4d0 [CONTEXT] === How to use this, how to get the raw stack trace, and more: fburl.com/ASAN === [CONTEXT] READ of size 331 at 0x6130000500ca thread T195 [CONTEXT] #0 db_test_bin+0x6be77f __interceptor_strlen.part.35 [CONTEXT] https://github.com/facebook/rocksdb/issues/1 internal_repo_rocksdb/repo/include/rocksdb/slice.h:55 rocksdb::Slice::Slice(char const) [CONTEXT] https://github.com/facebook/rocksdb/issues/2 internal_repo_rocksdb/repo/env/io_posix.cc:522 rocksdb::PosixRandomAccessFile::MultiRead(rocksdb::ReadRequest, unsigned long) I looked at env/io_posix.cc:522 but don't see a reason why the line needs to be there at all, because it is not used before overwritten. So it must be a line that is put there as a bug. Remove it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6135 Test Plan: Rerun the same test which passes after the fix. Run all the tests and make sure they all pass. Differential Revision: D18880251 fbshipit-source-id: 3b84ac6a05b67b529c4202e0ceb4c047460f44f2	5 years ago
sdong	e3a82bb934	PosixRandomAccessFile::MultiRead() to use I/O uring if supported (#5881 ) Summary: Right now, PosixRandomAccessFile::MultiRead() executes read requests in parallel. In this PR, it leverages I/O Uring library to run it in parallel, even when page cache is enabled. This function will fall back if the kernel version doesn't support it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5881 Test Plan: Run the unit test on a kernel version supporting it and make sure all tests pass, and run a unit test on kernel version supporting it and see it pass. Before merging, will also run stress test and see it passes. Differential Revision: D17742266 fbshipit-source-id: e05699c925ac04fdb42379456a4e23e4ebcb803a	5 years ago
Richard He	cfc20019d1	Fixed FALLOC_FL_KEEP_SIZE undefined (#5614 ) Summary: Fix `error: ‘FALLOC_FL_KEEP_SIZE’` undeclared error in `io_posix.cc` during Vagrant build in CentOS as per issue https://github.com/facebook/rocksdb/issues/5599 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5614 Differential Revision: D17217960 fbshipit-source-id: ef736c51b16833107fd9ccc7917ed1def2a8d02c	5 years ago
sheng qiu	c762efc4a9	fix compile error: ‘FALLOC_FL_KEEP_SIZE’ undeclared (#5708 ) Summary: add "linux/falloc.h" in env/io_posix.cc to fix compile error: ‘FALLOC_FL_KEEP_SIZE’ undeclared Signed-off-by: sheng qiu <herbert1984106@gmail.com> Pull Request resolved: https://github.com/facebook/rocksdb/pull/5708 Differential Revision: D16832922 fbshipit-source-id: 30e787c4a1b5a9724a8acfd68962ff5ec5f27d3e	5 years ago
Yi Wu	849a8c0ae0	fix sign compare warnings (#5651 ) Summary: Fix -Wsign-compare warnings for gcc9. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5651 Test Plan: Tested with ubuntu19.10+gcc9 Differential Revision: D16567428 fbshipit-source-id: 730b2704d42ba0c4e4ea946a3199bbb34be4c25c	5 years ago
ggaurav28	60d8b19836	Implemented a file logger that uses WritableFileWriter (#5491 ) Summary: Current PosixLogger performs IO operations using posix calls. Thus the current implementation will not work for non-posix env. Created a new logger class EnvLogger that uses env specific WritableFileWriter for IO operations. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5491 Test Plan: make check Differential Revision: D15909002 Pulled By: ggaurav28 fbshipit-source-id: 13a8105176e8e42db0c59798d48cb6a0dbccc965	5 years ago
Andrew Kryczka	2c9df9f9e5	Dynamic test whether sync_file_range returns ENOSYS (#5416 ) Summary: `sync_file_range` returns `ENOSYS` on Windows Subsystem for Linux even when using a supposedly supported filesystem like ext4. To handle this case we can do a dynamic check that a no-op `sync_file_range` invocation, which is accomplished by passing zero for the `flags` argument, succeeds. Also I rearranged the function and comments to hopefully make it more easily understandable. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5416 Differential Revision: D15807061 fbshipit-source-id: d31d94e1f228b7850ea500e6199f8b5daf8cfbd3	5 years ago
Siying Dong	000b9ec217	Move some logging related files to logging/ (#5387 ) Summary: Many logging related source files are under util/. It will be more structured if they are together. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5387 Differential Revision: D15579036 Pulled By: siying fbshipit-source-id: 3850134ed50b8c0bb40a0c8ae1f184fa4081303f	6 years ago
Siying Dong	8843129ece	Move some memory related files from util/ to memory/ (#5382 ) Summary: Move arena, allocator, and memory tools under util to a separate memory/ directory. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5382 Differential Revision: D15564655 Pulled By: siying fbshipit-source-id: 9cd6b5d0d3d52b39606e19221fa154596e5852a5	6 years ago
Siying Dong	e9e0101ca4	Move test related files under util/ to test_util/ (#5377 ) Summary: There are too many types of files under util/. Some test related files don't belong to there or just are just loosely related. Mo ve them to a new directory test_util/, so that util/ is cleaner. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5377 Differential Revision: D15551366 Pulled By: siying fbshipit-source-id: 0f5c8653832354ef8caa31749c0143815d719e2c	6 years ago
Raphael Bost	468ca61105	Break large file writes into 1GB chunks (#5213 ) Summary: This is a workaround for the issue described in #5169. It has been tested on a database with very large values, but not dedicated test has been added to the code base. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5213 Differential Revision: D15243116 Pulled By: siying fbshipit-source-id: e0c226a6cd71a60924dcd7ce7af74abcb4054484	6 years ago
Andrew Kryczka	8272a6de57	Optionally wait on bytes_per_sync to smooth I/O (#5183 ) Summary: The existing implementation does not guarantee bytes reach disk every `bytes_per_sync` when writing SST files, or every `wal_bytes_per_sync` when writing WALs. This can cause confusing behavior for users who enable this feature to avoid large syncs during flush and compaction, but then end up hitting them anyways. My understanding of the existing behavior is we used `sync_file_range` with `SYNC_FILE_RANGE_WRITE` to submit ranges for async writeback, such that we could continue processing the next range of bytes while that I/O is happening. I believe we can preserve that benefit while also limiting how far the processing can get ahead of the I/O, which prevents huge syncs from happening when the file finishes. Consider this `sync_file_range` usage: `sync_file_range(fd_, 0, static_cast<off_t>(offset + nbytes), SYNC_FILE_RANGE_WAIT_BEFORE \| SYNC_FILE_RANGE_WRITE)`. Expanding the range to start at 0 and adding the `SYNC_FILE_RANGE_WAIT_BEFORE` flag causes any pending writeback (like from a previous call to `sync_file_range`) to finish before it proceeds to submit the latest `nbytes` for writeback. The latest `nbytes` are still written back asynchronously, unless processing exceeds I/O speed, in which case the following `sync_file_range` will need to wait on it. There is a second change in this PR to use `fdatasync` when `sync_file_range` is unavailable (determined statically) or has some known problem with the underlying filesystem (determined dynamically). The above two changes only apply when the user enables a new option, `strict_bytes_per_sync`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5183 Differential Revision: D14953553 Pulled By: siying fbshipit-source-id: 445c3862e019fb7b470f9c7f314fc231b62706e9	6 years ago
Zhongyi Xie	3c5eed5ebe	remove incorrect assert in `GetUniqueIdFromFile` (#5102 ) Summary: User report has shown that sometimes `BlockBasedTable::SetupCacheKeyPrefix` would assert when trying to generate an id from the file. The actual cause seems to be hardware related but we might be better off without the incorrect assertion See T42178927 for more information Pull Request resolved: https://github.com/facebook/rocksdb/pull/5102 Differential Revision: D14604677 Pulled By: miasantreble fbshipit-source-id: fcb09207ebdc4fa66e941afbc0523d84797e7ad7	6 years ago
Rashmi Sharma	a4396f9218	Make it easier for users to load options from option file and set shared block cache. (#5063 ) Summary: [RocksDB] Make it easier for users to load options from option file and set shared block cache. Right now, it requires several dynamic casting for users to set the shared block cache to their option struct cast from the option file. If people don't do that, every CF of every DB will generate its own 8MB block cache. It's not a usable setting. So we are dragging every user who loads options from the file into such a mess. Instead, we should allow them to pass their cache object to LoadLatestOptions() and LoadOptionsFromFile(), so that those loaded option structs will have the shared block cache. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5063 Differential Revision: D14518584 Pulled By: rashmishrm fbshipit-source-id: c91430ff9425a0e67d76fc67931d755f491ca5aa	6 years ago
Andrew Kryczka	186b3afaa8	Use `fallocate` even if hole-punching unsupported (#5023 ) Summary: The compiler flag `-DROCKSDB_FALLOCATE_PRESENT` was only set when `fallocate`, `FALLOC_FL_KEEP_SIZE`, and `FALLOC_FL_PUNCH_HOLE` were all present. However, the last of the three is not really necessary for the primary `fallocate` use case; furthermore, it was introduced only in later Linux kernel versions (2.6.38+). This PR changes the flag `-DROCKSDB_FALLOCATE_PRESENT` to only require `fallocate` and `FALLOC_FL_KEEP_SIZE` to be present. There is a separate check for `FALLOC_FL_PUNCH_HOLE` only in the place where it is used. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5023 Differential Revision: D14248487 Pulled By: siying fbshipit-source-id: a10ed0b902fa755988e957bd2dcec9081ec0502e	6 years ago
Young Tack Jin	4091597c67	fix for nvme device path (#4866 ) Summary: nvme device path doesn't have "block" as like "nvme/nvme0/nvme0n1" or "nvme/nvme0/nvme0n1/nvme0n1p1". the last directory such as "nvme0n1p1" should be removed if nvme drive is partitioned. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4866 Differential Revision: D13627824 Pulled By: riversand963 fbshipit-source-id: 09ab968f349f3dbb890beea20193f1359b17d317	6 years ago
Sagar Vemuri	645e57c22d	Assert for Direct IO at the beginning in PositionedRead (#3891 ) Summary: Moved the direct-IO assertion to the top in `PosixSequentialFile::PositionedRead`, as it doesn't make sense to check for sector alignments before checking for direct IO. Closes https://github.com/facebook/rocksdb/pull/3891 Differential Revision: D8267972 Pulled By: sagar0 fbshipit-source-id: 0ecf77c0fb5c35747a4ddbc15e278918c0849af7	6 years ago
奏之章	6e08916eb3	Fix Fadvise on closed file when reads use mmap Summary: ```PosixMmapReadableFile::fd_``` is closed after created, but needs to remain open for the lifetime of `PosixMmapReadableFile` since it is used whenever `InvalidateCache` is called. Closes https://github.com/facebook/rocksdb/pull/2764 Differential Revision: D8152515 Pulled By: ajkr fbshipit-source-id: b738a6a55ba4e392f9b0f374ff396a1e61c64f65	7 years ago

1 2

79 Commits (27f3af596609f3b86af5080cc870c869e836e7f2)