rocksdb

Commit Graph

Author	SHA1	Message	Date
Akanksha Mahajan	fc51b7f33a	Fix clang error implicit conversion loses integer precision (#10323 ) Summary: Fix error: implicit conversion loses integer precision: 'size_t' (aka 'unsigned long') to 'uint32_t' (aka 'unsigned int') [-Werror,-Wshorten-64-to-32] Pull Request resolved: https://github.com/facebook/rocksdb/pull/10323 Test Plan: USE_CLANG=1 make -j32 Reviewed By: gitbw95 Differential Revision: D37688250 Pulled By: akankshamahajan15 fbshipit-source-id: 443873e41279ee8bdbe8452818549792047532fb	2 years ago
Akanksha Mahajan	2acbf386a3	Provide support for direct_reads with async_io (#10197 ) Summary: Provide support for use_direct_reads with async_io. TestPlan: - Updated unit tests - db_bench: Results in https://github.com/facebook/rocksdb/pull/10197#issuecomment-1159239420 - db_stress ``` export CRASH_TEST_EXT_ARGS=" --async_io=1 --use_direct_reads=1" make crash_test -j ``` - Ran db_bench on previous RocksDB version before any async_io implementation (as there have many changes in different PRs in this area) https://github.com/facebook/rocksdb/pull/10197#issuecomment-1160781563. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10197 Reviewed By: anand1976 Differential Revision: D37255646 Pulled By: akankshamahajan15 fbshipit-source-id: fec61ae15bf4d625f79dea56e4f86e0e307ba920	2 years ago
Akanksha Mahajan	c0e0f30667	Implement AbortIO using io_uring (#10125 ) Summary: Implement AbortIO in posix using io_uring to cancel any pending read requests submitted. Its cancelled using io_uring_prep_cancel which sets the IORING_OP_ASYNC_CANCEL flag. To cancel a request, the sqe must have ->addr set to the user_data of the request it wishes to cancel. If the request is cancelled successfully, the original request is completed with -ECANCELED and the cancel request is completed with a result of 0. If the request was already running, the original may or may not complete in error. The cancel request will complete with -EALREADY for that case. And finally, if the request to cancel wasn't found, the cancel request is completed with -ENOENT. Reference: https://kernel.dk/io_uring-whatsnew.pdf, https://lore.kernel.org/io-uring/d9a8d76d23690842f666c326631ecc2d85b6c1bc.1615566409.git.asml.silence@gmail.com/ Pull Request resolved: https://github.com/facebook/rocksdb/pull/10125 Test Plan: Existing Posix tests. Reviewed By: anand1976 Differential Revision: D36946970 Pulled By: akankshamahajan15 fbshipit-source-id: 3bc1f1521b3151d01a348fc6431eb3fc85db3a14	2 years ago
sdong	80afa77660	Run fadvise with mmap file (#10142 ) Summary: Right now with mmap file, we don't run fadvise following users' requests. There is no reason for that so this diff does that. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10142 Test Plan: A simple readrandom against files with page cache dropped shows latency improvement from 7.8 us to 2.8: ./db_bench -use_existing_db --benchmarks=readrandom --num=100 Reviewed By: anand1976 Differential Revision: D37074975 fbshipit-source-id: ccc72bcac1b5fd634eb8fa2b6a5d9afe332e0bf6	2 years ago
Zichen Zhu	65893ad959	Explicitly closing all directory file descriptors (#10049 ) Summary: Currently, the DB directory file descriptor is left open until the deconstruction process (`DB::Close()` does not close the file descriptor). To verify this, comment out the lines between `db_ = nullptr` and `db_->Close()` (line 512, 513, 514, 515 in ldb_cmd.cc) to leak the ``db_'' object, build `ldb` tool and run ``` strace --trace=open,openat,close ./ldb --db=$TEST_TMPDIR --ignore_unknown_options put K1 V1 --create_if_missing ``` There is one directory file descriptor that is not closed in the strace log. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10049 Test Plan: Add a new unit test DBBasicTest.DBCloseAllDirectoryFDs: Open a database with different WAL directory and three different data directories, and all directory file descriptors should be closed after calling Close(). Explicitly call Close() after a directory file descriptor is not used so that the counter of directory open and close should be equivalent. Reviewed By: ajkr, hx235 Differential Revision: D36722135 Pulled By: littlepig2013 fbshipit-source-id: 07bdc2abc417c6b30997b9bbef1f79aa757b21ff	2 years ago
Changyu Bi	8515bd50c9	Support read rate-limiting in SequentialFileReader (#9973 ) Summary: Added rate limiter and read rate-limiting support to SequentialFileReader. I've updated call sites to SequentialFileReader::Read with appropriate IO priority (or left a TODO and specified IO_TOTAL for now). The PR is separated into four commits: the first one added the rate-limiting support, but with some fixes in the unit test since the number of request bytes from rate limiter in SequentialFileReader are not accurate (there is overcharge at EOF). The second commit fixed this by allowing SequentialFileReader to check file size and determine how many bytes are left in the file to read. The third commit added benchmark related code. The fourth commit moved the logic of using file size to avoid overcharging the rate limiter into backup engine (the main user of SequentialFileReader). Pull Request resolved: https://github.com/facebook/rocksdb/pull/9973 Test Plan: - `make check`, backup_engine_test covers usage of SequentialFileReader with rate limiter. - Run db_bench to check if rate limiting is throttling as expected: Verified that reads and writes are together throttled at 2MB/s, and at 0.2MB chunks that are 100ms apart. - Set up: `./db_bench --benchmarks=fillrandom -db=/dev/shm/test_rocksdb` - Benchmark: ``` strace -ttfe read,write ./db_bench --benchmarks=backup -db=/dev/shm/test_rocksdb --backup_rate_limit=2097152 --use_existing_db strace -ttfe read,write ./db_bench --benchmarks=restore -db=/dev/shm/test_rocksdb --restore_rate_limit=2097152 --use_existing_db ``` - db bench on backup and restore to ensure no performance regression. - backup (avg over 50 runs): pre-change: 1.90443e+06 micros/op; post-change: 1.8993e+06 micros/op (improve by 0.2%) - restore (avg over 50 runs): pre-change: 1.79105e+06 micros/op; post-change: 1.78192e+06 micros/op (improve by 0.5%) ``` # Set up ./db_bench --benchmarks=fillrandom -db=/tmp/test_rocksdb -num=10000000 # benchmark TEST_TMPDIR=/tmp/test_rocksdb NUM_RUN=50 for ((j=0;j<$NUM_RUN;j++)) do ./db_bench -db=$TEST_TMPDIR -num=10000000 -benchmarks=backup -use_existing_db \| egrep 'backup' # Restore #./db_bench -db=$TEST_TMPDIR -num=10000000 -benchmarks=restore -use_existing_db done > rate_limit.txt && awk -v NUM_RUN=$NUM_RUN '{sum+=$3;sum_sqrt+=$3^2}END{print sum/NUM_RUN, sqrt(sum_sqrt/NUM_RUN-(sum/NUM_RUN)^2)}' rate_limit.txt >> rate_limit_2.txt ``` Reviewed By: hx235 Differential Revision: D36327418 Pulled By: cbi42 fbshipit-source-id: e75d4307cff815945482df5ba630c1e88d064691	2 years ago
Akanksha Mahajan	36bc3da97f	Fix segfault in FilePrefetchBuffer with async_io enabled (#9777 ) Summary: If FilePrefetchBuffer object is destroyed and then later Poll() calls callback on object which has been destroyed, it gives segfault on accessing destroyed object. It was caught after adding unit tests that tests Posix implementation of ReadAsync and Poll APIs. This PR also updates and fixes existing IOURing tests which were not running locally because RocksDbIOUringEnable function wasn't defined and IOUring was disabled for those tests Pull Request resolved: https://github.com/facebook/rocksdb/pull/9777 Test Plan: Added new unit test Reviewed By: anand1976 Differential Revision: D35254002 Pulled By: akankshamahajan15 fbshipit-source-id: 68e80054ffb14ae25c255920ebc6548ca5f130a1	3 years ago
Jay Zhuang	efd767d14a	Fix build for io_uring (#9690 ) Summary: Minor fix for build failure: ``` ./env/io_posix.h:68:33: error: unused parameter 'len' [-Werror=unused-parameter] 68 \| size_t len, size_t iov_len, bool async_read, \| ~~~~~~~^~~ ``` Only happens for release build with io_uring. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9690 Test Plan: build pass with io_uring Reviewed By: akankshamahajan15 Differential Revision: D34846347 Pulled By: jay-zhuang fbshipit-source-id: 2d7afb585097262d7722ef1beac486fc8ef28419	3 years ago
Akanksha Mahajan	8465cccde2	Posix API support for Async Read and Poll APIs (#9578 ) Summary: Provide support for Async Read and Poll in Posix file system using IOUring. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9578 Test Plan: In progress Reviewed By: anand1976 Differential Revision: D34690256 Pulled By: akankshamahajan15 fbshipit-source-id: 291cbd1380a3cb904b726c34c0560d1b2ce44a2e	3 years ago
Jay Zhuang	29102641dd	Skip directory fsync for filesystem btrfs (#8903 ) Summary: Directory fsync might be expensive on btrfs and it may not be needed. Here are 4 directory fsync cases: 1. creating a new file: dir-fsync is not needed on btrfs, as long as the new file itself is synced. 2. renaming a file: dir-fsync is not needed if the renamed file is synced. So an API `FsyncAfterFileRename(filename, ...)` is provided to sync the file on btrfs. By default, it just calls dir-fsync. 3. deleting files: dir-fsync is forced by set `IOOptions.force_dir_fsync = true` 4. renaming multiple files (like backup and checkpoint): dir-fsync is forced, the same as above. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8903 Test Plan: run tests on btrfs and non btrfs Reviewed By: ajkr Differential Revision: D30885059 Pulled By: jay-zhuang fbshipit-source-id: dd2730b31580b0bcaedffc318a762d7dbf25de4a	3 years ago
Zhichao Cao	0ce9b3a22d	Add AppendWithVerify and PositionedAppendWithVerify to Env and FileSystem (#7419 ) Summary: Add new AppendWithVerify and PositionedAppendWithVerify APIs to Env and FileSystem to bring the data verification information (data checksum information) from upper layer (e.g., WritableFileWriter) to the storage layer. This PR only include the API definition, no functional codes are added to unblock other developers which depend on these APIs. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7419 Test Plan: make -j32 Reviewed By: pdillinger Differential Revision: D23883196 Pulled By: zhichao-cao fbshipit-source-id: 94676c26bc56144cc32e3661f84f21eccd790411	4 years ago
Cheng Chang	2d9efc9ab2	Cache result of GetLogicalBufferSize in Linux (#6457 ) Summary: In Linux, when reopening DB with many SST files, profiling shows that 100% system cpu time spent for a couple of seconds for `GetLogicalBufferSize`. This slows down MyRocks' recovery time when site is down. This PR introduces two new APIs: 1. `Env::RegisterDbPaths` and `Env::UnregisterDbPaths` lets `DB` tell the env when it starts or stops using its database directories . The `PosixFileSystem` takes this opportunity to set up a cache from database directories to the corresponding logical block sizes. 2. `LogicalBlockSizeCache` is defined only for OS_LINUX to cache the logical block sizes. Other modifications: 1. rename `logical buffer size` to `logical block size` to be consistent with Linux terms. 2. declare `GetLogicalBlockSize` in `PosixHelper` to expose it to `PosixFileSystem`. 3. change the functions `IOError` and `IOStatus` in `env/io_posix.h` to have external linkage since they are used in other translation units too. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6457 Test Plan: 1. A new unit test is added for `LogicalBlockSizeCache` in `env/io_posix_test.cc`. 2. A new integration test is added for `DB` operations related to the cache in `db/db_logical_block_size_cache_test.cc`. `make check` Differential Revision: D20131243 Pulled By: cheng-chang fbshipit-source-id: 3077c50f8065c0bffb544d8f49fb10bba9408d04	5 years ago
Michael R. Crusoe	051696bf98	fix some spelling typos (#6464 ) Summary: Found from Debian's "Lintian" program Pull Request resolved: https://github.com/facebook/rocksdb/pull/6464 Differential Revision: D20162862 Pulled By: zhichao-cao fbshipit-source-id: 06941ee2437b038b2b8045becbe9d2c6fbff3e12	5 years ago
sdong	fdf882ded2	Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433 ) Summary: When dynamically linking two binaries together, different builds of RocksDB from two sources might cause errors. To provide a tool for user to solve the problem, the RocksDB namespace is changed to a flag which can be overridden in build time. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6433 Test Plan: Build release, all and jtest. Try to build with ROCKSDB_NAMESPACE with another flag. Differential Revision: D19977691 fbshipit-source-id: aa7f2d0972e1c31d75339ac48478f34f6cfcfb3e	5 years ago
anand76	afa2420c2b	Introduce a new storage specific Env API (#5761 ) Summary: The current Env API encompasses both storage/file operations, as well as OS related operations. Most of the APIs return a Status, which does not have enough metadata about an error, such as whether its retry-able or not, scope (i.e fault domain) of the error etc., that may be required in order to properly handle a storage error. The file APIs also do not provide enough control over the IO SLA, such as timeout, prioritization, hinting about placement and redundancy etc. This PR separates out the file/storage APIs from Env into a new FileSystem class. The APIs are updated to return an IOStatus with metadata about the error, as well as to take an IOOptions structure as input in order to allow more control over the IO. The user can set both ```options.env``` and ```options.file_system``` to specify that RocksDB should use the former for OS related operations and the latter for storage operations. Internally, a ```CompositeEnvWrapper``` has been introduced that inherits from ```Env``` and redirects individual methods to either an ```Env``` implementation or the ```FileSystem``` as appropriate. When options are sanitized during ```DB::Open```, ```options.env``` is replaced with a newly allocated ```CompositeEnvWrapper``` instance if both env and file_system have been specified. This way, the rest of the RocksDB code can continue to function as before. This PR also ports PosixEnv to the new API by splitting it into two - PosixEnv and PosixFileSystem. PosixEnv is defined as a sub-class of CompositeEnvWrapper, and threading/time functions are overridden with Posix specific implementations in order to avoid an extra level of indirection. The ```CompositeEnvWrapper``` translates ```IOStatus``` return code to ```Status```, and sets the severity to ```kSoftError``` if the io_status is retryable. The error handling code in RocksDB can then recover the DB automatically. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5761 Differential Revision: D18868376 Pulled By: anand1976 fbshipit-source-id: 39efe18a162ea746fabac6360ff529baba48486f	5 years ago
sdong	7d79b32618	Break db_stress_tool.cc to a list of source files (#6134 ) Summary: db_stress_tool.cc now is a giant file. In order to main it easier to improve and maintain, break it down to multiple source files. Most classes are turned into their own files. Separate .h and .cc files are created for gflag definiations. Another .h and .cc files are created for some common functions. Some test execution logic that is only loosely related to class StressTest is moved to db_stress_driver.h and db_stress_driver.cc. All the files are located under db_stress_tool/. The directory name is created as such because if we end it with either stress or test, .gitignore will ignore any file under it and makes it prone to issues in developements. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6134 Test Plan: Build under GCC7 with and without LITE on using GNU Make. Build with GCC 4.8. Build with cmake with -DWITH_TOOL=1 Differential Revision: D18876064 fbshipit-source-id: b25d0a7451840f31ac0f5ebb0068785f783fdf7d	5 years ago
sdong	e3a82bb934	PosixRandomAccessFile::MultiRead() to use I/O uring if supported (#5881 ) Summary: Right now, PosixRandomAccessFile::MultiRead() executes read requests in parallel. In this PR, it leverages I/O Uring library to run it in parallel, even when page cache is enabled. This function will fall back if the kernel version doesn't support it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5881 Test Plan: Run the unit test on a kernel version supporting it and make sure all tests pass, and run a unit test on kernel version supporting it and see it pass. Before merging, will also run stress test and see it passes. Differential Revision: D17742266 fbshipit-source-id: e05699c925ac04fdb42379456a4e23e4ebcb803a	5 years ago
Andrew Kryczka	8272a6de57	Optionally wait on bytes_per_sync to smooth I/O (#5183 ) Summary: The existing implementation does not guarantee bytes reach disk every `bytes_per_sync` when writing SST files, or every `wal_bytes_per_sync` when writing WALs. This can cause confusing behavior for users who enable this feature to avoid large syncs during flush and compaction, but then end up hitting them anyways. My understanding of the existing behavior is we used `sync_file_range` with `SYNC_FILE_RANGE_WRITE` to submit ranges for async writeback, such that we could continue processing the next range of bytes while that I/O is happening. I believe we can preserve that benefit while also limiting how far the processing can get ahead of the I/O, which prevents huge syncs from happening when the file finishes. Consider this `sync_file_range` usage: `sync_file_range(fd_, 0, static_cast<off_t>(offset + nbytes), SYNC_FILE_RANGE_WAIT_BEFORE \| SYNC_FILE_RANGE_WRITE)`. Expanding the range to start at 0 and adding the `SYNC_FILE_RANGE_WAIT_BEFORE` flag causes any pending writeback (like from a previous call to `sync_file_range`) to finish before it proceeds to submit the latest `nbytes` for writeback. The latest `nbytes` are still written back asynchronously, unless processing exceeds I/O speed, in which case the following `sync_file_range` will need to wait on it. There is a second change in this PR to use `fdatasync` when `sync_file_range` is unavailable (determined statically) or has some known problem with the underlying filesystem (determined dynamically). The above two changes only apply when the user enables a new option, `strict_bytes_per_sync`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5183 Differential Revision: D14953553 Pulled By: siying fbshipit-source-id: 445c3862e019fb7b470f9c7f314fc231b62706e9	6 years ago
Yanqin Jin	9358178edc	Support for single-primary, multi-secondary instances (#4899 ) Summary: This PR allows RocksDB to run in single-primary, multi-secondary process mode. The writer is a regular RocksDB (e.g. an `DBImpl`) instance playing the role of a primary. Multiple `DBImplSecondary` processes (secondaries) share the same set of SST files, MANIFEST, WAL files with the primary. Secondaries tail the MANIFEST of the primary and apply updates to their own in-memory state of the file system, e.g. `VersionStorageInfo`. This PR has several components: 1. (Originally in #4745). Add a `PathNotFound` subcode to `IOError` to denote the failure when a secondary tries to open a file which has been deleted by the primary. 2. (Similar to #4602). Add `FragmentBufferedReader` to handle partially-read, trailing record at the end of a log from where future read can continue. 3. (Originally in #4710 and #4820). Add implementation of the secondary, i.e. `DBImplSecondary`. 3.1 Tail the primary's MANIFEST during recovery. 3.2 Tail the primary's MANIFEST during normal processing by calling `ReadAndApply`. 3.3 Tailing WAL will be in a future PR. 4. Add an example in 'examples/multi_processes_example.cc' to demonstrate the usage of secondary RocksDB instance in a multi-process setting. Instructions to run the example can be found at the beginning of the source code. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4899 Differential Revision: D14510945 Pulled By: riversand963 fbshipit-source-id: 4ac1c5693e6012ad23f7b4b42d3c374fecbe8886	6 years ago
Andrew Kryczka	46152d53bf	Second attempt at db_stress crash-recovery verification Summary: - Original commit: `a4fb1f8c04` - Revert commit (we reverted as a quick fix to get crash tests passing): `6afe22db2e` This PR includes the contents of the original commit plus two bug fixes, which are: - In whitebox crash test, only set `--expected_values_path` for `db_stress` runs in the first half of the crash test's duration. In the second half, a fresh DB is created for each `db_stress` run, so we cannot maintain expected state across `db_stress` runs. - Made `Exists()` return true for `UNKNOWN_SENTINEL` values. I previously had an assert in `Exists()` that value was not `UNKNOWN_SENTINEL`. But it is possible for post-crash-recovery expected values to be `UNKNOWN_SENTINEL` (i.e., if the crash happens in the middle of an update), in which case this assertion would be tripped. The effect of returning true in this case is there may be cases where a `SingleDelete` deletes no data. But if we had returned false, the effect would be calling `SingleDelete` on a key with multiple older versions, which is not supported. Closes https://github.com/facebook/rocksdb/pull/3793 Differential Revision: D7811671 Pulled By: ajkr fbshipit-source-id: 67e0295bfb1695ff9674837f2e05bb29c50efc30	7 years ago
Andrew Kryczka	6afe22db2e	revert db_stress crash-recovery verification Summary: crash-recovery verification is failing in the whitebox testing, which may or may not be a valid correctness issue -- need more time to investigate. In the meantime, reverting so we don't mask other failures. Closes https://github.com/facebook/rocksdb/pull/3786 Differential Revision: D7794516 Pulled By: ajkr fbshipit-source-id: 28ccdfdb9ec9b3b0fb08c15cbf9d2e282201ff33	7 years ago
Andrew Kryczka	a4fb1f8c04	Add crash-recovery correctness check to db_stress Summary: Previously, our `db_stress` tool held the expected state of the DB in-memory, so after crash-recovery, there was no way to verify data correctness. This PR adds an option, `--expected_values_file`, which specifies a file holding the expected values. In black-box testing, the `db_stress` process can be killed arbitrarily, so updates to the `--expected_values_file` must be atomic. We achieve this by `mmap`ing the file and relying on `std::atomic<uint32_t>` for atomicity. Actually this doesn't provide a total guarantee on what we want as `std::atomic<uint32_t>` could, in theory, be translated into multiple stores surrounded by a mutex. We can verify our assumption by looking at `std::atomic::is_always_lock_free`. For the `mmap`'d file, we didn't have an existing way to expose its contents as a raw memory buffer. This PR adds it in the `Env::NewMemoryMappedFileBuffer` function, and `MemoryMappedFileBuffer` class. `db_crashtest.py` is updated to use an expected values file for black-box testing. On the first iteration (when the DB is created), an empty file is provided as `db_stress` will populate it when it runs. On subsequent iterations, that same filename is provided so `db_stress` can check the data is as expected on startup. Closes https://github.com/facebook/rocksdb/pull/3629 Differential Revision: D7463144 Pulled By: ajkr fbshipit-source-id: c8f3e82c93e045a90055e2468316be155633bd8b	7 years ago
Andrew Kryczka	5d68243e61	Comment out unused variables Summary: Submitting on behalf of another employee. Closes https://github.com/facebook/rocksdb/pull/3557 Differential Revision: D7146025 Pulled By: ajkr fbshipit-source-id: 495ca5db5beec3789e671e26f78170957704e77e	7 years ago
Igor Sugak	aba3409740	Back out "[codemod] - comment out unused parameters" Reviewed By: igorsugak fbshipit-source-id: 4a93675cc1931089ddd574cacdb15d228b1e5f37	7 years ago
David Lai	f4a030ce81	- comment out unused parameters Reviewed By: everiq, igorsugak Differential Revision: D7046710 fbshipit-source-id: 8e10b1f1e2aecebbfb229c742e214db887e5a461	7 years ago
Shaohua Li	eefd75a228	Stream Summary: Add a simple policy for NVMe write time life hint Closes https://github.com/facebook/rocksdb/pull/3095 Differential Revision: D6298030 Pulled By: shligit fbshipit-source-id: 9a72a42e32e92193af11599eb71f0cf77448e24d	7 years ago
Sagar Vemuri	72502cf227	Revert "comment out unused parameters" Summary: This reverts the previous commit `1d7048c598`, which broke the build. Did a `git revert 1d7048c`. Closes https://github.com/facebook/rocksdb/pull/2627 Differential Revision: D5476473 Pulled By: sagar0 fbshipit-source-id: 4756ff5c0dfc88c17eceb00e02c36176de728d06	7 years ago
Victor Gao	1d7048c598	comment out unused parameters Summary: This uses `clang-tidy` to comment out unused parameters (in functions, methods and lambdas) in fbcode. Cases that the tool failed to handle are fixed manually. Reviewed By: igorsugak Differential Revision: D5454343 fbshipit-source-id: 5dee339b4334e25e963891b519a5aa81fbf627b2	7 years ago
Siying Dong	3c327ac2d0	Change RocksDB License Summary: Closes https://github.com/facebook/rocksdb/pull/2589 Differential Revision: D5431502 Pulled By: siying fbshipit-source-id: 8ebf8c87883daa9daa54b2303d11ce01ab1f6f75	7 years ago
Siying Dong	857e9960be	Improve the error message for I/O related errors. Summary: Force people to write something other than file name while returning status for IOError. Closes https://github.com/facebook/rocksdb/pull/2493 Differential Revision: D5321309 Pulled By: siying fbshipit-source-id: 38bcf6c19e80831cd3e300a047e975cbb131d822	7 years ago
Siying Dong	d616ebea23	Add GPLv2 as an alternative license. Summary: Closes https://github.com/facebook/rocksdb/pull/2226 Differential Revision: D4967547 Pulled By: siying fbshipit-source-id: dd3b58ae1e7a106ab6bb6f37ab5c88575b125ab4	8 years ago
Aaron Gao	6616e4d629	add prefetch to PosixRandomAccessFile in buffered io Summary: Every time after a compaction/flush finish, we issue user reads to put the table into block cache which includes a couple of IO that read footer, index blocks, meta block, etc. So we implement Prefetch here to reduce IO. Closes https://github.com/facebook/rocksdb/pull/2196 Differential Revision: D4931782 Pulled By: lightmark fbshipit-source-id: 5a13d58dcab209964352322217193bbf7ff78149	8 years ago
Tomas Kolda	04d58970cb	AIX and Solaris Sparc Support Summary: Replacement of #2147 The change was squashed due to a lot of conflicts. Closes https://github.com/facebook/rocksdb/pull/2194 Differential Revision: D4929799 Pulled By: siying fbshipit-source-id: 5cd49c254737a1d5ac13f3c035f128e86524c581	8 years ago
Siying Dong	d2dce5611a	Move some files under util/ to separate dirs Summary: Move some files under util/ to new directories env/, monitoring/ options/ and cache/ Closes https://github.com/facebook/rocksdb/pull/2090 Differential Revision: D4833681 Pulled By: siying fbshipit-source-id: 2fd8bef	8 years ago
Aaron Gao	1ef5f50e84	detect logical sector size Summary: querying logical sector size from the device instead of hardcoding it for linux platform. Closes https://github.com/facebook/rocksdb/pull/1875 Differential Revision: D4591946 Pulled By: ajkr fbshipit-source-id: 4e9805c	8 years ago
Aaron Gao	286a36db7f	posix writablefile truncate Summary: we occasionally missing this call so the file size will be wrong Closes https://github.com/facebook/rocksdb/pull/1894 Differential Revision: D4598446 Pulled By: lightmark fbshipit-source-id: 42b6ef5	8 years ago
Marcin Dlugajczyk	a618a16f44	New subcode for IOError to detect the ESTALE errno Summary: I'd like to propose a patch to expose a new IOError type with subcode kStaleFile to allow to detect when ESTALE error is returned. This allows the rocksdb consumers to handle this error separately from other IOErrors. I've also added a missing string representation for the kDeadlock subcode, I believe calling ToString() on Status object with that subcode would result in an out of band access in the msgs array, Please let me know if you have any questions or would like me to make any changes to this pull request. Closes https://github.com/facebook/rocksdb/pull/1748 Differential Revision: D4387675 Pulled By: IslamAbdelRahman fbshipit-source-id: 67feb13	8 years ago
Siying Dong	f289d9f4ac	Fix OSX build break after the fallocate change Summary: The recent update about fallocate failed OSX build. Fix it. Closes https://github.com/facebook/rocksdb/pull/1830 Differential Revision: D4500235 Pulled By: siying fbshipit-source-id: a5f2b40	8 years ago
Siying Dong	0513e21f9b	RangeSync() should work with ROCKSDB_FALLOCATE_PRESENT not set Summary: Closes https://github.com/facebook/rocksdb/pull/1824 Differential Revision: D4493862 Pulled By: siying fbshipit-source-id: c168446	8 years ago
Aaron Gao	3e6899d116	change UseDirectIO() to use_direct_io() Summary: also change variable name `direct_io_` to `use_direct_io_` in WritableFile to make it consistent with read path. Closes https://github.com/facebook/rocksdb/pull/1770 Differential Revision: D4416435 Pulled By: lightmark fbshipit-source-id: 4143c53	8 years ago
Aaron Gao	dc2584eea0	direct reads refactor Summary: direct IO reads refactoring remove unnecessary classes and unified interfaces tested with db_bench need more change for options and ON/OFF for different files. Since disabled is default, it should be fine now Closes https://github.com/facebook/rocksdb/pull/1636 Differential Revision: D4307189 Pulled By: lightmark fbshipit-source-id: 6991e22	8 years ago
Aaron Gao	972f96b3fb	direct io write support Summary: rocksdb direct io support ``` [gzh@dev11575.prn2 ~/rocksdb] ./db_bench -benchmarks=fillseq --num=1000000 Initializing RocksDB Options from the specified file Initializing RocksDB Options from command-line flags RocksDB: version 5.0 Date: Wed Nov 23 13:17:43 2016 CPU: 40 * Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz CPUCache: 25600 KB Keys: 16 bytes each Values: 100 bytes each (50 bytes after compression) Entries: 1000000 Prefix: 0 bytes Keys per prefix: 0 RawSize: 110.6 MB (estimated) FileSize: 62.9 MB (estimated) Write rate: 0 bytes/second Compression: Snappy Memtablerep: skip_list Perf Level: 1 WARNING: Assertions are enabled; benchmarks unnecessarily slow ------------------------------------------------ Initializing RocksDB Options from the specified file Initializing RocksDB Options from command-line flags DB path: [/tmp/rocksdbtest-112628/dbbench] fillseq : 4.393 micros/op 227639 ops/sec; 25.2 MB/s [gzh@dev11575.prn2 ~/roc Closes https://github.com/facebook/rocksdb/pull/1564 Differential Revision: D4241093 Pulled By: lightmark fbshipit-source-id: 98c29e3	8 years ago
Maysam Yabandeh	9d60151b04	Implement PositionedAppend for PosixWritableFile Summary: This patch clarifies the contract of PositionedAppend with some unit tests and also implements it for PosixWritableFile. (Tasks: 14524071) Closes https://github.com/facebook/rocksdb/pull/1514 Differential Revision: D4204907 Pulled By: maysamyabandeh fbshipit-source-id: 06eabd2	8 years ago
Islam AbdelRahman	ba65c816bb	Support POSIX RandomRWFile Summary: Add Env::RandomRWFile in env.h and implement it for POSIX RandomRWFile is a file that allow us to read from / write to random offsets in the file I will implement it for other Envs later after finishing the whole task for AddFile() Test Plan: unit tests Reviewers: andrewkr, kradhakrishnan, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D62433	8 years ago
Edouard A	66a91e2607	Add NoSpace subcode to IOError (#1320 ) Add a sub code to distinguish "out of space" errors from regular I/O errors	8 years ago
krad	f89caa127b	Direct IO capability for RocksDB Summary: This patch adds direct IO capability to RocksDB Env. The direct IO capability is required for persistent cache since NVM is best accessed as 4K direct IO. SSDs can leverage direct IO for reading. Direct IO requires the offset and size be sector size aligned, and memory to be kernel page aligned. Since neither RocksDB/Persistent read cache data layout is aligned to sector size, the code can accommodate reading unaligned IO size (or unaligned memory) at the cost of an alloc/copy. The write code path expects the size and memory to be aligned. Test Plan: Run RocksDB unit tests Reviewers: sdong Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D57393	9 years ago
krad	bb98ca3c80	Implement GetUniqueId for Mac Summary: Persistent read cache relies on the accuracy of the GetUniqueIdFromFile to generate a unique key for a given block of data. Currently we don't have an implementation for Mac. This patch adds an implementation. Test Plan: Run unit tests Reviewers: sdong Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D58413	9 years ago
Baraa Hamodi	21e95811d1	Updated all copyright headers to the new format.	9 years ago
Dmitri Smirnov	5270b33bd3	Make use of portable `uint64_t` type to make possible file access in 64-bit. Currently, a signed off_t type is being used for the following interfaces for both offset and the length in bytes: * `Allocate` * `RangeSync` On Linux `off_t` is automatically either 32 or 64-bit depending on the platform. On Windows it is always a 32-bit signed long which limits file access and in particular space pre-allocation to effectively 2 Gb. Proposal is to replace off_t with uint64_t as a portable type always access files with 64-bit interfaces. May need to modify posix code but lack resources to test it.	9 years ago
Dmitri Smirnov	5421c9728b	Make use of portable `uint64_t` type to make possible file access in 64-bit. Currently, a signed off_t type is being used for the following interfaces for both offset and the length in bytes: * `Allocate` * `RangeSync` On Linux `off_t` is automatically either 32 or 64-bit depending on the platform. On Windows it is always a 32-bit signed long which limits file access and in particular space pre-allocation to effectively 2 Gb. Proposal is to replace off_t with uint64_t as a portable type always access files with 64-bit interfaces. May need to modify posix code but lack resources to test it.	9 years ago

34 Commits (27f3af596609f3b86af5080cc870c869e836e7f2)