Summary:
Provide support for Async Read and Poll in Posix file system using IOUring.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9578
Test Plan: In progress
Reviewed By: anand1976
Differential Revision: D34690256
Pulled By: akankshamahajan15
fbshipit-source-id: 291cbd1380a3cb904b726c34c0560d1b2ce44a2e
Summary:
Update the signature of Poll and ReadAsync APIs in filesystem.
Instead of unique_ptr, void** will be passed as io_handle and the delete function.
io_handle and delete function should be provided by underlying
FileSystem and its lifetime will be maintained by RocksDB. io_handle
will be deleted by RocksDB once callback is made to update the results or Poll is
called to get the results.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9623
Test Plan: Add a new unit test.
Reviewed By: anand1976
Differential Revision: D34403529
Pulled By: akankshamahajan15
fbshipit-source-id: ea185a5f4c7bec334631e4f781ea7ba4135645f0
Summary:
Thread-pool pops a thread function and then run the function,
which may cause thread-pool is empty but the last function is still
running.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9502
Test Plan:
`gtest-parallel ./env_test
--gtest_filter=DefaultEnvWithoutDirectIO/EnvPosixTestWithParam.RunMany/0
-r 10000 -w 1000`
Reviewed By: ajkr
Differential Revision: D34011184
Pulled By: jay-zhuang
fbshipit-source-id: 8c38bef155205bef96fd1c988dcc643a6b2ac270
Summary:
Added a CountedFileSystem that tracks a number of file operations (opens, closes, deletes, renames, flushes, syncs, fsyncs, reads, writes). This class was based on the ReportFileOpEnv from db_bench.
This is a stepping stone PR to be able to change the SpecialEnv into a SpecialFileSystem, where several of the file varieties wish to do operation counting.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9283
Reviewed By: pdillinger
Differential Revision: D33062004
Pulled By: mrambacher
fbshipit-source-id: d0d297a7fb9c48c06cbf685e5fa755c27193b6f5
Summary:
We see:
[ RUN ] ChrootEnvWithDirectIO/EnvPosixTestWithParam.RunMany/0
env/env_test.cc:464: Failure
Expected equality of these values:
4
cur
Which is: 0
The suspicious is that the wait time is not long enough. Increase the wait time to 10s and allows earlier check.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9413
Test Plan: Run the test
Reviewed By: riversand963
Differential Revision: D33697715
fbshipit-source-id: 3d71715562a8cceb694b773276dd9e4e451a18bc
Summary:
In order to support old-style regex function registration, restored the original "Register<T>(string, Factory)" method using regular expressions. The PatternEntry methods were left in place but renamed to AddFactory. The goal is to allow for the deprecation of the original regex Registry method in an upcoming release.
Added modes to the PatternEntry kMatchZeroOrMore and kMatchAtLeastOne to match * or +, respectively (kMatchAtLeastOne was the original behavior).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9362
Reviewed By: pdillinger
Differential Revision: D33432562
Pulled By: mrambacher
fbshipit-source-id: ed88ab3f9a2ad0d525c7bd1692873f9bb3209d02
Summary:
Allows the Env to have options (Configurable) and loads like other Customizable classes.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9293
Reviewed By: pdillinger, zhichao-cao
Differential Revision: D33181591
Pulled By: mrambacher
fbshipit-source-id: 55e823886c654d214eda9eedd45ccdc54dac14d7
Summary:
* New public header unique_id.h and function GetUniqueIdFromTableProperties
which computes a universally unique identifier based on table properties
of table files from recent RocksDB versions.
* Generation of DB session IDs is refactored so that they are
guaranteed unique in the lifetime of a process running RocksDB.
(SemiStructuredUniqueIdGen, new test included.) Along with file numbers,
this enables SST unique IDs to be guaranteed unique among SSTs generated
in a single process, and "better than random" between processes.
See https://github.com/pdillinger/unique_id
* In addition to public API producing 'external' unique IDs, there is a function
for producing 'internal' unique IDs, with functions for converting between the
two. In short, the external ID is "safe" for things people might do with it, and
the internal ID enables more "power user" features for the future. Specifically,
the external ID goes through a hashing layer so that any subset of bits in the
external ID can be used as a hash of the full ID, while also preserving
uniqueness guarantees in the first 128 bits (bijective both on first 128 bits
and on full 192 bits).
Intended follow-up:
* Use the internal unique IDs in cache keys. (Avoid conflicts with https://github.com/facebook/rocksdb/issues/8912) (The file offset can be XORed into
the third 64-bit value of the unique ID.)
* Publish the external unique IDs in FileStorageInfo (https://github.com/facebook/rocksdb/issues/8968)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8990
Test Plan:
Unit tests added, and checking of unique ids in stress test.
NOTE in stress test we do not generate nearly enough files to thoroughly
stress uniqueness, but the test trims off pieces of the ID to check for
uniqueness so that we can infer (with some assumptions) stronger
properties in the aggregate.
Reviewed By: zhichao-cao, mrambacher
Differential Revision: D31582865
Pulled By: pdillinger
fbshipit-source-id: 1f620c4c86af9abe2a8d177b9ccf2ad2b9f48243
Summary:
This header file was including everything and the kitchen sink when it did not need to. This resulted in many places including this header when they needed other pieces instead.
Cleaned up this header to only include what was needed and fixed up the remaining code to include what was now missing.
Hopefully, this sort of code hygiene cleanup will speed up the builds...
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8930
Reviewed By: pdillinger
Differential Revision: D31142788
Pulled By: mrambacher
fbshipit-source-id: 6b45de3f300750c79f751f6227dece9cfd44085d
Summary:
Made SystemClock into a Customizable class, complete with CreateFromString.
Cleaned up some of the existing SystemClock implementations that were redundant (NoSleep was the same as the internal one for MockEnv).
Changed MockEnv construction to allow Clock to be passed to the Memory/MockFileSystem.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8636
Reviewed By: zhichao-cao
Differential Revision: D30483360
Pulled By: mrambacher
fbshipit-source-id: cd0e3a876c39f8c98fe13374c06e8edbd5b9f2a1
Summary:
Failure to create the lock file (e.g. out of space) could
prevent future LockFile attempts in the same process on the same file
from succeeding.
Also added DEBUG code to fail assertion if PosixFileLock is destroyed
without using UnlockFile (which is a risk because FileLock is in the
public API with virtual destructor).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8747
Test Plan: test added
Reviewed By: ajkr
Differential Revision: D30732543
Pulled By: pdillinger
fbshipit-source-id: 4c30a959566d91f778d6fad3fbbd5f3941b097c1
Summary:
Env::GenerateUniqueId() works fine on Windows and on POSIX
where /proc/sys/kernel/random/uuid exists. Our other implementation is
flawed and easily produces collision in a new multi-threaded test.
As we rely more heavily on DB session ID uniqueness, this becomes a
serious issue.
This change combines several individually suitable entropy sources
for reliable generation of random unique IDs, with goal of uniqueness
and portability, not cryptographic strength nor maximum speed.
Specifically:
* Moves code for getting UUIDs from the OS to port::GenerateRfcUuid
rather than in Env implementation details. Callers are now told whether
the operation fails or succeeds.
* Adds an internal API GenerateRawUniqueId for generating high-quality
128-bit unique identifiers, by combining entropy from three "tracks":
* Lots of info from default Env like time, process id, and hostname.
* std::random_device
* port::GenerateRfcUuid (when working)
* Built-in implementations of Env::GenerateUniqueId() will now always
produce an RFC 4122 UUID string, either from platform-specific API or
by converting the output of GenerateRawUniqueId.
DB session IDs now use GenerateRawUniqueId while DB IDs (not as
critical) try to use port::GenerateRfcUuid but fall back on
GenerateRawUniqueId with conversion to an RFC 4122 UUID.
GenerateRawUniqueId is declared and defined under env/ rather than util/
or even port/ because of the Env dependency.
Likely follow-up: enhance GenerateRawUniqueId to be faster after the
first call and to guarantee uniqueness within the lifetime of a single
process (imparting the same property onto DB session IDs).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8708
Test Plan:
A new mini-stress test in env_test checks the various public
and internal APIs for uniqueness, including each track of
GenerateRawUniqueId individually. We can't hope to verify anywhere close
to 128 bits of entropy, but it can at least detect flaws as bad as the
old code. Serial execution of the new tests takes about 350 ms on
my machine.
Reviewed By: zhichao-cao, mrambacher
Differential Revision: D30563780
Pulled By: pdillinger
fbshipit-source-id: de4c9ff4b2f581cf784fcedb5f39f16e5185c364
Summary:
Made the EncryptionProvider and BlockCipher classes inherit from Customizable. Added/fixed the CreateFromString method to these classes to create instances from builtin or registered classes. Added tests to verify that instances can be registered and retrieved as appropriate.
Added the ability to configure the builtin (CTR, ROT13) classes from configurable properties. Added the appropriate tests.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8354
Reviewed By: zhichao-cao
Differential Revision: D29558949
Pulled By: mrambacher
fbshipit-source-id: c20286b32d179777e060f51a58943e9b0cf81d04
Summary:
The two new tests added to env_test don't clear sync points, so if tests are run in continuous mode, rather than parallel mode, the next test will trigger previous sync point and fail. Fix it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8319
Test Plan: Run the tests in continuous mode which used to fail and see them passing.
Reviewed By: pdillinger
Differential Revision: D28542562
fbshipit-source-id: 4052d487635188fe68a2a9df4b03d97b23f96720
Summary:
Fix typo in comments in env_test and add PermitUncheckedError() to two statuses.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8317
Reviewed By: jay-zhuang
Differential Revision: D28525093
fbshipit-source-id: 7a1ed3e45b6f500b8d2ae19fa339c9368111e922
Summary:
Right now return codes by io_uring_submit_and_wait() and io_uring_wait_cqe() are not handled. It is not the good practice. Although these two functions are not supposed to return non-0 values in normal exeuction, people suspect that they might return non-0 value when an interruption happens, and the code might cause hanging.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8311
Test Plan: Make sure at least normal test cases still pass.
Reviewed By: anand1976
Differential Revision: D28500828
fbshipit-source-id: 8a76cea9cafbd041102e0b6a8eef9d0bfed7c211
Summary:
`strerror()` is not thread-safe, using `strerror_r()` instead. The API could be different on the different platforms, used the code from 0deef031cb/folly/String.cpp (L457)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8087
Reviewed By: mrambacher
Differential Revision: D27267151
Pulled By: jay-zhuang
fbshipit-source-id: 4b8856d1ec069d5f239b764750682c56e5be9ddb
Summary:
Add the new Append and PositionedAppend API to env WritableFile. User is able to benefit from the write checksum handoff API when using the legacy Env classes. FileSystem already implemented the checksum handoff API.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8071
Test Plan: make check, added new unit test.
Reviewed By: anand1976
Differential Revision: D27177043
Pulled By: zhichao-cao
fbshipit-source-id: 430c8331fc81099fa6d00f4fff703b68b9e8080e
Summary:
For performance purposes, the lower level routines were changed to use a SystemClock* instead of a std::shared_ptr<SystemClock>. The shared ptr has some performance degradation on certain hardware classes.
For most of the system, there is no risk of the pointer being deleted/invalid because the shared_ptr will be stored elsewhere. For example, the ImmutableDBOptions stores the Env which has a std::shared_ptr<SystemClock> in it. The SystemClock* within the ImmutableDBOptions is essentially a "short cut" to gain access to this constant resource.
There were a few classes (PeriodicWorkScheduler?) where the "short cut" property did not hold. In those cases, the shared pointer was preserved.
Using db_bench readrandom perf_level=3 on my EC2 box, this change performed as well or better than 6.17:
6.17: readrandom : 28.046 micros/op 854902 ops/sec; 61.3 MB/s (355999 of 355999 found)
6.18: readrandom : 32.615 micros/op 735306 ops/sec; 52.7 MB/s (290999 of 290999 found)
PR: readrandom : 27.500 micros/op 871909 ops/sec; 62.5 MB/s (367999 of 367999 found)
(Note that the times for 6.18 are prior to revert of the SystemClock).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8033
Reviewed By: pdillinger
Differential Revision: D27014563
Pulled By: mrambacher
fbshipit-source-id: ad0459eba03182e454391b5926bf5cdd45657b67
Summary:
Introduces and uses a SystemClock class to RocksDB. This class contains the time-related functions of an Env and these functions can be redirected from the Env to the SystemClock.
Many of the places that used an Env (Timer, PerfStepTimer, RepeatableThread, RateLimiter, WriteController) for time-related functions have been changed to use SystemClock instead. There are likely more places that can be changed, but this is a start to show what can/should be done. Over time it would be nice to migrate most (if not all) of the uses of the time functions from the Env to the SystemClock.
There are several Env classes that implement these functions. Most of these have not been converted yet to SystemClock implementations; that will come in a subsequent PR. It would be good to unify many of the Mock Timer implementations, so that they behave similarly and be tested similarly (some override Sleep, some use a MockSleep, etc).
Additionally, this change will allow new methods to be introduced to the SystemClock (like https://github.com/facebook/rocksdb/issues/7101 WaitFor) in a consistent manner across a smaller number of classes.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7858
Reviewed By: pdillinger
Differential Revision: D26006406
Pulled By: mrambacher
fbshipit-source-id: ed10a8abbdab7ff2e23d69d85bd25b3e7e899e90
Summary:
Added "no-elide-constructors to the ASSERT_STATUS_CHECK builds. This flag gives more errors/warnings for some of the Status checks where an inner class checks a Status and later returns it. In this case, without the elide check on, the returned status may not have been checked in the caller, thereby bypassing the checked code.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7798
Reviewed By: jay-zhuang
Differential Revision: D25680451
Pulled By: pdillinger
fbshipit-source-id: c3f14ed9e2a13f0a8c54d839d5fb4d1fc1e93917
Summary:
A user who extended `Logger` recently pointed out it is unusual to
require they implement the two-argument `Logv()` overload when they've
already implemented the three-argument `Logv()` overload. I agree with
that and think we can fix it by only calling the two-argument overload
from the default implementation of the three-argument overload. Then
when the three-argument overload is overridden, RocksDB would not
rely on the two-argument overload. Only `Logger::LogHeader()` needed
adjustment to achieve this.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7605
Reviewed By: riversand963
Differential Revision: D24584749
Pulled By: ajkr
fbshipit-source-id: 9aabe040ac761c4c0dbebc4be046967403ecaf21
Summary:
This test uses database functionality and required more extensive work to get it to pass than the other tests. The DB functionality required for this test now passes the check.
When it was unclear what the proper behavior was for unchecked status codes, a TODO was added.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7283
Reviewed By: akankshamahajan15
Differential Revision: D23251497
Pulled By: ajkr
fbshipit-source-id: 52b79629bdafa0a58de8ead1d1d66f141b331523
Summary:
Make (most of) the env*_test pass when ASSERT_STATUS_CHECKED is enabled.
One test that opens a database is currently disabled in this mode, as there are many errors that need revisited for DB tests and status checks.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7176
Reviewed By: cheng-chang
Differential Revision: D22799278
Pulled By: ajkr
fbshipit-source-id: 16d8a02eaeecd6df1060249b6a5811292801f2ed
Summary:
Cleans up some of the dependencies on test code in the Makefile while building tools:
- Moves the test::RandomString, DBBaseTest::RandomString into Random
- Moves the test::RandomHumanReadableString into Random
- Moves the DestroyDir method into file_utils
- Moves the SetupSyncPointsToMockDirectIO into sync_point.
- Moves the FaultInjection Env and FS classes under env
These changes allow all of the tools to build without dependencies on test_util, thereby simplifying the build dependencies. By moving the FaultInjection code, the dependency in db_stress on different libraries for debug vs release was eliminated.
Tested both release and debug builds via Make and CMake for both static and shared libraries.
More work remains to clean up how the tools are built and remove some unnecessary dependencies. There is also more work that should be done to get the Makefile and CMake to align in their builds -- what is in the libraries and the sizes of the executables are different.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7097
Reviewed By: riversand963
Differential Revision: D22463160
Pulled By: pdillinger
fbshipit-source-id: e19462b53324ab3f0b7c72459dbc73165cc382b2
Summary:
Some tests directly uses TmpDir() as temporary directory without adding any randomize factor. This would cause failures when tests run in parallel. Fix it by moving some of them to test::PerThreadDBPath()
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7030
Test Plan: Watch existing tests pass
Reviewed By: zhichao-cao
Differential Revision: D22224710
fbshipit-source-id: 28c9932fede0a4a64670e5b5fdb08f4fb5dccdd0
Summary:
It's useful to build RocksDB using a more recent clang version in CI. Add a CircleCI build and fix some issues with it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7025
Test Plan: See all tests pass.
Reviewed By: pdillinger
Differential Revision: D22215700
fbshipit-source-id: 914a729c2cd3f3ac4a627cc0ac58d4691dca2168
Summary:
`Env::LowerThreadPoolCPUPriority` takes a new parameter `CpuPriority` to be able to lower to a specific priority such as `CpuPriority::kIdle`, previously, the priority is always lowered to `CpuPriority::kLow`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6969
Test Plan: unit test `EnvPosixTest::LowerThreadPoolCpuPriority` added to `env_test.cc`.
Reviewed By: siying
Differential Revision: D22011169
Pulled By: cheng-chang
fbshipit-source-id: 568878c24a924912e35cef00c552d4a63431cdf4
Summary:
When run */RunMany/* tests individually, e.g. ChrootEnvWithDirectIO/EnvPosixTestWithParam.RunMany/0, they hang. It's because they insert to background thread pool without initializing them. Fix it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6931
Test Plan: Run ChrootEnvWithDirectIO/EnvPosixTestWithParam.RunMany/0 by itself and see it passes.
Reviewed By: riversand963
Differential Revision: D21875603
fbshipit-source-id: 7f848174c1a660254a2b1f7e11cca5370793ba30
Summary:
This reverts commit 8d87e9cea1.
Based on offline discussions, it's too early to upgrade to gtest 1.10, as it prevents some developers from using an older version of gtest to integrate to some other systems. Revert it for now.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6923
Reviewed By: pdillinger
Differential Revision: D21864799
fbshipit-source-id: d0726b1ff649fc911b9378f1763316200bd363fc
Summary:
x.size() -1 or y - 1 can overflow to an extremely large value when x.size() pr y is 0 when they are unsigned type. The end condition of i in the for loop will be extremely large, potentially causes segment fault. Fix them.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6902
Test Plan: pass make asan_check
Reviewed By: ajkr
Differential Revision: D21843767
Pulled By: zhichao-cao
fbshipit-source-id: 5b8b88155ac5a93d86246d832e89905a783bb5a1
Summary:
IsDirectory() is a common API to check whether a path is a regular file or
directory.
POSIX: call stat() and use S_ISDIR(st_mode)
Windows: PathIsDirectoryA() and PathIsDirectoryW()
HDFS: FileSystem.IsDirectory()
Java: File.IsDirectory()
...
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6711
Test Plan: make check
Reviewed By: anand1976
Differential Revision: D21053520
Pulled By: riversand963
fbshipit-source-id: 680aadfd8ce982b63689190cf31b3145d5a89e27
Summary:
The current Env/FileSystem API separation has a couple of issues -
1. It requires the user to specify 2 options - ```Options::env``` and ```Options::file_system``` - which means they have to make code changes to benefit from the new APIs. Furthermore, there is a risk of accessing the same APIs in two different ways, through Env in the old way and through FileSystem in the new way. The two may not always match, for example, if env is ```PosixEnv``` and FileSystem is a custom implementation. Any stray RocksDB calls to env will use the ```PosixEnv``` implementation rather than the file_system implementation.
2. There needs to be a simple way for the FileSystem developer to instantiate an Env for backward compatibility purposes.
This PR solves the above issues and simplifies the migration in the following ways -
1. Embed a shared_ptr to the ```FileSystem``` in the ```Env```, and remove ```Options::file_system``` as a configurable option. This way, no code changes will be required in application code to benefit from the new API. The default Env constructor uses a ```LegacyFileSystemWrapper``` as the embedded ```FileSystem```.
1a. - This also makes it more robust by ensuring that even if RocksDB
has some stray calls to Env APIs rather than FileSystem, they will go
through the same object and thus there is no risk of getting out of
sync.
2. Provide a ```NewCompositeEnv()``` API that can be used to construct a
PosixEnv with a custom FileSystem implementation. This eliminates an
indirection to call Env APIs, and relieves the FileSystem developer of
the burden of having to implement wrappers for the Env APIs.
3. Add a couple of missing FileSystem APIs - ```SanitizeEnvOptions()``` and
```NewLogger()```
Tests:
1. New unit tests
2. make check and make asan_check
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6552
Reviewed By: riversand963
Differential Revision: D20592038
Pulled By: anand1976
fbshipit-source-id: c3801ad4153f96d21d5a3ae26c92ba454d1bf1f7
Summary:
MultiRead tests in env_test cannot simulate the io_uring case when queries need to be submitted in multiple rounds. Add a new unit test to cover up more requests per MultiRead
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6452
Test Plan: Run it and see it pass when liburing is enabled or not enabled.
Differential Revision: D20078924
fbshipit-source-id: 6cff7fe345a4c5aa47135186e6181bf00df02b68
Summary:
Make kPageSize extern const size_t (used in draft https://github.com/facebook/rocksdb/issues/6427)
Make kLitteEndian constexpr bool
Clarify a couple of comments
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6443
Test Plan: make check, CI
Differential Revision: D20044558
Pulled By: pdillinger
fbshipit-source-id: e0c5cc13229c82726280dc0ddcba4078346b8418
Summary:
The logic that handles io_uring partial results was wrong. Fix the logic by putting it into a queue and continue reading.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6441
Test Plan: Make sure this patch fixes the application test case where the bug was discovered; in env_test, add a unit test that simulates partial results and make sure the results are still correct.
Differential Revision: D20018616
fbshipit-source-id: 5398a7e34d74c26d52aa69dfd604e93e95d99c62
Summary:
When dynamically linking two binaries together, different builds of RocksDB from two sources might cause errors. To provide a tool for user to solve the problem, the RocksDB namespace is changed to a flag which can be overridden in build time.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6433
Test Plan: Build release, all and jtest. Try to build with ROCKSDB_NAMESPACE with another flag.
Differential Revision: D19977691
fbshipit-source-id: aa7f2d0972e1c31d75339ac48478f34f6cfcfb3e
Summary:
The DecreaseNumBgThreads test keeps failing on Windows in AppVeyor.
It fails because it depends on a timed wait for the tasks to be dequeued from the threadpool's internal queue, but within the specified time, the task might have not been scheduled onto the newly created threads.
https://github.com/facebook/rocksdb/pull/6232 tries to fix this by waiting for longer time to let the threads scheduled.
This PR tries to fix this by replacing the timed wait with a synchronization on the task's internal conditional variable.
When the number of threads increases, instead of guessing the time needed for the task to be scheduled, it directly blocks on the conditional variable until the task starts running.
But when thread number is reduced, it still does a timed wait, but this does not lead to the flakiness now, will try to remove these timed waits in a future PR.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6393
Test Plan: Wait to see whether AppVeyor tests pass.
Differential Revision: D19890928
Pulled By: cheng-chang
fbshipit-source-id: 4e56e4addf625c98c0876e62d9d57a6f0a156f76
Summary:
Since we do not evict a file's blocks from block cache before that file
is deleted, we require a file's cache ID prefix is both unique and
non-reusable. However, the Windows functionality we were relying on only
guaranteed uniqueness. That meant a newly created file could be assigned
the same cache ID prefix as a deleted file. If the newly created file
had block offsets matching the deleted file, full cache keys could be
exactly the same, resulting in obsolete data blocks returned from cache
when trying to read from the new file.
We noticed this when running on FAT32 where compaction was writing out
of order keys due to reading obsolete blocks from its input files. The
functionality is documented as behaving the same on NTFS, although I
wasn't able to repro it there.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5844
Test Plan:
we had a reliable repro of out-of-order keys on FAT32 that
was fixed by this change
Differential Revision: D17752442
fbshipit-source-id: 95d983f9196cf415f269e19293b97341edbf7e00
Summary:
Further apply formatter to more recent commits.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5830
Test Plan: Run all existing tests.
Differential Revision: D17488031
fbshipit-source-id: 137458fd94d56dd271b8b40c522b03036943a2ab
Summary:
For our default block cache, each additional entry has extra memory overhead. It include LRUHandle (72 bytes currently) and the cache key (two varint64, file id and offset). The usage is not negligible. For example for block_size=4k, the overhead accounts for an extra 2% memory usage for the cache. The patch charging the cache for the extra usage, reducing untracked memory usage outside block cache. The feature is enabled by default and can be disabled by passing kDontChargeCacheMetadata to the cache constructor.
This PR builds up on https://github.com/facebook/rocksdb/issues/4258
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5797
Test Plan:
- Existing tests are updated to either disable the feature when the test has too much dependency on the old way of accounting the usage or increasing the cache capacity to account for the additional charge of metadata.
- The Usage tests in cache_test.cc are augmented to test the cache usage under kFullChargeCacheMetadata.
Differential Revision: D17396833
Pulled By: maysamyabandeh
fbshipit-source-id: 7684ccb9f8a40ca595e4f5efcdb03623afea0c6f
Summary:
This affects some TSAN builds:
env/env_test.cc: In member function ‘virtual void rocksdb::EnvPosixTestWithParam_MultiRead_Test::TestBody()’:
env/env_test.cc:1126:76: error: type qualifiers ignored on cast result type [-Werror=ignored-qualifiers]
auto data = NewAligned(kSectorSize * 8, static_cast<const char>(i + 1));
^
env/env_test.cc:1154:77: error: type qualifiers ignored on cast result type [-Werror=ignored-qualifiers]
auto buf = NewAligned(kSectorSize * 8, static_cast<const char>(i*2 + 1));
^
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5432
Differential Revision: D15727277
Pulled By: ltamasi
fbshipit-source-id: dc0e687b123e7c4d703ccc0c16b7167e07d1c9b0
Summary:
Define the Env:: MultiRead() method to allow callers to request multiple block reads in one shot. The underlying Env implementation can parallelize it if it chooses to in order to reduce the overall IO latency.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5311
Differential Revision: D15502172
Pulled By: anand1976
fbshipit-source-id: 2b228269c2e11b5f54694d6b2bb3119c8a8ce2b9
Summary:
This change adds a Dynamic Library class to the RocksDB Env. Dynamic libraries are populated via the Env::LoadLibrary method.
The addition of dynamic library support allows for a few different features to be developed:
1. The compression code can be changed to use dynamic library support. This would allow RocksDB to determine at run-time what compression packages were installed. This change would eliminate the need to make sure the build-time and run-time environment had the same library set. It would also simplify some of the Java build issues (where it attempts to build and include various packages inside the RocksDB jars).
2. Along with other features (to be provided in a subsequent PR), this change would allow code/configurations to be added to RocksDB at run-time. For example, the build system includes code for building an "rados" environment and adding "Cassandra" features. Instead of these extensions being built into the base RocksDB code, these extensions could be loaded at run-time as required/appropriate, either by configuration or explicitly.
We intend to push out other changes in support of the extending RocksDB at run-time via configurations.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5281
Differential Revision: D15447613
Pulled By: riversand963
fbshipit-source-id: 452cd4f54511c0bceee18f6d9d919aae9fd25fef
Summary:
Many logging related source files are under util/. It will be more structured if they are together.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5387
Differential Revision: D15579036
Pulled By: siying
fbshipit-source-id: 3850134ed50b8c0bb40a0c8ae1f184fa4081303f
Summary:
There are too many types of files under util/. Some test related files don't belong to there or just are just loosely related. Mo
ve them to a new directory test_util/, so that util/ is cleaner.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5377
Differential Revision: D15551366
Pulled By: siying
fbshipit-source-id: 0f5c8653832354ef8caa31749c0143815d719e2c