Summary:
block_based_table_reader.cc is a giant file, which makes it hard for users to navigate the code. Divide the files to multiple files.
Some class templates cannot be moved to .cc file. They are moved to .h files. It is still better than including them all in block_based_table_reader.cc.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6527
Test Plan: "make all check" and "make release". Also build using cmake.
Differential Revision: D20428455
fbshipit-source-id: ca713c698469f07f35bc0c271358c0874ed4eb28
Summary:
This patch based on https://github.com/facebook/rocksdb/issues/5932 offers a better solution to add arm64 to TravisCI matrix.
Really thank adamretter for initiating Arm CI setup.
Difference comparing to amd64:
1. For CMake, as no official arm64 release ready on Kitware page,
a third party (conda-forge) released one is used instead of
building from source. The main reason is to save CI time.
2. Explicit export JAVA_HOME on arm64
3. Disable mingw test
Signed-off-by: Yuqi Gu <yuqi.gu@arm.com>
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6436
Differential Revision: D20428505
Pulled By: pdillinger
fbshipit-source-id: 81ef02435e41480bb71710b783d85ebf452ce926
Summary:
tcc gtest runner need to know the location of the binary in order to collect coverage. We can give them the location in an environment variable.
Note that all these tests will break in tpx currently, though this is a bug in rocksdb's wrapper script, not tpx.
Reviewed By: siying
Differential Revision: D20430043
fbshipit-source-id: c77d5f70bbc28f6011c6f91906bce2ceecc2f167
Summary:
In DBLogicalBlockSizeCacheTest, do not test OpenForReadOnly in LITE mode.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6523
Test Plan: watch test for LITE mode
Differential Revision: D20420321
Pulled By: cheng-chang
fbshipit-source-id: e45bf6f2800206d6f8ce9af7308e76a08de80643
Summary:
In the `.travis.yml` file the `jdk: openjdk7` element is ignored when `language: cpp`. So whatever version of the JDK that was installed in the Travis container was used - typically JDK 11.
To ensure our RocksJava builds are working, we now instead install and use OpenJDK 8. Ideally we would use OpenJDK 7, as RocksJava supports Java 7, but many of the newer Travis containers don't support Java 7, so Java 8 is the next best thing.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6512
Differential Revision: D20388296
Pulled By: pdillinger
fbshipit-source-id: 8bbe6b59b70cfab7fe81ff63867d907fefdd2df1
Summary:
fix a few build warnings that are treated as failures with more strict MSVC warning settings
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6517
Differential Revision: D20401325
Pulled By: pdillinger
fbshipit-source-id: b44979dfaafdc7b3b8cb44a565400a99b331dd30
Summary:
Remove copy of pairs from the for range loop
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6514
Test Plan: make check
Differential Revision: D20389688
Pulled By: cheng-chang
fbshipit-source-id: 1c772091f955be33267514010f3596c61a6f46b5
Summary:
In Linux, when reopening DB with many SST files, profiling shows that 100% system cpu time spent for a couple of seconds for `GetLogicalBufferSize`. This slows down MyRocks' recovery time when site is down.
This PR introduces two new APIs:
1. `Env::RegisterDbPaths` and `Env::UnregisterDbPaths` lets `DB` tell the env when it starts or stops using its database directories . The `PosixFileSystem` takes this opportunity to set up a cache from database directories to the corresponding logical block sizes.
2. `LogicalBlockSizeCache` is defined only for OS_LINUX to cache the logical block sizes.
Other modifications:
1. rename `logical buffer size` to `logical block size` to be consistent with Linux terms.
2. declare `GetLogicalBlockSize` in `PosixHelper` to expose it to `PosixFileSystem`.
3. change the functions `IOError` and `IOStatus` in `env/io_posix.h` to have external linkage since they are used in other translation units too.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6457
Test Plan:
1. A new unit test is added for `LogicalBlockSizeCache` in `env/io_posix_test.cc`.
2. A new integration test is added for `DB` operations related to the cache in `db/db_logical_block_size_cache_test.cc`.
`make check`
Differential Revision: D20131243
Pulled By: cheng-chang
fbshipit-source-id: 3077c50f8065c0bffb544d8f49fb10bba9408d04
Summary:
When users fail to open a DB with file lock failure, it is sometimes hard for users to debug. We now include the time the lock is acquired and the thread ID that acquired the lock, to help users debug problems like this. Default Env's thread ID is used.
Since type of lockedFiles is changed, rename it to follow naming convention too.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6507
Test Plan: Add a unit test and improve an existing test to validate the case.
Differential Revision: D20378333
fbshipit-source-id: 312fe0e9733fd1d1e9969c321b90ce523cf4708a
Summary:
* **macOS version:** 10.15.2 (Catalina)
* **XCode/Clang version:** Apple clang version 11.0.0 (clang-1100.0.33.16)
Before this bugfix the error generated is:
```
In file included from ./util/compression.h:23:
./snappy-1.1.7/snappy.h:76:59: error: unknown type name 'string'; did you mean 'std::string'?
size_t Compress(const char* input, size_t input_length, string* output);
^~~~~~
std::string
/Library/Developer/CommandLineTools/usr/bin/../include/c++/v1/iosfwd:211:65: note: 'std::string' declared here
typedef basic_string<char, char_traits<char>, allocator<char> > string;
^
In file included from db/builder.cc:10:
In file included from ./db/builder.h:12:
In file included from ./db/range_tombstone_fragmenter.h:15:
In file included from ./db/pinned_iterators_manager.h:12:
In file included from ./table/internal_iterator.h:13:
In file included from ./table/format.h:25:
In file included from ./options/cf_options.h:14:
In file included from ./util/compression.h:23:
./snappy-1.1.7/snappy.h:85:19: error: unknown type name 'string'; did you mean 'std::string'?
string* uncompressed);
^~~~~~
std::string
/Library/Developer/CommandLineTools/usr/bin/../include/c++/v1/iosfwd:211:65: note: 'std::string' declared here
typedef basic_string<char, char_traits<char>, allocator<char> > string;
^
2 errors generated.
make: *** [jls/db/builder.o] Error 1
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6496
Differential Revision: D20389254
Pulled By: pdillinger
fbshipit-source-id: 2864245c8d0dba7b2ab81294241a62f2adf02e20
Summary:
This helps to diagnose errors in the CMake build where it tries to retrieve dependencies.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6511
Differential Revision: D20387392
Pulled By: pdillinger
fbshipit-source-id: 7028dfd62704bcc747f39ff864ea9c9bf51cd1be
Summary:
It's never too soon to refactor something. The patch splits the recently
introduced (`VersionEdit` related) `BlobFileState` into two classes
`BlobFileAddition` and `BlobFileGarbage`. The idea is that once blob files
are closed, they are immutable, and the only thing that changes is the
amount of garbage in them. In the new design, `BlobFileAddition` contains
the immutable attributes (currently, the count and total size of all blobs, checksum
method, and checksum value), while `BlobFileGarbage` contains the mutable
GC-related information elements (count and total size of garbage blobs). This is a
better fit for the GC logic and is more consistent with how SST files are handled.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6502
Test Plan: `make check`
Differential Revision: D20348352
Pulled By: ltamasi
fbshipit-source-id: ff93f0121e80ab15e0e0a6525ba0d6af16a0e008
Summary:
Add sequence_number_ptr to the checkpoint interface to expose the sequence number during taking the checkpoint. The number will be consistent with the seq # in rocksdb log.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5528
Test Plan: make check -j64
Reviewed By: Winger1994
Differential Revision: D16080209
fbshipit-source-id: 6dc3c7680287ee97d673c5e61f89aae1f43e33df
Summary:
Allow user to specify options.max_open_files != -1 with FIFO compaction.
If max_open_files != -1, not all table files are kept open.
In the past, FIFO style compaction requires all table files to be open in order
to read file creation time from table properties. Later, we added file creation
time to MANIFEST, making it possible to read file creation time without opening
file.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6503
Test Plan: make check
Differential Revision: D20353758
Pulled By: riversand963
fbshipit-source-id: ba5c61a648419e47e9ef6d74e0e280e3ee24f296
Summary:
Preliminary support for iterator with user timestamp. Current implementation does not consider merge operator and reverse iterator. Auto compaction is also disabled in unit tests.
Create an iterator with timestamp.
```
...
read_opts.timestamp = &ts;
auto* iter = db->NewIterator(read_opts);
// target is key without timestamp.
for (iter->Seek(target); iter->Valid(); iter->Next()) {}
for (iter->SeekToFirst(); iter->Valid(); iter->Next()) {}
delete iter;
read_opts.timestamp = &ts1;
// lower_bound and upper_bound are without timestamp.
read_opts.iterate_lower_bound = &lower_bound;
read_opts.iterate_upper_bound = &upper_bound;
auto* iter1 = db->NewIterator(read_opts);
// Do Seek or SeekToFirst()
delete iter1;
```
Test plan (dev server)
```
$make check
```
Simple benchmarking (dev server)
1. The overhead introduced by this PR even when timestamp is disabled.
key size: 16 bytes
value size: 100 bytes
Entries: 1000000
Data reside in main memory, and try to stress iterator.
Repeated three times on master and this PR.
- Seek without next
```
./db_bench -db=/dev/shm/rocksdbtest-1000 -benchmarks=fillseq,seekrandom -enable_pipelined_write=false -disable_wal=true -format_version=3
```
master: 159047.0 ops/sec
this PR: 158922.3 ops/sec (2% drop in throughput)
- Seek and next 10 times
```
./db_bench -db=/dev/shm/rocksdbtest-1000 -benchmarks=fillseq,seekrandom -enable_pipelined_write=false -disable_wal=true -format_version=3 -seek_nexts=10
```
master: 109539.3 ops/sec
this PR: 107519.7 ops/sec (2% drop in throughput)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6255
Differential Revision: D19438227
Pulled By: riversand963
fbshipit-source-id: b66b4979486f8474619f4aa6bdd88598870b0746
Summary:
In direct IO mode, RandomAccessFileReader::Read allocates an internal aligned buffer, and then copies the result into the scratch buffer. If the result is only temporarily used inside a function, there is no need to do the memcpy and just let the result Slice refer to the internally allocated buffer.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6455
Test Plan: make check
Differential Revision: D20106753
Pulled By: cheng-chang
fbshipit-source-id: 44f505843837bba47a56e3fa2c4dd3bd76486b58
Summary:
In most places in the code the variable names are spelled correctly as
COMMITTED but in a couple places not. This fixes them and ensures the
variable is always called COMMITTED everywhere.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6481
Differential Revision: D20306776
Pulled By: pdillinger
fbshipit-source-id: b6c1bfe41db559b4bc6955c530934460c07f7022
Summary:
`TruncateLastLogAfterRecoverWithoutFlush` case depends on fallocate support
of underlying file system.
On a file system which lacks of this feature, like zfs, it will fail to allocate predefined file size as this test case intends to do;
So a check block is added to detect fallocate support and skip test if not.
The related work is done by JunHe77. Thanks!
Signed-off-by: Yuqi Gu <yuqi.gu@arm.com>
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6437
Differential Revision: D20145032
Pulled By: pdillinger
fbshipit-source-id: c8b691dc508e95acfa2a004ddbc07e2faa76680d
Summary:
In CompactRange, if there is no key in memtable falling in the specified range, then flush is skipped.
This PR extends this skipping logic to SST file levels: it starts compaction from the highest level (starting from L0) that has files with key falling in the specified range, instead of always starts from L0.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6482
Test Plan:
A new test ManualCompactionTest::SkipLevel is added.
Also updated a test related to statistics of index block cache hit in db_test2, the index cache hit is increased by 1 in this PR because when checking overlap for the key range in L0, OverlapWithLevelIterator will do a seek in the table cache iterator, which will read from the cached index.
Also updated db_compaction_test and db_test to use correct range for full compaction.
Differential Revision: D20251149
Pulled By: cheng-chang
fbshipit-source-id: f822157cf4796972bd5035d9d7178d8dfb7af08b
Summary:
In the current code base, we can use FaultInjectionTestEnv to simulate the env issue such as file write/read errors, which are used in most of the test. The PR https://github.com/facebook/rocksdb/issues/5761 introduce the File System as a new Env API. This PR implement the FaultInjectionTestFS, which can be used to simulate when File System has issues such as IO error. user can specify any IOStatus error as input, such that FS corresponding actions will return certain error to the caller.
A set of ErrorHandlerFSTests are introduced for testing
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6414
Test Plan: pass make asan_check, pass error_handler_fs_test.
Differential Revision: D20252421
Pulled By: zhichao-cao
fbshipit-source-id: e922038f8ce7e6d1da329fd0bba7283c4b779a21
Summary:
Check for sys/auxv.h and getauxval before using them as they are not
always available (for example on uclibc)
Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com>
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6359
Differential Revision: D20239797
fbshipit-source-id: 175a098094d81545628c2372e7c388e70a32fd48
Summary:
this silences following warning from clang-11
```
rocksdb/db/db_impl/db_impl_compaction_flush.cc:1040:21: warning: loop variable 'newf' of type 'const std::pair<int, rocksdb::FileMetaData>' creates a copy from type 'const
std::pair<int\
, rocksdb::FileMetaData>' [-Wrange-loop-analysis]
for (const auto newf : c->edit()->GetNewFiles()) {
^
rocksdb/db/db_impl/db_impl_compaction_flush.cc:1040:10: note: use reference type 'const std::pair<int, rocksdb::FileMetaData> &' to prevent copying
for (const auto newf : c->edit()->GetNewFiles()) {
^~~~~~~~~~~~~~~~~
&
```
Signed-off-by: Kefu Chai <tchaikov@gmail.com>
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6477
Differential Revision: D20211850
Pulled By: ltamasi
fbshipit-source-id: 3e89e13a12bba79f1b934d46b7c4c0576cdafb01
Summary:
When DBImpl::GetCreationTimeOfOldestFile() calls Version::GetCreationTimeOfOldestFile(), the version is not directly or indirectly referenced, so an event like compaction can race with the operation and cause DBImpl::GetCreationTimeOfOldestFile() to access delocated data. This was caught by an ASAN run:
==268==ERROR: AddressSanitizer: heap-use-after-free on address 0x612000b7d198 at pc 0x000018332913 bp 0x7f391510d310 sp 0x7f391510d308
READ of size 8 at 0x612000b7d198 thread T845 (store_load-33)
SCARINESS: 51 (8-byte-read-heap-use-after-free)
#0 0x18332912 in rocksdb::Version::GetCreationTimeOfOldestFile(unsigned long*) rocksdb/src/db/version_set.cc:1488
https://github.com/facebook/rocksdb/issues/1 0x1803ddaa in rocksdb::DBImpl::GetCreationTimeOfOldestFile(unsigned long*) rocksdb/src/db/db_impl/db_impl.cc:4499
https://github.com/facebook/rocksdb/issues/2 0xe24ca09 in rocksdb::StackableDB::GetCreationTimeOfOldestFile(unsigned long*) rocksdb/utilities/stackable_db.h:392
......
0x612000b7d198 is located 216 bytes inside of 296-byte region [0x612000b7d0c0,0x612000b7d1e8)
freed by thread T28 here:
......
https://github.com/facebook/rocksdb/issues/5 0x1832c73f in std::vector<rocksdb::FileMetaData*, std::allocator<rocksdb::FileMetaData*> >::~vector() third-party-buck/platform007/build/libgcc/include/c++/trunk/bits/stl_vector.h:435
https://github.com/facebook/rocksdb/issues/6 0x1832c73f in rocksdb::VersionStorageInfo::~VersionStorageInfo() rocksdb/src/db/version_set.cc:734
https://github.com/facebook/rocksdb/issues/7 0x1832cf42 in rocksdb::Version::~Version() rocksdb/src/db/version_set.cc:758
https://github.com/facebook/rocksdb/issues/8 0x9d1bb5 in rocksdb::Version::Unref() rocksdb/src/db/version_set.cc:2869
https://github.com/facebook/rocksdb/issues/9 0x183e7631 in rocksdb::Compaction::~Compaction() rocksdb/src/db/compaction/compaction.cc:275
https://github.com/facebook/rocksdb/issues/10 0x9e6de6 in std::default_delete<rocksdb::Compaction>::operator()(rocksdb::Compaction*) const third-party-buck/platform007/build/libgcc/include/c++/trunk/bits/unique_ptr.h:78
https://github.com/facebook/rocksdb/issues/11 0x9e6de6 in std::unique_ptr<rocksdb::Compaction, std::default_delete<rocksdb::Compaction> >::reset(rocksdb::Compaction*) third-party-buck/platform007/build/libgcc/include/c++/trunk/bits/unique_ptr.h:376
https://github.com/facebook/rocksdb/issues/12 0x9e6de6 in rocksdb::DBImpl::BackgroundCompaction(bool*, rocksdb::JobContext*, rocksdb::LogBuffer*, rocksdb::DBImpl::PrepickedCompaction*, rocksdb::Env::Priority) rocksdb/src/db/db_impl/db_impl_compaction_flush.cc:2826
https://github.com/facebook/rocksdb/issues/13 0x9ac3b8 in rocksdb::DBImpl::BackgroundCallCompaction(rocksdb::DBImpl::PrepickedCompaction*, rocksdb::Env::Priority) rocksdb/src/db/db_impl/db_impl_compaction_flush.cc:2320
https://github.com/facebook/rocksdb/issues/14 0x9abff7 in rocksdb::DBImpl::BGWorkCompaction(void*) rocksdb/src/db/db_impl/db_impl_compaction_flush.cc:2096
......
Fix the issue by reference the super version and use the referenced version from it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6473
Test Plan: Run ASAN for all existing tests.
Differential Revision: D20196416
fbshipit-source-id: 5f4a7918110fc7b8dd7841932d376bc9d1e59d6f
Summary:
In the current code base, we can use Directory from Env to manage directory (e.g, Fsync()). The PR https://github.com/facebook/rocksdb/issues/5761 introduce the File System as a new Env API. So we further replace the Directory class in DB with FSDirectory such that we can have more IO information from IOStatus returned by FSDirectory.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6468
Test Plan: pass make asan_check
Differential Revision: D20195261
Pulled By: zhichao-cao
fbshipit-source-id: 93962cb9436852bfcfb76e086d9e7babd461cbe1
Summary:
Added new Get() methods that return timestamp. Dummy implementation is given so that classes derived from DB don't need to be touched to provide their implementation. MultiGet is not included.
ReadRandom perf test (10 minutes) on the same development machine ram drive with the same DB data shows no regression (within marge of error). The test is adapted from https://github.com/facebook/rocksdb/wiki/RocksDB-In-Memory-Workload-Performance-Benchmarks.
base line (commit 72ee067b9):
101.712 micros/op 314602 ops/sec; 36.0 MB/s (5658999 of 5658999 found)
This PR:
100.288 micros/op 319071 ops/sec; 36.5 MB/s (5674999 of 5674999 found)
./db_bench --db=r:\rocksdb.github --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --cache_size=2147483648 --cache_numshardbits=6 --compression_type=none --compression_ratio=1 --min_level_to_compress=-1 --disable_seek_compaction=1 --hard_rate_limit=2 --write_buffer_size=134217728 --max_write_buffer_number=2 --level0_file_num_compaction_trigger=8 --target_file_size_base=134217728 --max_bytes_for_level_base=1073741824 --disable_wal=0 --wal_dir=r:\rocksdb.github\WAL_LOG --sync=0 --verify_checksum=1 --delete_obsolete_files_period_micros=314572800 --max_background_compactions=4 --max_background_flushes=0 --level0_slowdown_writes_trigger=16 --level0_stop_writes_trigger=24 --statistics=0 --stats_per_interval=0 --stats_interval=1048576 --histogram=0 --use_plain_table=1 --open_files=-1 --mmap_read=1 --mmap_write=0 --memtablerep=prefix_hash --bloom_bits=10 --bloom_locality=1 --duration=600 --benchmarks=readrandom --use_existing_db=1 --num=25000000 --threads=32
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6409
Differential Revision: D20200086
Pulled By: riversand963
fbshipit-source-id: 490edd74d924f62bd8ae9c29c2a6bbbb8410ca50
Summary:
Both clangd and cquery-language-server requires a compile_commands.json file to
index the project. This file can be ignored by git.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6472
Differential Revision: D20194899
Pulled By: riversand963
fbshipit-source-id: ea1587f2e5d10b7591147073b61efe262a1cf747
Summary:
Some combinatino of --index_with_first_key and --index_shortening_mode can signifcantly improve performance for large values. Expose them in db_bench.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5859
Test Plan: Run them with the new options and observe the behavior.
Differential Revision: D20104434
fbshipit-source-id: 21d48a732a9caf20b82312c7d7557d747ea3c304
Summary:
MultiRead tests in env_test cannot simulate the io_uring case when queries need to be submitted in multiple rounds. Add a new unit test to cover up more requests per MultiRead
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6452
Test Plan: Run it and see it pass when liburing is enabled or not enabled.
Differential Revision: D20078924
fbshipit-source-id: 6cff7fe345a4c5aa47135186e6181bf00df02b68
Summary:
Today `WriteUnpreparedTxn::RollbackInternal` will write the rollback batch assuming that there is only a single subbatch. However, because untracked_keys_ are currently not deduplicated, it's possible for duplicate keys to exist, and thus split the batch. Also, tracked_keys_ also does not support compators outside of the bytewise comparators, so it's possible for duplicates to occur there as well.
To solve this, just pass in the correct subbatch count.
Also, removed `WriteUnpreparedRollbackPreReleaseCallback` to unify the Commit/Rollback codepaths some more.
Also, fixed a bug in `CommitInternal` where if 1. two_write_queue is true and 2. include_data is true, then `WriteUnpreparedCommitEntryPreReleaseCallback` ends up calling `AddCommitted` on the commit time write batch a second time on the second write. To fix, `WriteUnpreparedCommitEntryPreReleaseCallback` is re-initialized.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6463
Differential Revision: D20150153
Pulled By: lth
fbshipit-source-id: df0b42d39406c75af73df995aa1138f0db539cd1
Summary:
Threw assert error at assert(isOwningHandle()) in ColumnFamilyHandle.getDescriptor(),
because default CF don't own a handle, due to [RocksDB.getDefaultColumnFamily()](3a408eeae9/java/src/main/java/org/rocksdb/RocksDB.java (L3702)) called cfHandle.disOwnNativeHandle().
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6006
Differential Revision: D19031448
fbshipit-source-id: 2420c45e835bda0e552e919b1b63708472b91538
Summary:
If file is not mmaped, CuckooTableReader should not try to read table properties from the file.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6453
Test Plan: Added a new unit test
Differential Revision: D20103334
Pulled By: cheng-chang
fbshipit-source-id: 48539f14d93f6c1ebe12c3df5a14719e9d7b8726
Summary:
Original author: jeffrey-xiao
If we are writing a global seqno for an ingested file, the range
tombstone metablock gets accessed and put into the cache during
ingestion preparation. At the time, the global seqno of the ingested
file has not yet been determined, so the cached block will not have a
global seqno. When the file is ingested and we read its range tombstone
metablock, it will be returned from the cache with no global seqno. In
that case, we use the actual seqnos stored in the range tombstones,
which are all zero, so the tombstones cover nothing.
This commit removes global_seqno_ variable from Block. When iterating
over a block, the global seqno for the block is determined by the
iterator instead of storing this mutable attribute in Block.
Additionally, this commit adds a regression test to check that keys are
deleted when ingesting a file with a global seqno and range deletion
tombstones.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6429
Differential Revision: D19961563
Pulled By: ajkr
fbshipit-source-id: 5cf777397fa3e452401f0bf0364b0750492487b7
Summary:
BlobDB currently does not keep track of blob files: no records are written to
the manifest when a blob file is added or removed, and upon opening a database,
the list of blob files is populated simply based on the contents of the blob directory.
This means that lost blob files cannot be detected at the moment. We plan to solve
this issue by making blob files a part of `Version`; as a first step, this patch makes
it possible to store information about blob files in `VersionEdit`. Currently, this information
includes blob file number, total number and size of all blobs, and total number and size
of garbage blobs. However, the format is extensible: new fields can be added in
both a forward compatible and a forward incompatible manner if needed (similarly
to `kNewFile4`).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6416
Test Plan: `make check`
Differential Revision: D19894234
Pulled By: ltamasi
fbshipit-source-id: f9753e1f2aedf6dadb70c09b345207cb9c58c329
Summary:
The known bug of liburing has been fixed. Now we can re-enable liburing under Linux
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6451
Test Plan: Watch internal CI
Differential Revision: D20079009
fbshipit-source-id: 04a6f53a900ff721f9a62a188cf906771b5d68d2
Summary:
Make kPageSize extern const size_t (used in draft https://github.com/facebook/rocksdb/issues/6427)
Make kLitteEndian constexpr bool
Clarify a couple of comments
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6443
Test Plan: make check, CI
Differential Revision: D20044558
Pulled By: pdillinger
fbshipit-source-id: e0c5cc13229c82726280dc0ddcba4078346b8418
Summary:
The logic that handles io_uring partial results was wrong. Fix the logic by putting it into a queue and continue reading.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6441
Test Plan: Make sure this patch fixes the application test case where the bug was discovered; in env_test, add a unit test that simulates partial results and make sure the results are still correct.
Differential Revision: D20018616
fbshipit-source-id: 5398a7e34d74c26d52aa69dfd604e93e95d99c62
Summary:
Cleanup some code without any real change in functionality.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6440
Differential Revision: D20015891
Pulled By: riversand963
fbshipit-source-id: 33e18754b0f002006a6d4805e9aaf84c0c8ad25a
Summary:
Adding a C API function to set `row_cache` on `rocksdb_options_t` as this functionality is missing.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6442
Differential Revision: D20036813
Pulled By: riversand963
fbshipit-source-id: c1fa95ea343345fbc1e57961d0d048e0e79be373
Summary:
Currently, a new MANIFEST file is assigned a new file number when 1) no
MANIFEST is open, or 2) current MANIFEST file size exceeds a threshold. This is
not sufficient. There are cases when the caller explicitly specifies that a new
MANIFEST be created. For example, if user sets options.write_dbid_to_manifest = true,
and there are WAL files, then RocksDB will run into an issue during recovery.
`DBImpl::Recover()` will call `LogAndApply()` to write dbid. At this point, the db being
recovered creates a new MANIFEST, say, MANIFEST-000003. Since there are WALs,
`DBImpl::RecoverLogFiles` will be called. Towards the end of this function, we call
`LogAndApply(new_descriptor_log=true)`, which explicitly creates a new MANIFEST.
However, the manifest_file_number is wrong before this fix. Consequently, RocksDB
opens an existing, non-empty file for append, effectively truncating the file to zero.
If a crash occurs, then there will be data loss.
Test Plan (devserver):
make check
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6426
Test Plan: make check
Differential Revision: D19951866
Pulled By: riversand963
fbshipit-source-id: 4b1b9fc28d4fe2ac12764b388ef9e61f05e766da