Summary:
BlockBasedTableOptions.hash_index_allow_collision is already deprecated and has no effect. Delete it for preparing 7.0 release.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9454
Test Plan: Run all existing tests.
Reviewed By: ajkr
Differential Revision: D33805827
fbshipit-source-id: ed8a436d1d083173ec6aef2a762ba02e1eefdc9d
Summary:
Fix g++ -march=native detection and reenable s390x in travis
This PR fixes s390x assembler messages:
```
Error: invalid switch -march=z14
Error: unrecognized option -march=z14
```
The s390x travis build was failing with gcc-7 because the assembler on
ubuntu 16.04 is too old to recognize the z14 model so it doesn't work
with -march=native on a z14 machine. It fixes the check for the
-march=native flag so that the assembler will get called and correctly
fail on ubuntu 16.04 which will cause the build to fall back to
-march=z196 which works.
The other changes are needed so builds work more consistently on
s390x:
1. Set make parallelism to 1 for s390x: The default was 4 previously
but I saw frequent internal compiler errors on travis probably due to
low resources. The `platform_dependent` job works more consistently
but is roughly 10 minutes slower although it varies.
2. Remove status_checked jobs, as we are relying on CircleCI for
these now and do not really need platform coverage on them.
Fixes https://github.com/facebook/rocksdb/issues/9524
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9631
Test Plan: CI
Reviewed By: ajkr
Differential Revision: D34553989
Pulled By: pdillinger
fbshipit-source-id: a6e3a7276446721c4c0bebc4ed217c2ca2b53f11
Summary:
Bumps [nokogiri](https://github.com/sparklemotion/nokogiri) from 1.12.5 to 1.13.3.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/sparklemotion/nokogiri/releases">nokogiri's releases</a>.</em></p>
<blockquote>
<h2>1.13.3 / 2022-02-21</h2>
<h3>Fixed</h3>
<ul>
<li>[CRuby] Revert a HTML4 parser bug in libxml 2.9.13 (introduced in Nokogiri v1.13.2). The bug causes libxml2's HTML4 parser to fail to recover when encountering a bare <code><</code> character in some contexts. This version of Nokogiri restores the earlier behavior, which is to recover from the parse error and treat the <code><</code> as normal character data (which will be serialized as <code>&lt;</code> in a text node). The bug (and the fix) is only relevant when the <code>RECOVER</code> parse option is set, as it is by default. [<a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2461">https://github.com/facebook/rocksdb/issues/2461</a>]</li>
</ul>
<hr />
<p>SHA256 checksums:</p>
<pre><code>025a4e333f6f903072a919f5f75b03a8f70e4969dab4280375b73f9d8ff8d2c0 nokogiri-1.13.3-aarch64-linux.gem
b9cb59c6a6da8cf4dbee5dbb569c7cc95a6741392e69053544e0f40b15ab9ad5 nokogiri-1.13.3-arm64-darwin.gem
e55d18cee64c19d51d35ad80634e465dbcdd46ac4233cb42c1e410307244ebae nokogiri-1.13.3-java.gem
53e2d68116cd00a873406b8bdb90c78a6f10e00df7ddf917a639ac137719b67b nokogiri-1.13.3-x64-mingw-ucrt.gem
b5f39ebb662a1be7d1c61f8f0a2a683f1bb11690a6f00a99a1aa23a071f80145 nokogiri-1.13.3-x64-mingw32.gem
7c0de5863aace4bbbc73c4766cf084d1f0b7a495591e46d1666200cede404432 nokogiri-1.13.3-x86-linux.gem
675cc3e7d7cca0d6790047a062cd3aa3eab59e3cb9b19374c34f98bade588c66 nokogiri-1.13.3-x86-mingw32.gem
f445596a5a76941a9d1980747535ab50d3399d1b46c32989bc26b7dd988ee498 nokogiri-1.13.3-x86_64-darwin.gem
3f6340661c2a283b337d227ea224f859623775b2f5c09a6bf197b786563958df nokogiri-1.13.3-x86_64-linux.gem
bf1b1bceff910abb0b7ad825535951101a0361b859c2ad1be155c010081ecbdc nokogiri-1.13.3.gem
</code></pre>
<h2>1.13.2 / 2022-02-21</h2>
<h3>Security</h3>
<ul>
<li>[CRuby] Vendored libxml2 is updated from 2.9.12 to 2.9.13. This update addresses <a href="https://nvd.nist.gov/vuln/detail/CVE-2022-23308">CVE-2022-23308</a>.</li>
<li>[CRuby] Vendored libxslt is updated from 1.1.34 to 1.1.35. This update addresses <a href="https://nvd.nist.gov/vuln/detail/CVE-2021-30560">CVE-2021-30560</a>.</li>
</ul>
<p>Please see <a href="https://github.com/sparklemotion/nokogiri/security/advisories/GHSA-fq42-c5rg-92c2">GHSA-fq42-c5rg-92c2</a> for more information about these CVEs.</p>
<h3>Dependencies</h3>
<ul>
<li>[CRuby] Vendored libxml2 is updated from 2.9.12 to 2.9.13. Full changelog is available at <a href="https://download.gnome.org/sources/libxml2/2.9/libxml2-2.9.13.news">https://download.gnome.org/sources/libxml2/2.9/libxml2-2.9.13.news</a></li>
<li>[CRuby] Vendored libxslt is updated from 1.1.34 to 1.1.35. Full changelog is available at <a href="https://download.gnome.org/sources/libxslt/1.1/libxslt-1.1.35.news">https://download.gnome.org/sources/libxslt/1.1/libxslt-1.1.35.news</a></li>
</ul>
<hr />
<p>SHA256 checksums:</p>
<pre><code>63469a9bb56a21c62fbaea58d15f54f8f167ff6fde51c5c2262072f939926fdd nokogiri-1.13.2-aarch64-linux.gem
2986617f982f645c06f22515b721e6d2613dd69493e5c41ddd03c4830c3b3065 nokogiri-1.13.2-arm64-darwin.gem
aca1d66206740b29d0d586b1d049116adcb31e6cdd7c4dd3a96eb77da215a0c4 nokogiri-1.13.2-java.gem
b9e4eea1a200d9a927a5bc7d662c427e128779cba0098ea49ddbdb3ffc3ddaec nokogiri-1.13.2-x64-mingw-ucrt.gem
48d5493fec495867c5516a908a068c1387a1d17c5aeca6a1c98c089d9d9fdcf8 nokogiri-1.13.2-x64-mingw32.gem
62034d7aaaa83fbfcb8876273cc5551489396841a66230d3200b67919ef76cf9 nokogiri-1.13.2-x86-linux.gem
e07237b82394017c2bfec73c637317ee7dbfb56e92546151666abec551e46d1d nokogiri-1.13.2-x86-mingw32.gem
</tr></table>
</code></pre>
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a href="https://github.com/sparklemotion/nokogiri/blob/main/CHANGELOG.md">nokogiri's changelog</a>.</em></p>
<blockquote>
<h2>1.13.3 / 2022-02-21</h2>
<h3>Fixed</h3>
<ul>
<li>[CRuby] Revert a HTML4 parser bug in libxml 2.9.13 (introduced in Nokogiri v1.13.2). The bug causes libxml2's HTML4 parser to fail to recover when encountering a bare <code><</code> character in some contexts. This version of Nokogiri restores the earlier behavior, which is to recover from the parse error and treat the <code><</code> as normal character data (which will be serialized as <code>&lt;</code> in a text node). The bug (and the fix) is only relevant when the <code>RECOVER</code> parse option is set, as it is by default. [<a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2461">https://github.com/facebook/rocksdb/issues/2461</a>]</li>
</ul>
<h2>1.13.2 / 2022-02-21</h2>
<h3>Security</h3>
<ul>
<li>[CRuby] Vendored libxml2 is updated from 2.9.12 to 2.9.13. This update addresses <a href="https://nvd.nist.gov/vuln/detail/CVE-2022-23308">CVE-2022-23308</a>.</li>
<li>[CRuby] Vendored libxslt is updated from 1.1.34 to 1.1.35. This update addresses <a href="https://nvd.nist.gov/vuln/detail/CVE-2021-30560">CVE-2021-30560</a>.</li>
</ul>
<p>Please see <a href="https://github.com/sparklemotion/nokogiri/security/advisories/GHSA-fq42-c5rg-92c2">GHSA-fq42-c5rg-92c2</a> for more information about these CVEs.</p>
<h3>Dependencies</h3>
<ul>
<li>[CRuby] Vendored libxml2 is updated from 2.9.12 to 2.9.13. Full changelog is available at <a href="https://download.gnome.org/sources/libxml2/2.9/libxml2-2.9.13.news">https://download.gnome.org/sources/libxml2/2.9/libxml2-2.9.13.news</a></li>
<li>[CRuby] Vendored libxslt is updated from 1.1.34 to 1.1.35. Full changelog is available at <a href="https://download.gnome.org/sources/libxslt/1.1/libxslt-1.1.35.news">https://download.gnome.org/sources/libxslt/1.1/libxslt-1.1.35.news</a></li>
</ul>
<h2>1.13.1 / 2022-01-13</h2>
<h3>Fixed</h3>
<ul>
<li>Fix <code>Nokogiri::XSLT.quote_params</code> regression in v1.13.0 that raised an exception when non-string stylesheet parameters were passed. Non-string parameters (e.g., integers and symbols) are now explicitly supported and both keys and values will be stringified with <code>#to_s</code>. [<a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2418">https://github.com/facebook/rocksdb/issues/2418</a>]</li>
<li>Fix CSS selector query regression in v1.13.0 that raised an <code>Nokogiri::XML::XPath::SyntaxError</code> when parsing XPath attributes mixed into the CSS query. Although this mash-up of XPath and CSS syntax previously worked unintentionally, it is now an officially supported feature and is documented as such. [<a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2419">https://github.com/facebook/rocksdb/issues/2419</a>]</li>
</ul>
<h2>1.13.0 / 2022-01-06</h2>
<h3>Notes</h3>
<h4>Ruby</h4>
<p>This release introduces native gem support for Ruby 3.1. Please note that Windows users should use the <code>x64-mingw-ucrt</code> platform gem for Ruby 3.1, and <code>x64-mingw32</code> for Ruby 2.6–3.0 (see <a href="https://rubyinstaller.org/2021/12/31/rubyinstaller-3.1.0-1-released.html">RubyInstaller 3.1.0 release notes</a>).</p>
<p>This release ends support for:</p>
<ul>
<li>Ruby 2.5, for which <a href="https://www.ruby-lang.org/en/downloads/branches/">official support ended 2021-03-31</a>.</li>
<li>JRuby 9.2, which is a Ruby 2.5-compatible release.</li>
</ul>
<h4>Faster, more reliable installation: Native Gem for ARM64 Linux</h4>
<p>This version of Nokogiri ships experimental native gem support for the <code>aarch64-linux</code> platform, which should support AWS Graviton and other ARM Linux platforms. We don't yet have CI running for this platform, and so we're interested in hearing back from y'all whether this is working, and what problems you're seeing. Please send us feedback here: <a href="https://github.com/sparklemotion/nokogiri/discussions/2359">Feedback: Have you used the <code>aarch64-linux</code> native gem?</a></p>
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="7d74cedf27"><code>7d74ced</code></a> version bump to v1.13.3</li>
<li><a href="5970fd95c8"><code>5970fd9</code></a> fix: revert libxml2 regression with HTML4 recovery</li>
<li><a href="49b86631b7"><code>49b8663</code></a> version bump to v1.13.2</li>
<li><a href="4729133787"><code>4729133</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2457">https://github.com/facebook/rocksdb/issues/2457</a> from sparklemotion/flavorjones-libxml-2.9.13-v1.13.x</li>
<li><a href="379f757ef5"><code>379f757</code></a> dev(package): work around gnome mirrors with expired certs</li>
<li><a href="95cf66ca9f"><code>95cf66c</code></a> dep: upgrade libxml2 2.9.12 → 2.9.13</li>
<li><a href="d37dd02ea5"><code>d37dd02</code></a> dep: upgrade libxslt 1.1.34 → 1.1.35</li>
<li><a href="59a93986ec"><code>59a9398</code></a> dep: upgrade mini_portile 2.7 to 2.8</li>
<li><a href="e885463285"><code>e885463</code></a> dev(package): handle either .tar.gz or .tar.xz archive names</li>
<li><a href="7957c7b009"><code>7957c7b</code></a> style: rubocop</li>
<li>Additional commits viewable in <a href="https://github.com/sparklemotion/nokogiri/compare/v1.12.5...v1.13.3">compare view</a></li>
</ul>
</details>
<br />
[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=nokogiri&package-manager=bundler&previous-version=1.12.5&new-version=1.13.3)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `dependabot rebase` will rebase this PR
- `dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `dependabot merge` will merge this PR after your CI passes on it
- `dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `dependabot cancel merge` will cancel a previously requested merge and block automerging
- `dependabot reopen` will reopen this PR if it is closed
- `dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
- `dependabot use these labels` will set the current labels as the default for future PRs for this repo and language
- `dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language
- `dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language
- `dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language
You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/facebook/rocksdb/network/alerts).
</details>
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9636
Reviewed By: anand1976
Differential Revision: D34556272
Pulled By: jay-zhuang
fbshipit-source-id: 76aa7e92ca3fcf5d34c53091b94bfe5b0af7b55d
Summary:
We should use the released clang-format version instead of the one from
dev branch. Otherwise the format report could be inconsistent with local
development env and CI. e.g.: https://github.com/facebook/rocksdb/issues/9644
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9646
Test Plan: CI
Reviewed By: riversand963
Differential Revision: D34554065
Pulled By: jay-zhuang
fbshipit-source-id: b841bc400becb4272be18c803eb03a7a1172da6f
Summary:
Related to: https://github.com/facebook/rocksdb/pull/9215
* Adds build_detect_platform support for RISCV on Linux (at least on SiFive Unmatched platforms)
This still leaves some linking issues on RISCV remaining (e.g. when building `db_test`):
```
/usr/bin/ld: ./librocksdb_debug.a(memtable.o): in function `__gnu_cxx::new_allocator<char>::deallocate(char*, unsigned long)':
/usr/include/c++/10/ext/new_allocator.h:133: undefined reference to `__atomic_compare_exchange_1'
/usr/bin/ld: ./librocksdb_debug.a(memtable.o): in function `std::__atomic_base<bool>::compare_exchange_weak(bool&, bool, std::memory_order, std::memory_order)':
/usr/include/c++/10/bits/atomic_base.h:464: undefined reference to `__atomic_compare_exchange_1'
/usr/bin/ld: /usr/include/c++/10/bits/atomic_base.h:464: undefined reference to `__atomic_compare_exchange_1'
/usr/bin/ld: /usr/include/c++/10/bits/atomic_base.h:464: undefined reference to `__atomic_compare_exchange_1'
/usr/bin/ld: /usr/include/c++/10/bits/atomic_base.h:464: undefined reference to `__atomic_compare_exchange_1'
/usr/bin/ld: ./librocksdb_debug.a(memtable.o):/usr/include/c++/10/bits/atomic_base.h:464: more undefined references to `__atomic_compare_exchange_1' follow
/usr/bin/ld: ./librocksdb_debug.a(db_impl.o): in function `rocksdb::DBImpl::NewIteratorImpl(rocksdb::ReadOptions const&, rocksdb::ColumnFamilyData*, unsigned long, rocksdb::ReadCallback*, bool, bool)':
/home/adamretter/rocksdb/db/db_impl/db_impl.cc:3019: undefined reference to `__atomic_exchange_1'
/usr/bin/ld: ./librocksdb_debug.a(write_thread.o): in function `rocksdb::WriteThread::Writer::CreateMutex()':
/home/adamretter/rocksdb/./db/write_thread.h:205: undefined reference to `__atomic_compare_exchange_1'
/usr/bin/ld: ./librocksdb_debug.a(write_thread.o): in function `rocksdb::WriteThread::SetState(rocksdb::WriteThread::Writer*, unsigned char)':
/home/adamretter/rocksdb/db/write_thread.cc:222: undefined reference to `__atomic_compare_exchange_1'
collect2: error: ld returned 1 exit status
make: *** [Makefile:1449: db_test] Error 1
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9366
Reviewed By: jay-zhuang
Differential Revision: D34377664
Pulled By: mrambacher
fbshipit-source-id: c86f9d0cd1cb0c18de72b06f1bf5847f23f51118
Summary:
In crash test with fault injection, we were seeing stack traces like the following:
```
https://github.com/facebook/rocksdb/issues/3 0x00007f75f763c533 in __GI___assert_fail (assertion=assertion@entry=0x1c5b2a0 "end_offset >= start_offset", file=file@entry=0x1c580a0 "table/block_based/block_based_table_reader.cc", line=line@entry=3245,
function=function@entry=0x1c60e60 "virtual uint64_t rocksdb::BlockBasedTable::ApproximateSize(const rocksdb::Slice&, const rocksdb::Slice&, rocksdb::TableReaderCaller)") at assert.c:101
https://github.com/facebook/rocksdb/issues/4 0x00000000010ea9b4 in rocksdb::BlockBasedTable::ApproximateSize (this=<optimized out>, start=..., end=..., caller=<optimized out>) at table/block_based/block_based_table_reader.cc:3224
https://github.com/facebook/rocksdb/issues/5 0x0000000000be61fb in rocksdb::TableCache::ApproximateSize (this=0x60f0000161b0, start=..., end=..., fd=..., caller=caller@entry=rocksdb::kCompaction, internal_comparator=..., prefix_extractor=...) at db/table_cache.cc:719
https://github.com/facebook/rocksdb/issues/6 0x0000000000c3eaec in rocksdb::VersionSet::ApproximateSize (this=<optimized out>, v=<optimized out>, f=..., start=..., end=..., caller=<optimized out>) at ./db/version_set.h:850
https://github.com/facebook/rocksdb/issues/7 0x0000000000c6ebc3 in rocksdb::VersionSet::ApproximateSize (this=<optimized out>, options=..., v=v@entry=0x621000047500, start=..., end=..., start_level=start_level@entry=0, end_level=<optimized out>, caller=<optimized out>)
at db/version_set.cc:5657
https://github.com/facebook/rocksdb/issues/8 0x000000000166e894 in rocksdb::CompactionJob::GenSubcompactionBoundaries (this=<optimized out>) at ./include/rocksdb/options.h:1869
https://github.com/facebook/rocksdb/issues/9 0x000000000168c526 in rocksdb::CompactionJob::Prepare (this=this@entry=0x7f75f3ffcf00) at db/compaction/compaction_job.cc:546
```
The problem occurred in `ApproximateSize()` when the index `Seek()` for the first `ApproximateDataOffsetOf()` encountered an I/O error, while the second `Seek()` did not. In the old code that scenario caused `start_offset == data_size` , thus it was easy to trip the assertion that `end_offset >= start_offset`.
The fix is to set `start_offset == 0` when the first index `Seek()` fails, and `end_offset == data_size` when the second index `Seek()` fails. I doubt these give an "on average correct" answer for how this function is used, but I/O errors in index seeks are hopefully rare, it looked consistent with what was already there, and it was easier to calculate.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9615
Test Plan:
run the repro command for a while and stopped seeing coredumps -
```
$ while ! ./db_stress --block_size=128 --cache_size=32768 --clear_column_family_one_in=0 --column_families=1 --continuous_verification_interval=0 --db=/dev/shm/rocksdb_crashtest --delpercent=4 --delrangepercent=1 --destroy_db_initially=0 --expected_values_dir=/dev/shm/rocksdb_crashtest_expected --index_type=2 --iterpercent=10 --kill_random_test=18887 --max_key=1000000 --max_bytes_for_level_base=2048576 --nooverwritepercent=1 --open_files=-1 --open_read_fault_one_in=32 --ops_per_thread=1000000 --prefixpercent=5 --read_fault_one_in=0 --readpercent=45 --reopen=0 --skip_verifydb=1 --subcompactions=2 --target_file_size_base=524288 --test_batches_snapshots=0 --value_size_mult=32 --write_buffer_size=524288 --writepercent=35 ; do : ; done
```
Reviewed By: pdillinger
Differential Revision: D34383069
Pulled By: ajkr
fbshipit-source-id: fac26c3b20ea962e75387515ba5f2724dc48719f
Summary:
We found a case of cacheline bouncing due to writers locking/unlocking `mutex_` and readers accessing `block_cache_tracer_`. We discovered it only after the issue was fixed by https://github.com/facebook/rocksdb/issues/9462 shifting the `DBImpl` members such that `mutex_` and `block_cache_tracer_` were naturally placed in separate cachelines in our regression testing setup. This PR forces the cacheline alignment of `mutex_` so we don't accidentally reintroduce the problem.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9637
Reviewed By: riversand963
Differential Revision: D34502233
Pulled By: ajkr
fbshipit-source-id: 46aa313b7fe83e80c3de254e332b6fb242434c07
Summary:
**Context:**
As part of https://github.com/facebook/rocksdb/pull/6949, file deletion is disabled for faulty database on the IOError of MANIFEST write/sync and [re-enabled again during `DBImpl::Resume()` if all recovery is completed](e66199d848 (diff-d9341fbe2a5d4089b93b22c5ed7f666bc311b378c26d0786f4b50c290e460187R396)). Before re-enabling file deletion, it `assert(versions_->io_status().ok());`, which IMO assumes `versions_` is **the** `version_` in the recovery process.
However, this is not necessarily true due to `s = error_handler_.ClearBGError();` happening before that assertion can unblock some foreground thread by [`EventHelpers::NotifyOnErrorRecoveryEnd()`](3122cb4358/db/error_handler.cc (L552-L553)) as part of the `ClearBGError()`. That foreground thread can do whatever it wants including closing/reopening the db and clean up that same `versions_`.
As a consequence, `assert(versions_->io_status().ok());`, will access `io_status()` of a nullptr and test like `DBErrorHandlingFSTest.MultiCFWALWriteError` becomes flaky. The unblocked foreground thread (in this case, the testing thread) proceeds to [reopen the db](https://github.com/facebook/rocksdb/blob/6.29.fb/db/error_handler_fs_test.cc?fbclid=IwAR1kQOxSbTUmaHQPAGz5jdMHXtDsDFKiFl8rifX-vIz4B23Y0S9jBkssSCg#L1494), where [`versions_` gets reset to nullptr](https://github.com/facebook/rocksdb/blob/6.29.fb/db/db_impl/db_impl.cc?fbclid=IwAR2uRhwBiPKgmE9q_6CM2mzbfwjoRgsGpXOrHruSJUDcAKc9rYZtVSvKdOY#L678) as part of the old db clean-up. If this happens right before `assert(versions_->io_status().ok()); ` gets excuted in the background thread, then we can see error like
```
db/db_impl/db_impl.cc:420:5: runtime error: member call on null pointer of type 'rocksdb::VersionSet'
assert(versions_->io_status().ok());
```
**Summary:**
- I proposed to call `s = error_handler_.ClearBGError();` after we know it's fine to wake up foreground, which I think is right before we LOG `ROCKS_LOG_INFO(immutable_db_options_.info_log, "Successfully resumed DB");`
- As the context, the orignal https://github.com/facebook/rocksdb/pull/3997 introducing `DBImpl::Resume()` calls `s = error_handler_.ClearBGError();` very close to calling `ROCKS_LOG_INFO(immutable_db_options_.info_log, "Successfully resumed DB");` while the later https://github.com/facebook/rocksdb/pull/6949 distances these two calls a bit.
- And it seems fine to me that `s = error_handler_.ClearBGError();` happens after `EnableFileDeletions(/*force=*/true);` at least syntax-wise since these two functions are orthogonal. And it also seems okay to me that we re-enable file deletion before `s = error_handler_.ClearBGError();`, which basically is resetting some state variables.
- In addition, to preserve the previous behavior of https://github.com/facebook/rocksdb/pull/6949 where status of re-enabling file deletion is not taken account into the general status of resuming the db, I separated `enable_file_deletion_s` from the general `s`
- In addition, to make `ROCKS_LOG_INFO(immutable_db_options_.info_log, "Successfully resumed DB");` more clear, I separated it into its own if-block.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9496
Test Plan:
- Manually reproduce the assertion failure in`DBErrorHandlingFSTest.MultiCFWALWriteError` by injecting sleep like below so that it's more likely for `assert(versions_->io_status().ok());` to execute after [reopening the db](https://github.com/facebook/rocksdb/blob/6.29.fb/db/error_handler_fs_test.cc?fbclid=IwAR1kQOxSbTUmaHQPAGz5jdMHXtDsDFKiFl8rifX-vIz4B23Y0S9jBkssSCg#L1494) in the foreground (i.e, testing) thread
```
sleep(1);
assert(versions_->io_status().ok());
```
`python3 gtest-parallel/gtest_parallel.py -r 100 -w 100 rocksdb/error_handler_fs_test --gtest_filter=DBErrorHandlingFSTest.MultiCFWALWriteError`
```
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from DBErrorHandlingFSTest
[ RUN ] DBErrorHandlingFSTest.MultiCFWALWriteError
Received signal 11 (Segmentation fault)
#0 rocksdb/error_handler_fs_test() [0x5818a4] rocksdb::DBImpl::ResumeImpl(rocksdb::DBRecoverContext) /data/users/huixiao/rocksdb/db/db_impl/db_impl.cc:421
https://github.com/facebook/rocksdb/issues/1 rocksdb/error_handler_fs_test() [0x6379ff] rocksdb::ErrorHandler::RecoverFromBGError(bool) /data/users/huixiao/rocksdb/db/error_handler.cc:600
https://github.com/facebook/rocksdb/issues/2 rocksdb/error_handler_fs_test() [0x7c5362] rocksdb::SstFileManagerImpl::ClearError() /data/users/huixiao/rocksdb/file/sst_file_manager_impl.cc:310
https://github.com/facebook/rocksdb/issues/3 rocksdb/error_handler_fs_test()
```
- The assertion failure does not happen with PR
`python3 gtest-parallel/gtest_parallel.py -r 100 -w 100 rocksdb/error_handler_fs_test --gtest_filter=DBErrorHandlingFSTest.MultiCFWALWriteError`
`[100/100] DBErrorHandlingFSTest.MultiCFWALWriteError (43785 ms) `
Reviewed By: riversand963, anand1976
Differential Revision: D33990099
Pulled By: hx235
fbshipit-source-id: 2e0259a471fa8892ff177da91b3e1c0792dd7bab
Summary:
Implement a streaming compression API (compress/uncompress) to use for WAL compression. The log_writer would use the compress class/API to compress a record before writing it out in chunks. The log_reader would use the uncompress class/API to uncompress the chunks and combine into a single record.
Added unit test to verify the API for different sizes/compression types.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9619
Test Plan: make -j24 check
Reviewed By: anand1976
Differential Revision: D34437346
Pulled By: sidroyc
fbshipit-source-id: b180569ad2ddcf3106380f8758b556cc0ad18382
Summary:
**Summary:**
RocksDB uses a block cache to reduce IO and make queries more efficient. The block cache is based on the LRU algorithm (LRUCache) and keeps objects containing uncompressed data, such as Block, ParsedFullFilterBlock etc. It allows the user to configure a second level cache (rocksdb::SecondaryCache) to extend the primary block cache by holding items evicted from it. Some of the major RocksDB users, like MyRocks, use direct IO and would like to use a primary block cache for uncompressed data and a secondary cache for compressed data. The latter allows us to mitigate the loss of the Linux page cache due to direct IO.
This PR includes a concrete implementation of rocksdb::SecondaryCache that integrates with compression libraries such as LZ4 and implements an LRU cache to hold compressed blocks.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9518
Test Plan:
In this PR, the lru_secondary_cache_test.cc includes the following tests:
1. The unit tests for the secondary cache with either compression or no compression, such as basic tests, fails tests.
2. The integration tests with both primary cache and this secondary cache .
**Follow Up:**
1. Statistics (e.g. compression ratio) will be added in another PR.
2. Once this implementation is ready, I will do some shadow testing and benchmarking with UDB to measure the impact.
Reviewed By: anand1976
Differential Revision: D34430930
Pulled By: gitbw95
fbshipit-source-id: 218d78b672a2f914856d8a90ff32f2f5b5043ded
Summary:
This PR supports inserting keys to a `WriteBatchWithIndex` for column families that enable user-defined timestamps
and reading the keys back. **The index does not have timestamps.**
Writing a key to WBWI is unchanged, because the underlying WriteBatch already supports it.
When reading the keys back, we need to make sure to distinguish between keys with and without timestamps before
comparison.
When user calls `GetFromBatchAndDB()`, no timestamp is needed to query the batch, but a timestamp has to be
provided to query the db. The assumption is that data in the batch must be newer than data from the db.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9603
Test Plan: make check
Reviewed By: ltamasi
Differential Revision: D34354849
Pulled By: riversand963
fbshipit-source-id: d25d1f84e2240ce543e521fa30595082fb8db9a0
Summary:
We often see flaky tests due to `DB::Flush()` or `DBImpl::TEST_WaitForFlushMemTable()` not waiting until event listeners complete. For example, https://github.com/facebook/rocksdb/issues/9084, https://github.com/facebook/rocksdb/issues/9400, https://github.com/facebook/rocksdb/issues/9528, plus two new ones this week: "EventListenerTest.OnSingleDBFlushTest" and "DBFlushTest.FireOnFlushCompletedAfterCommittedResult". I ran a `make check` with the below race condition-coercing patch and fixed issues it found besides old BlobDB.
```
diff --git a/db/db_impl/db_impl_compaction_flush.cc b/db/db_impl/db_impl_compaction_flush.cc
index 0e1864788..aaba68c4a 100644
--- a/db/db_impl/db_impl_compaction_flush.cc
+++ b/db/db_impl/db_impl_compaction_flush.cc
@@ -861,6 +861,8 @@ void DBImpl::NotifyOnFlushCompleted(
mutable_cf_options.level0_stop_writes_trigger);
// release lock while notifying events
mutex_.Unlock();
+ bg_cv_.SignalAll();
+ sleep(1);
{
for (auto& info : *flush_jobs_info) {
info->triggered_writes_slowdown = triggered_writes_slowdown;
```
The reason I did not fix old BlobDB issues is because it appears to have a fundamental (non-test) issue. In particular, it uses an EventListener to keep track of the files. OnFlushCompleted() could be delayed until even after a compaction involving that flushed file completes, causing the compaction to unexpectedly delete an untracked file.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9617
Test Plan: `make check` including the race condition coercing patch
Reviewed By: hx235
Differential Revision: D34384022
Pulled By: ajkr
fbshipit-source-id: 2652ded39b415277c5d6a628414345223930514e
Summary:
Valgrind was failing with the below error because we forgot to destroy
the `BackupEngine` object:
```
==421173== Command: ./db_test2 --gtest_filter=DBTest2.BackupFileTemperature
==421173==
Note: Google Test filter = DBTest2.BackupFileTemperature
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from DBTest2
[ RUN ] DBTest2.BackupFileTemperature
--421173-- WARNING: unhandled amd64-linux syscall: 425
--421173-- You may be able to write your own handler.
--421173-- Read the file README_MISSING_SYSCALL_OR_IOCTL.
--421173-- Nevertheless we consider this a bug. Please report
--421173-- it at http://valgrind.org/support/bug_reports.html.
[ OK ] DBTest2.BackupFileTemperature (3366 ms)
[----------] 1 test from DBTest2 (3371 ms total)
[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (3413 ms total)
[ PASSED ] 1 test.
==421173==
==421173== HEAP SUMMARY:
==421173== in use at exit: 13,042 bytes in 195 blocks
==421173== total heap usage: 26,022 allocs, 25,827 frees, 27,555,265 bytes allocated
==421173==
==421173== 8 bytes in 1 blocks are possibly lost in loss record 6 of 167
==421173== at 0x4838DBF: operator new(unsigned long) (vg_replace_malloc.c:344)
==421173== by 0x8D4606: allocate (new_allocator.h:114)
==421173== by 0x8D4606: allocate (alloc_traits.h:445)
==421173== by 0x8D4606: _M_allocate (stl_vector.h:343)
==421173== by 0x8D4606: reserve (vector.tcc:78)
==421173== by 0x8D4606: rocksdb::BackupEngineImpl::Initialize() (backupable_db.cc:1174)
==421173== by 0x8D5473: Initialize (backupable_db.cc:918)
==421173== by 0x8D5473: rocksdb::BackupEngine::Open(rocksdb::BackupEngineOptions const&, rocksdb::Env*, rocksdb::BackupEngine**) (backupable_db.cc:937)
==421173== by 0x50AC8F: Open (backup_engine.h:585)
==421173== by 0x50AC8F: rocksdb::DBTest2_BackupFileTemperature_Test::TestBody() (db_test2.cc:6996)
...
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9610
Test Plan:
```
$ make -j24 ROCKSDBTESTS_SUBSET=db_test2 valgrind_check_some
```
Reviewed By: akankshamahajan15
Differential Revision: D34371210
Pulled By: ajkr
fbshipit-source-id: 68154fcb0c51b28222efa23fa4ee02df8d925a18
Summary:
Change enum SizeApproximationFlags to enum and class and add
overloaded operators for the transition between enum class and uint8_t
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9604
Test Plan: Circle CI jobs
Reviewed By: riversand963
Differential Revision: D34360281
Pulled By: akankshamahajan15
fbshipit-source-id: 6351dfdb717ae3c4530d324c3d37a8ecb01dd1ef
Summary:
Add Temperature hints information from RocksDB in API
`NewSequentialFile()`. backup and checkpoint operations need to open the
source files with `NewSequentialFile()`, which will have the temperature
hints. Other operations are not covered.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9499
Test Plan: Added unittest
Reviewed By: pdillinger
Differential Revision: D34006115
Pulled By: jay-zhuang
fbshipit-source-id: 568b34602b76520e53128672bd07e9d886786a2f
Summary:
This PR adds support for new APIs Async Read that reads the data
asynchronously and Poll API that checks if requested read request has
completed or not.
Usage: In RocksDB, we are currently planning to prefetch data
asynchronously during sequential scanning and RocksDB will call these
APIs to prefetch more data in advanced.
Design:
- ReadAsync API submits the read request to underlying FileSystem in
order to read data asynchronously. When read request is completed,
callback function will be called. cb_arg is used by RocksDB to track the
original request submitted and IOHandle is used by FileSystem to keep track
of IO requests at their level.
- The Poll API is added in FileSystem because the call could end up handling
completions for multiple different files which is not specific to a
FSRandomAccessFile instance. There could be multiple outstanding file reads
from different files in future and they can complete in any order.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9564
Test Plan: Test will be added in separate PR.
Reviewed By: anand1976
Differential Revision: D34226216
Pulled By: akankshamahajan15
fbshipit-source-id: 95e64edafb17f543f7232421d51e2665a3267f69
Summary:
[Compaction::IsTrivialMove](a2b9be42b6/db/compaction/compaction.cc (L318)) checks whether allow_trivial_move is set, and if so it returns the value of is_trivial_move_. The allow_trivial_move option is there for universal compaction. So when this is set and leveled compaction is enabled, then useful code that follows this block never gets a chance to run.
A check that [compaction_style == kCompactionStyleUniversal](320d9a8e8a/db/db_impl/db_impl_compaction_flush.cc (L1030)) should be added to avoid doing the wrong thing for leveled.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9586
Test Plan:
To reproduce this:
First edit db/compaction/compaction.cc with
```
diff --git a/db/compaction/compaction.cc b/db/compaction/compaction.cc
index 7ae50b91e..52dd489b1 100644
--- a/db/compaction/compaction.cc
+++ b/db/compaction/compaction.cc
@@ -319,6 +319,8 @@ bool Compaction::IsTrivialMove() const {
// input files are non overlapping
if ((mutable_cf_options_.compaction_options_universal.allow_trivial_move) &&
(output_level_ != 0)) {
+ printf("IsTrivialMove:: return %d because universal allow_trivial_move\n", (int) is_trivial_move_);
+ // abort();
return is_trivial_move_;
}
```
And then run
```
./db_bench --benchmarks=fillseq --allow_concurrent_memtable_write=false --level0_file_num_compaction_trigger=4 --level0_slowdown_writes_trigger=20 --level0_stop_writes_trigger=30 --max_background_jobs=8 --max_write_buffer_number=8 --db=/data/m/rx --wal_dir=/data/m/rx --num=800000000 --num_levels=8 --key_size=20 --value_size=400 --block_size=8192 --cache_size=51539607552 --cache_numshardbits=6 --compression_max_dict_bytes=0 --compression_ratio=0.5 --compression_type=lz4 --bytes_per_sync=8388608 --cache_index_and_filter_blocks=1 --cache_high_pri_pool_ratio=0.5 --benchmark_write_rate_limit=0 --write_buffer_size=16777216 --target_file_size_base=16777216 --max_bytes_for_level_base=67108864 --verify_checksum=1 --delete_obsolete_files_period_micros=62914560 --max_bytes_for_level_multiplier=8 --statistics=0 --stats_per_interval=1 --stats_interval_seconds=20 --histogram=1 --memtablerep=skip_list --bloom_bits=10 --open_files=-1 --subcompactions=1 --compaction_style=0 --min_level_to_compress=3 --level_compaction_dynamic_level_bytes=true --pin_l0_filter_and_index_blocks_in_cache=1 --soft_pending_compaction_bytes_limit=167503724544 --hard_pending_compaction_bytes_limit=335007449088 --min_level_to_compress=0 --use_existing_db=0 --sync=0 --threads=1 --memtablerep=vector --allow_concurrent_memtable_write=false --disable_wal=1 --seed=1641328309 --universal_allow_trivial_move=1
```
Example output with the debug code added
```
IsTrivialMove:: return 0 because universal allow_trivial_move
IsTrivialMove:: return 0 because universal allow_trivial_move
```
After this PR, the bug is fixed.
Reviewed By: ajkr
Differential Revision: D34350451
Pulled By: gitbw95
fbshipit-source-id: 3232005cc47c40a7e75d316cfc7960beb5bdff3a
Summary:
Make FilterPolicy into a Customizable class. Allow new FilterPolicy to be discovered through the ObjectRegistry
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9590
Reviewed By: pdillinger
Differential Revision: D34327367
Pulled By: mrambacher
fbshipit-source-id: 37e7edac90ec9457422b72f359ab8ef48829c190
Summary:
RocksDB try to provide temperature information in the event
listener callbacks. The information is not guaranteed, as some operation
like backup won't have these information.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9591
Test Plan: Added unittest
Reviewed By: siying, pdillinger
Differential Revision: D34309339
Pulled By: jay-zhuang
fbshipit-source-id: 4aca4f270f99fa49186d85d300da42594663d6d7
Summary:
`ColumnFamilyOptions::OldDefaults` and `DBOptions::OldDefaults`
now deprecated. Were previously overlooked with `Options::OldDefaults` in https://github.com/facebook/rocksdb/issues/9363
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9594
Test Plan: comments only
Reviewed By: jay-zhuang
Differential Revision: D34318592
Pulled By: pdillinger
fbshipit-source-id: 773c97a61e2a8290ae154f363dd61c1f35a9dd16
Summary:
Extend the plugin architecture to allow for the inclusion, building and testing of Java and JNI components of a plugin. This will cause the JAR built by `$ make rocksdbjava` to include the extra functionality provided by the plugin, and will cause `$ make jtest` to add the java tests provided by the plugin to the tests built and run by Java testing.
The plugin's `<plugin>.mk` file can define:
```
<plugin>_JNI_NATIVE_SOURCES
<plugin>_NATIVE_JAVA_CLASSES
<plugin>_JAVA_TESTS
```
The plugin should provide java/src, java/test and java/rocksjni directories. When a plugin is required to be build it must be named in the ROCKSDB_PLUGINS environment variable (as per the plugin architecture). This now has the effect of adding the files specified by the above definitions to the appropriate parts of the build.
An example of a plugin with a Java component can be found as part of the hdfs plugin in https://github.com/riversand963/rocksdb-hdfs-env - at the time of writing the Java part of this fails tests, and needs a little work to complete, but it builds correctly under the plugin model.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9575
Reviewed By: hx235
Differential Revision: D34253948
Pulled By: riversand963
fbshipit-source-id: b3dde5da06f3d3c25c54246892097ae2a369b42d
Summary:
For RocksDB v7 major release. Remove previously deprecated Java API methods and associated tests
- where equivalent/alternative functionality exists and is already tested AND
- where the core RocksDB function/feature has also been removed
- OR the functionality exists only in Java so the previous deprecation only affected Java methods
RETAIN deprecated Java which reflects functionality which is deprecated by, but also still supported by, the core of RocksDB.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9576
Reviewed By: ajkr
Differential Revision: D34314983
Pulled By: jay-zhuang
fbshipit-source-id: 7cf9c17e3e07be9d289beb99f81b71e8e09ac403
Summary:
We don't have any evidence of people using these to build custom
filters. The recommended way of customizing filter handling is to
defer to various built-in policies based on FilterBuildingContext
(e.g. to build Monkey filtering policy). With old API, we have
evidence of people modifying keys going into filter, but most cases
of that can be handled with prefix_extractor.
Having FilterBitsBuilder+Reader in the public API is an ogoing
hinderance to code evolution (e.g. recent new Finish and
MaybePostVerify), and so this change removes them from the public API
for 7.0. Maybe they will come back in some form later, but lacking
evidence of them providing value in the public API, we want to take back
more freedom to evolve these.
With this moved to internal-only, there is no rush to clean up the
complex Finish signatures, or add memory allocator support, but doing so
is much easier with them out of public API, for example to use
CacheAllocationPtr without exposing it in the public API.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9592
Test Plan: cosmetic changes only
Reviewed By: hx235
Differential Revision: D34315470
Pulled By: pdillinger
fbshipit-source-id: 03e03bb66a72c73df2c464d2dbbbae906dd8f99b
Summary:
The NUM_INDEX_AND_FILTER_BLOCKS_READ_PER_LEVEL, NUM_DATA_BLOCKS_READ_PER_LEVEL, and NUM_SST_READ_PER_LEVEL stats were being recorded only when the last file in a level happened to have hits. They are supposed to be updated for every level. Also, there was some overcounting of GetContextStats. This PR fixes both the problems.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9583
Test Plan: Update the unit test in db_basic_test
Reviewed By: akankshamahajan15
Differential Revision: D34308044
Pulled By: anand1976
fbshipit-source-id: b3b36020fda26ba91bc6e0e47d52d58f4d7f656e
Summary:
When WAL compression is enabled, add a record (new record type) to store the compression type to indicate that all subsequent records are compressed. The log reader will store the compression type when this record is encountered and use the type to uncompress the subsequent records. Compress and uncompress to be implemented in subsequent diffs.
Enabled WAL compression in some WAL tests to check for regressions. Some tests that rely on offsets have been disabled.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9556
Reviewed By: anand1976
Differential Revision: D34308216
Pulled By: sidroyc
fbshipit-source-id: 7f10595e46f3277f1ea2d309fbf95e2e935a8705
Summary:
For RocksJava 7 we will move from requiring Java 7 to Java 8.
* This simplifies the `Makefile` as we no longer need to deal with Java 7; so we no longer use `javah`.
* Added a java-version target which is invoked by the java target, and which exits if the version of java being used is not 8 or greater.
* Enforces java 8 as a minimum.
* Fixed CMake build.
* Fixed broken java event listener test, as the test was broken and the assertions in the callbacks were not causing assertions in the tests. The callbacks now queue up assertion errors for the main thread of the tests to check.
* Fixed C++ dangling pointers in the test code.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9541
Reviewed By: pdillinger
Differential Revision: D34214929
Pulled By: jay-zhuang
fbshipit-source-id: fdff348758d0a23a742e83c87d5f54073ce16ca6
Summary:
Users can set the priority for file reads associated with their operation by setting `ReadOptions::rate_limiter_priority` to something other than `Env::IO_TOTAL`. Rate limiting `VerifyChecksum()` and `VerifyFileChecksums()` is the motivation for this PR, so it also includes benchmarks and minor bug fixes to get that working.
`RandomAccessFileReader::Read()` already had support for rate limiting compaction reads. I changed that rate limiting to be non-specific to compaction, but rather performed according to the passed in `Env::IOPriority`. Now the compaction read rate limiting is supported by setting `rate_limiter_priority = Env::IO_LOW` on its `ReadOptions`.
There is no default value for the new `Env::IOPriority` parameter to `RandomAccessFileReader::Read()`. That means this PR goes through all callers (in some cases multiple layers up the call stack) to find a `ReadOptions` to provide the priority. There are TODOs for cases I believe it would be good to let user control the priority some day (e.g., file footer reads), and no TODO in cases I believe it doesn't matter (e.g., trace file reads).
The API doc only lists the missing cases where a file read associated with a provided `ReadOptions` cannot be rate limited. For cases like file ingestion checksum calculation, there is no API to provide `ReadOptions` or `Env::IOPriority`, so I didn't count that as missing.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9424
Test Plan:
- new unit tests
- new benchmarks on ~50MB database with 1MB/s read rate limit and 100ms refill interval; verified with strace reads are chunked (at 0.1MB per chunk) and spaced roughly 100ms apart.
- setup command: `./db_bench -benchmarks=fillrandom,compact -db=/tmp/testdb -target_file_size_base=1048576 -disable_auto_compactions=true -file_checksum=true`
- benchmarks command: `strace -ttfe pread64 ./db_bench -benchmarks=verifychecksum,verifyfilechecksums -use_existing_db=true -db=/tmp/testdb -rate_limiter_bytes_per_sec=1048576 -rate_limit_bg_reads=1 -rate_limit_user_ops=true -file_checksum=true`
- crash test using IO_USER priority on non-validation reads with https://github.com/facebook/rocksdb/issues/9567 reverted: `python3 tools/db_crashtest.py blackbox --max_key=1000000 --write_buffer_size=524288 --target_file_size_base=524288 --level_compaction_dynamic_level_bytes=true --duration=3600 --rate_limit_bg_reads=true --rate_limit_user_ops=true --rate_limiter_bytes_per_sec=10485760 --interval=10`
Reviewed By: hx235
Differential Revision: D33747386
Pulled By: ajkr
fbshipit-source-id: a2d985e97912fba8c54763798e04f006ccc56e0c
Summary:
The following sequence of events can cause silent data loss for write-committed
transactions.
```
Time thread 1 bg flush
| db->Put("a")
| txn = NewTxn()
| txn->Put("b", "v")
| txn->Prepare() // writes only to 5.log
| db->SwitchMemtable() // memtable 1 has "a"
| // close 5.log,
| // creates 8.log
| trigger flush
| pick memtable 1
| unlock db mutex
| write new sst
| txn->ctwb->Put("gtid", "1") // writes 8.log
| txn->Commit() // writes to 8.log
| // writes to memtable 2
| compute min_log_number_to_keep_2pc, this
| will be 8 (incorrect).
|
| Purge obsolete wals, including 5.log
|
V
```
At this point, writes of txn exists only in memtable. Close db without flush because db thinks the data in
memtable are backed by log. Then reopen, the writes are lost except key-value pair {"gtid"->"1"},
only the commit marker of txn is in 8.log
The reason lies in `PrecomputeMinLogNumberToKeep2PC()` which calls `FindMinPrepLogReferencedByMemTable()`.
In the above example, when bg flush thread tries to find obsolete wals, it uses the information
computed by `PrecomputeMinLogNumberToKeep2PC()`. The return value of `PrecomputeMinLogNumberToKeep2PC()`
depends on three components
- `PrecomputeMinLogNumberToKeepNon2PC()`. This represents the WAL that has unflushed data. As the name of this method suggests, it does not account for 2PC. Although the keys reside in the prepare section of a previous WAL, the column family references the current WAL when they are actually inserted into the memtable during txn commit.
- `prep_tracker->FindMinLogContainingOutstandingPrep()`. This represents the WAL with a prepare section but the txn hasn't committed.
- `FindMinPrepLogReferencedByMemTable()`. This represents the WAL on which some memtables (mutable and immutable) depend for their unflushed data.
The bug lies in `FindMinPrepLogReferencedByMemTable()`. Originally, this function skips checking the column families
that are being flushed, but the unit test added in this PR shows that they should not be. In this unit test, there is
only the default column family, and one of its memtables has unflushed data backed by a prepare section in 5.log.
We should return this information via `FindMinPrepLogReferencedByMemTable()`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9571
Test Plan:
```
./transaction_test --gtest_filter=*/TransactionTest.SwitchMemtableDuringPrepareAndCommit_WC/*
make check
```
Reviewed By: siying
Differential Revision: D34235236
Pulled By: riversand963
fbshipit-source-id: 120eb21a666728a38dda77b96276c6af72b008b1
Summary:
As in
```
db_stress: table/block_based/filter_policy.cc:316: rocksdb::{anonymous}::FastLocalBloomBitsBuilder::FastLocalBloomBitsBuilder(int, std::atomic<long int>*, std::shared_ptr<rocksdb::CacheReservationManager>, bool): Assertion `millibits_per_key >= 1000' failed.
```
This assertion failure was actually happening with our RibbonFilterPolicy
which falls back to Bloom for some cases, often for flush, but was
missing new special logic to skip generating filter for 0 bits per key
case. Fixed by adding the logic in other builtin FilterPolicy
implementations.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9585
Test Plan:
Updated db_bloom_filter_test to do more integration testing
of the RibbonFilterPolicy ("auto Ribbon") class, incl regression test
this with SkipFilterOnEssentiallyZeroBpk
Reviewed By: ajkr
Differential Revision: D34295101
Pulled By: pdillinger
fbshipit-source-id: 3488eb207fc1d67bbbd1301313714aa1b6406e6e
Summary:
Opening DB as seconeary instance has been supported in ldb but it is not mentioned in --help. Mention it there. The part of the help message after the modification:
```
commands MUST specify --db=<full_path_to_db_directory> when necessary
commands can optionally specify
--env_uri=<uri_of_environment> or --fs_uri=<uri_of_filesystem> if necessary
--secondary_path=<secondary_path> to open DB as secondary instance. Operations not supported in secondary instance will fail.
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9582
Test Plan: Build and run ldb --help
Reviewed By: riversand963
Differential Revision: D34286427
fbshipit-source-id: e56c5290d0548098ab6acc6dde2167f5a64f34f3
Summary:
Remove deprecated remote compaction APIs
`CompactionService::Start()` and `CompactionService::WaitForComplete()`.
Please use `CompactionService::StartV2()`,
`CompactionService::WaitForCompleteV2()` instead, which provides the
same information plus extra data like priority, db_id, etc.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9570
Test Plan: CI
Reviewed By: riversand963
Differential Revision: D34255969
Pulled By: jay-zhuang
fbshipit-source-id: c6376eccdd1123f1c42ab53771b5f65f8160c325
Summary:
Add support for doubles to ObjectLibrary::PatternEntry. This support will allow patterns containing a non-integer number to be parsed correctly.
Added appropriate test cases to cover this new option.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9577
Reviewed By: pdillinger
Differential Revision: D34269763
Pulled By: mrambacher
fbshipit-source-id: b5ce16cbd3665c2974ec0f3412ef2b403ef8b155
Summary:
Add Solana's RocksDB use case in USERS.md.
Solana is a fast, secure, scalable, and decentralized blockchain. It uses RocksDB as the underlying storage for its ledger store.
github: https://github.com/solana-labs/solana
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9558
Reviewed By: jay-zhuang
Differential Revision: D34249087
Pulled By: riversand963
fbshipit-source-id: 7524eff4952e2676e8520ac491ffb6a686fb4d7e
Summary:
Some changes to make it easier to make FilterPolicy
customizable. Especially, create distinct classes for the different
testing-only and user-facing built-in FilterPolicy modes.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9567
Test Plan:
tests updated, with no intended difference in functionality
tested. No difference in test performance seen as a result of moving to
string-based filter type configuration.
Reviewed By: mrambacher
Differential Revision: D34234694
Pulled By: pdillinger
fbshipit-source-id: 8a94931a9e04c3bcca863a4f524cfd064aaf0122