rocksdb

Commit Graph

Author	SHA1	Message	Date
Merlin Mao	d10801e983	Allow Replayer to report the results of TraceRecords. (#8657 ) Summary: `Replayer::Execute()` can directly returns the result (e.g, request latency, DB::Get() return code, returned value, etc.) `Replayer::Replay()` reports the results via a callback function. New interface: `TraceRecordResult` in "rocksdb/trace_record_result.h". `DBTest2.TraceAndReplay` and `DBTest2.TraceAndManualReplay` are updated accordingly. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8657 Reviewed By: ajkr Differential Revision: D30290216 Pulled By: autopear fbshipit-source-id: 3c8d4e6b180ec743de1a9d9dcaee86064c74f0d6	3 years ago
Merlin Mao	f58d276764	Make TraceRecord and Replayer public (#8611 ) Summary: New public interfaces: `TraceRecord` and `TraceRecord::Handler`, available in "rocksdb/trace_record.h". `Replayer`, available in `rocksdb/utilities/replayer.h`. User can use `DB::NewDefaultReplayer()` to create a Replayer to auto/manual replay a trace file. Unit tests: - `./db_test2 --gtest_filter="DBTest2.TraceAndReplay"`: Updated with the internal API changes. - `./db_test2 --gtest_filter="DBTest2.TraceAndManualReplay"`: New for manual replay. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8611 Reviewed By: ajkr Differential Revision: D30266329 Pulled By: autopear fbshipit-source-id: 1ecb3cbbedae0f6a67c18f0cc82e002b4d81b6f8	3 years ago
Baptiste Lemaire	e3a96c4823	Memtable sampling for mempurge heuristic. (#8628 ) Summary: Changes the API of the MemPurge process: the `bool experimental_allow_mempurge` and `experimental_mempurge_policy` flags have been replaced by a `double experimental_mempurge_threshold` option. This change of API reflects another major change introduced in this PR: the MemPurgeDecider() function now works by sampling the memtables being flushed to estimate the overall amount of useful payload (payload minus the garbage), and then compare this useful payload estimate with the `double experimental_mempurge_threshold` value. Therefore, when the value of this flag is `0.0` (default value), mempurge is simply deactivated. On the other hand, a value of `DBL_MAX` would be equivalent to always going through a mempurge regardless of the garbage ratio estimate. At the moment, a `double experimental_mempurge_threshold` value else than 0.0 or `DBL_MAX` is opnly supported`with the `SkipList` memtable representation. Regarding the sampling, this PR includes the introduction of a `MemTable::UniqueRandomSample` function that collects (approximately) random entries from the memtable by using the new `SkipList::Iterator::RandomSeek()` under the hood, or by iterating through each memtable entry, depending on the target sample size and the total number of entries. The unit tests have been readapted to support this new API. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8628 Reviewed By: pdillinger Differential Revision: D30149315 Pulled By: bjlemaire fbshipit-source-id: 1feef5390c95db6f4480ab4434716533d3947f27	3 years ago
Andrew Kryczka	82b81dc8b5	Simplify GenericRateLimiter algorithm (#8602 ) Summary: `GenericRateLimiter` slow path handles requests that cannot be satisfied immediately. Such requests enter a queue, and their thread stays in `Request()` until they are granted or the rate limiter is stopped. These threads are responsible for unblocking themselves. The work to do so is split into two main duties. (1) Waiting for the next refill time. (2) Refilling the bytes and granting requests. Prior to this PR, the slow path logic involved a leader election algorithm to pick one thread to perform (1) followed by (2). It elected the thread whose request was at the front of the highest priority non-empty queue since that request was most likely to be granted. This algorithm was efficient in terms of reducing intermediate wakeups, which is a thread waking up only to resume waiting after finding its request is not granted. However, the conceptual complexity of this algorithm was too high. It took me a long time to draw a timeline to understand how it works for just one edge case yet there were so many. This PR drops the leader election to reduce conceptual complexity. Now, the two duties can be performed by whichever thread acquires the lock first. The risk of this change is increasing the number of intermediate wakeups, however, we took steps to mitigate that. - `wait_until_refill_pending_` flag ensures only one thread performs (1). This\ prevents the thundering herd problem at the next refill time. The remaining\ threads wait on their condition variable with an unbounded duration -- thus we\ must remember to notify them to ensure forward progress. - (1) is typically done by a thread at the front of a queue. This is trivial\ when the queues are initially empty as the first choice that arrives must be\ the only entry in its queue. When queues are initially non-empty, we achieve\ this by having (2) notify a thread at the front of a queue (preferring higher\ priority) to perform the next duty. - We do not require any additional wakeup for (2). Typically it will just be\ done by the thread that finished (1). Combined, the second and third bullet points above suggest the refill/granting will typically be done by a request at the front of its queue. This is important because one wakeup is saved when a granted request happens to be in an already running thread. Note there are a few cases that still lead to intermediate wakeup, however. The first two are existing issues that also apply to the old algorithm, however, the third (including both subpoints) is new. - No request may be granted (only possible when rate limit dynamically\ decreases). - Requests from a different queue may be granted. - (2) may be run by a non-front request thread causing it to not be granted even\ if some requests in that same queue are granted. It can happen for a couple\ (unlikely) reasons. - A new request may sneak in and grab the lock at the refill time, before the\ thread finishing (1) can wake up and grab it. - A new request may sneak in and grab the lock and execute (1) before (2)'s\ chosen candidate can wake up and grab the lock. Then that non-front request\ thread performing (1) can carry over to perform (2). Pull Request resolved: https://github.com/facebook/rocksdb/pull/8602 Test Plan: - Use existing tests. The edge cases listed in the comment are all performance\ related; I could not really think of any related to correctness. The logic\ looks the same whether a thread wakes up/finishes its work early/on-time/late,\ or whether the thread is chosen vs. "steals" the work. - Verified write throughput and CPU overhead are basically the same with and\ without this change, even in a rate limiter heavy workload: Test command: ``` $ rm -rf /dev/shm/dbbench/ && TEST_TMPDIR=/dev/shm /usr/bin/time ./db_bench -benchmarks=fillrandom -num_multi_db=64 -num_low_pri_threads=64 -num_high_pri_threads=64 -write_buffer_size=262144 -target_file_size_base=262144 -max_bytes_for_level_base=1048576 -rate_limiter_bytes_per_sec=16777216 -key_size=24 -value_size=1000 -num=10000 -compression_type=none -rate_limiter_refill_period_us=1000 ``` Results before this PR: ``` fillrandom : 108.463 micros/op 9219 ops/sec; 9.0 MB/s 7.40user 8.84system 1:26.20elapsed 18%CPU (0avgtext+0avgdata 256140maxresident)k ``` Results after this PR: ``` fillrandom : 108.108 micros/op 9250 ops/sec; 9.0 MB/s 7.45user 8.23system 1:26.68elapsed 18%CPU (0avgtext+0avgdata 255688maxresident)k ``` Reviewed By: hx235 Differential Revision: D30048013 Pulled By: ajkr fbshipit-source-id: 6741bba9d9dfbccab359806d725105817fef818b	3 years ago
sdong	e7c24168d8	Move old files to warm tier in FIFO compactions (#8310 ) Summary: Some FIFO users want to keep the data for longer, but the old data is rarely accessed. This feature allows users to configure FIFO compaction so that data older than a threshold is moved to a warm storage tier. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8310 Test Plan: Add several unit tests. Reviewed By: ajkr Differential Revision: D28493792 fbshipit-source-id: c14824ea634814dee5278b449ab5c98b6e0b5501	3 years ago
mrambacher	d057e8326d	Make MergeOperator+CompactionFilter/Factory into Customizable Classes (#8481 ) Summary: - Changed MergeOperator, CompactionFilter, and CompactionFilterFactory into Customizable classes. - Added Options/Configurable/Object Registration for TTL and Cassandra variants - Changed the StringAppend MergeOperators to accept a string delimiter rather than a simple char. Made the delimiter into a configurable option - Added tests for new functionality Pull Request resolved: https://github.com/facebook/rocksdb/pull/8481 Reviewed By: zhichao-cao Differential Revision: D30136050 Pulled By: mrambacher fbshipit-source-id: 271d1772835935b6773abaf018ee71e42f9491af	3 years ago
Baptiste Lemaire	9501279d5f	Create fillanddeleteuniquerandom benchmark (db_bench), with new option flags. (#8593 ) Summary: Introduction of a new `fillanddeleteuniquerandom` benchmark (`db_bench`) with 5 new option flags to simulate a benchmark where the following sequence is repeated multiple times: "A set of keys S1 is inserted ('`disposable entries`'), then after some delay another set of keys S2 is inserted ('`persistent entries`') and the first set of keys S1 is deleted. S2 artificially represents the insertion of hypothetical results from some undefined computation done on the first set of keys S1. The next sequence can start as soon as the last disposable entry in the set S1 of this sequence is inserted, if the `delay` is non negligible." New flags: - `disposable_entries_delete_delay`: minimum delay in microseconds between insertion of the last `disposable` entry, and the start of the insertion of the first `persistent` entry. - `disposable_entries_batch_size`: number of `disposable` entries inserted at the beginning of each sequence. - `disposable_entries_value_size`: size of the random `value` string for the `disposable` entries. - `persistent_entries_batch_size`: number of `persistent` entries inserted at the end of each sequence, right before the deletion of the `disposable` entries starts. - `persistent_entries_value_size`: size of the random value string for the `persistent` entries. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8593 Reviewed By: pdillinger Differential Revision: D29974436 Pulled By: bjlemaire fbshipit-source-id: f578033e5b45e8268ba6fa6f38f4770c2e6e801d	3 years ago
mrambacher	3aee4fbd41	Make EventListener into a Customizable Class (#8473 ) Summary: - Added Type/CreateFromString - Added ability to load EventListeners to DBOptions - Since EventListeners did not previously have a Name(), defaulted to "". If there is no name, the listener cannot be loaded from the ObjectRegistry. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8473 Reviewed By: zhichao-cao Differential Revision: D29901488 Pulled By: mrambacher fbshipit-source-id: 2d3a4aa6db1562ac03e7ad41b360e3521d486254	3 years ago
Baptiste Lemaire	4361d6d163	Add simple heuristics for experimental mempurge. (#8583 ) Summary: Add `experimental_mempurge_policy` option flag and introduce two new `MemPurge` (Memtable Garbage Collection) policies: 'ALWAYS' and 'ALTERNATE'. Default value: ALTERNATE. `ALWAYS`: every flush will first go through a `MemPurge` process. If the output is too big to fit into a single memtable, then the mempurge is aborted and a regular flush process carries on. `ALWAYS` is designed for user that need to reduce the number of L0 SST file created to a strict minimum, and can afford a small dent in performance (possibly hits to CPU usage, read efficiency, and maximum burst write throughput). `ALTERNATE`: a flush is transformed into a `MemPurge` except if one of the memtables being flushed is the product of a previous `MemPurge`. `ALTERNATE` is a good tradeoff between reduction in number of L0 SST files created and performance. `ALTERNATE` perform particularly well for completely random garbage ratios, or garbage ratios anywhere in (0%,50%], and even higher when there is a wild variability in garbage ratios. This PR also includes support for `experimental_mempurge_policy` in `db_bench`. Testing was done locally by replacing all the `MemPurge` policies of the unit tests with `ALTERNATE`, as well as local testing with `db_crashtest.py` `whitebox` and `blackbox`. Overall, if an `ALWAYS` mempurge policy passes the tests, there is no reasons why an `ALTERNATE` policy would fail, and therefore the mempurge policy was set to `ALWAYS` for all mempurge unit tests. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8583 Reviewed By: pdillinger Differential Revision: D29888050 Pulled By: bjlemaire fbshipit-source-id: e2cf26646d66679f6f5fb29842624615610759c1	3 years ago
leipeng	2febf1c45c	db_bench_tool.cc: fix copy - paste (#8553 ) Summary: PR https://github.com/facebook/rocksdb/issues/8519 fix db_bench_tool.cc for MSVC build errors by simply copy-paste, this PR fix the copy-paste while also works for MSVC. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8553 Reviewed By: ajkr Differential Revision: D29838056 Pulled By: jay-zhuang fbshipit-source-id: 0cd60c146b87a355c3dc1061dfe813169d75cea4	3 years ago
Baptiste Lemaire	6b4cdacf41	Add overwrite_probability for filluniquerandom benchmark in db_bench (#8569 ) Summary: Add flags `overwrite_probability` and `overwrite_window_size` flag to `db_bench`. Add the possibility of performing a `filluniquerandom` benchmark with an overwrite probability. For each write operation, there is a probability _p_ that the write is an overwrite (_p_=`overwrite_probability`). When an overwrite is decided, the key is randomly chosen from the last _N_ keys previously inserted into the DB (with _N_=`overwrite_window_size`). When a pure write is decided, the key inserted into the DB is unique and therefore will not be an overwrite. The `overwrite_window_size` is used so that the user can decide if the overwrite are mostly targeting recently inserted keys (when `overwrite_window_size` is small compared to the total number of writes), or can also target keys inserted "a long time ago" (when `overwrite_window_size` is comparable to total number of writes). Note that total number of writes = # of unique insertions + # of overwrites. No unit test specifically added. Local testing show the following throughputs for `filluniquerandom` with 1M total writes: - bypass the code inserts (no `overwrite_probability` flag specified): ~14.0MB/s - `overwrite_probability=0.99`, `overwrite_window_size=10`: ~17.0MB/s - `overwrite_probability=0.10`, `overwrite_window_size=10`: ~14.0MB/s - `overwrite_probability=0.99`, `overwrite_window_size=1M`: ~14.5MB/s - `overwrite_probability=0.10`, `overwrite_window_size=1M`: ~14.0MB/s Pull Request resolved: https://github.com/facebook/rocksdb/pull/8569 Reviewed By: pdillinger Differential Revision: D29818631 Pulled By: bjlemaire fbshipit-source-id: d472b4ea4e457a4da7c4ee4f14b40cccd6a4587a	3 years ago
sdong	bbc85a5f22	Fix minor wrong variable name in db_bench (#8549 ) Summary: Fix a minor variable name that is not accurate. This is recently introduced in https://github.com/facebook/rocksdb/pull/7818 Pull Request resolved: https://github.com/facebook/rocksdb/pull/8549 Reviewed By: zhichao-cao Differential Revision: D29745585 fbshipit-source-id: 6268b348878fdf99a162b2cc3d5876fbd9bb10d9	3 years ago
Baptiste Lemaire	f4529a54bb	Add experimental_allow_mempurge flag to benchmark. (#8546 ) Summary: Tiny PR to add the `experimental_allow_mempurge` to the `db_bench` tool (`Mempurge` is the current prototype for memtable garbage collection). This is useful to benchmark the prototype of this new feature, stress test it and help find new meaningful heuristics for GC. By default, the flag to allow `mempurge` is set to `false`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8546 Reviewed By: anand1976 Differential Revision: D29738338 Pulled By: bjlemaire fbshipit-source-id: 01892883a2f1c714c110718674da05992d6e2dd6	3 years ago
sdong	1e5b631e51	db_bench seekrandom with multiDB should only create iterators queried (#7818 ) Summary: Right now, db_bench with seekrandom and multiple DB setup creates iterator for all DBs just to query one of them. It's different from most real workloads. Fix it by only creating iterators that will be queried. Also fix a bug that DBs are not destroyed in multi-DB mode. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7818 Test Plan: Run db_bench with single/multiDB X using/not using tailing iterator with ASAN build, and validate the behavior is expected. Reviewed By: ajkr Differential Revision: D25720226 fbshipit-source-id: c2ff7ff7120e5ba64287a30b057c5d29b2cbe20b	3 years ago
sherriiiliu	7b9ecd4067	fix several MSVC build errors (#8519 ) Summary: Fixed a few MSVC (VCToolsVersion=14.0) build errors and warnings * `DEFINE_string` is a macro and VC compiler complains that it cannot put [ifdef-inside-define](https://stackoverflow.com/questions/5586429/ifdef-inside-define) * `sleep()` is not a recognizable function. Use `FLAGS_env->SleepForMicroseconds` instead * Define precise type in comparison to avoid mismatch warning Pull Request resolved: https://github.com/facebook/rocksdb/pull/8519 Reviewed By: jay-zhuang Differential Revision: D29683086 fbshipit-source-id: 8c80941472089f8daba84ae29597e75e603850e4	3 years ago
mrambacher	da90e23998	Improvements to benchmark.sh script (#8346 ) Summary: 1. Fix printing of stats when there are no writes (wamp=0). Previously had a div0 error 2. Added multireadrandom command as a valid target 3. Added ability to pass additional command line options to db_bench. Now can say things like benchmark.sh readrandom --mmap_read and the option will be passed to db_bench. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8346 Reviewed By: zhichao-cao Differential Revision: D29500436 Pulled By: mrambacher fbshipit-source-id: 54e90708aae9133be3a903e35efdf8f8abbd86fa	3 years ago
Adam Retter	5afd1e309c	Correct CVS -> CSV typo (#8513 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8513 Reviewed By: jay-zhuang Differential Revision: D29654066 Pulled By: mrambacher fbshipit-source-id: b8f492fe21edd37fe1f1c5a4a0e9153f58bbf3e2	3 years ago
mrambacher	570248aeff	Make SecondaryCache Customizable (#8480 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8480 Reviewed By: zhichao-cao Differential Revision: D29528740 Pulled By: mrambacher fbshipit-source-id: fd0f70d15f66611c8498257a9973f7e98ca13839	3 years ago
Peter (Stig) Edwards	b20737709f	Add -report_open_timing to db_bench (#8464 ) Summary: Hello and thanks for RocksDB, This PR adds support for ```-report_open_timing true``` to ```db_bench```. It can be useful when tuning RocksDB on filesystem/env with high latencies for file level operations (create/delete/rename...) seen during ```((Optimistic)Transaction)DB::Open```. Some examples: ``` > db_bench -benchmarks updaterandom -num 1 -db /dev/shm/db_bench > db_bench -benchmarks updaterandom -num 0 -db /dev/shm/db_bench -use_existing_db true -report_open_timing true -readonly true 2>&1 \| grep OpenDb OpenDb: 3.90133 milliseconds > db_bench -benchmarks updaterandom -num 0 -db /dev/shm/db_bench -use_existing_db true -report_open_timing true -use_secondary_db true 2>&1 \| grep OpenDb OpenDb: 3.33414 milliseconds > db_bench -benchmarks updaterandom -num 0 -db /dev/shm/db_bench -use_existing_db true -report_open_timing true 2>&1 \| grep -A1 OpenDb OpenDb: 6.05423 milliseconds > db_bench -benchmarks updaterandom -num 1 > db_bench -benchmarks updaterandom -num 0 -use_existing_db true -report_open_timing true -readonly true 2>&1 \| grep OpenDb OpenDb: 4.06859 milliseconds > db_bench -benchmarks updaterandom -num 0 -use_existing_db true -report_open_timing true -use_secondary_db true 2>&1 \| grep OpenDb OpenDb: 2.85794 milliseconds > db_bench -benchmarks updaterandom -num 0 -use_existing_db true -report_open_timing true 2>&1 \| grep OpenDb OpenDb: 6.46376 milliseconds > db_bench -benchmarks updaterandom -num 1 -db /clustered_fs/db_bench > db_bench -benchmarks updaterandom -num 0 -db /clustered_fs/db_bench -use_existing_db true -report_open_timing true -readonly true 2>&1 \| grep OpenDb OpenDb: 3.79805 milliseconds > db_bench -benchmarks updaterandom -num 0 -db /clustered_fs/db_bench -use_existing_db true -report_open_timing true -use_secondary_db true 2>&1 \| grep OpenDb OpenDb: 3.00174 milliseconds > db_bench -benchmarks updaterandom -num 0 -db /clustered_fs/db_bench -use_existing_db true -report_open_timing true 2>&1 \| grep OpenDb OpenDb: 24.8732 milliseconds ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/8464 Reviewed By: hx235 Differential Revision: D29398096 Pulled By: zhichao-cao fbshipit-source-id: 8f05dc3284f084612a3f30234e39e1c37548f50c	3 years ago
Akanksha Mahajan	be8199cdb9	Run Merge with Integrated BlobDB in stress, crash and db_bench (#8461 ) Summary: Run Merge with Intergrated BlobDB in stress tests, crash tests and db_bench. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8461 Test Plan: 1. python3 -u tools/db_crashtest.py --simple whitebox ---use_merge=1 --enable_blob_files=1 2. ./db_bench --benchmarks="readwhilemerging" --merge_operator=uint64add --enable_blob_files=true Reviewed By: ltamasi Differential Revision: D29394824 Pulled By: akankshamahajan15 fbshipit-source-id: 0a8e492b13129673e088fb8af3402ab678bb473a	3 years ago
Peter (Stig) Edwards	75741eb0ce	Add more ops to: db_bench -report_file_operations (#8448 ) Summary: Hello and thanks for RocksDB, Here is a PR to add file deletes, renames and ```Flush()```, ```Sync()```, ```Fsync()``` and ```Close()``` to file ops report. The reason is to help tune RocksDB options when using an env/filesystem with high latencies for file level ("metadata") operations, typically seen during ```DB::Open``` (```db_bench -num 0``` also see https://github.com/facebook/rocksdb/pull/7203 where IOTracing does not trace ```DB::Open```). Before: ``` > db_bench -benchmarks updaterandom -num 0 -report_file_operations true ... Entries: 0 ... Num files opened: 12 Num Read(): 6 Num Append(): 8 Num bytes read: 6216 Num bytes written: 6289 ``` After: ``` > db_bench -benchmarks updaterandom -num 0 -report_file_operations true ... Entries: 0 ... Num files opened: 12 Num files deleted: 3 Num files renamed: 4 Num Flush(): 10 Num Sync(): 5 Num Fsync(): 1 Num Close(): 2 Num Read(): 6 Num Append(): 8 Num bytes read: 6216 Num bytes written: 6289 ``` Before: ``` > db_bench -benchmarks updaterandom -report_file_operations true ... Entries: 1000000 ... Num files opened: 18 Num Read(): 396339 Num Append(): 1000058 Num bytes read: 892030224 Num bytes written: 187569238 ``` After: ``` > db_bench -benchmarks updaterandom -report_file_operations true ... Entries: 1000000 ... Num files opened: 18 Num files deleted: 5 Num files renamed: 4 Num Flush(): 1000068 Num Sync(): 9 Num Fsync(): 1 Num Close(): 6 Num Read(): 396339 Num Append(): 1000058 Num bytes read: 892030224 Num bytes written: 187569238 ``` Another example showing how using ```DB::OpenForReadOnly``` reduces file operations compared to ```((Optimistic)Transaction)DB::Open```: ``` > db_bench -benchmarks updaterandom -num 1 > db_bench -benchmarks updaterandom -num 0 -use_existing_db true -readonly true -report_file_operations true ... Entries: 0 ... Num files opened: 8 Num files deleted: 0 Num files renamed: 0 Num Flush(): 0 Num Sync(): 0 Num Fsync(): 0 Num Close(): 0 Num Read(): 13 Num Append(): 0 Num bytes read: 374 Num bytes written: 0 ``` ``` > db_bench -benchmarks updaterandom -num 1 > db_bench -benchmarks updaterandom -num 0 -use_existing_db true -report_file_operations true ... Entries: 0 ... Num files opened: 14 Num files deleted: 3 Num files renamed: 4 Num Flush(): 14 Num Sync(): 5 Num Fsync(): 1 Num Close(): 3 Num Read(): 11 Num Append(): 10 Num bytes read: 7291 Num bytes written: 7357 ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/8448 Reviewed By: anand1976 Differential Revision: D29333818 Pulled By: zhichao-cao fbshipit-source-id: a06a8c87f799806462319115195b3e94faf5f542	3 years ago
Akanksha Mahajan	5ba1b6e549	Cache warming data blocks during flush (#8242 ) Summary: This PR prepopulates warm/hot data blocks which are already in memory into block cache at the time of flush. On a flush, the data block that is in memory (in memtables) get flushed to the device. If using Direct IO, additional IO is incurred to read this data back into memory again, which is avoided by enabling newly added option. Right now, this is enabled only for flush for data blocks. We plan to expand this option to cover compactions in the future and for other types of blocks. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8242 Test Plan: Add new unit test Reviewed By: anand1976 Differential Revision: D28521703 Pulled By: akankshamahajan15 fbshipit-source-id: 7219d6958821cedce689a219c3963a6f1a9d5f05	3 years ago
Peter Dillinger	865a25101d	Mark Ribbon filter and optimize_filters_for_memory as production (#8408 ) Summary: Marked the Ribbon filter and optimize_filters_for_memory features as production-ready, each enabling memory savings for Bloom-like filters. Use `NewRibbonFilterPolicy` in place of `NewBloomFilterPolicy` to use Ribbon filters instead of Bloom, or `ribbonfilter` in place of `bloomfilter` in configuration string. Some small refactoring in db_stress. Removed/refactored unused code in db_bench, in part preparing for future default possibly being different from "disabled." Pull Request resolved: https://github.com/facebook/rocksdb/pull/8408 Test Plan: Lots of prior automated, ad-hoc, and "real world" testing. Updated tests for new API names. Quick db_bench test: bloom fillrandom 77730 ops/sec rocksdb.block.cache.filter.bytes.insert COUNT : 89929384 ribbon fillrandom 71492 ops/sec rocksdb.block.cache.filter.bytes.insert COUNT : 64531384 Reviewed By: mrambacher Differential Revision: D29140805 Pulled By: pdillinger fbshipit-source-id: d742c922722421678f95ad85eeb0aaebc9f5e49a	3 years ago
mrambacher	281ac9c89e	Add CreateFrom methods to Env/FileSystem (#8174 ) Summary: - Added CreateFromString method to Env and FilesSystem to replace LoadEnv/Load. This method/signature is a precursor to making these classes extend Customizable. - Added CreateFromSystem to Env. This method standardizes creating an Env from the environment variables. Previously, some places would check TEST_ENV_URI and others would also check TEST_FS_URI. Now the code is more command/standardized. - Added CreateFromFlags to Env. These method allows Env to be create from string options (such as GFLAGS options) in a more standard way. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8174 Reviewed By: zhichao-cao Differential Revision: D28999603 Pulled By: mrambacher fbshipit-source-id: 88e6911e7e91f908458a7fe10a20e93ecbc275fb	3 years ago
Peter Dillinger	311a544c2a	Use deleters to label cache entries and collect stats (#8297 ) Summary: This change gathers and publishes statistics about the kinds of items in block cache. This is especially important for profiling relative usage of cache by index vs. filter vs. data blocks. It works by iterating over the cache during periodic stats dump (InternalStats, stats_dump_period_sec) or on demand when DB::Get(Map)Property(kBlockCacheEntryStats), except that for efficiency and sharing among column families, saved data from the last scan is used when the data is not considered too old. The new information can be seen in info LOG, for example: Block cache LRUCache@0x7fca62229330 capacity: 95.37 MB collections: 8 last_copies: 0 last_secs: 0.00178 secs_since: 0 Block cache entry stats(count,size,portion): DataBlock(7092,28.24 MB,29.6136%) FilterBlock(215,867.90 KB,0.888728%) FilterMetaBlock(2,5.31 KB,0.00544%) IndexBlock(217,180.11 KB,0.184432%) WriteBuffer(1,256.00 KB,0.262144%) Misc(1,0.00 KB,0%) And also through DB::GetProperty and GetMapProperty (here using ldb just for demonstration): $ ./ldb --db=/dev/shm/dbbench/ get_property rocksdb.block-cache-entry-stats rocksdb.block-cache-entry-stats.bytes.data-block: 0 rocksdb.block-cache-entry-stats.bytes.deprecated-filter-block: 0 rocksdb.block-cache-entry-stats.bytes.filter-block: 0 rocksdb.block-cache-entry-stats.bytes.filter-meta-block: 0 rocksdb.block-cache-entry-stats.bytes.index-block: 178992 rocksdb.block-cache-entry-stats.bytes.misc: 0 rocksdb.block-cache-entry-stats.bytes.other-block: 0 rocksdb.block-cache-entry-stats.bytes.write-buffer: 0 rocksdb.block-cache-entry-stats.capacity: 8388608 rocksdb.block-cache-entry-stats.count.data-block: 0 rocksdb.block-cache-entry-stats.count.deprecated-filter-block: 0 rocksdb.block-cache-entry-stats.count.filter-block: 0 rocksdb.block-cache-entry-stats.count.filter-meta-block: 0 rocksdb.block-cache-entry-stats.count.index-block: 215 rocksdb.block-cache-entry-stats.count.misc: 1 rocksdb.block-cache-entry-stats.count.other-block: 0 rocksdb.block-cache-entry-stats.count.write-buffer: 0 rocksdb.block-cache-entry-stats.id: LRUCache@0x7f3636661290 rocksdb.block-cache-entry-stats.percent.data-block: 0.000000 rocksdb.block-cache-entry-stats.percent.deprecated-filter-block: 0.000000 rocksdb.block-cache-entry-stats.percent.filter-block: 0.000000 rocksdb.block-cache-entry-stats.percent.filter-meta-block: 0.000000 rocksdb.block-cache-entry-stats.percent.index-block: 2.133751 rocksdb.block-cache-entry-stats.percent.misc: 0.000000 rocksdb.block-cache-entry-stats.percent.other-block: 0.000000 rocksdb.block-cache-entry-stats.percent.write-buffer: 0.000000 rocksdb.block-cache-entry-stats.secs_for_last_collection: 0.000052 rocksdb.block-cache-entry-stats.secs_since_last_collection: 0 Solution detail - We need some way to flag what kind of blocks each entry belongs to, preferably without changing the Cache API. One of the complications is that Cache is a general interface that could have other users that don't adhere to whichever convention we decide on for keys and values. Or we would pay for an extra field in the Handle that would only be used for this purpose. This change uses a back-door approach, the deleter, to indicate the "role" of a Cache entry (in addition to the value type, implicitly). This has the added benefit of ensuring proper code origin whenever we recognize a particular role for a cache entry; if the entry came from some other part of the code, it will use an unrecognized deleter, which we simply attribute to the "Misc" role. An internal API makes for simple instantiation and automatic registration of Cache deleters for a given value type and "role". Another internal API, CacheEntryStatsCollector, solves the problem of caching the results of a scan and sharing them, to ensure scans are neither excessive nor redundant so as not to harm Cache performance. Because code is added to BlocklikeTraits, it is pulled out of block_based_table_reader.cc into its own file. This is a reformulation of https://github.com/facebook/rocksdb/issues/8276, without the type checking option (could still be added), and with actual stat gathering. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8297 Test Plan: manual testing with db_bench, and a couple of basic unit tests Reviewed By: ltamasi Differential Revision: D28488721 Pulled By: pdillinger fbshipit-source-id: 472f524a9691b5afb107934be2d41d84f2b129fb	4 years ago
anand76	13232e11d4	Allow cache_bench/db_bench to use a custom secondary cache (#8312 ) Summary: This PR adds a ```-secondary_cache_uri``` option to the cache_bench and db_bench tools to allow the user to specify a custom secondary cache URI. The object registry is used to create an instance of the ```SecondaryCache``` object of the type specified in the URI. The main cache_bench code is packaged into a separate library, similar to db_bench. An example invocation of db_bench with a secondary cache URI - ```db_bench --env_uri=ws://ws.flash_sandbox.vll1_2/ -db=anand/nvm_cache_2 -use_existing_db=true -benchmarks=readrandom -num=30000000 -key_size=32 -value_size=256 -use_direct_reads=true -cache_size=67108864 -cache_index_and_filter_blocks=true -secondary_cache_uri='cachelibwrapper://filename=/home/anand76/nvm_cache/cache_file;size=2147483648;regionSize=16777216;admPolicy=random;admProbability=1.0;volatileSize=8388608;bktPower=20;lockPower=12' -partition_index_and_filters=true -duration=1800``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/8312 Reviewed By: zhichao-cao Differential Revision: D28544325 Pulled By: anand1976 fbshipit-source-id: 8f209b9af900c459dc42daa7a610d5f00176eeed	4 years ago
sdong	c3ff14e2c1	Hint temperature of bottommost level files to FileSystem (#8222 ) Summary: As the first part of the effort of having placing different files on different storage types, this change introduces several things: (1) An experimental interface in FileSystem that specify temperature to a new file created. (2) A test FileSystemWrapper, SimulatedHybridFileSystem, that simulates HDD for a file of "warm" temperature. (3) A simple experimental feature ColumnFamilyOptions.bottommost_temperature. RocksDB would pass this value to FileSystem when creating any bottommost file. (4) A db_bench parameter that applies the (2) and (3) to db_bench. The motivation of the change is to introduce minimal changes that allow us to evolve tiered storage development. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8222 Test Plan: ./db_bench --benchmarks=fillrandom --write_buffer_size=2000000 -max_bytes_for_level_base=20000000 -level_compaction_dynamic_level_bytes --reads=100 -compaction_readahead_size=20000000 --reads=100000 -num=10000000 followed by ./db_bench --benchmarks=readrandom,stats --write_buffer_size=2000000 -max_bytes_for_level_base=20000000 -simulate_hybrid_fs_file=/tmp/warm_file_list -level_compaction_dynamic_level_bytes -compaction_readahead_size=20000000 --reads=500 --threads=16 -use_existing_db --num=10000000 and see results as expected. Reviewed By: ajkr Differential Revision: D28003028 fbshipit-source-id: 4724896d5205730227ba2f17c3fecb11261744ce	4 years ago
David Carlier	728e5f5750	db_bench_tool: basic sys infos for FreeBSD. (#8169 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8169 Reviewed By: riversand963 Differential Revision: D27672457 Pulled By: ajkr fbshipit-source-id: b40a7ad5d09a754154f28c2574ef9f77c8a131bb	4 years ago
Yanqin Jin	2d8518f5ea	Reset pinnable slice before using it in Get() (#8154 ) Summary: Fixes https://github.com/facebook/rocksdb/issues/6548. If we do not reset the pinnable slice before calling get, we will see the following assertion failure while running the test with multiple column families. ``` db_bench: ./include/rocksdb/slice.h:168: void rocksdb::PinnableSlice::PinSlice(const rocksdb::Slice&, rocksdb::Cleanable*): Assertion `!pinned_' failed. ``` This happens in `BlockBasedTable::Get()`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8154 Test Plan: ./db_bench --benchmarks=fillseq -num_column_families=3 ./db_bench --benchmarks=readrandom -use_existing_db=1 -num_column_families=3 Reviewed By: ajkr Differential Revision: D27587589 Pulled By: riversand963 fbshipit-source-id: 7379e7649ba40f046d6a4014c9ad629cb3f9a786	4 years ago
mrambacher	1be3867689	Fix check in db_bench for num shard bits to match check in LRUCache (#8110 ) Summary: The check in db_bench for table_cache_numshardbits was 0 < bits <= 20, whereas the check in LRUCache was 0 < bits < 20. Changed the two values to match to avoid a crash in db_bench on a null cache. Fixes https://github.com/facebook/rocksdb/issues/7393 Pull Request resolved: https://github.com/facebook/rocksdb/pull/8110 Reviewed By: zhichao-cao Differential Revision: D27353522 Pulled By: mrambacher fbshipit-source-id: a414bd23b5bde1f071146b34cfca5e35c02de869	4 years ago
junhan lee	06bb45a65a	fix typo (#8088 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8088 Reviewed By: ajkr Differential Revision: D27270378 Pulled By: zhichao-cao fbshipit-source-id: 05af12c63855d00cc57bab9866fc8193c03a404e	4 years ago
Zhichao Cao	dd0447ae2c	Add new Append API with DataVerificationInfo to Env WritableFile (#8071 ) Summary: Add the new Append and PositionedAppend API to env WritableFile. User is able to benefit from the write checksum handoff API when using the legacy Env classes. FileSystem already implemented the checksum handoff API. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8071 Test Plan: make check, added new unit test. Reviewed By: anand1976 Differential Revision: D27177043 Pulled By: zhichao-cao fbshipit-source-id: 430c8331fc81099fa6d00f4fff703b68b9e8080e	4 years ago
Mark Callaghan	326670d265	Add new db_bench --benchmarks options for controlling compaction (#8027 ) Summary: The new options are: * compact0 - compact L0 into L1 using one thread * compact1 - compact L1 into L2 using one thread * flush - flush memtable * waitforcompaction - wait for compaction to finish These are useful for reproducible benchmarks to help get the LSM tree shape into a deterministic state. I wrote about this at: http://smalldatum.blogspot.com/2021/02/read-only-benchmarks-with-lsm-are.html Pull Request resolved: https://github.com/facebook/rocksdb/pull/8027 Reviewed By: riversand963 Differential Revision: D27053861 Pulled By: ajkr fbshipit-source-id: 1646f35584a3db03740fbeb47d91c3f00fb35d6e	4 years ago
mrambacher	3dff28cf9b	Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033 ) Summary: For performance purposes, the lower level routines were changed to use a SystemClock* instead of a std::shared_ptr<SystemClock>. The shared ptr has some performance degradation on certain hardware classes. For most of the system, there is no risk of the pointer being deleted/invalid because the shared_ptr will be stored elsewhere. For example, the ImmutableDBOptions stores the Env which has a std::shared_ptr<SystemClock> in it. The SystemClock* within the ImmutableDBOptions is essentially a "short cut" to gain access to this constant resource. There were a few classes (PeriodicWorkScheduler?) where the "short cut" property did not hold. In those cases, the shared pointer was preserved. Using db_bench readrandom perf_level=3 on my EC2 box, this change performed as well or better than 6.17: 6.17: readrandom : 28.046 micros/op 854902 ops/sec; 61.3 MB/s (355999 of 355999 found) 6.18: readrandom : 32.615 micros/op 735306 ops/sec; 52.7 MB/s (290999 of 290999 found) PR: readrandom : 27.500 micros/op 871909 ops/sec; 62.5 MB/s (367999 of 367999 found) (Note that the times for 6.18 are prior to revert of the SystemClock). Pull Request resolved: https://github.com/facebook/rocksdb/pull/8033 Reviewed By: pdillinger Differential Revision: D27014563 Pulled By: mrambacher fbshipit-source-id: ad0459eba03182e454391b5926bf5cdd45657b67	4 years ago
Yanqin Jin	1f11d07f24	Enable compact filter for blob in dbstress and dbbench (#8011 ) Summary: As title. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8011 Test Plan: ``` ./db_bench -enable_blob_files=1 -use_keep_filter=1 -disable_auto_compactions=1 /db_stress -enable_blob_files=1 -enable_compaction_filter=1 -acquire_snapshot_one_in=0 -compact_range_one_in=0 -iterpercent=0 -test_batches_snapshots=0 -readpercent=10 -prefixpercent=20 -writepercent=55 -delpercent=15 -continuous_verification_interval=0 ``` Reviewed By: ltamasi Differential Revision: D26736061 Pulled By: riversand963 fbshipit-source-id: 1c7834903c28431ce23324c4f259ed71255614e2	4 years ago
Andrew Kryczka	d904233d2f	Limit buffering for collecting samples for compression dictionary (#7970 ) Summary: For dictionary compression, we need to collect some representative samples of the data to be compressed, which we use to either generate or train (when `CompressionOptions::zstd_max_train_bytes > 0`) a dictionary. Previously, the strategy was to buffer all the data blocks during flush, and up to the target file size during compaction. That strategy allowed us to randomly pick samples from as wide a range as possible that'd be guaranteed to land in a single output file. However, some users try to make huge files in memory-constrained environments, where this strategy can cause OOM. This PR introduces an option, `CompressionOptions::max_dict_buffer_bytes`, that limits how much data blocks are buffered before we switch to unbuffered mode (which means creating the per-SST dictionary, writing out the buffered data, and compressing/writing new blocks as soon as they are built). It is not strict as we currently buffer more than just data blocks -- also keys are buffered. But it does make a step towards giving users predictable memory usage. Related changes include: - Changed sampling for dictionary compression to select unique data blocks when there is limited availability of data blocks - Made use of `BlockBuilder::SwapAndReset()` to save an allocation+memcpy when buffering data blocks for building a dictionary - Changed `ParseBoolean()` to accept an input containing characters after the boolean. This is necessary since, with this PR, a value for `CompressionOptions::enabled` is no longer necessarily the final component in the `CompressionOptions` string. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7970 Test Plan: - updated `CompressionOptions` unit tests to verify limit is respected (to the extent expected in the current implementation) in various scenarios of flush/compaction to bottommost/non-bottommost level - looked at jemalloc heap profiles right before and after switching to unbuffered mode during flush/compaction. Verified memory usage in buffering is proportional to the limit set. Reviewed By: pdillinger Differential Revision: D26467994 Pulled By: ajkr fbshipit-source-id: 3da4ef9fba59974e4ef40e40c01611002c861465	4 years ago
Levi Tamasi	0743eba0c4	Add support for the integrated BlobDB to db_bench (#7956 ) Summary: The patch adds the configuration options of the new BlobDB implementation to `db_bench` and adjusts the help messages of the old (`StackableDB`-based) BlobDB's options to make it clear which implementation they pertain to. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7956 Test Plan: Ran `make check` and `db_bench` with the new options. Reviewed By: jay-zhuang Differential Revision: D26384808 Pulled By: ltamasi fbshipit-source-id: b4405bb2c56cfd3506d4c32e3329c08dfdf69c94	4 years ago
David CARLIER	14fbb43f3e	db_bench: dump cpu info for Mac. (#7932 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/7932 Reviewed By: jay-zhuang Differential Revision: D26316480 Pulled By: zhichao-cao fbshipit-source-id: 3e002e49fcb7f60bc9270550a6b3e182fe197551	4 years ago
anand76	4ee991b1e6	Cleanup multiple DBs after running db_bench in multi-DB mode (#7891 ) Summary: Currently, db_bench cleanup only deletes the main DB, if there's one. Multiple DBs that are opened when --num_multi_db is specified are not deleted, which can lead to crashes due to running compaction threads on process exit. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7891 Test Plan: Run regression test Reviewed By: jay-zhuang Differential Revision: D26049914 Pulled By: anand1976 fbshipit-source-id: acef2821001ca5e208a96a6a273c724e56353316	4 years ago
anand76	d7738666b0	Fix db_bench duration for multireadrandom benchmark (#7817 ) Summary: The multireadrandom benchmark, when run for a specific number of reads (--reads argument), should base the duration on the actual number of keys read rather than number of batches. Tests: Run db_bench multireadrandom benchmark Pull Request resolved: https://github.com/facebook/rocksdb/pull/7817 Reviewed By: zhichao-cao Differential Revision: D25717230 Pulled By: anand1976 fbshipit-source-id: 13f4d8162268cf9a34918655e60302d0aba3864b	4 years ago
Peter Dillinger	239d17a19c	Support optimize_filters_for_memory for Ribbon filter (#7774 ) Summary: Primarily this change refactors the optimize_filters_for_memory code for Bloom filters, based on malloc_usable_size, to also work for Ribbon filters. This change also replaces the somewhat slow but general BuiltinFilterBitsBuilder::ApproximateNumEntries with implementation-specific versions for Ribbon (new) and Legacy Bloom (based on a recently deleted version). The reason is to emphasize speed in ApproximateNumEntries rather than 100% accuracy. Justification: ApproximateNumEntries (formerly CalculateNumEntry) is only used by RocksDB for range-partitioned filters, called each time we start to construct one. (In theory, it should be possible to reuse the estimate, but the abstractions provided by FilterPolicy don't really make that workable.) But this is only used as a heuristic estimate for hitting a desired partitioned filter size because of alignment to data blocks, which have various numbers of unique keys or prefixes. The two factors lead us to prioritize reasonable speed over 100% accuracy. optimize_filters_for_memory adds extra complication, because precisely calculating num_entries for some allowed number of bytes depends on state with optimize_filters_for_memory enabled. And the allocator-agnostic implementation of optimize_filters_for_memory, using malloc_usable_size, means we would have to actually allocate memory, many times, just to precisely determine how many entries (keys) could be added and stay below some size budget, for the current state. (In a draft, I got this working, and then realized the balance of speed vs. accuracy was all wrong.) So related to that, I have made CalculateSpace, an internal-only API only used for testing, non-authoritative also if optimize_filters_for_memory is enabled. This simplifies some code. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7774 Test Plan: unit test updated, and for FilterSize test, range of tested values is greatly expanded (still super fast) Also tested `db_bench -benchmarks=fillrandom,stats -bloom_bits=10 -num=1000000 -partition_index_and_filters -format_version=5 [-optimize_filters_for_memory] [-use_ribbon_filter]` with temporary debug output of generated filter sizes. Bloom+optimize_filters_for_memory: 1 Filter size: 197 (224 in memory) 134 Filter size: 3525 (3584 in memory) 107 Filter size: 4037 (4096 in memory) Total on disk: 904,506 Total in memory: 918,752 Ribbon+optimize_filters_for_memory: 1 Filter size: 3061 (3072 in memory) 110 Filter size: 3573 (3584 in memory) 58 Filter size: 4085 (4096 in memory) Total on disk: 633,021 (-30.0%) Total in memory: 634,880 (-30.9%) Bloom (no offm): 1 Filter size: 261 (320 in memory) 1 Filter size: 3333 (3584 in memory) 240 Filter size: 3717 (4096 in memory) Total on disk: 895,674 (-1% on disk vs. +offm; known tolerable overhead of offm) Total in memory: 986,944 (+7.4% vs. +offm) Ribbon (no offm): 1 Filter size: 2949 (3072 in memory) 1 Filter size: 3381 (3584 in memory) 167 Filter size: 3701 (4096 in memory) Total on disk: 624,397 (-30.3% vs. Bloom) Total in memory: 690,688 (-30.0% vs. Bloom) Note that optimize_filters_for_memory is even more effective for Ribbon filter than for cache-local Bloom, because it can close the unused memory gap even tighter than Bloom filter, because of 16 byte increments for Ribbon vs. 64 byte increments for Bloom. Reviewed By: jay-zhuang Differential Revision: D25592970 Pulled By: pdillinger fbshipit-source-id: 606fdaa025bb790d7e9c21601e8ea86e10541912	4 years ago
Mammo, Mulugeta	1861de455e	Add arena_block_size flag to db_bench (#7654 ) Summary: db_bench currently does not allow overriding the default `arena_block_size `calculation ([memtable size/8](https://github.com/facebook/rocksdb/blob/master/db/column_family.cc#L216)). For memtables whose size is in gigabytes, the `arena_block_size` defaults to hundreds of megabytes (affecting performance). Exposing this option in db_bench would allow us to test the workloads with various `arena_block_size` values. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7654 Reviewed By: jay-zhuang Differential Revision: D24996812 Pulled By: ajkr fbshipit-source-id: a5e3d2c83d9f89e1bb8382f2e8dd476c79e33bef	4 years ago
Yanqin Jin	394210f280	Remove unused includes (#7604 ) Summary: This is a PR generated semi-automatically by an internal tool to remove unused includes and `using` statements. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7604 Test Plan: make check Reviewed By: ajkr Differential Revision: D24579392 Pulled By: riversand963 fbshipit-source-id: c4bfa6c6b08da1de186690d37eb73d8fff45aecd	4 years ago
Levi Tamasi	30fb9dd50f	Introduce a helper method UncompressData (#7434 ) Summary: The patch introduces a helper method in `util/compression.h` called `UncompressData` that dispatches calls to the correct uncompression method based on type, and changes `UncompressBlockContentsForCompressionType` and `Benchmark::Uncompress` in `db_bench` so they are implemented in terms of the new method. This eliminates some code duplication. (`Benchmark::Compress` is also updated to use the previously introduced `CompressData` helper.) In addition, the patch brings the implementation of `Snappy_Uncompress` into sync with the other uncompression methods by making the method compute the buffer size and allocate the buffer itself. Finally, the patch eliminates some potentially risky back-and-forth conversions between various unsigned and signed integer types by exposing the size of the allocated buffer as a `size_t` instead of an `int`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7434 Test Plan: `make check` `./db_bench -benchmarks=compress,uncompress --compression_type ...` Reviewed By: riversand963 Differential Revision: D23900011 Pulled By: ltamasi fbshipit-source-id: b25df63ceec4639889be94acb22eb53e530c54e0	4 years ago
Yanqin Jin	a28df7a75a	Add basic support for user-defined timestamp to db_bench (#7389 ) Summary: Update db_bench so that we can run it with user-defined timestamp. Currently, only 64-bit timestamp is supported, while others are disabled by assertion. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7389 Test Plan: ./db_bench -benchmarks=fillseq,fillrandom,readrandom,readsequential,....., -user_timestamp_size=8 Reviewed By: ltamasi Differential Revision: D23720830 Pulled By: riversand963 fbshipit-source-id: 486eacbb82de9a5441e79a61bfa9beef6581608a	4 years ago
mrambacher	7d472accdc	Bring the Configurable options together (#5753 ) Summary: This PR merges the functionality of making the ColumnFamilyOptions, TableFactory, and DBOptions into Configurable into a single PR, resolving any merge conflicts Pull Request resolved: https://github.com/facebook/rocksdb/pull/5753 Reviewed By: ajkr Differential Revision: D23385030 Pulled By: zhichao-cao fbshipit-source-id: 8b977a7731556230b9b8c5a081b98e49ee4f160a	4 years ago
Peter Dillinger	9de912de3f	Fix some errors showing up in Travis builds (#7359 ) Summary: Also enables a pull request to trigger all the Travis configurations by writing FULL_CI in the commit message. (See what I did there?) First issue make: *** No rule to make target 'jl/util/crc32c_ppc_asm.o', needed by 'rocksdbjava'. Stop. Second issue tools/db_bench_tool.cc:5514:38: error: ‘gen_exp.rocksdb::Benchmark::GenerateTwoTermExpKeys::keyrange_size_’ may be used uninitialized in this function Pull Request resolved: https://github.com/facebook/rocksdb/pull/7359 Test Plan: CI Reviewed By: zhichao-cao Differential Revision: D23582132 Pulled By: pdillinger fbshipit-source-id: 06d794673fd522ba11cf6398385387e6bd97ef89	4 years ago
Hans Holmberg	679a413f11	Close databases on benchmark error exits in db_bench (#7327 ) Summary: Delete database instances to make sure there are no loose threads running before exit(). This fixes segfaults seen when running workloads through CompositeEnvs with custom file systems. For further background on the issues arising when using CompositeEnvs, see the discussion in: https://github.com/facebook/rocksdb/pull/6878 Pull Request resolved: https://github.com/facebook/rocksdb/pull/7327 Reviewed By: cheng-chang Differential Revision: D23433244 Pulled By: ajkr fbshipit-source-id: 4e19cf2067e3fe68c2a3fe1823f24b4091336bbe	4 years ago
Hans Holmberg	2a0d3c7054	Add a file system parameter: --fs_uri to db_stress and db_bench (#6878 ) Summary: This pull request adds the parameter --fs_uri to db_bench and db_stress, creating a composite env combining the default env with a specified registered rocksdb file system. This makes it easier to develop and test new RocksDB FileSystems. The pull request also registers the posix file system for testing purposes. Examples: ``` $./db_bench --fs_uri=posix:// --benchmarks=fillseq $./db_stress --fs_uri=zenfs://nullb1 ``` zenfs is a RocksDB FileSystem I'm developing to add support for zoned block devices, and in that case the zoned block device is specified in the uri (a zoned null block device in the above example). Pull Request resolved: https://github.com/facebook/rocksdb/pull/6878 Reviewed By: siying Differential Revision: D23023063 Pulled By: ajkr fbshipit-source-id: 8b3fe7193ce45e683043b021779b7a4d547af247	4 years ago
Aaron Kabcenell	56ed601df3	Compaction Read/Write Stats by Compaction Type (#7165 ) Summary: Adds compaction statistics (total bytes read and written) for compactions that occur for delete-triggered, periodic, and TTL compaction reasons. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7165 Test Plan: TTL and periodic can be checked by runnning db_bench with the options activated: /db_bench --benchmarks="fillrandom,stats" --statistics --num=10000000 -base_background_compactions=16 -periodic_compaction_seconds=1 ./db_bench --benchmarks="fillrandom,stats" --statistics --num=10000000 -base_background_compactions=16 -fifo_compaction_ttl=1 Setting the time to one second causes non-zero bytes read/written for those compaction reasons. Disabling them or setting them to times longer than the test run length causes the stats to return to zero as expected. Delete-triggered compaction counting is tested in DBTablePropertiesTest.DeletionTriggeredCompactionMarking Reviewed By: ajkr Differential Revision: D22693050 Pulled By: akabcenell fbshipit-source-id: d15cef4d94576f703015c8942d5f0d492f69401d	4 years ago

1 2 3 4 5 ...

291 Commits (b9e98728192d56cd26008daa6f80a7fa4c1e1225)