Summary:
Give a name for every kill point, and allow users to disable some kill points based on prefixes. The kill points can be passed by db_stress through a command line paramter. This provides a way for users to boost the chance of triggering low frequency kill points
This allow follow up changes in crash test scripts to improve crash test coverage.
Test Plan:
Manually run db_stress with variable values of --kill_random_test and --kill_prefix_blacklist. Like this:
--kill_random_test=2 --kill_prefix_blacklist=Posix,WritableFileWriter::Append,WritableFileWriter::WriteBuffered,WritableFileWriter::Sync
Reviewers: igor, kradhakrishnan, rven, IslamAbdelRahman, yhchiang
Reviewed By: yhchiang
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D48735
Summary:
Fixed the following tsan warning in db_stress.cc
WARNING: ThreadSanitizer: data race (pid=3163194)
Read of size 8 at 0x7fd1797cb518 by thread T32:
#0 VerifyDb tools/db_stress.cc:1731 (db_stress+0x000000040674)
#1 rocksdb::StressTest::ThreadBody(void*) tools/db_stress.cc:1191 (db_stress+0x0000000625a9)
#2 StartThreadWrapper util/env_posix.cc:1648 (db_stress+0x00000028bbbd)
Previous write of size 8 at 0x7fd1797cb518 by thread T31:
#0 VerifyDb tools/db_stress.cc:1726 (db_stress+0x00000004072a)
#1 rocksdb::StressTest::ThreadBody(void*) tools/db_stress.cc:1191 (db_stress+0x0000000625a9)
#2 StartThreadWrapper util/env_posix.cc:1648 (db_stress+0x00000028bbbd)
The cause is that in VerifyDb(), the static local const variable long max_key
can be read and written at the same time. This patch fixed it by making it
non-static.
Test Plan: db_stress
Reviewers: igor, sdong, IslamAbdelRahman, anthony
Reviewed By: anthony
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D47703
Summary:
This patch fixes#7460559. It introduces SingleDelete as a new database
operation. This operation can be used to delete keys that were never
overwritten (no put following another put of the same key). If an overwritten
key is single deleted the behavior is undefined. Single deletion of a
non-existent key has no effect but multiple consecutive single deletions are
not allowed (see limitations).
In contrast to the conventional Delete() operation, the deletion entry is
removed along with the value when the two are lined up in a compaction. Note:
The semantics are similar to @igor's prototype that allowed to have this
behavior on the granularity of a column family (
https://reviews.facebook.net/D42093 ). This new patch, however, is more
aggressive when it comes to removing tombstones: It removes the SingleDelete
together with the value whenever there is no snapshot between them while the
older patch only did this when the sequence number of the deletion was older
than the earliest snapshot.
Most of the complex additions are in the Compaction Iterator, all other changes
should be relatively straightforward. The patch also includes basic support for
single deletions in db_stress and db_bench.
Limitations:
- Not compatible with cuckoo hash tables
- Single deletions cannot be used in combination with merges and normal
deletions on the same key (other keys are not affected by this)
- Consecutive single deletions are currently not allowed (and older version of
this patch supported this so it could be resurrected if needed)
Test Plan: make all check
Reviewers: yhchiang, sdong, rven, anthony, yoshinorim, igor
Reviewed By: igor
Subscribers: maykov, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D43179
Summary: Add ZSTD compression type. The same way as adding LZ4.
Test Plan: run all tests. Generate files in db_bench. Make sure reads succeed. But the SST files cannot be opened in older versions. Also some other adhoc tests.
Reviewers: rven, anthony, IslamAbdelRahman, kradhakrishnan, igor
Reviewed By: igor
Subscribers: MarkCallaghan, maykov, yoshinorim, leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D45747
Summary:
Up until this point we had DbOptions.num_subcompactions, but
it is semantically more correct to call this max_subcompactions since
we will schedule *up to* DbOptions.max_subcompactions smaller compactions
at a time during a compaction job.
I also added a --subcompactions option to db_bench
Test Plan: make all make check
Reviewers: sdong, igor, anthony, yhchiang
Reviewed By: yhchiang
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D45069
Summary:
In prepration for running multiple threads at the same time during
a compaction job, this patch assigns each subcompaction its own state
(instead of sharing the one global CompactionState). Each subcompaction then
uses this state to update its statistics, keep track of its snapshots, etc.
during the course of execution. Then at the end of all the executions the
statistics are aggregated across the subcompactions so that the final result
is the same as if only one larger compaction had run.
Test Plan: ./db_test ./db_compaction_test ./compaction_job_test
Reviewers: sdong, anthony, igor, noetzli, yhchiang
Reviewed By: yhchiang
Subscribers: MarkCallaghan, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D43239
Summary:
While working on single delete support for db_bench, I realized that
db_bench/db_stress contain a bunch of duplicate code related to
copmression and found some typos. This patch removes duplicate code,
typos and a redundant #ifndef in internal_stats.cc.
Test Plan: make db_stress && make db_bench && ./db_bench --benchmarks=compress,uncompress
Reviewers: yhchiang, sdong, rven, anthony, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D43965
Summary:
When we first started, max_background_flushes was 0 by default and compaction thread was executing flushes (since there was no flush thread). Then, we switched the default max_background_flushes to 1. However, we still support the case where there is no flush thread and flushes are done in compaction. This is making our code a bit more complicated. By not supporting this use-case we can make our code simpler.
We have a special case that when you set max_background_flushes to 0, we
schedule the flush to execute on the compaction thread.
Test Plan: make check (there might be some unit tests that depend on this behavior)
Reviewers: IslamAbdelRahman, yhchiang, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D41931
Summary: This option is guarding the feature implemented 2 and a half years ago: D8991. The feature was enabled by default back then and has been running without issues. There is no reason why any client would turn this feature off. I found no reference in fbcode.
Test Plan: none
Reviewers: sdong, yhchiang, anthony, dhruba
Reviewed By: dhruba
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D42063
Summary: This helps Windows port to format their changes, as discussed. Might have formatted some other codes too becasue last 10 commits include more.
Test Plan: Build it.
Reviewers: anthony, IslamAbdelRahman, kradhakrishnan, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D41961
Summary: Make RocksDb build and run on Windows to be functionally
complete and performant. All existing test cases run with no
regressions. Performance numbers are in the pull-request.
Test plan: make all of the existing unit tests pass, obtain perf numbers.
Co-authored-by: Praveen Rao praveensinghrao@outlook.com
Co-authored-by: Sherlock Huang baihan.huang@gmail.com
Co-authored-by: Alex Zinoviev alexander.zinoviev@me.com
Co-authored-by: Dmitri Smirnov dmitrism@microsoft.com
Summary:
EventListener::OnFlushCompleted() now passes a structure instead
of a list of parameters. This minimizes the API change in the
future.
Test Plan:
listener_test
compact_files_test
example/compact_files_example
Reviewers: kradhakrishnan, sdong, IslamAbdelRahman, rven, igor
Reviewed By: rven, igor
Subscribers: IslamAbdelRahman, rven, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D39543
Summary:
Add EventListener::OnTableFileCreated(), which will be called
when a table file is created. This patch is part of the
EventLogger and EventListener integration.
Test Plan: Augment existing test in db/listener_test.cc
Reviewers: anthony, kradhakrishnan, rven, igor, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D38865
Summary:
Fixed db_stress by correcting the verification of column family
names in the Listener of db_stress
Test Plan: db_stress
Reviewers: igor, sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D39255
Summary: Fixed a compile warning in db_stress in NDEBUG mode.
Test Plan: make OPT=-DNDEBUG db_stress
Reviewers: sdong, anthony
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D39213
Summary:
Fixed the following compile warning in db_stress:
error: 'OnCompactionCompleted' overrides a member function but is not marked 'override' [-Werror,-Winconsistent-missing-override]
Test Plan: make db_stress
Reviewers: sdong, igor, anthony
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D39207
Summary:
For transactions, we are using the memtables to validate that there are no write conflicts. But after flushing, we don't have any memtables, and transactions could fail to commit. So we want to someone keep around some extra history to use for conflict checking. In addition, we want to provide a way to increase the size of this history if too many transactions fail to commit.
After chatting with people, it seems like everyone prefers just using Memtables to store this history (instead of a separate history structure). It seems like the best place for this is abstracted inside the memtable_list. I decide to create a separate list in MemtableListVersion as using the same list complicated the flush/installalflushresults logic too much.
This diff adds a new parameter to control how much memtable history to keep around after flushing. However, it sounds like people aren't too fond of adding new parameters. So I am making the default size of flushed+not-flushed memtables be set to max_write_buffers. This should not change the maximum amount of memory used, but make it more likely we're using closer the the limit. (We are now postponing deleting flushed memtables until the max_write_buffer limit is reached). So while we might use more memory on average, we are still obeying the limit set (and you could argue it's better to go ahead and use up memory now instead of waiting for a write stall to happen to test this limit).
However, if people are opposed to this default behavior, we can easily set it to 0 and require this parameter be set in order to use transactions.
Test Plan: Added a xfunc test to play around with setting different values of this parameter in all tests. Added testing in memtablelist_test and planning on adding more testing here.
Reviewers: sdong, rven, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D37443
Summary:
This diff contains trivial fixes for 6 scan-build warnings:
**db/c_test.c**
`db` variable is never read. Removed assignment.
scan-build report:
http://home.fburl.com/~sugak/latest20/report-9b77d2.html#EndPath
**db/db_iter.cc**
`skipping` local variable is assigned to false. Then in the next switch block the only "non return" case assign `skipping` to true, the rest cases don't use it and all do return.
scan-build report:
http://home.fburl.com/~sugak/latest20/report-13fca7.html#EndPath
**db/log_reader.cc**
In `bool Reader::SkipToInitialBlock()` `offset_in_block` local variable is assigned to 0 `if (offset_in_block > kBlockSize - 6)` and then never used. Removed the assignment and renamed it to `initial_offset_in_block` to avoid confusion.
scan-build report:
http://home.fburl.com/~sugak/latest20/report-a618dd.html#EndPath
In `bool Reader::ReadRecord(Slice* record, std::string* scratch)` local variable `in_fragmented_record` in switch case `kFullType` block is assigned to false and then does `return` without use. In the other switch case `kFirstType` block the same `in_fragmented_record` is assigned to false, but later assigned to true without prior use. Removed assignment for both cases.
scan-build reprots:
http://home.fburl.com/~sugak/latest20/report-bb86b0.html#EndPathhttp://home.fburl.com/~sugak/latest20/report-a975be.html#EndPath
**table/plain_table_key_coding.cc**
Local variable `user_key_size` is assigned when declared. But then in both places where it is used assigned to `static_cast<uint32_t>(key.size() - 8)`. Changed to initialize the variable to the proper value in declaration.
scan-build report:
http://home.fburl.com/~sugak/latest20/report-9e6b86.html#EndPath
**tools/db_stress.cc**
Missing `break` in switch case block. This seems to be a bug. Added missing `break`.
Test Plan:
Make sure all tests are passing and scan-build does not report 'Dead assignment' and 'Dead initialization' bugs.
```lang=bash
% make check
% make analyze
```
Reviewers: meyering, igor, kradhakrishnan, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D33795
Summary:
Fix the following compile warning in db_stress.cc on Mac
tools/db_stress.cc:1688:52: error: format specifies type 'unsigned long' but the argument has type '::google::uint64' (aka 'unsigned long long') [-Werror,-Wformat]
fprintf(stdout, "DB-write-buffer-size: %lu\n", FLAGS_db_write_buffer_size);
~~~ ^~~~~~~~~~~~~~~~~~~~~~~~~~
%llu
Test Plan:
make
Summary:
Introduces a new class for managing write buffer memory across column
families. We supplement ColumnFamilyOptions::write_buffer_size with
ColumnFamilyOptions::write_buffer, a shared pointer to a WriteBuffer
instance that enforces memory limits before flushing out to disk.
Test Plan: Added SharedWriteBuffer unit test to db_test.cc
Reviewers: sdong, rven, ljin, igor
Reviewed By: igor
Subscribers: tnovak, yhchiang, dhruba, xjin, MarkCallaghan, yoshinorim
Differential Revision: https://reviews.facebook.net/D22581
Summary:
In some environment such as android, the c++ library does not have
std::to_string. This path adds rocksdb::ToString(), which wraps std::to_string
when std::to_string is not available, and implements std::to_string
in the other case.
Test Plan:
make dbg -j32
./db_test
make clean
make dbg OPT=-DOS_ANDROID -j32
./db_test
Reviewers: ljin, sdong, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D29181
Summary:
Make db_stress built for ROCKSDB_LITE.
The test doesn't pass tough. It seg fault quickly. But I took a look and it doesn't seem to be related to lite version. Likely to be a bug inside RocksDB.
Test Plan: make db_stress
Reviewers: yhchiang, rven, ljin, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D28797
Summary:
We need to turn on -Wshorten-64-to-32 for mobile. See D1671432 (internal phabricator) for details.
This diff turns on the warning flag and fixes all the errors. There were also some interesting errors that I might call bugs, especially in plain table. Going forward, I think it makes sense to have this flag turned on and be very very careful when converting 64-bit to 32-bit variables.
Test Plan: compiles
Reviewers: ljin, rven, yhchiang, sdong
Reviewed By: yhchiang
Subscribers: bobbaldwin, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D28689
Summary: Previously I made `make check` work with -Wshadow, but there are some tools that are not compiled using `make check`.
Test Plan: make all
Reviewers: yhchiang, rven, ljin, sdong
Reviewed By: ljin, sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D28497
Summary: as title
Test Plan: ./db_test
Reviewers: sdong, yhchiang, rven, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D28269
Summary:
Added two flags which operate as follows:
in_place_update: enable in_place_update for default column family
set_in_place_one_in: toggles the value of the option inplace_update_support with a probability of 1/N
Test Plan:
Run db_stress with the two flags above set.
Specifically tried in_place_update set to true and set_in_place_one_in set to 10,000.
Reviewers: ljin, igor, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D28029
Summary: Allow SetOptions() during db_stress test
Test Plan: make crash_test
Reviewers: sdong, yhchiang, rven, igor
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D25497
Summary:
1. Make filter_block.h a base class. Derive block_based_filter_block and full_filter_block. The previous one is the traditional filter block. The full_filter_block is newly added. It would generate a filter block that contain all the keys in SST file.
2. When querying a key, table would first check if full_filter is available. If not, it would go to the exact data block and check using block_based filter.
3. User could choose to use full_filter or tradional(block_based_filter). They would be stored in SST file with different meta index name. "filter.filter_policy" or "full_filter.filter_policy". Then, Table reader is able to know the fllter block type.
4. Some optimizations have been done for full_filter_block, thus it requires a different interface compared to the original one in filter_policy.h.
5. Actual implementation of filter bits coding/decoding is placed in util/bloom_impl.cc
Benchmark: base commit 1d23b5c470
Command:
db_bench --db=/dev/shm/rocksdb --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --write_buffer_size=134217728 --max_write_buffer_number=2 --target_file_size_base=33554432 --max_bytes_for_level_base=1073741824 --verify_checksum=false --max_background_compactions=4 --use_plain_table=0 --memtablerep=prefix_hash --open_files=-1 --mmap_read=1 --mmap_write=0 --bloom_bits=10 --bloom_locality=1 --memtable_bloom_bits=500000 --compression_type=lz4 --num=393216000 --use_hash_search=1 --block_size=1024 --block_restart_interval=16 --use_existing_db=1 --threads=1 --benchmarks=readrandom —disable_auto_compactions=1
Read QPS increase for about 30% from 2230002 to 2991411.
Test Plan:
make all check
valgrind db_test
db_stress --use_block_based_filter = 0
./auto_sanity_test.sh
Reviewers: igor, yhchiang, ljin, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D20979
Summary:
1. assert db->Put to be true in db_stress
2. begin column family with name "1".
Test Plan: 1. ./db_stress
Reviewers: ljin, yhchiang, dhruba, sdong, igor
Reviewed By: sdong, igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D22659
Summary:
I will move compression related options in a separate diff since this
diff is already pretty lengthy.
I guess I will also need to change JNI accordingly :(
Test Plan: make all check
Reviewers: yhchiang, igor, sdong
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D21915
Summary:
All public headers need to be under `include/rocksdb` directory. Otherwise, clients include our header files like this:
#include <rocksdb/db.h>
#include <utilities/backupable_db.h> // still our public header!
Also, internally, we include:
#include "utilities/backupable/backupable_db.h" // internal header
#include "utilities/backupable_db.h" // public header
which is confusing.
This way, when we install rocksdb as a system library, we can just copy `include/rocksdb` directory to system's header files. We can't really copy `utilities` directory to system's header files.
Test Plan: compiles
Reviewers: dhruba, ljin, yhchiang, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D20409
Summary:
Now that the arena is used to allocate space for hashskiplist's bucket. The bucket size
need to be set small enough to avoid "should_flush_" failure in memtable's assertion.
Test Plan:
make all check
Reviewers: sdong
Reviewed By: sdong
Subscribers: igor
Differential Revision: https://reviews.facebook.net/D19371
Summary:
As discussed in our internal group, we don't get much use of seek compaction at the moment, while it's making code more complicated and slower in some cases.
This diff removes seek compaction and (hopefully) all code that was introduced to support seek compaction.
There is one test case that relied on didIO information. I'll try to find another way to implement it.
Test Plan: make check
Reviewers: sdong, haobo, yhchiang, ljin, dhruba
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D19161
Summary: Provide an convenience option to create column families if they are missing from the DB. Task #4460490
Test Plan: added unit test. also, stress test for some time
Reviewers: sdong, haobo, dhruba, ljin, yhchiang
Reviewed By: yhchiang
Subscribers: yhchiang, leveldb
Differential Revision: https://reviews.facebook.net/D18951
Summary: Add an option to indicates the variation of background threads. If the flag is not 0, for every 100 milliseconds, adjust thread pool size to a value within the range.
Test Plan: run db_stress
Reviewers: haobo, dhruba, igor, ljin
Reviewed By: ljin
Subscribers: leveldb, nkg-
Differential Revision: https://reviews.facebook.net/D18759
Summary:
Newer gflags switched from `google` namespace to `gflags` namespace. See: https://github.com/facebook/rocksdb/issues/139 and https://github.com/facebook/rocksdb/issues/102
Unfortunately, they don't define any macro with their namespace, so we need to actually try to compile gflags with two different namespace to figure out which one is the correct one.
Test Plan: works in fbcode environemnt. I'll also try in ubutnu with newer gflags
Reviewers: dhruba
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D18537
Summary:
also add an override option total_order_iteration if you want to use full
iterator with prefix_extractor
Test Plan: make all check
Reviewers: igor, haobo, sdong, yhchiang
Reviewed By: haobo
CC: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D17805
Summary:
Currently, whenever DB Verification fails we bail out by calling `exit(1)`. This is kind of bad since it causes unclean shutdown and spew of error log messages like:
05:03:27 pthread lock: Invalid argument
05:03:27 pthread lock: Invalid argument
05:03:27 pthread lock: Invalid argument
05:03:27 pthread lock: Invalid argument
05:03:27 pthread lock: Invalid argument
05:03:27 pthread lock: Invalid argument
05:03:27 pthread lock: Invalid argument
05:03:27 pthread lock: Invalid argument
05:03:27 pthread lock: Invalid argument
This diff adds a new parameter that is set to true when verification fails. It can then use the parameter to bail out safely.
Test Plan: Casued artificail failure. Verified that exit was clean.
Reviewers: dhruba, haobo, ljin
Reviewed By: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D18243
Summary:
The file tree structure in Version is prebuilt and the range of each file is known.
On the Get() code path, we do binary search in FindFile() by comparing
target key with each file's largest key and also check the range for each L0 file.
With some pre-calculated knowledge, each key comparision that has been done can serve
as a hint to narrow down further searches:
(1) If a key falls within a L0 file's range, we can safely skip the next
file if its range does not overlap with the current one.
(2) If a key falls within a file's range in level L0 - Ln-1, we should only
need to binary search in the next level for files that overlap with the current one.
(1) will be able to skip some files depending one the key distribution.
(2) can greatly reduce the range of binary search, especially for bottom
levels, given that one file most likely only overlaps with N files from
the level below (where N is max_bytes_for_level_multiplier). So on level
L, we will only look at ~N files instead of N^L files.
Some inital results: measured with 500M key DB, when write is light (10k/s = 1.2M/s), this
improves QPS ~7% on top of blocked bloom. When write is heavier (80k/s =
9.6M/s), it gives us ~13% improvement.
Test Plan: make all check
Reviewers: haobo, igor, dhruba, sdong, yhchiang
Reviewed By: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D17205