Summary: When reporting compaction that was started because of SuggestCompactRange() we should treat it as manual compaction.
Test Plan: none
Reviewers: yhchiang, rven
Reviewed By: rven
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D38139
Summary: Don't treat warnings as error when building rocksdbjavastatic
Test Plan: make rocksdbjavastatic -j32
Reviewers: rven, fyrz, adamretter, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D38187
Summary: Without this I get bunch of questions when I run `make clean`
Test Plan: no more questions!
Reviewers: rven, yhchiang, meyering, anthony
Reviewed By: meyering, anthony
Subscribers: meyering, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D38145
Summary:
In D28521 we removed GarbageCollect() from BackupEngine's constructor. The reason was that opening BackupEngine on HDFS was very slow and in most cases we didn't have any garbage. We allowed the user to call GarbageCollect() when it detects some garbage files in his backup directory.
Unfortunately, this left us vulnerable to an interesting issue. Let's say we started a backup and copied files {1, 3} but the backup failed. On another host, we restore DB from backup and generate {1, 3, 5}. Since {1, 3} is already there, we will not overwrite. However, these files might be from a different database so their contents might be different. See internal task t6781803 for more info.
Now, when we're copying files and we discover a file already there, we check:
1. if the file is not referenced from any backups, we overwrite the file.
2. if the file is referenced from other backups AND the checksums don't match, we fail the backup. This will only happen if user is using a single backup directory for backing up two different databases.
3. if the file is referenced from other backups AND the checksums match, it's all good. We skip the copy and go copy the next file.
Test Plan: Added new test to backupable_db_test. The test fails before this patch.
Reviewers: sdong, rven, yhchiang
Reviewed By: yhchiang
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D37599
Summary:
Couple changes:
1. instead of SnapshotList, just take a vector of snapshots
2. don't take a separate parameter is_snapshots_supported. If there are snapshots in the list, that means they are supported. I actually think we should get rid of this notion of snapshots not being supported.
3. don't pass in mutable_cf_options as a parameter. Lifetime of mutable_cf_options is a bit tricky to maintain, so it's better to not pass it in for the whole compaction job. We only really need it when we install the compaction results.
Test Plan: make check
Reviewers: sdong, rven, yhchiang
Reviewed By: yhchiang
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D36627
Summary:
Fixes#6840824, running "make check" on centos6 hits
a deadlock in column_family_test
Test Plan:
seq 10000 | parallel --gnu --eta 't=/dev/shm/rdb-{}; rm -rf
$t; mkdir $t && export TEST_TMPDIR=$t; ./column_family_test > $t/log-{}'
Made the test deterministic by narrrowing the window for the flush.
Reviewers: igor, meyering
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D38079
Summary: Optimize GetRange Function by checking the level of the files
Test Plan: pass make all check
Reviewers: rven, yhchiang, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D37977
Summary:
[noticed a new warning when building with the very latest gcc]
* db/memtablerep_bench.cc (FLAGS_env): Remove declaration
of unused varaible, to avoid this warning/error:
db/memtablerep_bench.cc:135:22: error: ‘FLAGS_env’ defined but not\
used [-Werror=unused-variable]
static rocksdb::Env* FLAGS_env = rocksdb::Env::Default();
^
Test Plan: compile
Reviewers: ljin, rven, igor.sugak, yhchiang, sdong, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D37983
Summary:
The default, use one iter for the whole test, isn't good. This cost me
a few hours of debugging and a few days of tessting. For readonly
that isn't realistic and for read-write that keeps a lot of old sst files around.
I remove the option because nothing uses it and not calling gettimeofday per
loop iteration adds about 3% to QPS at 20 threads.
Task ID: #
Blame Rev:
Test Plan:
run db_bench
Revert Plan:
Database Impact:
Memcache Impact:
Other Notes:
EImportant:
- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -
Reviewers: igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D37965
Summary:
CPU profiling reveals GetApproximateSizes as a bottleneck for performance. The current implementation is sub-optimal, it scans every file in every level to compute the result.
We can take advantage of the fact that all levels above 0 are sorted in the increasing order of key ranges and use binary search to locate the starting index. This can reduce the number of comparisons required to compute the result.
Test Plan: We have good test coverage. Run the tests.
Reviewers: sdong, igor, rven, dynamike
Subscribers: dynamike, maykov, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D37755
Summary: Before the fix we also marked the bottommost level for compaction. This is wrong because then RocksDB has N+1 levels instead of N as before the compaction.
Test Plan: SuggestCompactRangeTest in db_test
Reviewers: yhchiang, rven
Reviewed By: rven
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D37869
Summary: Remove duplicate code. If this diff looks good, I will cleanup other call sites as well.
Test Plan: unit tests
Reviewers: rven, yhchiang, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D37761
Summary:
This runs a benchmark for LevelDB similar to what we have
in tools/run_flash_bench.sh. It requires changes to db_bench that I published
in a LevelDB fork on github. Some results are at:
http://smalldatum.blogspot.com/2015/04/comparing-leveldb-and-rocksdb-take-2.html
Sample output:
ops/sec mb/sec usec/op avg p50 Test
525 16.4 1904.5 1904.5 111.0 fillseq.v32768
75187 15.5 13.3 13.3 4.4 fillseq.v200
28328 5.8 35.3 35.3 4.7 overwrite.t1.s0
175438 0.0 5.7 5.7 4.4 readrandom.t1
28490 5.9 35.1 35.1 4.7 overwrite.t1.s0
121951 0.0 8.2 8.2 5.7 readwhilewriting.t1
Task ID: #
Blame Rev:
Test Plan:
Revert Plan:
Database Impact:
Memcache Impact:
Other Notes:
EImportant:
- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -
Reviewers: igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D37749
Summary:
Added these events:
* Recovery start, finish and also when recovery creates a file
* Trivial move
* Compaction start, finish and when compaction creates a file
* Flush start, finish
Also includes small fix to EventLogger
Also added option ROCKSDB_PRINT_EVENTS_TO_STDOUT which is useful when we debug things. I've spent far too much time chasing LOG files.
Still didn't get sst table properties in JSON. They are written very deeply into the stack. I'll address in separate diff.
TODO:
* Write specification. Let's first use this for a while and figure out what's good data to put here, too. After that we'll write spec
* Write tools that parse and analyze LOGs. This can be in python or go. Good intern task.
Test Plan: Ran db_bench with ROCKSDB_PRINT_EVENTS_TO_STDOUT. Here's the output: https://phabricator.fb.com/P19811976
Reviewers: sdong, yhchiang, rven, MarkCallaghan, kradhakrishnan, anthony
Reviewed By: anthony
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D37521
Test Plan: Verified that valgrind build passes for cache_test
Reviewers: igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D37665
Summary: Since we enabled jemalloc for open source builds, Travis looks like it's dying. Don't install jemalloc when running in travis
Test Plan: none
Reviewers: yhchiang
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D37659
Summary: Added keyword override for SetCapacity()
Test Plan: Fixes build
Reviewers: igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D37647
Summary:
When new capacity is larger than existing capacity, simply update the capacity to the new valie
When new capacity is less than existing capacity, but more than the usage, simply update the capacity to new value
When new capacity is less than the existing capacity and existing usage both, try to purge entries in LRU if feasible to make usage < capacity
Test Plan: Created unit tests in cache_test.cc
Reviewers: sdong, rven, yhchiang, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D37527
Summary:
Make it build for CYGWIN.
Need to define "-std=gnu++11" instead of "-std=c++11" and use some replacement functions.
Test Plan: Build it and run some unit tests in CYGWIN
Reviewers: yhchiang, rven, anthony, kradhakrishnan, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D37605
Summary:
CompactRange for universal compaction with num_levels > 1 seems to have a bug. The unit test also has a bug so it doesn't capture the problem.
Fix it. Revert the compact range to the logic equivalent to num_levels=1. Always compact all files together.
It should also fix DBTest.IncreaseUniversalCompactionNumLevels. The issue was that options.write_buffer_size = 100 << 10 and options.write_buffer_size = 100 << 10 are not used in later test scenarios. So write_buffer_size of 4MB was used. The compaction trigger condition is not anymore obvious as expected.
Test Plan: Run the new test and all test suites
Reviewers: yhchiang, rven, kradhakrishnan, anthony, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D37551
Summary:
Based on feedback from D37083.
Are all of these correct? In some spaces it seems like we're doing SetMaxPossibleForUserKey() although we want the smallest possible internal key for user key.
Test Plan: make check
Reviewers: sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D37341
Summary: Currently open source rocksdb only builds with tcmalloc. This diff first checks if jemalloc is available. If it is, it compiles with jemalloc. If it isn't, it checks for tcmalloc.
Test Plan: Tried this out on my Ubuntu virtual machine and confirms that jemalloc is correctly detected and compiled.
Reviewers: MarkCallaghan, yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: adamretter, meyering, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D36789
Summary: Reading CompactionPicker I noticed this dangerous substraction of two unsigned integers. We should assert to mark this as safe.
Test Plan: make check
Reviewers: anthony, rven, yhchiang, sdong
Reviewed By: sdong
Subscribers: kradhakrishnan, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D37041
Summary:
This diff implements a new `DB` method `PromoteL0` which moves all files in L0
to a given level skipping compaction, provided that the files have disjoint
ranges and all levels up to the target level are empty.
This method provides finer-grain control for trivial compactions, and it is
useful for bulk-loading pre-sorted keys. Compared to D34797, it does not change
the semantics of an existing operation, which can impact existing code.
PromoteL0 is designed to work well in combination with the proposed
`GetSstFileWriter`/`AddFile` interface, enabling to "design" the level structure
by populating one level at a time. Such fine-grained control can be very useful
for static or mostly-static databases.
Test Plan: `make check`
Reviewers: IslamAbdelRahman, philipp, MarkCallaghan, yhchiang, igor, sdong
Reviewed By: sdong
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D37107
Summary: Add more logging to help debugging issues.
Test Plan: Run test suites
Reviewers: yhchiang, rven, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D37401
Summary: To further distinguish the corruption cases were caused by storage media or in memory states when writing it, add a paranoid check after writing the file to iterate all the rows.
Test Plan: Add a new unit test for it
Reviewers: rven, igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D37335
Summary:
This is done to avoid having each thread use the same seed between runs
of db_bench. Without this we can inflate the OS filesystem cache hit rate on
reads for read heavy tests and generally see the same key sequences get generated
between teste runs.
Task ID: #
Blame Rev:
Test Plan:
Revert Plan:
Database Impact:
Memcache Impact:
Other Notes:
EImportant:
- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -
Reviewers: igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D37563
Summary:
A couple of times on Travis, we have had the thread status say that there were no compactions done and since we assert for it, the test failed.
We now fix this by waiting till compaction started.
Test Plan:
run DBTEST::*PreShutdown*
d=/tmp/j; rm -rf $d; seq 200 | parallel --gnu --eta 'd=/tmp/j/d-{}; mkdir -p $d; TEST_TMPDIR=$d ./db_test --gtest_filter=DBTest.PreShutdown* >& '$d'/log-{}'
Reviewers: sdong, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D37545
Summary:
Without this change, someone on the machine on which
I run "make check" could cause me to overwrite arbitrary
files owned by me, via a symlink attack.
Instead of using a predictable temporary directory and
accepting to use a preexisting one, always create a new
one using mkdtemp. If $TEST_IOCTL_FRIENDLY_TMPDIR is
set and usable, attempt first to find a usable
temporary directory therein. If not, or if unusable,
then try /var/tmp and /tmp. If none of those is usable
abort with a diagnostic.
To do that, I added a new class.
Its constructor finds a suitable directory or aborts,
the sole member prints that directory's name, and the
destructor unlinks what should be an empty directory.
Note that while the code before this did not remove
its temporary directory, there was only one per $UID.
Now, there would be at least one per run or one per
test, depending on implementation, so it is important
to remove them.
Test Plan:
Run this on a fedora rawhide system, where /tmp
is a tmpfs file system, and /var/tmp is ext4.
# This gives a diagnostic that /dev/shm is not suitable
# and ends up using /var/tmp.
TEST_IOCTL_FRIENDLY_TMPDIR=/dev/shm ./env_test
# Uses /var/tmp; same as when envvar not set.
TEST_IOCTL_FRIENDLY_TMPDIR=/var/tmp ./env_test
# Uses /tmp unless it's tmpfs, in which case it gives
# a diagnostic and uses /var/tmp.
TEST_IOCTL_FRIENDLY_TMPDIR=/tmp ./env_test
Reviewers: ljin, rven, igor.sugak, yhchiang, sdong, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D37287
Summary:
This lets the production toolchain libraries get used on devservers and
in production.
Task ID: #6849362
Blame Rev:
Test Plan:
Revert Plan:
Database Impact:
Memcache Impact:
Other Notes:
EImportant:
- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -
Reviewers: igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D37533
Summary:
This adds:
1) use of --level_compaction_dynamic_level_bytes=true
2) use of --bytes_per_sync=2M
The second is a big win for disks. The first helps in general.
This also adds a new test, fillseq with 32kb values to increase the peak
ingest and make it more likely that storage limits throughput.
Sample outpout from the first 3 tests - https://gist.github.com/mdcallag/e793bd3038e367b05d6f
Task ID: #
Blame Rev:
Test Plan:
Revert Plan:
Database Impact:
Memcache Impact:
Other Notes:
EImportant:
- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -
Reviewers: igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D37509
Summary: `echo` correctly interpretes \n on mac, but not on linux. On linux you have to give it `-e` to interpret \n. Unfortunately, `-e` options is not available on Mac. Go back to old way of checking gflags
Test Plan: build_tools/build_detect_platform on mac and linux
Reviewers: sdong
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D37515