Summary:
We occasionally get write stalls (>1s Write() calls) on HDD under read load. The following timers explain almost all of the stalls:
- perf_context.db_mutex_lock_nanos
- perf_context.db_condition_wait_nanos
- iostats_context.open_time
- iostats_context.allocate_time
- iostats_context.write_time
- iostats_context.range_sync_time
- iostats_context.logger_time
In my experiments each of these occasionally takes >1s on write path under some workload. There are rare cases when Write() takes long but none of these takes long.
Test Plan: Added code to our application to write the listed timings to log for slow writes. They usually add up to almost exactly the time Write() call took.
Reviewers: rven, yhchiang, sdong
Reviewed By: sdong
Subscribers: march, dhruba, tnovak
Differential Revision: https://reviews.facebook.net/D39177
Summary: Optimistic transactions supporting begin/commit/rollback semantics. Currently relies on checking the memtable to determine if there are any collisions at commit time. Not yet implemented would be a way of enuring the memtable has some minimum amount of history so that we won't fail to commit when the memtable is empty. You should probably start with transaction.h to get an overview of what is currently supported.
Test Plan: Added a new test, but still need to look into stress testing.
Reviewers: yhchiang, igor, rven, sdong
Reviewed By: sdong
Subscribers: adamretter, MarkCallaghan, leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D33435
Summary:
Rename EventLoggerHelpers EventHelpers, as it's going to include
all event-related helper functions instead of EventLogger only stuffs.
Test Plan: make
Reviewers: sdong, rven, anthony
Reviewed By: anthony
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D39093
Summary:
Added a couple functions to WriteBatchWithIndex to make it easier to query the value of a key including reading pending writes from a batch. (This is needed for transactions).
I created write_batch_with_index_internal.h to use to store an internal-only helper function since there wasn't a good place in the existing class hierarchy to store this function (and it didn't seem right to stick this function inside WriteBatchInternal::Rep).
Since I needed to access the WriteBatchEntryComparator, I moved some helper classes from write_batch_with_index.cc into write_batch_with_index_internal.h/.cc. WriteBatchIndexEntry, ReadableWriteBatch, and WriteBatchEntryComparator are all unchanged (just moved to a different file(s)).
Test Plan: Added new unit tests.
Reviewers: rven, yhchiang, sdong, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D38037
Summary:
Some Mongo+Rocks datasets in Parse's environment are not doing compactions very frequently. During the quiet period (with no IO), we'd like to schedule compactions so that our reads become faster. Also, aggressively compacting during quiet periods helps when write bursts happen. In addition, we also want to compact files that are containing deleted key ranges (like old oplog keys).
All of this is currently not possible with CompactRange() because it's single-threaded and blocks all other compactions from happening. Running CompactRange() risks an issue of blocking writes because we generate too much Level 0 files before the compaction is over. Stopping writes is very dangerous because they hold transaction locks. We tried running manual compaction once on Mongo+Rocks and everything fell apart.
MarkForCompaction() solves all of those problems. This is very light-weight manual compaction. It is lower priority than automatic compactions, which means it shouldn't interfere with background process keeping the LSM tree clean. However, if no automatic compactions need to be run (or we have extra background threads available), we will start compacting files that are marked for compaction.
Test Plan: added a new unit test
Reviewers: yhchiang, rven, MarkCallaghan, sdong
Reviewed By: sdong
Subscribers: yoshinorim, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D37083
Summary: Make target for running all xfunc tests
Test Plan: make xfunc
Reviewers: igor, sdong, meyering
Reviewed By: meyering
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D36873
Summary:
* src.mk (JNI_NATIVE_SOURCES): New variable, so we don't have to use
a glob in Makefile
* Makefile (JNI_NATIVE_SOURCES): Remove glob-using definition, now
that the explicit list of sources is in src.mk.
Test Plan:
Run this:
JAVA_HOME=/usr/local/jdk-7u67-64 PATH=$JAVA_HOME/bin:$PATH \
make rocksdbjava
Reviewers: yhchiang
Reviewed By: yhchiang
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D36633
Summary:
Just couple of small changes:
1. removed signal_test, since it doesn't seem useful and we don't even run it as part of `make check`
2. moved perf_context_test to TESTS instead of PROGRAMS
3. `make release` probably shouldn't compile benchmarks. We currently rely on `make release` building db_bench (via Jenkins), so I left db_bench there.
This is just a minor cleanup. We need to rethink our targets since they are a bit messy right now. We can do this during our tech debt week.
Test Plan: make release
Reviewers: anthony, rven, yhchiang, sdong, meyering
Reviewed By: meyering
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D36171
Summary: Most of the approach is copied from WebSQL's MySQL branch. It's nice that we can do this without touching core RocksDB code.
Test Plan: Compiles and runs. Didn't test flashback code, as I don't have flashback device and most if it is c/p
Reviewers: MarkCallaghan, sdong
Reviewed By: sdong
Subscribers: rven, lgalanis, kradhakrishnan, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D35391
Summary: WriteBatch and WriteBatchWithIndex now both inherit from a common abstract base class. This makes it easier to write code that is agnostic toward the implementation of the particular write batch. In particular, I plan on utilizing this abstraction to allow transactions to support using either implementation of a write batch.
Test Plan: modified existing WriteBatchWithIndex tests to test new functions. Running all tests.
Reviewers: igor, rven, yhchiang, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D34017
Summary:
Adds gtest fused source code into `third-party` directory. No manual changes.
gtest latest released 1.7 has clang dev compilation errors. Trunk version requires only one disabled warning (-Wno-missing-field-initializers)
Fused code is made as described here https://fburl.com/90806322
Details about why we need gtest source code instead of precompiled library https://fburl.com/90805763
Source used from http://googletest.googlecode.com/svn/trunk
Test Plan:
Build and notice no errors. Also check in logs that gtest-all.o being compiled gtest-all.o.
```lang=bash
% USE_CLANG=1 make all
```
Reviewers: lgalanis, yufei.zhu, rven, sdong, igor, meyering
Reviewed By: meyering
Subscribers: meyering, yhchiang, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D33345
Summary:
Here's my proposal for making our LOGs easier to read by machines.
The idea is to dump all events as JSON objects. JSON is easy to read by humans, but more importantly, it's easy to read by machines. That way, we can parse this, load into SQLite/mongo and then query or visualize.
I started with table_create and table_delete events, but if everybody agrees, I'll continue by adding more events (flush/compaction/etc etc)
Test Plan:
Ran db_bench. Observed:
2015/01/15-14:13:25.788019 1105ef000 EVENT_LOG_v1 {"time_micros": 1421360005788015, "event": "table_file_creation", "file_number": 12, "file_size": 1909699}
2015/01/15-14:13:25.956500 110740000 EVENT_LOG_v1 {"time_micros": 1421360005956498, "event": "table_file_deletion", "file_number": 12}
Reviewers: yhchiang, rven, dhruba, MarkCallaghan, lgalanis, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D31647
Summary:
The build process now requires new source files to be added to
src.mk. Adding convenience.cc to src.mk
Test Plan: Build rocksdb
Reviewers: igor, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D34815
Summary:
* src.mk (LIB_SOURCES): Add this file:
utilities/document/json_document_builder.cc
to avoid link errors.
Blame Rev: D33849
Test Plan: run "make"
Reviewers: ljin, igor.sugak, rven, sdong, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D34593
Summary:
Any time one would modify a dependent of any *test*.cc file,
"make" would fail to rebuild the affected test binaries,
e.g., db_test. That was due to the fact that we deliberately
excluded those test-related files from the definition of SOURCES
and only $(SOURCES) was used to create the automatically-generated
.d dependency files. The fix is to generate a .d file for every
source file.
* src.mk: New file. Defines LIB_SOURCES, MOCK_SOURCES
and TEST_BENCH_SOURCES.
* Makefile: Include src.mk.
Reflect s/SOURCES/LIB_SOURCES/ renaming.
* build_tools/build_detect_platform: Remove the code
that was used to generate SOURCES= and MOCK_SOURCES=
definitions in make_config.mk. Those lists of files
are now hard-coded in src.mk. Hard-coding this list of
sources is desirable, because without that, one risks
including stray .cc files in a build. Not reproducible.
Test Plan:
Touch a file used by db_test's dependent .o files and ensure that
they are all recompiled. Before, none would be:
$ touch db/db_impl.h && make db_test
CC db/db_test.o
CC db/column_family.o
CC db/db_filesnapshot.o
CC db/db_impl.o
CC db/db_impl_debug.o
CC db/db_impl_readonly.o
CC db/forward_iterator.o
CC db/internal_stats.o
CC db/managed_iterator.o
CC db/repair.o
CC db/write_batch.o
CC utilities/compacted_db/compacted_db_impl.o
CC utilities/ttl/db_ttl_impl.o
CC util/ldb_cmd.o
CC util/ldb_tool.o
CC util/sst_dump_tool.o
CC util/xfunc.o
CCLD db_test
Reviewers: ljin, igor.sugak, igor, rven, sdong
Reviewed By: sdong
Subscribers: yhchiang, adamretter, fyrz, dhruba
Differential Revision: https://reviews.facebook.net/D33849