Summary:
This diff replaces compaction_filter_args and CompactionFilter with a single compaction_filter parameter. It gives CompactionFilter better encapsulation and a similar look to Comparator and MergeOpertor, which improves consistency of the overall interface.
The change is not backward compatible. Nevertheless, the two references in fbcode are not in production yet.
Test Plan: make check
Reviewers: dhruba
Reviewed By: dhruba
CC: leveldb, zshao
Differential Revision: https://reviews.facebook.net/D10773
Summary:
This diff introduces a new Merge operation into rocksdb.
The purpose of this review is mostly getting feedback from the team (everyone please) on the design.
Please focus on the four files under include/leveldb/, as they spell the client visible interface change.
include/leveldb/db.h
include/leveldb/merge_operator.h
include/leveldb/options.h
include/leveldb/write_batch.h
Please go over local/my_test.cc carefully, as it is a concerete use case.
Please also review the impelmentation files to see if the straw man implementation makes sense.
Note that, the diff does pass all make check and truly supports forward iterator over db and a version
of Get that's based on iterator.
Future work:
- Integration with compaction
- A raw Get implementation
I am working on a wiki that explains the design and implementation choices, but coding comes
just naturally and I think it might be a good idea to share the code earlier. The code is
heavily commented.
Test Plan: run all local tests
Reviewers: dhruba, heyongqiang
Reviewed By: dhruba
CC: leveldb, zshao, sheki, emayanke, MarkCallaghan
Differential Revision: https://reviews.facebook.net/D9651
Summary:
Mark's task description from #2316777
Env::Default() comes from util/env_posix.cc
This is a static global.
static PosixEnv default_env;
Env* Env::Default() {
return &default_env;
}
-----
These globals assume default_env was initialized first. I don't think that is safe or correct to do (http://stackoverflow.com/questions/1005685/c-static-initialization-order)
const string AutoRollLoggerTest::kTestDir(
test::TmpDir() + "/db_log_test");
const string AutoRollLoggerTest::kLogFile(
test::TmpDir() + "/db_log_test/LOG");
Env* AutoRollLoggerTest::env = Env::Default();
Test Plan:
run make clean && make && make check
But how can I know if it works in Ubuntu?
Reviewers: MarkCallaghan, chip
Reviewed By: chip
CC: leveldb, dhruba, haobo
Differential Revision: https://reviews.facebook.net/D10491
Summary:
RocksDB doesn't build on Ubuntu VM .. shoudl be fixed with this patch.
g++ --version
g++ (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
util/env_posix.cc:68:24: sorry, unimplemented: non-static data member initializers
util/env_posix.cc:68:24: error: ISO C++ forbids in-class initialization of non-const static member ‘use_os_buffer’
util/env_posix.cc:113:24: sorry, unimplemented: non-static data member initializers
util/env_posix.cc:113:24: error: ISO C++ forbids in-class initialization of non-const static member ‘use_os_buffer
Test Plan: make check
Reviewers: sheki, leveldb
Reviewed By: sheki
Differential Revision: https://reviews.facebook.net/D10461
Summary:
Adds the --writes_per_second rate limit for the readwhilewriting test.
The purpose is to optionally avoid saturating storage with writes & compaction
and test read response time when some writes are being done.
Changes the histogram code to also print the p99.99 value
Task ID: #
Blame Rev:
Test Plan:
make check, ran db_bench with it
Revert Plan:
Database Impact:
Memcache Impact:
Other Notes:
EImportant:
- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -
Reviewers: haobo
Reviewed By: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D10305
Summary: forgot to include signal_test.cc
Test Plan: make check
Reviewers: sheki
Reviewed By: sheki
CC: leveldb
Differential Revision: https://reviews.facebook.net/D10281
Summary:
This diff provides the ability to print out a stacktrace when the process receives certain signals.
Currently, we enable this for the following signals (program error related):
SIGILL SIGSEGV SIGBUS SIGABRT
Application simply #include "util/stack_trace.h" and call leveldb::InstallStackTraceHandler() during initialization, if signal handler is needed. It's not done automatically when openning db, because it's the application(process)'s responsibility to install signal handler and some applications might already have their own (like fbcode).
Sample output:
Received signal 11 (Segmentation fault)
#0 0x408ff0 ./signal_test() [0x408ff0] /home/haobo/rocksdb/util/signal_test.cc:4
#1 0x40827d ./signal_test() [0x40827d] /home/haobo/rocksdb/util/signal_test.cc:24
#2 0x7f8bb183172e /usr/local/fbcode/gcc-4.7.1-glibc-2.14.1/lib/libc.so.6(__libc_start_main+0x10e) [0x7f8bb183172e] ??:0
#3 0x408ebc ./signal_test() [0x408ebc] /home/engshare/third-party/src/glibc/glibc-2.14.1/glibc-2.14.1/csu/../sysdeps/x86_64/elf/start.S:113
Segmentation fault (core dumped)
For each frame, we print the raw pointer, the symbol provided by backtrace_symbols (still not good enough), and the source file/line. Note that address translation is done by directly shell out to addr2line. ??:0 means addr2line fails to do the translation. Hacky, but I think it's good for now.
Test Plan: signal_test.cc
Reviewers: dhruba, MarkCallaghan
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D10173
Summary: As title. Found out this when testing stack_trace.cc portability.
Test Plan: make check; manual test 'non-linux' build by forcing OS_LINUX2
Reviewers: dhruba, heyongqiang
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D10263
Summary: Primarily a refactor. Introduced LDBTool interface to which customers can plug in their options and this will create their own version of ldb tool.
Test Plan: made ldb tool and tried it.
Reviewers: dhruba, heyongqiang
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D10191
Summary:
The background compaction threads are never exitted and therefore caused
memory-leaks while running rpcksdb tests. Have changed the PosixEnv destructor to exit and join them and changed the tests likewise
The memory leaked has reduced from 320 bytes to 64 bytes in all the tests. The 64
bytes is relating to
pthread_exit, but still have to figure out why. The stack-trace right now with
table_test.cc = 64 bytes in 1 blocks are possibly lost in loss record 4 of 5
at 0x475D8C: malloc (jemalloc.c:914)
by 0x400D69E: _dl_map_object_deps (dl-deps.c:505)
by 0x4013393: dl_open_worker (dl-open.c:263)
by 0x400F015: _dl_catch_error (dl-error.c:178)
by 0x4013B2B: _dl_open (dl-open.c:569)
by 0x5D3E913: do_dlopen (dl-libc.c:86)
by 0x400F015: _dl_catch_error (dl-error.c:178)
by 0x5D3E9D6: __libc_dlopen_mode (dl-libc.c:47)
by 0x5048BF3: pthread_cancel_init (unwind-forcedunwind.c:53)
by 0x5048DC9: _Unwind_ForcedUnwind (unwind-forcedunwind.c:126)
by 0x5046D9F: __pthread_unwind (unwind.c:130)
by 0x50413A4: pthread_exit (pthreadP.h:289)
Test Plan: make all check
Reviewers: dhruba, sheki, haobo
Reviewed By: dhruba
CC: leveldb, chip
Differential Revision: https://reviews.facebook.net/D9573
Summary: as subject. This is causing problem in adsconv. Ideally, this flags should be set in open. But that is only supported in Linux kernel ≥2.6.23 and glibc ≥2.7.
Test Plan:
db_test
run db_test
Reviewers: dhruba, MarkCallaghan, haobo
Reviewed By: dhruba
CC: leveldb, chip
Differential Revision: https://reviews.facebook.net/D10089
Summary:
1. The stock LRUCache nukes itself whenever the working set (the total number of entries not released by client at a certain time) is bigger than the cache capacity.
See https://our.dev.facebook.com/intern/tasks/?t=2252281
2. There's a bug in shard calculation leading to segmentation fault when only one shard is needed.
Test Plan: make check
Reviewers: dhruba, heyongqiang
Reviewed By: heyongqiang
CC: leveldb, zshao, sheki
Differential Revision: https://reviews.facebook.net/D9927
Summary:
1. SetBackgroundThreads was not thread safe
2. queue_size_ does not seem necessary
3. moved condition signal after shared state change. Even though the original
order is in practice ok (because the mutex is still held), it looks fishy
and non-intuitive.
Test Plan: make check
Reviewers: dhruba
Reviewed By: dhruba
CC: leveldb, zshao
Differential Revision: https://reviews.facebook.net/D9825
Summary:
Use non mmapd files for Write-Ahead log.
Earlier use of MMaped files. made the log iterator read ahead and miss records.
Now the reader and writer will point to the same physical location.
There is no perf regression :
./db_bench --benchmarks=fillseq --db=/dev/shm/mmap_test --num=$(million 20) --use_existing_db=0 --threads=2
with This diff :
fillseq : 10.756 micros/op 185281 ops/sec; 20.5 MB/s
without this dif :
fillseq : 11.085 micros/op 179676 ops/sec; 19.9 MB/s
Test Plan: unit test included
Reviewers: dhruba, heyongqiang
Reviewed By: heyongqiang
CC: leveldb
Differential Revision: https://reviews.facebook.net/D9741
Summary:
Earlier Statistics object was a raw pointer. This meant the user had to clear up
the Statistics object after creating the database. In most use cases the database is created in a function and the statistics pointer is out of scope. Hence the statistics object would never be deleted.
Now Using a shared_ptr to manage this.
Want this in before the next release.
Test Plan: make all check.
Reviewers: dhruba, emayanke
Reviewed By: emayanke
CC: leveldb
Differential Revision: https://reviews.facebook.net/D9735
Summary: This caused compilation problems on some gcc platforms during the third-partyrelease
Test Plan: make
Reviewers: sheki
Reviewed By: sheki
Differential Revision: https://reviews.facebook.net/D9627
Summary:
This patch allows an application to specify whether to use bufferedio,
reads-via-mmaps and writes-via-mmaps per database. Earlier, there
was a global static variable that was used to configure this functionality.
The default setting remains the same (and is backward compatible):
1. use bufferedio
2. do not use mmaps for reads
3. use mmap for writes
4. use readaheads for reads needed for compaction
I also added a parameter to db_bench to be able to explicitly specify
whether to do readaheads for compactions or not.
Test Plan: make check
Reviewers: sheki, heyongqiang, MarkCallaghan
Reviewed By: sheki
CC: leveldb
Differential Revision: https://reviews.facebook.net/D9429
Summary: Getting rid of boost in our github codebase which caused problems on third-party
Test Plan: make ldb; python tools/ldb_test.py
Reviewers: sheki, dhruba
Reviewed By: sheki
Differential Revision: https://reviews.facebook.net/D9543
Summary: Was causing error(warning) in third-party saying unused result
Test Plan: make
Reviewers: sheki, dhruba
Reviewed By: dhruba
Differential Revision: https://reviews.facebook.net/D9447
Summary: Makefile had options to ignore sign-comparisons and unused-parameters, which should be there. Also fixed the specific errors in the code-base
Test Plan: make
Reviewers: chip, dhruba
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D9531
Summary: negation of the condition checked currently had to be checkd actually
Test Plan: make ldb; python ldb_test.py
Reviewers: sheki, dhruba
Reviewed By: sheki
Differential Revision: https://reviews.facebook.net/D9459
Summary: boost functions cause complications while deploying to third-party
Test Plan: make
Reviewers: sheki, dhruba
Reviewed By: sheki
Differential Revision: https://reviews.facebook.net/D9441
Summary:
Ftruncate does not throw an error on disk-full. This causes Sig-bus in
the case where the database tries to issue a Put call on a full-disk.
Use posix_fallocate for allocation instead of truncate.
Add a check to use MMaped files only on ext4, xfs and tempfs, as
posix_fallocate is very slow on ext3 and older.
Test Plan: make all check
Reviewers: dhruba, chip
Reviewed By: dhruba
CC: adsharma, leveldb
Differential Revision: https://reviews.facebook.net/D9291
Summary: Fix for memory leaks in rocksdb tests. Also modified the variable NUM_FAILED_TESTS to print the actual number of failed tests.
Test Plan: make <test>; valgrind --leak-check=full ./<test>
Reviewers: sheki, dhruba
Reviewed By: sheki
CC: leveldb
Differential Revision: https://reviews.facebook.net/D9333
Summary:
1. Create only 2 levels so that manual compactions are fast.
2. Set target file size to a large value
Test Plan: make clean check
Reviewers: kailiu, zshao
Reviewed By: zshao
CC: leveldb
Differential Revision: https://reviews.facebook.net/D9231
Summary:
Add a shortcut function to make it easier for people
to efficiently bulk_load data into RocksDB.
Test Plan:
Tried ldb with "--bulk_load" and "--bulk_load --compact" and verified the outcome.
Needs to consult the team on how to test this automatically.
Reviewers: sheki, dhruba, emayanke, heyongqiang
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D8907
Summary:
This adds the rate_delay_limit_milliseconds option to make the delay
configurable in MakeRoomForWrite when the max compaction score is too high.
This delay is called the Ln slowdown. This change also counts the Ln slowdown
per level to make it possible to see where the stalls occur.
From IO-bound performance testing, the Level N stalls occur:
* with compression -> at the largest uncompressed level. This makes sense
because compaction for compressed levels is much
slower. When Lx is uncompressed and Lx+1 is compressed
then files pile up at Lx because the (Lx,Lx+1)->Lx+1
compaction process is the first to be slowed by
compression.
* without compression -> at level 1
Task ID: #1832108
Blame Rev:
Test Plan:
run with real data, added test
Revert Plan:
Database Impact:
Memcache Impact:
Other Notes:
EImportant:
- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -
Reviewers: dhruba
Reviewed By: dhruba
Differential Revision: https://reviews.facebook.net/D9045
Summary:
Rocks accumulates recent writes and deletes in the in-memory memtable.
When the memtable is full, it writes the contents on the memtable to
a file in L0.
This patch removes redundant records at the time of the flush. If there
are multiple versions of the same key in the memtable, then only the
most recent one is dumped into the output file. The purging of
redundant records occur only if the most recent snapshot is earlier
than the earliest record in the memtable.
Should we switch on this feature by default or should we keep this feature
turned off in the default settings?
Test Plan: Added test case to db_test.cc
Reviewers: sheki, vamsi, emayanke, heyongqiang
Reviewed By: sheki
CC: leveldb
Differential Revision: https://reviews.facebook.net/D8991
Summary: LDB tool to print the deleted/put keys in hex in the wal file.
Test Plan: run ldb on a db to check if output was satisfactory
Reviewers: dhruba
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D8691
Summary:
Changed the Get and Scan options with openForReadOnly mode to have access to the memtable.
Changed the visibility of NewInternalIterator in db_impl from private to protected so that
the derived class db_impl_read_only can call that in its NewIterator function for the
scan case. The previous approach which changed the default for flush_on_destroy_ from false to true
caused many problems in the unit tests due to empty sst files that it created. All
unit tests pass now.
Test Plan: make clean; make all check; ldb put and get and scans
Reviewers: dhruba, heyongqiang, sheki
Reviewed By: dhruba
CC: kosievdmerwe, zshao, dilipj, kailiu
Differential Revision: https://reviews.facebook.net/D8697
Summary:
* Introduce is histogram in statistics.h
* stop watch to measure time.
* introduce two timers as a poc.
Replaced NULL with nullptr to fight some lint errors
Should be useful for google.
Test Plan:
ran db_bench and check stats.
make all check
Reviewers: dhruba, heyongqiang
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D8637
Summary:
I missed InitTestDb() in one of my tess. InitTestDb() initializes the test directory, without which the test will throw IO error.
This problem didn't occur before because I've already run the tests before so the test directory is already there.
Test Plan:
Reviewers: dhruba
CC:
Task ID: #
Blame Rev:
Summary:
flush_on_destroy has a default value of false and the memtable is flushed
in the dbimpl-destructor only when that is set to true. Because we want the memtable to be flushed everytime that
the destructor is called(db is closed) and the cases where we work with the memtable only are very less
it is a good idea to give this a default value of true. Thus the put from ldb
wil have its data flushed to disk in the destructor and the next Get will be able to
read it when opened with OpenForReadOnly. The reason that ldb could read the latest value when
the db was opened in the normal Open mode is that the Get from normal Open first reads
the memtable and directly finds the latest value written there and the Get from OpenForReadOnly
doesn't have access to the memtable (which is correct because all its Put/Modify) are disabled
Test Plan: make all; ldb put and get and scans
Reviewers: dhruba, heyongqiang, sheki
Reviewed By: heyongqiang
CC: kosievdmerwe, zshao, dilipj, kailiu
Differential Revision: https://reviews.facebook.net/D8631
Summary: Fix the warning [-Werror=format-security] and [-Werror=unused-result].
Test Plan:
enforced the Werror and run make
Task ID: 2101673
Blame Rev:
Reviewers: heyongqiang
Differential Revision: https://reviews.facebook.net/D8553
Summary:
$SUBJECT -- cosmetic fix for histograms, print P75/P99, and
make sure zlib is enabled for our command line tools.
Test Plan: compile, test db_bench with --compression_type=zlib
Reviewers: heyongqiang
Reviewed By: heyongqiang
CC: adsharma, leveldb
Differential Revision: https://reviews.facebook.net/D8445
Summary:
* Add a SplitByTTLLogger to enable this feature. In this diff I implemented generalized AutoSplitLoggerBase class to simplify the
development of such classes.
* Refactor the existing AutoSplitLogger and fix several bugs.
Test Plan:
* Added a unit tests for different types of "auto splitable" loggers individually.
* Tested the composited logger which allows the log files to be splitted by both TTL and log size.
Reviewers: heyongqiang, dhruba
Reviewed By: heyongqiang
CC: zshao, leveldb
Differential Revision: https://reviews.facebook.net/D8037
Summary:
The existing code did not initialize a few doubles in histogram.cc.
Cropped up when I wrote a unit-test.
Test Plan: make all check
Reviewers: chip
Reviewed By: chip
CC: leveldb
Differential Revision: https://reviews.facebook.net/D8319
Summary:
Earlier way to record in histogram=>
Linear search BucketLimit array to find the bucket and increment the
counter
Current way to record in histogram=>
Store a HistMap statically which points the buckets of each value in the
range [kFirstValue, kLastValue);
In the proccess use vectors instead of array's and refactor some code to
HistogramHelper class.
Test Plan:
run db_bench with histogram=1 and see a histogram being
printed.
Reviewers: dhruba, chip, heyongqiang
Reviewed By: chip
CC: leveldb
Differential Revision: https://reviews.facebook.net/D8265
Summary:
Added function to `RandomAccessFile` to generate an unique ID for that file. Currently only `PosixRandomAccessFile` has this behaviour implemented and only on Linux.
Changed how key is generated in `Table::BlockReader`.
Added tests to check whether the unique ID is stable, unique and not a prefix of another unique ID. Added tests to see that `Table` uses the cache more efficiently.
Test Plan: make check
Reviewers: chip, vamsi, dhruba
Reviewed By: chip
CC: leveldb
Differential Revision: https://reviews.facebook.net/D8145
Summary: fallocate is linux only, so let's protect it with ifdef's
Test Plan: make
Reviewers: sheki, dhruba
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D8223
Summary:
Previously, if you opened a db with num_levels set lower than
the database, you received the unhelpful message "Corruption:
VersionEdit: new-file entry." Now you get a more verbose message
describing the issue.
Also, fix handling of compression_levels (both the run-over-the-end
issue and the memory management of it).
Lastly, unique_ptr'ify a couple of minor calls.
Test Plan: make check
Reviewers: dhruba
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D8151
Summary:
We continually rebuilt build_version.c because we put the
current date into it, but that's what __DATE__ already is. This makes
builds faster.
This also fixes an issue with 'make clean FOO' not working properly.
Also tweak the build rules to be more consistent, always have warnings,
and add a 'make release' rule to handle flags for release builds.
Test Plan: make, make clean
Reviewers: dhruba
Reviewed By: dhruba
Differential Revision: https://reviews.facebook.net/D8139
Summary:
On some filesystems, pre-allocation can be a considerable
amount of space. xfs in our production environment pre-allocates by
1GB, for instance. By using fallocate to inform the kernel of our
expected file sizes, we eliminate this wasteage (that isn't recovered
until the file is closed which, in the case of LOG files, can be a
considerable amount of time).
Test Plan:
created an xfs loopback filesystem, mounted with
allocsize=4M, and ran db_stress. LOG file without this change was 4M,
and with it it was 128k then grew to normal size.
Reviewers: dhruba
Reviewed By: dhruba
CC: adsharma, leveldb
Differential Revision: https://reviews.facebook.net/D7953
Summary:
Replace manual memory management with std::unique_ptr in a
number of places; not exhaustive, but this fixes a few leaks with file
handles as well as clarifies semantics of the ownership of file handles
with log classes.
Test Plan: db_stress, make check
Reviewers: dhruba
Reviewed By: dhruba
CC: zshao, leveldb, heyongqiang
Differential Revision: https://reviews.facebook.net/D8043
Summary:
Check in LogAndApply if the file size is more than the limit set in
Options.
Things to consider : will this be expensive?
Test Plan: make all check. Inputs on a new unit test?
Reviewers: dhruba
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D7701