rocksdb

Commit Graph

Author	SHA1	Message	Date
Lei Jin	44f0ff31c2	use fallocate(FALLOC_FL_PUNCH_HOLE) to release unused blocks at the end of file Summary: ftruncate does not always free preallocated unused space at the end of file. In some cases, we pin too much disk space than it should Test Plan: env_test Reviewers: sdong, rven, yhchiang, igor Reviewed By: igor Subscribers: nkg-, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D25641	10 years ago
Igor Canadi	965d9d50b8	Fix timing	10 years ago
Igor Canadi	001ce64dc7	Use chrono for timing Summary: Since we depend on C++11, we might as well use it for timing, instead of this platform-depended code. Test Plan: Ran autovector_test, which reports time and confirmed that output is similar to master Reviewers: ljin, sdong, yhchiang, rven, dhruba Reviewed By: dhruba Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D25587	10 years ago
Lei Jin	7e9f28cb23	limit max bytes that can be read/written per pread/write syscall Summary: BlockBasedTable sst file size can grow to a large size when universal compaction is used. When index block exceeds 2G, pread seems to fail and return truncated data and causes "trucated block" error. I tried to use ``` #define _FILE_OFFSET_BITS 64 ``` But the problem still persists. Splitting a big write/read into smaller batches seems to solve the problem. Test Plan: successfully compacted a case with resulting sst file at ~90G (2.1G index block size) Reviewers: yhchiang, igor, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22569	10 years ago
Igor Canadi	d9c0785812	Fix assertion in PosixRandomAccessFile Summary: See https://github.com/facebook/rocksdb/issues/244#issuecomment-53372297 Also see this: https://github.com/facebook/rocksdb/blob/master/util/env_posix.cc#L1075 Test Plan: compiles Reviewers: yhchiang, ljin, sdong Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22419	10 years ago
ZHANG Biao	8dfe2fdd51	fix compile error under Mac OS X	10 years ago
Lei Jin	58c49466d2	Allow env_posix to lower background thread IO priority Summary: This is a linux-specific system call. Test Plan: ran db_bench Reviewers: igor, yhchiang, sdong Reviewed By: sdong Subscribers: haobo, leveldb Differential Revision: https://reviews.facebook.net/D21183	10 years ago
Lei Jin	534357ca3a	integrate rate limiter into rocksdb Summary: Add option and plugin rate limiter for PosixWritableFile. The rate limiter only applies to flush and compaction. WAL and MANIFEST are excluded from this enforcement. Test Plan: db_test Reviewers: igor, yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19425	10 years ago
Yueh-Hsuan Chiang	90a6aca48e	Finer report I/O stats about Flush and Compaction. Summary: This diff allows the I/O stats about Flush and Compaction to be reported in a more accurate way. Instead of measuring the size of a file, it measure I/O cost in per read / write basis. Test Plan: make all check Reviewers: sdong, igor, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19383	10 years ago
Igor Canadi	d3f63f03ad	Fix 32-bit errors Summary: https://www.facebook.com/groups/rocksdb.dev/permalink/590438347721350/ Test Plan: compiles Reviewers: sdong, ljin, yhchiang Reviewed By: yhchiang Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19197	10 years ago
sdong	9899b12780	ThreadID printed when Thread terminating in the same format as posix_logger Summary: `220132b65e` correctly fixed the issue of thread ID printing when terminating a thread. Nothing wrong with it. This diff prints the ID in the same way as in PosixLogger::logv() so that users can be more easily to correlates them. Test Plan: run env_test and make sure it prints correctly. Reviewers: igor, haobo, ljin, yhchiang Reviewed By: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D18819	11 years ago
Chilledheart	81b498bc15	Print pthread_t in a more safe way	11 years ago
sdong	bd1105aa5a	Print out thread ID while thread terminates for decreased pool size. Summary: Per request from @nkg-, temporarily print thread ID when a thread terminates. It is a temp solution as we try to minimized stderr messages. Test Plan: env_test Reviewers: haobo, igor, dhruba Reviewed By: igor CC: nkg-, leveldb Differential Revision: https://reviews.facebook.net/D18753	11 years ago
sdong	3df07d1703	ThreadPool to allow decrease number of threads and increase of number of threads is to be instantly scheduled Summary: Add a feature to decrease the number of threads in thread pool. Also instantly schedule more threads if number of threads is increased. Here is the way it is implemented: each background thread needs its thread ID. After decreasing number of threads, all threads are woken up. The thread with the largest thread ID will terminate. If there are more threads to terminate, the thread will wake up all threads again. Another change is made so that when number of threads is increased, more threads are created and all previous excessive threads are woken up to do the work. Test Plan: Add a unit test. Reviewers: haobo, dhruba Reviewed By: haobo CC: yhchiang, igor, nkg-, leveldb Differential Revision: https://reviews.facebook.net/D18675	11 years ago
Igor Canadi	72ff275e3c	Fix TransactionLogIterator EOF caching Summary: When TransactionLogIterator comes to EOF, it calls UnmarkEOF and continues reading. However, if glibc cached the EOF status of the file, it will get EOF again, even though the new data might have been written to it. This has been causing errors in Mac OS. Test Plan: test passes, was failing before Reviewers: dhruba, haobo, sdong Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D18381	11 years ago
Igor Canadi	c2da9e5997	Flush before Fsync()/Sync() Summary: Calling Fsync()/Sync() on a file should give the guarantee that whatever you written to the file is now persisted. This is currently not the case, since we might have some data left in application cache as we do Fsync()/Sync(). For example, BuildTable() calls Fsync() without the flush, assuming all sst data is now persisted, but it's actually not. This may result in big inconsistencies. Test Plan: no test Reviewers: sdong, dhruba, haobo, ljin, yhchiang Reviewed By: sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D18159	11 years ago
Yueh-Hsuan Chiang	fa84eb1f7b	Fixed a compile error which tries to check whether a size_t < 0 in env_posix.cc Summary: Fixed a compile error which tries to check whether a size_t < 0 in env_posix.cc util/env_posix.cc:180:16: error: comparison of unsigned expression < 0 is always false [-Werror,-Wtautological-compare] } while (r < 0 && errno == EINTR); ~ ^ ~ 1 error generated. Test Plan: make check all Reviewers: igor, haobo Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D17379	11 years ago
Igor Canadi	726c8084cd	Retry FS system calls on EINTR Summary: EINTR means 'please retry'. We don't do that currenty. We should. Test Plan: make check, although it doesn't really test the new code. we'll just have to believe in the code! Reviewers: haobo, ljin Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17349	11 years ago
Igor Canadi	64ae6e9eb9	Don't preallocate log files	11 years ago
Igor Canadi	5c44a8db61	fallocate_with_keep_size is false for LogWrites	11 years ago
Igor Canadi	22507aff6c	Fix compile issue in Mac OS Summary: Compile issues are: * Unused variable env_ * Unused fallocate_with_keep_size_ Test Plan: compiles Reviewers: dhruba, haobo, sdong Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D17043	11 years ago
Igor Canadi	f26cb0f093	Optimize fallocation Summary: Based on my recent findings (posted in our internal group), if we use fallocate without KEEP_SIZE flag, we get superior performance of fdatasync() in append-only workloads. This diff provides an option for user to not use KEEP_SIZE flag, thus optimizing his sync performance by up to 2x-3x. At one point we also just called posix_fallocate instead of fallocate, which isn't very fast: http://code.woboq.org/userspace/glibc/sysdeps/posix/posix_fallocate.c.html (tl;dr it manually writes out zero bytes to allocate storage). This diff also fixes that, by first calling fallocate and then posix_fallocate if fallocate is not supported. Test Plan: make check Reviewers: dhruba, sdong, haobo, ljin Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D16761	11 years ago
sdong	01dcef114b	Env to add a function to allow users to query waiting queue length Summary: Add a function to Env so that users can query the waiting queue length of each thread pool Test Plan: add a test in env_test Reviewers: haobo Reviewed By: haobo CC: dhruba, igor, yhchiang, ljin, nkg-, leveldb Differential Revision: https://reviews.facebook.net/D16755	11 years ago
Yumikiyo Osanai	056a0286d2	Modify the compile error about ftruncate() Summary: Change to store the return value from ftruncate(). The reason is that ftruncate() has "warn_unused_result" attribute in some environment. Signed-off-by: Yumikiyo Osanai <yumios.art@gmail.com>	11 years ago
Igor Canadi	26ac5603f4	Truncate unused space on PosixWritableFile::Close() Summary: Blocks allocated with fallocate will take extra space on disk even if they are unused and the file is close. Now we remove the extra blocks at the end of the file by calling `ftruncate`. Test Plan: added a test to env_test Reviewers: dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D16647	11 years ago
Lei Jin	b2795b799e	thread local pointer storage Summary: This is not a generic thread local implementation in the sense that it only takes pointer. But it does support multiple instances per thread and lets user plugin function to perform cleanup when thread exits or an instance gets destroyed. Test Plan: unit test for now Reviewers: haobo, igor, sdong, dhruba Reviewed By: igor CC: leveldb, kailiu Differential Revision: https://reviews.facebook.net/D16131	11 years ago
Igor Canadi	d53b188228	Fix some errors detected by coverity scan Summary: Nothing major, just an extra return line and posibility of leaking fb in NewRandomRWFile Test Plan: make check Reviewers: kailiu, dhruba Reviewed By: kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D15993	11 years ago
Igor Canadi	832158e7f7	Fsync directory after we create a new file Summary: @dhruba, I'm not sure where we need to sync the directory. I implemented the function in Env() and added the dir sync just after we close the newly created file in the builder. Should I also add FsyncDir() to new files that get created by a compaction? Test Plan: Confirmed that FsyncDir is returning Status::OK() Reviewers: dhruba, haobo Reviewed By: dhruba CC: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D14751	11 years ago
Mike Lin	4c75e21c20	Eliminate stdout message when launching a posix thread. This seems out of place as it's the only time RocksDB prints to stdout in the normal course of operations. Thread IDs can still be retrieved from the LOG file: cut -d ' ' -f2 LOG \| sort \| uniq \| egrep -x '[0-9a-f]+'	11 years ago
James Golick	c28dd2a891	oops - missed a spot	11 years ago
James Golick	43c386b72e	only try to use fallocate if it's actually present on the system	11 years ago
kailiu	e1d92dfd2e	Fix a bunch of mac compilation issues in performance branch	11 years ago
lovro	45a2f2d8d3	Fix build without glibc Summary: The preprocessor does not follow normal rules of && evaluation, tries to evaluate __GLIBC_PREREQ(2, 12) even though the defined() check fails. This breaks the build if __GLIBC_PREREQ is absent. Test Plan: Try adding #undef __GLIBC_PREREQ above the offending line, build no longer breaks Reviewed By: igor Blame Rev: `4c81383628`	11 years ago
lovro	4c81383628	Set background thread name with pthread_setname_np() Summary: Makes it easier to monitor performance with top Test Plan: ./manual_compaction_test with `top -H` running. Previously was two `manual_compacti`, now one shows `rocksdb:bg0`. Reviewers: igor, dhruba Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D14367	11 years ago
Siying Dong	8aac46d686	[RocksDB Performance Branch] Fix a regression bug of munmap Summary: Fix a stupid bug I just introduced in `b59d4d5a50`, which I didn't even mean to include. GCC might remove the munmap. Test Plan: Run it and make sure munmap succeeds Reviewers: haobo, kailiu Reviewed By: kailiu CC: dhruba, reconnect.grayhat, leveldb Differential Revision: https://reviews.facebook.net/D14361	11 years ago
Siying Dong	b59d4d5a50	A Simple Plain Table Summary: A Simple plain table format. No block structure. When creating the table reader, scanning the full table to create indexes. Test Plan:Add unit test Reviewers:haobo,dhruba,kailiu CC: Task ID: # Blame Rev:	11 years ago
kailiu	97d8e573a6	make util/env_posix.cc work under mac Summary: This diff invoves some more complicated issues in the posix environment. Test Plan: works under mac os. will need to verify dev box. Reviewers: dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D14061	11 years ago
Dhruba Borthakur	b4ad5e89ae	Implement a compressed block cache. Summary: Rocksdb can now support a uncompressed block cache, or a compressed block cache or both. Lookups first look for a block in the uncompressed cache, if it is not found only then it is looked up in the compressed cache. If it is found in the compressed cache, then it is uncompressed and inserted into the uncompressed cache. It is possible that the same block resides in the compressed cache as well as the uncompressed cache at the same time. Both caches have their own individual LRU policy. Test Plan: Unit test case attached. Reviewers: kailiu, sdong, haobo, leveldb Reviewed By: haobo CC: xjin, haobo Differential Revision: https://reviews.facebook.net/D12675	11 years ago
Igor Canadi	b572e81f94	Flush Log every 5 seconds Summary: This might help with p99 performance, but does not solve the real problem. More discussion on #2947135 Test Plan: make check Reviewers: dhruba, haobo Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D13809	11 years ago
Mayank Agarwal	9b50106f9a	Dbid feature Summary: Create a new type of file on startup if it doesn't already exist called DBID. This will store a unique number generated from boost library's uuid header file. The use-case is to identify the case of a db losing all its data and coming back up either empty or from an image(backup/live replica's recovery) the key point to note is that DBID is not stored in a backup or db snapshot It's preferable to use Boost for uuid because: 1) A non-standard way of generating uuid is not good 2) /proc/sys/kernel/random/uuid generates a uuid but only on linux environments and the solution would not be clean 3) c++ doesn't have any direct way to get a uuid 4) Boost is a very good library that was already having linkage in rocksdb from third-party Note: I had to update the TOOLCHAIN_REV in build files to get latest verison of boost from third-party as the older version had a bug. I had to put Wno-uninitialized in Makefile because boost-1.51 has an unitialized variable and rocksdb would not comiple otherwise. Latet open-source for boost is 1.54 but is not there in third-party. I have notified the concerned people in fbcode about it. @kailiu : While releasing to third-party, an additional dependency will need to be created for boost in TARGETS file. I can help identify. Test Plan: Expand db_test to test 2 cases 1) Restarting db with Id file present - verify that no change to Id 2)Restarting db with Id file deleted - verify that a different Id is there after reopen Also run make all check Reviewers: dhruba, haobo, kailiu, sdong Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D13587	11 years ago
Dhruba Borthakur	9cd221094c	Add appropriate LICENSE and Copyright message. Summary: Add appropriate LICENSE and Copyright message. Test Plan: make check Reviewers: CC: Task ID: # Blame Rev:	11 years ago
Igor Canadi	d0beadd456	Env class that can randomly read and write Summary: I have implemented basic simple use case that I need for External Value Store I'm working on. There is a potential for making this prettier by refactoring/combining WritableFile and RandomAccessFile, avoiding some copypasta. However, I decided to implement just the basic functionality, so I can continue working on the other diff. Test Plan: Added a unittest Reviewers: dhruba, haobo, kailiu Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D13365	11 years ago
Dhruba Borthakur	4463b11cad	Migrate names of properties from 'leveldb' prefix to 'rocksdb' prefix. Summary: Migrate names of properties from 'leveldb' prefix to 'rocksdb' prefix. Test Plan: make check Reviewers: emayanke, haobo Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D13311	11 years ago
Dhruba Borthakur	a143ef9b38	Change namespace from leveldb to rocksdb Summary: Change namespace from leveldb to rocksdb. This allows a single application to link in open-source leveldb code as well as rocksdb code into the same process. Test Plan: compile rocksdb Reviewers: emayanke Reviewed By: emayanke CC: leveldb Differential Revision: https://reviews.facebook.net/D13287	11 years ago
Dhruba Borthakur	87d6eb2f6b	Implement apis in the Environment to clear out pages in the OS cache. Summary: Added a new api to the Environment that allows clearing out not-needed pages from the OS cache. This will be helpful when the compressed block cache replaces the OS cache. Test Plan: EnvPosixTest.InvalidateCache Reviewers: haobo Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D13041	11 years ago
Rajat Goel	11c65021fb	Revert "Minor fixes found while trying to compile it using clang on Mac OS X" This reverts commit `5f2c136c32`.	11 years ago
Rajat Goel	5f2c136c32	Minor fixes found while trying to compile it using clang on Mac OS X	11 years ago
Haobo Xu	1565dab809	[RocksDB] Enhance Env to support two thread pools LOW and HIGH Summary: this is the ground work for separating memtable flush jobs to their own thread pool. Both SetBackgroundThreads and Schedule take a third parameter Priority to indicate which thread pool they are working on. The names LOW and HIGH are just identifiers for two different thread pools, and does not indicate real difference in 'priority'. We can set number of threads in the pools independently. The thread pool implementation is refactored. Test Plan: make check Reviewers: dhruba, emayanke CC: leveldb Differential Revision: https://reviews.facebook.net/D12885	11 years ago
Haobo Xu	f2f4c8072f	[RocksDB] Added nano second stopwatch and new perf counters to track block read cost Summary: The pupose of this diff is to expose per user-call level precise timing of block read, so that we can answer questions like: a Get() costs me 100ms, is that somehow related to loading blocks from file system, or sth else? We will answer that with EXACTLY how many blocks have been read, how much time was spent on transfering the bytes from os, how much time was spent on checksum verification and how much time was spent on block decompression, just for that one Get. A nano second stopwatch was introduced to track time with higher precision. The cost/precision of the stopwatch is also measured in unit-test. On my dev box, retrieving one time instance costs about 30ns, on average. The deviation of timing results is good enough to track 100ns-1us level events. And the overhead could be safely ignored for 100us level events (10000 instances/s), for example, a viewstate thrift call. Test Plan: perf_context_test, also testing with viewstate shadow traffic. Reviewers: dhruba Reviewed By: dhruba CC: leveldb, xjin Differential Revision: https://reviews.facebook.net/D12351	11 years ago
Dhruba Borthakur	1186192ed1	Replace include/leveldb with include/rocksdb. Summary: Replace include/leveldb with include/rocksdb. Test Plan: make clean; make check make clean; make release Differential Revision: https://reviews.facebook.net/D12489	11 years ago

1 2 3

106 Commits (c2999f54bd775ede3a37b9648b263b608f9b31fa)