rocksdb

Commit Graph

Author	SHA1	Message	Date
Dhruba Borthakur	116ec527f2	Renamed 'hybrid_compaction' tp be "Universal Compaction'. Summary: All the universal compaction parameters are encapsulated in a new file universal_compaction.h Test Plan: make check	12 years ago
Dhruba Borthakur	47c4191fe8	Reduce write amplification by merging files in L0 back into L0 Summary: There is a new option called hybrid_mode which, when switched on, causes HBase style compactions. Files from L0 are compacted back into L0. This meat of this compaction algorithm is in PickCompactionHybrid(). All files reside in L0. That means all files have overlapping keys. Each file has a time-bound, i.e. each file contains a range of keys that were inserted around the same time. The start-seqno and the end-seqno refers to the timeframe when these keys were inserted. Files that have contiguous seqno are compacted together into a larger file. All files are ordered from most recent to the oldest. The current compaction algorithm starts to look for candidate files starting from the most recent file. It continues to add more files to the same compaction run as long as the sum of the files chosen till now is smaller than the next candidate file size. This logic needs to be debated and validated. The above logic should reduce write amplification to a large extent... will publish numbers shortly. Test Plan: dbstress runs for 6 hours with no data corruption (tested so far). Differential Revision: https://reviews.facebook.net/D11289	12 years ago
Dhruba Borthakur	554c06dd18	Reduce write amplification by merging files in L0 back into L0 Summary: There is a new option called hybrid_mode which, when switched on, causes HBase style compactions. Files from L0 are compacted back into L0. This meat of this compaction algorithm is in PickCompactionHybrid(). All files reside in L0. That means all files have overlapping keys. Each file has a time-bound, i.e. each file contains a range of keys that were inserted around the same time. The start-seqno and the end-seqno refers to the timeframe when these keys were inserted. Files that have contiguous seqno are compacted together into a larger file. All files are ordered from most recent to the oldest. The current compaction algorithm starts to look for candidate files starting from the most recent file. It continues to add more files to the same compaction run as long as the sum of the files chosen till now is smaller than the next candidate file size. This logic needs to be debated and validated. The above logic should reduce write amplification to a large extent... will publish numbers shortly. Test Plan: dbstress runs for 6 hours with no data corruption (tested so far). Differential Revision: https://reviews.facebook.net/D11289	12 years ago
Abhishek Kona	5ef6bb8c37	[rocksdb][refactor] statistic printing code to one place Summary: $title Test Plan: db_bench --statistics=1 Reviewers: haobo Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D11373	12 years ago
Haobo Xu	3cc1af2062	[RocksDB] Option for incremental sync Summary: This diff added an option to control the incremenal sync frequency. db_bench has a new flag bytes_per_sync for easy tuning exercise. Test Plan: make check; db_bench Reviewers: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D11295	12 years ago
Abhishek Kona	79f4fd2b62	[Rocksdb] Simplify Printing code in db_bench Summary: simplify the printing code in db_bench use TickersMap and HistogramsNameMap introduced in previous diffs. Test Plan: ./db_bench --statistics=1 and see if all the statistics are printed Reviewers: haobo, dhruba Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D11355	12 years ago
Dhruba Borthakur	6acbe0fc45	Compact multiple memtables before flushing to storage. Summary: Merge multiple multiple memtables in memory before writing it out to a file in L0. There is a new config parameter min_write_buffer_number_to_merge that specifies the number of write buffers that should be merged together to a single file in storage. The system will not flush wrte buffers to storage unless at least these many buffers have accumulated in memory. The default value of this new parameter is 1, which means that a write buffer will be immediately flushed to disk as soon it is ready. Test Plan: make check Differential Revision: https://reviews.facebook.net/D11241	12 years ago
Abhishek Kona	bff718d81c	[Rocksdb] Implement filluniquerandom Summary: Use a bit set to keep track of which random number is generated. Currently only supports single-threaded. All our perf tests are run with threads=1 Copied over bitset implementation from common/datastructures Test Plan: printed the generated keys, and verified all keys were present. Reviewers: MarkCallaghan, haobo, dhruba Reviewed By: MarkCallaghan CC: leveldb Differential Revision: https://reviews.facebook.net/D11247	12 years ago
Deon Nicholas	2a52e1dcb6	Fix db_bench for release build. Test Plan: make release Reviewers: haobo, dhruba, jpaton Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D11307	12 years ago
Deon Nicholas	4985a9f73b	[Rocksdb] [Multiget] Introduced multiget into db_bench Summary: Preliminary! Introduced the --use_multiget=1 and --keys_per_multiget=n flags for db_bench. Also updated and tested the ReadRandom() method to include an option to use multiget. By default, keys_per_multiget=100. Preliminary tests imply that multiget is at least 1.25x faster per key than regular get. Will continue adding Multiget for ReadMissing, ReadHot, RandomWithVerify, ReadRandomWriteRandom; soon. Will also think about ways to better verify benchmarks. Test Plan: 1. make db_bench 2. ./db_bench --benchmarks=fillrandom 3. ./db_bench --benchmarks=readrandom --use_existing_db=1 --use_multiget=1 --threads=4 --keys_per_multiget=100 4. ./db_bench --benchmarks=readrandom --use_existing_db=1 --threads=4 5. Verify ops/sec (and 1000000 of 1000000 keys found) Reviewers: haobo, MarkCallaghan, dhruba Reviewed By: MarkCallaghan CC: leveldb Differential Revision: https://reviews.facebook.net/D11127	12 years ago
Haobo Xu	bdf1085944	[RocksDB] cleanup EnvOptions Summary: This diff simplifies EnvOptions by treating it as POD, similar to Options. - virtual functions are removed and member fields are accessed directly. - StorageOptions is removed. - Options.allow_readahead and Options.allow_readahead_compactions are deprecated. - Unused global variables are removed: useOsBuffer, useFsReadAhead, useMmapRead, useMmapWrite Test Plan: make check; db_stress Reviewers: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D11175	12 years ago
Abhishek Kona	e982b5a489	[Rocksdb] measure table open io in a histogram Summary: as title Test Plan: db_bench --statistics=1 check for statistic. Reviewers: dhruba, haobo Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D11109	12 years ago
Abhishek Kona	d91b42ee27	[Rocksdb] Measure all FSYNC/SYNC times Summary: Add stop watches around all sync calls. Test Plan: db_bench check if respective histograms are printed Reviewers: haobo, dhruba Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D11073	12 years ago
Haobo Xu	d897d33bf1	[RocksDB] Introduce Fast Mutex option Summary: This diff adds an option to specify whether PTHREAD_MUTEX_ADAPTIVE_NP will be enabled for the rocksdb single big kernel lock. db_bench also have this option now. Quickly tested 8 thread cpu bound 100 byte random read. No fast mutex: ~750k/s ops With fast mutex: ~880k/s ops Test Plan: make check; db_bench; db_stress Reviewers: dhruba CC: MarkCallaghan, leveldb Differential Revision: https://reviews.facebook.net/D11031	12 years ago
Haobo Xu	ab8d2f6ab2	[RocksDB] [Performance] Allow different posix advice to be applied to the same table file Summary: Current posix advice implementation ties up the access pattern hint with the creation of a file. It is not possible to apply different advice for different access (random get vs compaction read), without keeping two open files for the same table. This patch extended the RandomeAccessFile interface to accept new access hint at anytime. Particularly, we are able to set different access hint on the same table file based on when/how the file is used. Two options are added to set the access hint, after the file is first opened and after the file is being compacted. Test Plan: make check; db_stress; db_bench Reviewers: dhruba Reviewed By: dhruba CC: MarkCallaghan, leveldb Differential Revision: https://reviews.facebook.net/D10905	12 years ago
Haobo Xu	c2e2460f8a	[RocksDB] Expose DBStatistics Summary: Make Statistics usable by client Test Plan: make check; db_bench Reviewers: dhruba Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D10899	12 years ago
Dhruba Borthakur	d1aaaf718c	Ability to set different size fanout multipliers for every level. Summary: There is an existing field Options.max_bytes_for_level_multiplier that sets the multiplier for the size of each level in the database. This patch introduces the ability to set different multipliers for every level in the database. The size of a level is determined by using both max_bytes_for_level_multiplier as well as the per-level fanout. size of level[i] = size of level[i-1] * max_bytes_for_level_multiplier * fanout[i-1] The default value of fanout is 1, so that it is backward compatible. Test Plan: make check Reviewers: haobo, emayanke Reviewed By: emayanke CC: leveldb Differential Revision: https://reviews.facebook.net/D10863	12 years ago
Abhishek Kona	fb96ec1686	[RocksDB] Print all internally collected histograms in db_bench. Also print p95 Summary: $title Test Plan: make db_bench . run db_bench and check for expected output Reviewers: haobo, dhruba Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D10521	12 years ago
Abhishek Kona	344e832f55	[RocksDB] Fix ReadMissing in db_bench Summary: D8943 Broke read_missing. Fix it by adding a "." at the end of the generated key Test Plan: generate, print and check the key has a "." Reviewers: dhruba, haobo Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D10455	12 years ago
Mark Callaghan	b1ff9ac9c5	Add --writes_per_second rate limit, print p99.99 in histogram Summary: Adds the --writes_per_second rate limit for the readwhilewriting test. The purpose is to optionally avoid saturating storage with writes & compaction and test read response time when some writes are being done. Changes the histogram code to also print the p99.99 value Task ID: # Blame Rev: Test Plan: make check, ran db_bench with it Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin PUBLIC platform impact section - Bugzilla: # - end platform impact - Reviewers: haobo Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D10305	12 years ago
Haobo Xu	1255dcd446	[RocksDB] Add stacktrace signal handler Summary: This diff provides the ability to print out a stacktrace when the process receives certain signals. Currently, we enable this for the following signals (program error related): SIGILL SIGSEGV SIGBUS SIGABRT Application simply #include "util/stack_trace.h" and call leveldb::InstallStackTraceHandler() during initialization, if signal handler is needed. It's not done automatically when openning db, because it's the application(process)'s responsibility to install signal handler and some applications might already have their own (like fbcode). Sample output: Received signal 11 (Segmentation fault) #0 0x408ff0 ./signal_test() [0x408ff0] /home/haobo/rocksdb/util/signal_test.cc:4 #1 0x40827d ./signal_test() [0x40827d] /home/haobo/rocksdb/util/signal_test.cc:24 #2 0x7f8bb183172e /usr/local/fbcode/gcc-4.7.1-glibc-2.14.1/lib/libc.so.6(__libc_start_main+0x10e) [0x7f8bb183172e] ??:0 #3 0x408ebc ./signal_test() [0x408ebc] /home/engshare/third-party/src/glibc/glibc-2.14.1/glibc-2.14.1/csu/../sysdeps/x86_64/elf/start.S:113 Segmentation fault (core dumped) For each frame, we print the raw pointer, the symbol provided by backtrace_symbols (still not good enough), and the source file/line. Note that address translation is done by directly shell out to addr2line. ??:0 means addr2line fails to do the translation. Hacky, but I think it's good for now. Test Plan: signal_test.cc Reviewers: dhruba, MarkCallaghan Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D10173	12 years ago
Abhishek Kona	63f216ee0a	memory manage statistics Summary: Earlier Statistics object was a raw pointer. This meant the user had to clear up the Statistics object after creating the database. In most use cases the database is created in a function and the statistics pointer is out of scope. Hence the statistics object would never be deleted. Now Using a shared_ptr to manage this. Want this in before the next release. Test Plan: make all check. Reviewers: dhruba, emayanke Reviewed By: emayanke CC: leveldb Differential Revision: https://reviews.facebook.net/D9735	12 years ago
Dhruba Borthakur	ad96563b79	Ability to configure bufferedio-reads, filesystem-readaheads and mmap-read-write per database. Summary: This patch allows an application to specify whether to use bufferedio, reads-via-mmaps and writes-via-mmaps per database. Earlier, there was a global static variable that was used to configure this functionality. The default setting remains the same (and is backward compatible): 1. use bufferedio 2. do not use mmaps for reads 3. use mmap for writes 4. use readaheads for reads needed for compaction I also added a parameter to db_bench to be able to explicitly specify whether to do readaheads for compactions or not. Test Plan: make check Reviewers: sheki, heyongqiang, MarkCallaghan Reviewed By: sheki CC: leveldb Differential Revision: https://reviews.facebook.net/D9429	12 years ago
Mayank Agarwal	487168cdcf	Fixed sign-comparison in rocksdb code-base and fixed Makefile Summary: Makefile had options to ignore sign-comparisons and unused-parameters, which should be there. Also fixed the specific errors in the code-base Test Plan: make Reviewers: chip, dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D9531	12 years ago
Mark Callaghan	72d14eafd3	add --benchmarks=levelstats option to db_bench, prevent "nan" in stats output Summary: Add --benchmarks=levelstats option to report per-level stats (#files, #bytes) Change readwhilewriting test to report response time for writes but exclude them from the stats merged by all threads. Prevent "NaN" in stats output by preventing division by 0. Remove "o" file I committed by mistake. Task ID: # Blame Rev: Test Plan: make check Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin PUBLIC platform impact section - Bugzilla: # - end platform impact - Reviewers: dhruba Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D9513	12 years ago
Mark Callaghan	5a8c8845a9	Enhance db_bench Summary: Add --benchmarks=updaterandom for read-modify-write workloads. This is different from --benchmarks=readrandomwriterandom in a few ways. First, an "operation" is the combined time to do the read & write rather than treating them as two ops. Second, the same key is used for the read & write. Change RandomGenerator to support rows larger than 1M. That was using "assert" to fail and assert is compiled-away when -DNDEBUG is used. Add more options to db_bench --duration - sets the number of seconds for tests to run. When not set the operation count continues to be the limit. This is used by random operation tests. --use_snapshot - when set GetSnapshot() is called prior to each random read. This is to measure the overhead from using snapshots. --get_approx - when set GetApproximateSizes() is called prior to each random read. This is to measure the overhead for a query optimizer. Task ID: # Blame Rev: Test Plan: run db_bench Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin PUBLIC platform impact section - Bugzilla: # - end platform impact - Reviewers: dhruba Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D9267	12 years ago
Mark Callaghan	993543d1be	Add rate_delay_limit_milliseconds Summary: This adds the rate_delay_limit_milliseconds option to make the delay configurable in MakeRoomForWrite when the max compaction score is too high. This delay is called the Ln slowdown. This change also counts the Ln slowdown per level to make it possible to see where the stalls occur. From IO-bound performance testing, the Level N stalls occur: * with compression -> at the largest uncompressed level. This makes sense because compaction for compressed levels is much slower. When Lx is uncompressed and Lx+1 is compressed then files pile up at Lx because the (Lx,Lx+1)->Lx+1 compaction process is the first to be slowed by compression. * without compression -> at level 1 Task ID: #1832108 Blame Rev: Test Plan: run with real data, added test Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin PUBLIC platform impact section - Bugzilla: # - end platform impact - Reviewers: dhruba Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D9045	12 years ago
bil	4992633751	enable the ability to set key size in db_bench in rocksdb Summary: 1. the default value for key size is still 16 2. enable the ability to set the key size via command line --key_size= Test Plan: build & run db_banch and pass some value via command line. verify it works correctly. Reviewers: sheki Reviewed By: sheki CC: leveldb Differential Revision: https://reviews.facebook.net/D8943	12 years ago
Abhishek Kona	a9866b721b	Refactor statistics. Remove individual functions like incNumFileOpens Summary: Use only the counter mechanism. Do away with incNumFileOpens, incNumFileClose, incNumFileErrors s/NULL/nullptr/g in db/table_cache.cc Test Plan: make clean check Reviewers: dhruba, heyongqiang, emayanke Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D8841	12 years ago
Vamsi Ponnekanti	6abb30d4d0	[Missed adding cmdline parsing for new flags added in D8685] Summary: I had added FLAGS_numdistinct and FLAGS_deletepercent for randomwithverify but forgot to add cmdline parsing for those flags. Test Plan: [nponnekanti@dev902 /data/users/nponnekanti/rocksdb] ./db_bench --benchmarks=randomwithverify --numdistinct=500 LevelDB: version 1.5 Date: Thu Feb 21 10:34:40 2013 CPU: 24 * Intel(R) Xeon(R) CPU X5650 @ 2.67GHz CPUCache: 12288 KB Keys: 16 bytes each Values: 100 bytes each (50 bytes after compression) Entries: 1000000 RawSize: 110.6 MB (estimated) FileSize: 62.9 MB (estimated) Compression: snappy WARNING: Assertions are enabled; benchmarks unnecessarily slow ------------------------------------------------ Created bg thread 0x7fbf90bff700 randomwithverify : 4.693 micros/op 213098 ops/sec; ( get:900000 put:80000 del:20000 total:1000000 found:714556) [nponnekanti@dev902 /data/users/nponnekanti/rocksdb] ./db_bench --benchmarks=randomwithverify --deletepercent=5 LevelDB: version 1.5 Date: Thu Feb 21 10:35:03 2013 CPU: 24 * Intel(R) Xeon(R) CPU X5650 @ 2.67GHz CPUCache: 12288 KB Keys: 16 bytes each Values: 100 bytes each (50 bytes after compression) Entries: 1000000 RawSize: 110.6 MB (estimated) FileSize: 62.9 MB (estimated) Compression: snappy WARNING: Assertions are enabled; benchmarks unnecessarily slow ------------------------------------------------ Created bg thread 0x7fe14dfff700 randomwithverify : 4.883 micros/op 204798 ops/sec; ( get:900000 put:50000 del:50000 total:1000000 found:443847) [nponnekanti@dev902 /data/users/nponnekanti/rocksdb] [nponnekanti@dev902 /data/users/nponnekanti/rocksdb] ./db_bench --benchmarks=randomwithverify --deletepercent=5 --numdistinct=500 LevelDB: version 1.5 Date: Thu Feb 21 10:36:18 2013 CPU: 24 * Intel(R) Xeon(R) CPU X5650 @ 2.67GHz CPUCache: 12288 KB Keys: 16 bytes each Values: 100 bytes each (50 bytes after compression) Entries: 1000000 RawSize: 110.6 MB (estimated) FileSize: 62.9 MB (estimated) Compression: snappy WARNING: Assertions are enabled; benchmarks unnecessarily slow ------------------------------------------------ Created bg thread 0x7fc31c7ff700 randomwithverify : 4.920 micros/op 203233 ops/sec; ( get:900000 put:50000 del:50000 total:1000000 found:445522) Revert Plan: OK Task ID: # Reviewers: dhruba, emayanke Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D8769	12 years ago
Vamsi Ponnekanti	945d2b59b9	[Add randomwithverify benchmark option] Summary: Added RandomWithVerify benchmark option. Test Plan: This whole diff is to test. [nponnekanti@dev902 /data/users/nponnekanti/rocksdb] ./db_bench --benchmarks=randomwithverify LevelDB: version 1.5 Date: Tue Feb 19 17:50:28 2013 CPU: 24 * Intel(R) Xeon(R) CPU X5650 @ 2.67GHz CPUCache: 12288 KB Keys: 16 bytes each Values: 100 bytes each (50 bytes after compression) Entries: 1000000 RawSize: 110.6 MB (estimated) FileSize: 62.9 MB (estimated) Compression: snappy WARNING: Assertions are enabled; benchmarks unnecessarily slow ------------------------------------------------ Created bg thread 0x7fa9c3fff700 randomwithverify : 5.004 micros/op 199836 ops/sec; ( get:900000 put:80000 del:20000 total:1000000 found:711992) Revert Plan: OK Task ID: # Reviewers: dhruba, emayanke Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D8685	12 years ago
Abhishek Kona	fe10200ddc	Introduce histogram in statistics.h Summary: * Introduce is histogram in statistics.h * stop watch to measure time. * introduce two timers as a poc. Replaced NULL with nullptr to fight some lint errors Should be useful for google. Test Plan: ran db_bench and check stats. make all check Reviewers: dhruba, heyongqiang Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D8637	12 years ago
Chip Turner	0b83a83191	Fix poor error on num_levels mismatch and few other minor improvements Summary: Previously, if you opened a db with num_levels set lower than the database, you received the unhelpful message "Corruption: VersionEdit: new-file entry." Now you get a more verbose message describing the issue. Also, fix handling of compression_levels (both the run-over-the-end issue and the memory management of it). Lastly, unique_ptr'ify a couple of minor calls. Test Plan: make check Reviewers: dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D8151	12 years ago
Chip Turner	2fdf91a4f8	Fix a number of object lifetime/ownership issues Summary: Replace manual memory management with std::unique_ptr in a number of places; not exhaustive, but this fixes a few leaks with file handles as well as clarifies semantics of the ownership of file handles with log classes. Test Plan: db_stress, make check Reviewers: dhruba Reviewed By: dhruba CC: zshao, leveldb, heyongqiang Differential Revision: https://reviews.facebook.net/D8043	12 years ago
Dhruba Borthakur	628dc2aad9	db_bench should use the default value for max_grandparent_overlap_factor. Summary: This was a peformance regression caused by https://reviews.facebook.net/D6729. The default value of max_grandparent_overlap_factor was erroneously set to 0 in db_bench. This was causing compactions to create really really small files because the max_grandparent_overlap_factor was erroneously set to zero in the benchmark. Test Plan: Run --benchmarks=overwrite Reviewers: heyongqiang, emayanke, sheki, MarkCallaghan Reviewed By: sheki CC: leveldb Differential Revision: https://reviews.facebook.net/D7797	12 years ago
Mark Callaghan	4069f66cc7	Add --seed, --read_range to db_bench Summary: Adds the option --seed to db_bench to specify the base for the per-thread RNG. When not set each thread uses the same value across runs of db_bench which defeats IO stress testing. Adds the option --read_range. When set to a value > 1 an iterator is created and each query done for the randomread benchmark will do a range scan for that many rows. When not set or set to 1 the existing behavior (a point lookup) is done. Fixes a bug where a printf format string was missing. Test Plan: run db_bench Reviewers: dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D7749	12 years ago
sheki	d4627e6de4	Move WAL files to archive directory, instead of deleting. Summary: Create a directory "archive" in the DB directory. During DeleteObsolteFiles move the WAL files (*.log) to the Archive directory, instead of deleting. Test Plan: Created a DB using DB_Bench. Reopened it. Checked if files move. Reviewers: dhruba Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D6975	12 years ago
Abhishek Kona	d29f181923	Fix all the lint errors. Summary: Scripted and removed all trailing spaces and converted all tabs to spaces. Also fixed other lint errors. All lint errors from this point of time should be taken seriously. Test Plan: make all check Reviewers: dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D7059	12 years ago
Dhruba Borthakur	7632fdb5cb	Support taking a configurable number of files from the same level to compact in a single compaction run. Summary: The compaction process takes some files from LevelK and merges it into LevelK+1. The number of files it picks from LevelK was capped such a way that the total amount of data picked does not exceed the maxfilesize of that level. This essentially meant that only one file from LevelK is picked for a single compaction. For bulkloads, we would like to take many many file from LevelK and compact them using a single compaction run. This patch introduces a option called the 'source_compaction_factor' (similar to expanded_compaction_factor). It is a multiplier that is multiplied by the maxfilesize of that level to arrive at the limit that is used to throttle the number of source files from LevelK. For bulk loads, set source_compaction_factor to a very high number so that multiple files from the same level are picked for compaction in a single compaction. The default value of source_compaction_factor is 1, so that we can keep backward compatibilty with existing compaction semantics. Test Plan: make clean check Reviewers: emayanke, sheki Reviewed By: emayanke CC: leveldb Differential Revision: https://reviews.facebook.net/D6867	12 years ago
Dhruba Borthakur	fbb73a4ac3	Support to disable background compactions on a database. Summary: This option is needed for fast bulk uploads. The goal is to load all the data into files in L0 without any interference from background compactions. Test Plan: make clean check Reviewers: sheki Reviewed By: sheki CC: leveldb Differential Revision: https://reviews.facebook.net/D6849	12 years ago
Dhruba Borthakur	e988c11f58	Enhance db_bench to be able to specify a grandparent_overlap_factor. Summary: The value specified in max_grandparent_overlap_factor is used to limit the file size in a compaction run. This patch makes it configurable when using db_bench. Test Plan: make clean db_bench Reviewers: MarkCallaghan, heyongqiang Reviewed By: heyongqiang CC: leveldb Differential Revision: https://reviews.facebook.net/D6729	12 years ago
Dhruba Borthakur	a785e029f7	The db_bench utility was broken in 1.5.4.fb because of a signed-unsigned comparision. Summary: The db_bench utility was broken in 1.5.4.fb because of a signed-unsigned comparision. The static variable FLAGS_min_level_to_compress was recently changed from int to 'unsigned in' but it is initilized to a nagative value -1. The segfault is of this type: Program received signal SIGSEGV, Segmentation fault. Open (this=0x7fffffffdee0) at db/db_bench.cc:939 939 db/db_bench.cc: No such file or directory. (gdb) where Test Plan: run db_bench with no options. Reviewers: heyongqiang Reviewed By: heyongqiang CC: MarkCallaghan, emayanke, sheki Differential Revision: https://reviews.facebook.net/D6663	12 years ago
Abhishek Kona	0f8e4721a5	Metrics: record compaction drop's and bloom filter effectiveness Summary: Record BloomFliter hits and drop off reasons during compaction. Test Plan: Unit tests work. Reviewers: dhruba, heyongqiang Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D6591	12 years ago
heyongqiang	20d18a89a3	disable size compaction in ldb reduce_levels and added compression and file size parameter to it Summary: disable size compaction in ldb reduce_levels, this will avoid compactions rather than the manual comapction, added --compression=none\|snappy\|zlib\|bzip2 and --file_size= per-file size to ldb reduce_levels command Test Plan: run ldb Reviewers: dhruba, MarkCallaghan Reviewed By: dhruba CC: sheki, emayanke Differential Revision: https://reviews.facebook.net/D6597	12 years ago
Abhishek Kona	391885c4e4	stat's collection in leveldb Summary: Prototype stat's collection. Diff is a good estimate of what the final code will look like. A few assumptions : * Used a global static instance of the statistics object. Plan to pass it to each internal function. Static allows metrics only at app level. * In the Ticker's do not do any locking. Depend on the mutex at each function of LevelDB. If we ever remove the mutex, we should change here too. The other option is use atomic objects anyways as there won't be any contention as they will be always acquired only by one thread. * The counters are dumb, increment through lifecycle. Plan to use ods etc to get last5min stat etc. Test Plan: made changes in db_bench Ran ./db_bench --statistics=1 --num=10000 --cache_size=5000 This will print the cache hit/miss stats. Reviewers: dhruba, heyongqiang Differential Revision: https://reviews.facebook.net/D6441	12 years ago
heyongqiang	3fcf533ed0	Add a readonly db Summary: as subject Test Plan: run db_bench readrandom Reviewers: dhruba Reviewed By: dhruba CC: MarkCallaghan, emayanke, sheki Differential Revision: https://reviews.facebook.net/D6495	12 years ago
Dhruba Borthakur	aa42c66814	Fix all warnings generated by -Wall option to the compiler. Summary: The default compilation process now uses "-Wall" to compile. Fix all compilation error generated by gcc. Test Plan: make all check Reviewers: heyongqiang, emayanke, sheki Reviewed By: heyongqiang CC: MarkCallaghan Differential Revision: https://reviews.facebook.net/D6525	12 years ago
amayank	854c66b089	Make compression options configurable. These include window-bits, level and strategy for ZlibCompression Summary: Leveldb currently uses windowBits=-14 while using zlib compression.(It was earlier 15). This makes the setting configurable. Related changes here: https://reviews.facebook.net/D6105 Test Plan: make all check Reviewers: dhruba, MarkCallaghan, sheki, heyongqiang Differential Revision: https://reviews.facebook.net/D6393	12 years ago
heyongqiang	3096fa7534	Add two more options: disable block cache and make table cache shard number configuable Summary: as subject Test Plan: run db_bench and db_test Reviewers: dhruba Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D6111	12 years ago
Dhruba Borthakur	321dfdc3ae	Allow having different compression algorithms on different levels. Summary: The leveldb API is enhanced to support different compression algorithms at different levels. This adds the option min_level_to_compress to db_bench that specifies the minimum level for which compression should be done when compression is enabled. This can be used to disable compression for levels 0 and 1 which are likely to suffer from stalls because of the CPU load for memtable flushes and (L0,L1) compaction. Level 0 is special as it gets frequent memtable flushes. Level 1 is special as it frequently gets all:all file compactions between it and level 0. But all other levels could be the same. For any level N where N > 1, the rate of sequential IO for that level should be the same. The last level is the exception because it might not be full and because files from it are not read to compact with the next larger level. The same amount of time will be spent doing compaction at any level N excluding N=0, 1 or the last level. By this standard all of those levels should use the same compression. The difference is that the loss (using more disk space) from a faster compression algorithm is less significant for N=2 than for N=3. So we might be willing to trade disk space for faster write rates with no compression for L0 and L1, snappy for L2, zlib for L3. Using a faster compression algorithm for the mid levels also allows us to reclaim some cpu without trading off much loss in disk space overhead. Also note that little is to be gained by compressing levels 0 and 1. For a 4-level tree they account for 10% of the data. For a 5-level tree they account for 1% of the data. With compression enabled: * memtable flush rate is ~18MB/second * (L0,L1) compaction rate is ~30MB/second With compression enabled but min_level_to_compress=2 * memtable flush rate is ~320MB/second * (L0,L1) compaction rate is ~560MB/second This practicaly takes the same code from https://reviews.facebook.net/D6225 but makes the leveldb api more general purpose with a few additional lines of code. Test Plan: make check Differential Revision: https://reviews.facebook.net/D6261	12 years ago

1 2 3 4 5

220 Commits (b08c39e14f1119ecb462e6264ad0ccf390d1d712)