rocksdb

Commit Graph

Author	SHA1	Message	Date
Dhruba Borthakur	38671c4d54	Fix a race condition while processing tasks by background threads. Summary: Suppose you submit 100 background tasks one after another. The first enqueu task finds that the queue is empty and wakes up one worker thread. Now suppose that all remaining 99 work items are enqueued, they do not wake up any worker threads because the queue is already non-empty. This causes a situation when there are 99 tasks in the task queue but only one worker thread is processing a task while the remaining worker threads are waiting. The fix is to always wakeup one worker thread while enqueuing a task. I also added a check to count the number of elements in the queue to help in debugging. Test Plan: make clean check. Reviewers: chip Reviewed By: chip CC: leveldb Differential Revision: https://reviews.facebook.net/D7203	13 years ago
Zheng Shao	768edfaaed	ldb: Add compression and bloom filter options. Summary: Added the following two options: [--bloom_bits=<int,e.g.:14>] [--compression_type=<no\|snappy\|zlib\|bzip2>] These options will be used when ldb opens the leveldb database. Test Plan: Tried by hand for both success and failure cases. We do need a test framework. Reviewers: dhruba, emayanke, sheki Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D7197	13 years ago
Kosie van der Merwe	f69e9f3e04	Fixed off by 1 in tests. Summary: Added 1 to indices where I shouldn't have so overrun array. Test Plan: make check Reviewers: sheki, emayanke, vamsi, dhruba Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D7227	13 years ago
Kosie van der Merwe	0eb0c9bb82	Added methods to write small ints to bit streams. Summary: Added BitStreamPutInt() and BitStreamGetInt() which take a stream of chars and can write integers of arbitrary bit sizes to that stream at arbitrary positions. There are also convenience versions of these functions that take std::strings and leveldb::Slices. Test Plan: make check Reviewers: sheki, vamsi, dhruba, emayanke Reviewed By: vamsi CC: leveldb Differential Revision: https://reviews.facebook.net/D7071	13 years ago
sheki	d4627e6de4	Move WAL files to archive directory, instead of deleting. Summary: Create a directory "archive" in the DB directory. During DeleteObsolteFiles move the WAL files (*.log) to the Archive directory, instead of deleting. Test Plan: Created a DB using DB_Bench. Reopened it. Checked if files move. Reviewers: dhruba Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D6975	13 years ago
Abhishek Kona	d29f181923	Fix all the lint errors. Summary: Scripted and removed all trailing spaces and converted all tabs to spaces. Also fixed other lint errors. All lint errors from this point of time should be taken seriously. Test Plan: make all check Reviewers: dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D7059	13 years ago
Chip Turner	6caf3b8e4b	Fix broken test; some ldb commands can run without a db_ Summary: It would appear our unit tests make use of code from ldb_cmd, and don't always require a valid database handle. D6855 was not aware db_ could sometimes be NULL for such commands, and so it broke reduce_levels_test. This moves the check elsewhere to (at least) fix the 'ldb dump' case of segfaulting when it couldn't open a database. Test Plan: make check Reviewers: dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D6903	13 years ago
Chip Turner	879e45eb99	Fix ldb segfault and use static libsnappy for all builds Summary: Link statically against snappy, using the gvfs one for facebook environments, and the bundled one otherwise. In addition, fix a few minor segfaults in ldb when it couldn't open the database, and update .gitignore to include a few other build artifacts. Test Plan: make check Reviewers: dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D6855	13 years ago
Dhruba Borthakur	7632fdb5cb	Support taking a configurable number of files from the same level to compact in a single compaction run. Summary: The compaction process takes some files from LevelK and merges it into LevelK+1. The number of files it picks from LevelK was capped such a way that the total amount of data picked does not exceed the maxfilesize of that level. This essentially meant that only one file from LevelK is picked for a single compaction. For bulkloads, we would like to take many many file from LevelK and compact them using a single compaction run. This patch introduces a option called the 'source_compaction_factor' (similar to expanded_compaction_factor). It is a multiplier that is multiplied by the maxfilesize of that level to arrive at the limit that is used to throttle the number of source files from LevelK. For bulk loads, set source_compaction_factor to a very high number so that multiple files from the same level are picked for compaction in a single compaction. The default value of source_compaction_factor is 1, so that we can keep backward compatibilty with existing compaction semantics. Test Plan: make clean check Reviewers: emayanke, sheki Reviewed By: emayanke CC: leveldb Differential Revision: https://reviews.facebook.net/D6867	13 years ago
Dhruba Borthakur	fbb73a4ac3	Support to disable background compactions on a database. Summary: This option is needed for fast bulk uploads. The goal is to load all the data into files in L0 without any interference from background compactions. Test Plan: make clean check Reviewers: sheki Reviewed By: sheki CC: leveldb Differential Revision: https://reviews.facebook.net/D6849	13 years ago
Abhishek Kona	661dc15721	Fix LDB dumpwal to print the messages as in the file. Summary: StringStream.clear() does not clear the stream. It sets some flags. Who knew? Fixing that is not printing the stuff again and again. Test Plan: ran it on a local db Reviewers: dhruba, emayanke Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D6795	13 years ago
Abhishek Kona	30742e1692	LDB can read WAL. Summary: Add option to read WAL and print a summary for each record. facebook task => #1885013 E.G. Output : ./ldb dump_wal --walfile=/tmp/leveldbtest-5907/dbbench/026122.log --header Sequence,Count,ByteSize 49981,1,100033 49981,1,100033 49982,1,100033 49981,1,100033 49982,1,100033 49983,1,100033 49981,1,100033 49982,1,100033 49983,1,100033 49984,1,100033 49981,1,100033 49982,1,100033 Test Plan: Works run ./ldb read_wal --wal-file=/tmp/leveldbtest-5907/dbbench/000078.log --header Reviewers: dhruba, heyongqiang Reviewed By: dhruba CC: emayanke, leveldb, zshao Differential Revision: https://reviews.facebook.net/D6675	13 years ago
Abhishek Kona	b648401adb	Fix LDB dumpwal to print the messages as in the file. Summary: StringStream.clear() does not clear the stream. It sets some flags. Who knew? Fixing that is not printing the stuff again and again. Test Plan: ran it on a local db Reviewers: dhruba, emayanke Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D6795	13 years ago
Abhishek Kona	f5cdf931a0	LDB can read WAL. Summary: Add option to read WAL and print a summary for each record. facebook task => #1885013 E.G. Output : ./ldb dump_wal --walfile=/tmp/leveldbtest-5907/dbbench/026122.log --header Sequence,Count,ByteSize 49981,1,100033 49981,1,100033 49982,1,100033 49981,1,100033 49982,1,100033 49983,1,100033 49981,1,100033 49982,1,100033 49983,1,100033 49984,1,100033 49981,1,100033 49982,1,100033 Test Plan: Works run ./ldb read_wal --wal-file=/tmp/leveldbtest-5907/dbbench/000078.log --header Reviewers: dhruba, heyongqiang Reviewed By: dhruba CC: emayanke, leveldb, zshao Differential Revision: https://reviews.facebook.net/D6675	13 years ago
Dhruba Borthakur	5d16e503a6	Improved CompactionFilter api: pass in a opaque argument to CompactionFilter invocation. Summary: There are applications that operate on multiple leveldb instances. These applications will like to pass in an opaque type for each leveldb instance and this type should be passed back to the application with every invocation of the CompactionFilter api. Test Plan: Enehanced unit test for opaque parameter to CompactionFilter. Reviewers: heyongqiang Reviewed By: heyongqiang CC: MarkCallaghan, sheki, emayanke Differential Revision: https://reviews.facebook.net/D6711	13 years ago
heyongqiang	c64796fd34	Fix test failure of reduce_num_levels Summary: I changed the reduce_num_levels logic to avoid "compactRange()" call if the current number of levels in use (levels that contain files) is smaller than the new num of levels. And that change breaks the assert in reduce_levels_test Test Plan: run reduce_levels_test Reviewers: dhruba, MarkCallaghan Reviewed By: dhruba CC: emayanke, sheki Differential Revision: https://reviews.facebook.net/D6651	13 years ago
Dhruba Borthakur	9c6c232e47	Compilation error while compiling with OPT=-g Summary: make clean check OPT=-g fails leveldb::DBStatistics::getTickerCount(leveldb::Tickers)’: ./db/db_statistics.h:34: error: ‘MAX_NO_TICKERS’ was not declared in this scope util/ldb_cmd.cc:255: warning: left shift count >= width of type Test Plan: make clean check OPT=-g Reviewers: CC: Task ID: # Blame Rev:	13 years ago
heyongqiang	20d18a89a3	disable size compaction in ldb reduce_levels and added compression and file size parameter to it Summary: disable size compaction in ldb reduce_levels, this will avoid compactions rather than the manual comapction, added --compression=none\|snappy\|zlib\|bzip2 and --file_size= per-file size to ldb reduce_levels command Test Plan: run ldb Reviewers: dhruba, MarkCallaghan Reviewed By: dhruba CC: sheki, emayanke Differential Revision: https://reviews.facebook.net/D6597	13 years ago
Dhruba Borthakur	aa42c66814	Fix all warnings generated by -Wall option to the compiler. Summary: The default compilation process now uses "-Wall" to compile. Fix all compilation error generated by gcc. Test Plan: make all check Reviewers: heyongqiang, emayanke, sheki Reviewed By: heyongqiang CC: MarkCallaghan Differential Revision: https://reviews.facebook.net/D6525	13 years ago
Dhruba Borthakur	5273c81483	Ability to invoke application hook for every key during compaction. Summary: There are certain use-cases where the application intends to delete older keys aftre they have expired a certian time period. One option for those applications is to periodically scan the entire database and delete appropriate keys. A better way is to allow the application to hook into the compaction process. This patch allows the application to set a method callback for every key that is being compacted. If this method returns true, then the key is not preserved in the output of the compaction. Test Plan: This is mostly to preview the proposed new public api. Since it is a public api, please do due diligence on reviewing it. I will be writing test cases for this api in mynext version of this patch. Reviewers: MarkCallaghan, heyongqiang Reviewed By: heyongqiang CC: sheki, adsharma Differential Revision: https://reviews.facebook.net/D6285	13 years ago
heyongqiang	d55c2ba305	Add a tool to change number of levels Summary: as subject. Test Plan: manually test it, will add a testcase Reviewers: dhruba, MarkCallaghan Differential Revision: https://reviews.facebook.net/D6345	13 years ago
amayank	854c66b089	Make compression options configurable. These include window-bits, level and strategy for ZlibCompression Summary: Leveldb currently uses windowBits=-14 while using zlib compression.(It was earlier 15). This makes the setting configurable. Related changes here: https://reviews.facebook.net/D6105 Test Plan: make all check Reviewers: dhruba, MarkCallaghan, sheki, heyongqiang Differential Revision: https://reviews.facebook.net/D6393	13 years ago
heyongqiang	3096fa7534	Add two more options: disable block cache and make table cache shard number configuable Summary: as subject Test Plan: run db_bench and db_test Reviewers: dhruba Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D6111	13 years ago
Dhruba Borthakur	321dfdc3ae	Allow having different compression algorithms on different levels. Summary: The leveldb API is enhanced to support different compression algorithms at different levels. This adds the option min_level_to_compress to db_bench that specifies the minimum level for which compression should be done when compression is enabled. This can be used to disable compression for levels 0 and 1 which are likely to suffer from stalls because of the CPU load for memtable flushes and (L0,L1) compaction. Level 0 is special as it gets frequent memtable flushes. Level 1 is special as it frequently gets all:all file compactions between it and level 0. But all other levels could be the same. For any level N where N > 1, the rate of sequential IO for that level should be the same. The last level is the exception because it might not be full and because files from it are not read to compact with the next larger level. The same amount of time will be spent doing compaction at any level N excluding N=0, 1 or the last level. By this standard all of those levels should use the same compression. The difference is that the loss (using more disk space) from a faster compression algorithm is less significant for N=2 than for N=3. So we might be willing to trade disk space for faster write rates with no compression for L0 and L1, snappy for L2, zlib for L3. Using a faster compression algorithm for the mid levels also allows us to reclaim some cpu without trading off much loss in disk space overhead. Also note that little is to be gained by compressing levels 0 and 1. For a 4-level tree they account for 10% of the data. For a 5-level tree they account for 1% of the data. With compression enabled: * memtable flush rate is ~18MB/second * (L0,L1) compaction rate is ~30MB/second With compression enabled but min_level_to_compress=2 * memtable flush rate is ~320MB/second * (L0,L1) compaction rate is ~560MB/second This practicaly takes the same code from https://reviews.facebook.net/D6225 but makes the leveldb api more general purpose with a few additional lines of code. Test Plan: make check Differential Revision: https://reviews.facebook.net/D6261	13 years ago
Mark Callaghan	70c42bf05f	Adds DB::GetNextCompaction and then uses that for rate limiting db_bench Summary: Adds a method that returns the score for the next level that most needs compaction. That method is then used by db_bench to rate limit threads. Threads are put to sleep at the end of each stats interval until the score is less than the limit. The limit is set via the --rate_limit=$double option. The specified value must be > 1.0. Also adds the option --stats_per_interval to enable additional metrics reported every stats interval. Task ID: # Blame Rev: Test Plan: run db_bench Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin PUBLIC platform impact section - Bugzilla: # - end platform impact - Reviewers: dhruba Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D6243	13 years ago
Kai Liu	8965c8d0b9	Add the missing util/auto_split_logger.h Summary: Test Plan: Reviewers: CC: Task ID: 1803577 Blame Rev:	13 years ago
Kai Liu	d50f8eb603	Enable LevelDb to create a new log file if current log file is too large. Summary: Enable LevelDb to create a new log file if current log file is too large. Test Plan: Write a script and manually check the generated info LOG. Task ID: 1803577 Blame Rev: Reviewers: dhruba, heyongqiang Reviewed By: heyongqiang CC: zshao Differential Revision: https://reviews.facebook.net/D6003	13 years ago
Dhruba Borthakur	cf5adc8016	db_bench was not correctly initializing the value for delete_obsolete_files_period_micros option. Summary: The parameter delete_obsolete_files_period_micros controls the periodicity of deleting obsolete files. db_bench was reading in this parameter intoa local variable called 'l' but was incorrectly using another local variable called 'n' while setting it in the db.options data structure. This patch also logs the value of delete_obsolete_files_period_micros in the LOG file at db startup time. I am hoping that this will improve the overall write throughput drastically. Test Plan: run db_bench Reviewers: MarkCallaghan, heyongqiang Reviewed By: MarkCallaghan Differential Revision: https://reviews.facebook.net/D6099	13 years ago
Dhruba Borthakur	1ca0584345	This is the mega-patch multi-threaded compaction published in https://reviews.facebook.net/D5997. Summary: This patch allows compaction to occur in multiple background threads concurrently. If a manual compaction is issued, the system falls back to a single-compaction-thread model. This is done to ensure correctess and simplicity of code. When the manual compaction is finished, the system resumes its concurrent-compaction mode automatically. The updates to the manifest are done via group-commit approach. Test Plan: run db_bench	13 years ago
Dhruba Borthakur	aa73538f2a	The deletion of obsolete files should not occur very frequently. Summary: The method DeleteObsolete files is a very costly methind, especially when the number of files in a system is large. It makes a list of all live-files and then scans the directory to compute the diff. By default, this method is executed after every compaction run. This patch makes it such that DeleteObsolete files is never invoked twice within a configured period. Test Plan: run all unit tests Reviewers: heyongqiang, MarkCallaghan Reviewed By: MarkCallaghan Differential Revision: https://reviews.facebook.net/D6045	13 years ago
Dhruba Borthakur	f7975ac733	Implement RowLocks for assoc schema Summary: Each assoc is identified by (id1, assocType). This is the rowkey. Each row has a read/write rowlock. There is statically allocated array of 2000 read/write locks. A rowkey is murmur-hashed to one of the read/write locks. assocPut and assocDelete acquires the rowlock in Write mode. The key-updates are done within the rowlock with a atomic nosync batch write to leveldb. Then the rowlock is released and a write-with-sync is done to sync leveldb transaction log. Test Plan: added unit test Reviewers: heyongqiang Reviewed By: heyongqiang Differential Revision: https://reviews.facebook.net/D5859	13 years ago
Dhruba Borthakur	c1006d4276	An configurable option to write data using write instead of mmap. Summary: We have seen that reading data via the pread call (instead of mmap) is much faster on Linux 2.6.x kernels. This patch makes an equivalent option to switch off mmaps for the write path as well. db_bench --mmap_write=0 will use write() instead of mmap() to write data to a file. This change is backward compatible, the default option is to continue using mmap for writing to a file. Test Plan: "make check all" Differential Revision: https://reviews.facebook.net/D5781	13 years ago
Dhruba Borthakur	a58d48de79	Implement ReadWrite locks for leveldb Summary: Implement ReadWrite locks for leveldb. These will be helpful to implement a read-modify-write operation (e.g. atomic increments). Test Plan: does not modify any existing code Reviewers: heyongqiang Reviewed By: heyongqiang CC: MarkCallaghan Differential Revision: https://reviews.facebook.net/D5787	13 years ago
Dhruba Borthakur	72c45c66c6	Print the block cache size in the LOG. Summary: Print the block cache size in the LOG. Test Plan: run db_bench and look at LOG. This is helpful while I was debugging one use-case. Reviewers: heyongqiang, MarkCallaghan Reviewed By: heyongqiang Differential Revision: https://reviews.facebook.net/D5739	13 years ago
Dhruba Borthakur	ae36e509f8	The BackupAPI should also list the length of the manifest file. Summary: The GetLiveFiles() api lists the set of sst files and the current MANIFEST file. But the database continues to append new data to the MANIFEST file even when the application is backing it up to the backup location. This means that the database-version that is stored in the MANIFEST FILE in the backup location does not correspond to the sst files returned by GetLiveFiles. This API adds a new parameter to GetLiveFiles. This new parmeter returns the current size of the MANIFEST file. Test Plan: Unit test attached. Reviewers: heyongqiang Reviewed By: heyongqiang Differential Revision: https://reviews.facebook.net/D5631	13 years ago
Dhruba Borthakur	9e84834eb4	Allow a configurable number of background threads. Summary: The background threads are necessary for compaction. For slower storage, it might be necessary to have more than one compaction thread per DB. This patch allows creating a configurable number of worker threads. The default reamins at 1 (to maintain backward compatibility). Test Plan: run all unit tests. changes to db-bench coming in a separate patch. Reviewers: heyongqiang Reviewed By: heyongqiang CC: MarkCallaghan Differential Revision: https://reviews.facebook.net/D5559	13 years ago
heyongqiang	a8464ed820	add an option to disable seek compaction Summary: as subject. This diff should be good for benchmarking. will send another diff to make it better in the case the seek compaction is enable. In that coming diff, will not count a seek if the bloomfilter filters. Test Plan: build Reviewers: dhruba, MarkCallaghan Reviewed By: MarkCallaghan Differential Revision: https://reviews.facebook.net/D5481	13 years ago
heyongqiang	b85cdca690	add a global var leveldb::useMmapRead to enable mmap Summary: Summary: as subject. this can be used for benchmarking. If we want it for some cases, we can do more changes to make this part of the option. Test Plan: db_test Reviewers: dhruba CC: MarkCallaghan Differential Revision: https://reviews.facebook.net/D5451	13 years ago
Mark Callaghan	33323f2111	Remove use of mmap for random reads Summary: Reads via mmap on concurrent workloads are much slower than pread. For example on a 24-core server with storage that can do 100k IOPS or more I can get no more than 10k IOPS with mmap reads and 32+ threads. Test Plan: db_bench benchmarks Reviewers: dhruba, heyongqiang Reviewed By: heyongqiang Differential Revision: https://reviews.facebook.net/D5433	13 years ago
Dhruba Borthakur	93f4952089	Ability to switch off filesystem read-aheads Summary: Ability to switch off filesystem read-aheads. This change is backward-compatible: the default setting is to allow file system read-aheads. Test Plan: run benchmarks Reviewers: heyongqiang, adsharma Reviewed By: heyongqiang Differential Revision: https://reviews.facebook.net/D5391	13 years ago
Dhruba Borthakur	4028ae7d31	Do not cache readahead-pages in the OS cache. Summary: When posix_fadvise(offset, offset) is usedm it frees up only those pages in that specified range. But the filesystem could have done some read-aheads and those get cached in the OS cache. Do not cache readahead-pages in the OS cache. Test Plan: run db_bench benchmark. Reviewers: vamsi, heyongqiang Reviewed By: heyongqiang Differential Revision: https://reviews.facebook.net/D5379	13 years ago
Dhruba Borthakur	407727b75f	Fix compiler warnings. Use uint64_t instead of uint. Summary: Fix compiler warnings. Use uint64_t instead of uint. Test Plan: build using -Wall Reviewers: heyongqiang Reviewed By: heyongqiang Differential Revision: https://reviews.facebook.net/D5355	13 years ago
heyongqiang	0f43aa474e	put log in a seperate dir Summary: added a new option db_log_dir, which points the log dir. Inside that dir, in order to make log names unique, the log file name is prefixed with the leveldb data dir absolute path. Test Plan: db_test Reviewers: dhruba Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D5205	13 years ago
Dhruba Borthakur	fe93631678	Clean up compiler warnings generated by -Wall option. Summary: Clean up compiler warnings generated by -Wall option. make clean all OPT=-Wall This is a pre-requisite before making a new release. Test Plan: compile and run unit tests Reviewers: heyongqiang Reviewed By: heyongqiang Differential Revision: https://reviews.facebook.net/D5019	13 years ago
Dhruba Borthakur	e5fe80e4e3	The sharding of the block cache is limited to 220 pieces. Summary: The numbers of shards that the block cache is divided into is configurable. However, if the user specifies that he/she wants the block cache to be divided into more than 220 pieces, then the system will rey to allocate a huge array of that size) that could fail. It is better to limit the sharding of the block cache to an upper bound. The default sharding is 16 shards (i.e. 24) and the maximum is now 2 million shards (i.e. 2*20). Also, fixed a bug with the LRUCache where the numShardBits should be a private member of the LRUCache object rather than a static variable. Test Plan: run db_bench with --cache_numshardbits=64. Task ID: # Blame Rev: Reviewers: heyongqiang Reviewed By: heyongqiang Differential Revision: https://reviews.facebook.net/D5013	13 years ago
heyongqiang	a4f9b8b49e	merge 1.5 Summary: as subject Test Plan: db_test table_test Reviewers: dhruba	13 years ago
Dhruba Borthakur	fc20273e73	Introduce a new method Env->Fsync() that issues fsync (instead of fdatasync). Summary: Introduce a new method Env->Fsync() that issues fsync (instead of fdatasync). This is needed for data durability when running on ext3 filesystems. Added options to the benchmark db_bench to generate performance numbers with either fsync or fdatasync enabled. Cleaned up Makefile to build leveldb_shell only when building the thrift leveldb server. Test Plan: build and run benchmark Reviewers: heyongqiang Reviewed By: heyongqiang Differential Revision: https://reviews.facebook.net/D4911	13 years ago
Dhruba Borthakur	e5a7c8e580	Log the open-options to the LOG. Summary: Log the open-options to the LOG. Use options_ instead of options because SanitizeOptions could modify the max_file_open limit. Test Plan: num db_bench Reviewers: heyongqiang Reviewed By: heyongqiang Differential Revision: https://reviews.facebook.net/D4833	13 years ago
heyongqiang	af6fa308b0	regression for trigger compaction logic Summary: as subject Test Plan: manually run db_bench confirmed Reviewers: dhruba Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D4809	13 years ago
heyongqiang	21082fa13c	regression for trigger compaction logic Summary: as subject Test Plan: manually run db_bench confirmed Reviewers: dhruba Differential Revision: https://reviews.facebook.net/D4809	13 years ago

... 16 17 18 19 20 ...

1023 Commits (9babaeed16cb671eba3adb02ccf7a535190e6002)