rocksdb

Commit Graph

Author	SHA1	Message	Date
Dhruba Borthakur	6d5f6a4b1a	A bare-bones rocksdb logo. Summary: A hand-crafted rocksdb logo. Test Plan: Reviewers: CC: Task ID: # Blame Rev:	12 years ago
Dhruba Borthakur	3c37955a2f	Remove obsolete namespace mappings. Summary: The previous release 2.4 had a mapping to alias the older namespace to rocksdb. This mapping is not needed in the new release. Test Plan: make check make release Reviewers: emayanke Reviewed By: emayanke CC: leveldb Differential Revision: https://reviews.facebook.net/D13359	12 years ago
Naman Gupta	cbf4a06427	Add option for storing transaction logs in a separate dir Summary: In some cases, you might not want to store the data log (write ahead log) files in the same dir as the sst files. An example use case is leaf, which stores sst files in tmpfs. And would like to save the log files in a separate dir (disk) to save memory. Test Plan: make all. Ran db_test test. A few test failing. P2785018. If you guys don't see an obvious problem with the code, maybe somebody from the rocksdb team could help me debug the issue here. Running this on leaf worked well. I could see logs stored on disk, and deleted appropriately after compactions. Obviously this is only one set of options. The unit tests cover different options. Seems like I'm missing some edge cases. Reviewers: dhruba, haobo, leveldb CC: xinyaohu, sumeet Differential Revision: https://reviews.facebook.net/D13239	12 years ago
Naman Gupta	116071411b	Make db_test more robust Summary: While working on D13239, I noticed that the same options are not used for opening and destroying at db. So adding that. Also added asserts for successful DestroyDB calls. Test Plan: Ran unit tests. Atleast 1 unit test is failing. They failures are a result of some past logic change. I'm not really planning to fix those. But I would like to check this in. And hopefully the respective unit test owners can fix the broken tests Reviewers: leveldb, haobo CC: xinyaohu, sumeet, dhruba Differential Revision: https://reviews.facebook.net/D13329	12 years ago
Kai Liu	1f8ade6bd6	Fix a bug in table builder Summary: In talbe.cc, when reading the metablock, it uses BytewiseComparator(); However in table_builder.cc, we use r->options.comparator. After tracing the creation of r->options.comparator, I found this comparator is an InternalKeyComparator, which wraps the user defined comparator(details can be found in DBImpl::SanitizeOptions(). I encountered this problem when adding metadata about "bloom filter" before. With different comparator, we may fail to do the binary sort. Current code works well since there is only one entry in meta block. Test Plan: make all check I've also tested this change in https://reviews.facebook.net/D8283 before. Reviewers: dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D13335	12 years ago
Igor Canadi	fa46ddb41f	Move delete and free outside of crtical section Summary: Split Unref into two parts -> cheap and expensive. Try to call expensive Unref outside of critical section to decrease lock contention. Test Plan: unittests Reviewers: dhruba, haobo Reviewed By: dhruba CC: leveldb, kailiu Differential Revision: https://reviews.facebook.net/D13299	12 years ago
Dhruba Borthakur	1a8c1b0817	Unit test failure in DBTest.NumImmutableMemTable. Summary: Previous patch introduced a unit test failure in DBTest.NumImmutableMemTable because of change in property names. Test Plan: Reviewers: CC: Task ID: # Blame Rev:	12 years ago
Dhruba Borthakur	4463b11cad	Migrate names of properties from 'leveldb' prefix to 'rocksdb' prefix. Summary: Migrate names of properties from 'leveldb' prefix to 'rocksdb' prefix. Test Plan: make check Reviewers: emayanke, haobo Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D13311	12 years ago
Haobo Xu	bf89edf78b	[RocksDB] Added a property "leveldb.num-immutable-mem-table" so that Flush can be called without blocking, and application still has a way to check when it's done also without blocking. Summary: as title Test Plan: DBTest.NumImmutableMemTable Reviewers: dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D13305	12 years ago
Dhruba Borthakur	0a9f873f4b	Removed scribe, thrift and java modules. Summary: Removed scribe, thrift and java modules. Test Plan: make release make check Reviewers: emayanke Reviewed By: emayanke CC: leveldb Differential Revision: https://reviews.facebook.net/D13293	12 years ago
Mayank Agarwal	aad2110823	Updating README.fb to have newest verison 2.4 Summary: Test Plan: visual	12 years ago
Dhruba Borthakur	a143ef9b38	Change namespace from leveldb to rocksdb Summary: Change namespace from leveldb to rocksdb. This allows a single application to link in open-source leveldb code as well as rocksdb code into the same process. Test Plan: compile rocksdb Reviewers: emayanke Reviewed By: emayanke CC: leveldb Differential Revision: https://reviews.facebook.net/D13287	12 years ago
Mayank Agarwal	b3ed08129b	Add a statistic to count the number of calls to GetUpdatesSince Summary: This is useful to keep track of refreshes in transaction log iterator Test Plan: make; db_stress --statistics=1 shows it Reviewers: dhruba, haobo Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D13281	12 years ago
Mayank Agarwal	854d236361	Add backward compatible option in GetLiveFiles to choose whether to not Flush first Summary: As explained in comments in GetLiveFiles in db.h, this option will cause flush to be skipped in GetLiveFiles because some use-cases use GetSortedWalFiles after GetLiveFiles to generate more complete snapshots. Using GetSortedWalFiles after GetLiveFiles allows us to not Flush in GetLiveFiles first because wals have everything. Note: file deletions will be disabled before calling GLF or GSWF so live logs will not move to archive logs or get delted. Note: Manifest file is truncated to a proper value in GLF, so it will always reply from the proper wal files on a restart Test Plan: make Reviewers: dhruba, haobo Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D13257	12 years ago
Haobo Xu	200c05a23f	[RocksDB] Still honor DisableFileDeletions when purge_log_after_memtable_flush is on Summary: as title Test Plan: make check Reviewers: emayanke Reviewed By: emayanke CC: leveldb Differential Revision: https://reviews.facebook.net/D13263	12 years ago
Haobo Xu	fa798e9e28	[Rocksdb] Submit mem table flush job in a different thread pool Summary: As title. This is just a quick hack and not ready for commit. fails a lot of unit test. I will test/debug it directly in ViewState shadow . Test Plan: Try it in shadow test. Reviewers: dhruba, xjin CC: leveldb Differential Revision: https://reviews.facebook.net/D12933	12 years ago
Xing Jin	658a3ce2fa	Fix SIGSEGV issue in universal compaction Summary: We saw SIGSEGV when set options.num_levels=1 in universal compaction style. Dug into this issue for a while, and finally found the root cause (thank Haobo for discussion). Test Plan: Add new unit test. It throws SIGSEGV without this change. Also run "make all check". Reviewers: haobo, dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D13251	12 years ago
Mayank Agarwal	6b34021fc2	Triggering verify for gets also Summary: Will use iterators to verify keys in the db for half of its keys and Gets for the other half. Test Plan: ./db_stress --max_key=1000 --ops_per_thread=100 Reviewers: dhruba, haobo Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D13227	12 years ago
Haobo Xu	71046971f0	[RocksDB] Added perf counters to track skipped internal keys during iteration Summary: as title. unit test not polished. this is for a quick live test Test Plan: live Reviewers: dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D13221	12 years ago
Kai Liu	861f6e48e4	Remove the hard-coded enum value in statistics.h Summary: I am planning to add more to statistics classes but found current way of using enum is very verbose and unnecessarily increase the difficulity of adding new statistics. In this diff I removed the code that explicitly specifies the value of each enum entry. This will help us easily add new statistic items more conveniently without manually adding the value of other enum entries by one. Test Plan: make; make check; Reviewers: haobo, dhruba, xjin, emayanke, vamsi CC: leveldb Differential Revision: https://reviews.facebook.net/D13197	12 years ago
Natalie Hildebrandt	7edb92b843	Phase 2 of iterator stress test Summary: Using an iterator instead of the Get method, each thread goes through a portion of the database and verifies values by comparing to the shared state. Test Plan: ./db_stress --db=/tmp/tmppp --max_key=10000 --ops_per_thread=10000 To test some basic cases, the following lines can be added (each set in turn) to the verifyDb method with the following expected results: // Should abort with "Unexpected value found" shared.Delete(start); // Should abort with "Value not found" WriteOptions write_opts; db_->Delete(write_opts, Key(start)); // Should succeed WriteOptions write_opts; shared.Delete(start); db_->Delete(write_opts, Key(start)); // Should abort with "Value not found" WriteOptions write_opts; db_->Delete(write_opts, Key(start + (end-start)/2)); // Should abort with "Value not found" db_->Delete(write_opts, Key(end-1)); // Should abort with "Unexpected value" shared.Delete(end-1); // Should abort with "Unexpected value" shared.Delete(start + (end-start)/2); // Should abort with "Value not found" db_->Delete(write_opts, Key(start)); shared.Delete(start); db_->Delete(write_opts, Key(end-1)); db_->Delete(write_opts, Key(end-2)); To test the out of range abort, change the key in the for loop to Key(i+1), so that the key defined by the index i is now outside of the supposed range of the database. Reviewers: emayanke Reviewed By: emayanke CC: dhruba, xjin Differential Revision: https://reviews.facebook.net/D13071	12 years ago
Haobo Xu	22bb7c754b	[RocksDB] print the name of options.memtable_factory in LOG so we know Summary: as title Test Plan: make check Reviewers: dhruba, emayanke Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D13179	12 years ago
Xing Jin	8eb552bf4d	New unit test for iterator with snapshot Summary: I played with the reported bug about iterator with snapshot: https://code.google.com/p/leveldb/issues/detail?id=200. I turned the original test program (https://code.google.com/p/leveldb/issues/attachmentText?id=200&aid=2000000000&name=test.cc&token=7uOUQW-HFlbAFMUm7EqtaAEy7Tw%3A1378320724136) into a new unit test, but I cannot reproduce the problem. Notice lines 31-34 in above link. I have ran the new test with and without such Put() operations. Both succeed. So this diff simply adds the test, without changing any source codes. Test Plan: run new test. Reviewers: dhruba, haobo, emayanke Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D12735	12 years ago
Haobo Xu	0c4040681a	[RocksDB] Move last_sequence and last_flushed_sequence_ update back into lock protected area Summary: A previous diff moved these outside of lock protected area. Moved back in now. Also moved tmp_batch_ update outside of lock protected area, as only the single write thread can access it. Test Plan: make check Reviewers: dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D13137	12 years ago
Haobo Xu	08740b15a4	[RocksDB] Fix skiplist sequential insertion optimization Summary: The original optimization missed updating links other than the lowest level. Test Plan: make check; perf_context_test Reviewers: dhruba Reviewed By: dhruba CC: leveldb, adsharma Differential Revision: https://reviews.facebook.net/D13119	12 years ago
Haobo Xu	e0aa19a94e	[RocbsDB] Add an option to enable set based memtable for perf_context_test Summary: as title. Some result: -- Sequential insertion of 1M key/value with stock skip list (all in on memtable) time ./perf_context_test --total_keys=1000000 --use_set_based_memetable=0 Inserting 1000000 key/value pairs ... Put uesr key comparison: Count: 1000000 Average: 8.0179 StdDev: 176.34 Min: 0.0000 Median: 2.5555 Max: 88933.0000 Percentiles: P50: 2.56 P75: 2.83 P99: 58.21 P99.9: 133.62 P99.99: 987.50 Get uesr key comparison: Count: 1000000 Average: 43.4465 StdDev: 379.03 Min: 2.0000 Median: 36.0195 Max: 88939.0000 Percentiles: P50: 36.02 P75: 43.66 P99: 112.98 P99.9: 824.84 P99.99: 7615.38 real 0m21.345s user 0m14.723s sys 0m5.677s -- Sequential insertion of 1M key/value with set based memtable (all in on memtable) time ./perf_context_test --total_keys=1000000 --use_set_based_memetable=1 Inserting 1000000 key/value pairs ... Put uesr key comparison: Count: 1000000 Average: 61.5022 StdDev: 6.49 Min: 0.0000 Median: 62.4295 Max: 71.0000 Percentiles: P50: 62.43 P75: 66.61 P99: 71.00 P99.9: 71.00 P99.99: 71.00 Get uesr key comparison: Count: 1000000 Average: 29.3810 StdDev: 3.20 Min: 1.0000 Median: 29.1801 Max: 34.0000 Percentiles: P50: 29.18 P75: 32.06 P99: 34.00 P99.9: 34.00 P99.99: 34.00 real 0m28.875s user 0m21.699s sys 0m5.749s Worst case comparison for a Put is 88933 (skiplist) vs 71 (set based memetable) Of course, there's other in-efficiency in set based memtable implementation, which lead to the overall worst performance. However, P99 behavior advantage is very very obvious. Test Plan: ./perf_context_test and viewstate shadow testing Reviewers: dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D13095	12 years ago
Dhruba Borthakur	f1a60e5c3e	The vector rep implementation was segfaulting because of incorrect initialization of vector. Summary: The constructor for Vector memtable has a parameter called 'count' that specifies the capacity of the vector to be reserved at allocation time. It was incorrectly used to initialize the size of the vector. Test Plan: Enhanced db_test. Reviewers: haobo, xjin, emayanke Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D13083	12 years ago
Dhruba Borthakur	87d6eb2f6b	Implement apis in the Environment to clear out pages in the OS cache. Summary: Added a new api to the Environment that allows clearing out not-needed pages from the OS cache. This will be helpful when the compressed block cache replaces the OS cache. Test Plan: EnvPosixTest.InvalidateCache Reviewers: haobo Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D13041	12 years ago
Natalie Hildebrandt	9262061b0d	Fixing crashing tests to include iterpercent param Summary: Adding in the iterpercent flag to tests. Test Plan: make crash_test Reviewers: emayanke Reviewed By: emayanke Differential Revision: https://reviews.facebook.net/D13035	12 years ago
Dhruba Borthakur	5e9f3a9aa7	Better locking in vectorrep that increases throughput to match speed of storage. Summary: There is a use-case where we want to insert data into rocksdb as fast as possible. Vector rep is used for this purpose. The background flush thread needs to flush the vectorrep to storage. It acquires the dblock then sorts the vector, releases the dblock and then writes the sorted vector to storage. This is suboptimal because the lock is held during the sort, which prevents new writes for occuring. This patch moves the sorting of the vector rep to outside the db mutex. Performance is now as fastas the underlying storage system. If you are doing buffered writes to rocksdb files, then you can observe throughput upwards of 200 MB/sec writes. This is an early draft and not yet ready to be reviewed. Test Plan: make check Task ID: # Blame Rev: Reviewers: haobo Reviewed By: haobo CC: leveldb, haobo Differential Revision: https://reviews.facebook.net/D12987	12 years ago
Natalie Hildebrandt	433541823c	Phase 1 of an iterator stress test Summary: Added MultiIterate() which does a seek and some Next/Prev calls. Iterator status is checked only, no data integrity check Test Plan: make db_stress ./db_stress --iterpercent=<nonzero value> --readpercent=, etc. Reviewers: emayanke, dhruba, xjin Reviewed By: emayanke CC: leveldb Differential Revision: https://reviews.facebook.net/D12915	12 years ago
Haobo Xu	4734dbb742	[RocksDB] Unit test to show Seek key comparison number Summary: Added SeekKeyComparison to show the uer key comparison incurred by Seek. Test Plan: make perf_context_test export LEVELDB_TESTS=DBTest.SeekKeyComparison ./perf_context_test --write_buffer_size=500000 --total_keys=10000 ./perf_context_test --write_buffer_size=250000 --total_keys=10000 Reviewers: dhruba, xjin Reviewed By: xjin CC: leveldb Differential Revision: https://reviews.facebook.net/D12843	12 years ago
Haobo Xu	72fcbf055d	[RocksDB] Fix DBTest.UniversalCompactionSizeAmplification too Summary: as title Test Plan: make db_test; ./db_test Reviewers: dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D13005	12 years ago
Haobo Xu	5b76338c01	[RocksDB] Fix DBTest.UniversalCompactionTrigger to reflect the correct compaction trigger condition. Summary: as title Test Plan: make db_test; ./db_test Reviewers: dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D12981	12 years ago
Rajat Goel	11c65021fb	Revert "Minor fixes found while trying to compile it using clang on Mac OS X" This reverts commit `5f2c136c32`.	12 years ago
Haobo Xu	1d8c57db23	[RocksDB] Universal compaction trigger condition minor fix Summary: Currently, when total number of files reaches level0_file_num_compaction_trigger, universal compaction will schedule a compaction job, but the job will not honor the compaction until the total number of files is level0_file_num_compaction_trigger+1. Fixed the condition for consistent behavior (start compaction on reaching level0_file_num_compaction_trigger). Test Plan: make check; db_stress Reviewers: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D12945	12 years ago
Rajat Goel	5f2c136c32	Minor fixes found while trying to compile it using clang on Mac OS X	12 years ago
Haobo Xu	8866448001	[RocksDB] fix build env_test Summary: move the TwoPools test to the end of thread related tests. Otherwise, the SetBackgroundThreads call would increase the Low pool size and affect the result of other tests. Test Plan: make env_test; ./env_test Reviewers: dhruba, emayanke, xjin Reviewed By: xjin CC: leveldb Differential Revision: https://reviews.facebook.net/D12939	12 years ago
Dhruba Borthakur	4012ca1c7b	Added a parameter to limit the maximum space amplification for universal compaction. Summary: Added a new field called max_size_amplification_ratio in the CompactionOptionsUniversal structure. This determines the maximum percentage overhead of space amplification. The size amplification is defined to be the ratio between the size of the oldest file to the sum of the sizes of all other files. If the size amplification exceeds the specified value, then min_merge_width and max_merge_width are ignored and a full compaction of all files is done. A value of 10 means that the size a database that stores 100 bytes of user data could occupy 110 bytes of physical storage. Test Plan: Unit test DBTest.UniversalCompactionSpaceAmplification added. Reviewers: haobo, emayanke, xjin Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D12825	12 years ago
Mayank Agarwal	e2a093a6c3	Fix delete in db_ttl.cc Summary: should delete the proper variable Test Plan: make all check Reviewers: haobo, dhruba Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D12921	12 years ago
Mayank Agarwal	eeb90c7ee9	Update README file for public interface Summary: public interface is in include/* Test Plan: visual Reviewers: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D12927	12 years ago
Mayank Agarwal	5e73c4d4ad	Update README file and check arc diff with proxy Summary: export http_proxy='http://172.31.255.99:8080' export https_proxy="$http_proxy" in bashrc makes arc work. Also README file needed to be updated Test Plan: visual Reviewers: dhruba, haobo Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D12903	12 years ago
Haobo Xu	1565dab809	[RocksDB] Enhance Env to support two thread pools LOW and HIGH Summary: this is the ground work for separating memtable flush jobs to their own thread pool. Both SetBackgroundThreads and Schedule take a third parameter Priority to indicate which thread pool they are working on. The names LOW and HIGH are just identifiers for two different thread pools, and does not indicate real difference in 'priority'. We can set number of threads in the pools independently. The thread pool implementation is refactored. Test Plan: make check Reviewers: dhruba, emayanke CC: leveldb Differential Revision: https://reviews.facebook.net/D12885	12 years ago
Haobo Xu	0e422308aa	[RocksDB] Remove Log file immediately after memtable flush Summary: As title. The DB log file life cycle is tied up with the memtable it backs. Once the memtable is flushed to sst and committed, we should be able to delete the log file, without holding the mutex. This is part of the bigger change to avoid FindObsoleteFiles at runtime. It deals with log files. sst files will be dealt with later. Test Plan: make check; db_bench Reviewers: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D11709	12 years ago
Mayank Agarwal	6e2b5809f6	Updating readme file for version 2.3 Summary: Test Plan: Reviewers: CC: Task ID: # Blame Rev:	12 years ago
Haobo Xu	f2f4c8072f	[RocksDB] Added nano second stopwatch and new perf counters to track block read cost Summary: The pupose of this diff is to expose per user-call level precise timing of block read, so that we can answer questions like: a Get() costs me 100ms, is that somehow related to loading blocks from file system, or sth else? We will answer that with EXACTLY how many blocks have been read, how much time was spent on transfering the bytes from os, how much time was spent on checksum verification and how much time was spent on block decompression, just for that one Get. A nano second stopwatch was introduced to track time with higher precision. The cost/precision of the stopwatch is also measured in unit-test. On my dev box, retrieving one time instance costs about 30ns, on average. The deviation of timing results is good enough to track 100ns-1us level events. And the overhead could be safely ignored for 100us level events (10000 instances/s), for example, a viewstate thrift call. Test Plan: perf_context_test, also testing with viewstate shadow traffic. Reviewers: dhruba Reviewed By: dhruba CC: leveldb, xjin Differential Revision: https://reviews.facebook.net/D12351	12 years ago
Dhruba Borthakur	32c965d417	Flush was hanging because the configured options specified that more than 1 memtable need to be merged. Summary: There is an config option called Options.min_write_buffer_number_to_merge that specifies the minimum number of write buffers to merge in memory before flushing to a file in L0. But in the the case when the db is being closed, we should not be using this config, instead we should flush whatever write buffers were available at that time. Test Plan: Unit test attached. Reviewers: haobo, emayanke Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D12717	12 years ago
Dhruba Borthakur	197034e4c3	An iterator may automatically invoke reseeks. Summary: An iterator invokes reseek if the number of sequential skips over the same userkey exceeds a configured number. This makes iter->Next() faster (bacause of fewer key compares) if a large number of adjacent internal keys in a table (sst or memtable) have the same userkey. Test Plan: Unit test DBTest.IterReseek. Reviewers: emayanke, haobo, xjin Reviewed By: xjin CC: leveldb, xjin Differential Revision: https://reviews.facebook.net/D11865	12 years ago
Mayank Agarwal	de98c1d9aa	Update documentation for backups and LogData Summary: LogData doesn't consume sequence numbers and doesn't increase the count of the write-batch. Also it was discussed that GetLiveFiles will have to be followed by GetSortedWalFiles to get a lossless backup Test Plan: visual Reviewers: dhruba, haobo Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D12753	12 years ago
Mayank Agarwal	4b785aab05	Add logdata to ttl Summary: Ttl-write makes a new writebatch and calls Write on the base db. It should recognize LogData also Test Plan: make Reviewers: dhruba, haobo Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D12747	12 years ago

... 8 9 10 11 12 ...

1160 Commits (d45d17b2a3a6fe6e456989cbeb11a666ddc3b42d) All Branches Search

1160 Commits (d45d17b2a3a6fe6e456989cbeb11a666ddc3b42d)

All Branches