rocksdb

Commit Graph

Author	SHA1	Message	Date
Yueh-Hsuan Chiang	b7a2369fb2	Revert "Replace std::priority_queue in MergingIterator with custom heap" Summary: This patch reverts "Replace std::priority_queue in MergingIterator with custom heap" (commit commit `b6655a679d`) as it causes db_stress failure. Test Plan: ./db_stress --test_batches_snapshots=1 --threads=32 --write_buffer_size=4194304 --destroy_db_initially=0 --reopen=20 --readpercent=45 --prefixpercent=5 --writepercent=35 --delpercent=5 --iterpercent=10 --db=/tmp/rocksdb_crashtest_KdCI5F --max_key=100000000 --mmap_read=0 --block_size=16384 --cache_size=1048576 --open_files=500000 --verify_checksum=1 --sync=0 --progress_reports=0 --disable_wal=0 --disable_data_sync=1 --target_file_size_base=2097152 --target_file_size_multiplier=2 --max_write_buffer_number=3 --max_background_compactions=20 --max_bytes_for_level_base=10485760 --filter_deletes=0 --memtablerep=prefix_hash --prefix_size=7 --ops_per_thread=200 --kill_random_test=97 Reviewers: igor, anthony, lovro, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D41343	10 years ago
lovro	b6655a679d	Replace std::priority_queue in MergingIterator with custom heap Summary: While profiling compaction in our service I noticed a lot of CPU (~15% of compaction) being spent in MergingIterator and key comparison. Looking at the code I found MergingIterator was (understandably) using std::priority_queue for the multiway merge. Keys in our dataset include sequence numbers that increase with time. Adjacent keys in an L0 file are very likely to be adjacent in the full database. Consequently, compaction will often pick a chunk of rows from the same L0 file before switching to another one. It would be great to avoid the O(log K) operation per row while compacting. This diff replaces std::priority_queue with a custom binary heap implementation. It has a "replace top" operation that is cheap when the new top is the same as the old one (i.e. the priority of the top entry is decreased but it still stays on top). Test Plan: make check To test the effect on performance, I generated databases with data patterns that mimic what I describe in the summary (rows have a mostly increasing sequence number). I see a 10-15% CPU decrease for compaction (and a matching throughput improvement on tmpfs). The exact improvement depends on the number of L0 files and the amount of locality. Performance on randomly distributed keys seems on par with the old code. Reviewers: kailiu, sdong, igor Reviewed By: igor Subscribers: yoshinorim, dhruba, tnovak Differential Revision: https://reviews.facebook.net/D29133	10 years ago
Ari Ekmekji	35cd75c379	Introduce InfoLogLevel::HEADER_LEVEL Summary: Introduced a new category in the enum InfoLogLevel in env.h. Modifed Log() in env.cc to use the Header() when the InfoLogLevel == HEADER_LEVEL. Updated tests in auto_roll_logger_test to ensure the header is handled properly in these cases. Test Plan: Augment existing tests in auto_roll_logger_test Reviewers: igor, sdong, yhchiang Reviewed By: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D41067	10 years ago
Aaron Feldman	a69bc91e37	Multithreaded backup and restore in BackupEngineImpl Summary: Add a new field: BackupableDBOptions.max_background_copies. CreateNewBackup() and RestoreDBFromBackup() will use this number of threads to perform copies. If there is a backup rate limit, then max_background_copies must be 1. Update backupable_db_test.cc to test multi-threaded backup and restore. Update backupable_db_test.cc to test backups when the backup environment is not the same as the database environment. Test Plan: Run ./backupable_db_test Run valgrind ./backupable_db_test Run with TSAN and ASAN Reviewers: yhchiang, rven, anthony, sdong, igor Reviewed By: igor Subscribers: yhchiang, anthony, sdong, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D40725	10 years ago
Giuseppe Ottaviano	782a1590f9	Implement a table-level row cache Summary: Implementation of a table-level row cache. It only caches point queries done through the `DB::Get` interface, queries done through the `Iterator` interface will completely skip the cache. Supports snapshots and merge operations. Test Plan: Ran `make valgrind_check commit-prereq` Reviewers: igor, philipp, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D39849	10 years ago
krad	de85e4cadf	Introduce WAL recovery consistency levels Summary: The "one size fits all" approach with WAL recovery will only introduce inconvenience for our varied clients as we go forward. The current recovery is a bit heuristic. We introduce the following levels of consistency while replaying the WAL. 1. RecoverAfterRestart (kTolerateCorruptedTailRecords) This mocks the current recovery mode. 2. RecoverAfterCleanShutdown (kAbsoluteConsistency) This is ideal for unit test and cases where the store is shutdown cleanly. We tolerate no corruption or incomplete writes. 3. RecoverPointInTime (kPointInTimeRecovery) This is ideal when using devices with controller cache or file systems which can loose data on restart. We recover upto the point were is no corruption or incomplete write. 4. RecoverAfterDisaster (kSkipAnyCorruptRecord) This is ideal mode to recover data. We tolerate corruption and incomplete writes, and we hop over those sections that we cannot make sense of salvaging as many records as possible. Test Plan: (1) Run added unit test to cover all levels. (2) Run make check. Reviewers: leveldb, sdong, igor Subscribers: yoshinorim, dhruba Differential Revision: https://reviews.facebook.net/D38487	10 years ago
krad	7015fd81c4	Add read_nanos to IOStatsContext. Summary: MyRocks need a mechanism to track read outliers. We need to expose this stat. Test Plan: None Reviewers: sdong CC: leveldb Task ID: #7152512 Blame Rev:	10 years ago
Aaron Feldman	18cc5018b7	Fix memory leaks in PinnedUsageTest Summary: See title Test Plan: Run valgrind ./cache_test Reviewers: igor Reviewed By: igor Subscribers: anthony, dhruba Differential Revision: https://reviews.facebook.net/D40419	10 years ago
Yueh-Hsuan Chiang	df719d4964	Make autovector_test runnable in ROCKSDB_LITE Summary: Make autovector_test runnable in ROCKSDB_LITE Test Plan: autovector_test Reviewers: sdong, rven, anthony, kradhakrishnan, IslamAbdelRahman, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D40245	10 years ago
Igor Canadi	760e9a94de	Fail DB::Open() when the requested compression is not available Summary: Currently RocksDB silently ignores this issue and doesn't compress the data. Based on discussion, we agree that this is pretty bad because it can cause confusion for our users. This patch fails DB::Open() if we don't support the compression that is specified in the options. Test Plan: make check with LZ4 not present. If Snappy is not present all tests will just fail because Snappy is our default library. We should make Snappy the requirement, since without it our default DB::Open() fails. Reviewers: sdong, MarkCallaghan, rven, yhchiang Reviewed By: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D39687	10 years ago
Aaron Feldman	69bb210d58	Add Cache.GetPinnedUsageUsage() Summary: Add the funcion Cache.GetPinnedUsage() to return the memory size of entries that are in use by the system (that is, all the entries not in the LRU list). Test Plan: Run ./cache_test and examine PinnedUsageTest. Reviewers: tnovak, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D40305	10 years ago
Igor Canadi	4b8bb62f0a	Don't dump DBOptions for each column family Summary: Currently we dump DBOptions for each column family options we dump. This leads to duplicate lines in our LOG file. This diff fixes that. Test Plan: Check out the LOG Reviewers: sdong, rven, yhchiang Reviewed By: yhchiang Subscribers: IslamAbdelRahman, yoshinorim, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D39729	10 years ago
Islam AbdelRahman	12e030a992	Use CompactRangeOptions for CompactRange Summary: This diff update DB::CompactRange to use RangeCompactionOptions instead of using multiple parameters Old CompactRange is still available but deprecated Test Plan: make all check make rocksdbjava USE_CLANG=1 make all OPT=-DROCKSDB_LITE make release Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D40209	10 years ago
Yueh-Hsuan Chiang	1369f015ee	Only initialize the ThreadStatusData when necessary. Summary: Before this patch, any function call to ThreadStatusUtil might automatically initialize and register the thread status data. However, if it is the user-thread making this call, the allocated thread-status-data will never be released as such threads are not managed by rocksdb. In this patch, I remove the automatic-initialization part. Thread-status data is only initialized and uninitialized in Env during the thread creation and destruction. Test Plan: db_test thread_list_test listener_test Reviewers: igor, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D40017	10 years ago
sdong	40f562e747	Allow GetApproximateSize() to include mem table size if it is skip list memtable Summary: Add an option in GetApproximateSize() so that the result will include estimated sizes in mem tables. To implement it, implement an estimated count from the beginning to a key in skip list. The approach is to count to find the entry, how many Next() is issued from each level, and sum them with a weight that is <branching factor> ^ <level>. Test Plan: Add a test case Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D40119	10 years ago
Yueh-Hsuan Chiang	bee8d033f4	Removed two unused macros in iostats_context Summary: Removed two unused macros in iostats_context Test Plan: make all check Reviewers: sdong, rven, IslamAbdelRahman, kradhakrishnan, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D40005	10 years ago
sdong	7842920be5	Slow down writes by bytes written Summary: We slow down data into the database to the rate of options.delayed_write_rate (a new option) with this patch. The thread synchronization approach I take is to still synchronize write controller by DB mutex and GetDelay() is inside DB mutex. Try to minimize the frequency of getting time in GetDelay(). I verified it through db_bench and it seems to work hard_rate_limit is deprecated. options.delayed_write_rate is still not dynamically changeable. Need to work on it as a follow-up. Test Plan: Add new unit tests in db_test Reviewers: yhchiang, rven, kradhakrishnan, anthony, MarkCallaghan, igor Reviewed By: igor Subscribers: ikabiljo, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D36351	10 years ago
Islam AbdelRahman	ab455ce495	fix clang build	10 years ago
Yueh-Hsuan Chiang	3eddd1abe9	Add Env::GetThreadID(), which returns the ID of the current thread. Summary: Add Env::GetThreadID(), which returns the ID of the current thread. In addition, make GetThreadList() and InfoLog use same unique ID for the same thread. Test Plan: db_test listener_test Reviewers: igor, rven, IslamAbdelRahman, kradhakrishnan, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D39735	10 years ago
Islam AbdelRahman	643bbbf081	Use nullptr for default compaction_filter_factory Summary: Replacing the default value for compaction_filter_factory and compaction_filter_factory_v2 to be nullptr instead of DefaultCompactionFilterFactory / DefaultCompactionFilterFactoryV2 The reason for this is to be able to determine easily if we have compaction filter factory or not without depending on RTTI Test Plan: make check Reviewers: yoshinorim, ott, igor, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D39693	11 years ago
Yueh-Hsuan Chiang	7647df8f9e	Fixed the tsan failure in util/compaction_job_stats_impl.cc Summary: The type of smallest_output_key_prefix and largest_output_key_prefix have been changed to std::string in https://reviews.facebook.net/D39537. As a result, we shouldn't do smallest_output_key_prefix[0] = 0 in the initialization. Test Plan: compile db_test with tsan enabled and repeat DBTest.CompactionDeletionTrigger test to verify the tsan issue has been gone. Reviewers: igor, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D39645	11 years ago
Yueh-Hsuan Chiang	fe5c6321cb	Allow EventListener::OnCompactionCompleted to return CompactionJobStats. Summary: Allow EventListener::OnCompactionCompleted to return CompactionJobStats, which contains useful information about a compaction. Example CompactionJobStats returned by OnCompactionCompleted(): smallest_output_key_prefix 05000000 largest_output_key_prefix 06990000 elapsed_time 42419 num_input_records 300 num_input_files 3 num_input_files_at_output_level 2 num_output_records 200 num_output_files 1 actual_bytes_input 167200 actual_bytes_output 110688 total_input_raw_key_bytes 5400 total_input_raw_value_bytes 300000 num_records_replaced 100 is_manual_compaction 1 Test Plan: Developed a mega test in db_test which covers 20 variables in CompactionJobStats. Reviewers: rven, igor, anthony, sdong Reviewed By: sdong Subscribers: tnovak, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38463	11 years ago
Mike Kolupaev	ec7a944360	more times in perf_context and iostats_context Summary: We occasionally get write stalls (>1s Write() calls) on HDD under read load. The following timers explain almost all of the stalls: - perf_context.db_mutex_lock_nanos - perf_context.db_condition_wait_nanos - iostats_context.open_time - iostats_context.allocate_time - iostats_context.write_time - iostats_context.range_sync_time - iostats_context.logger_time In my experiments each of these occasionally takes >1s on write path under some workload. There are rare cases when Write() takes long but none of these takes long. Test Plan: Added code to our application to write the listed timings to log for slow writes. They usually add up to almost exactly the time Write() call took. Reviewers: rven, yhchiang, sdong Reviewed By: sdong Subscribers: march, dhruba, tnovak Differential Revision: https://reviews.facebook.net/D39177	11 years ago
Mike Kolupaev	2ecac9f96d	add rocksdb::WritableFileWrapper similar to rocksdb::EnvWrapper Summary: It used to be no good (known to me) non-intrusive way to wrap WritableFile - you can't call protected virtual methods of the wrapped pointer to WritableFile. This diff adds a convenience class WritableFileWrapper that makes wrapping WritableFile both possible and easy. Test Plan: `make clean; make -j release`, `make clean; OPT=-DROCKSDB_LITE make release`, `make clean; USE_CLANG=1 make -j all`. Reviewers: sdong, yhchiang, rven Reviewed By: rven Subscribers: dhruba, tnovak, march Differential Revision: https://reviews.facebook.net/D39147	11 years ago
agiardullo	dc9d70de65	Optimistic Transactions Summary: Optimistic transactions supporting begin/commit/rollback semantics. Currently relies on checking the memtable to determine if there are any collisions at commit time. Not yet implemented would be a way of enuring the memtable has some minimum amount of history so that we won't fail to commit when the memtable is empty. You should probably start with transaction.h to get an overview of what is currently supported. Test Plan: Added a new test, but still need to look into stress testing. Reviewers: yhchiang, igor, rven, sdong Reviewed By: sdong Subscribers: adamretter, MarkCallaghan, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D33435	11 years ago
Yueh-Hsuan Chiang	9ffc8ba024	Include EventListener in stress test. Summary: Include EventListener in stress test. Test Plan: make blackbox_crash_test whitebox_crash_test Reviewers: anthony, igor, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D39105	11 years ago
agiardullo	c815351038	Support saving history in memtable_list Summary: For transactions, we are using the memtables to validate that there are no write conflicts. But after flushing, we don't have any memtables, and transactions could fail to commit. So we want to someone keep around some extra history to use for conflict checking. In addition, we want to provide a way to increase the size of this history if too many transactions fail to commit. After chatting with people, it seems like everyone prefers just using Memtables to store this history (instead of a separate history structure). It seems like the best place for this is abstracted inside the memtable_list. I decide to create a separate list in MemtableListVersion as using the same list complicated the flush/installalflushresults logic too much. This diff adds a new parameter to control how much memtable history to keep around after flushing. However, it sounds like people aren't too fond of adding new parameters. So I am making the default size of flushed+not-flushed memtables be set to max_write_buffers. This should not change the maximum amount of memory used, but make it more likely we're using closer the the limit. (We are now postponing deleting flushed memtables until the max_write_buffer limit is reached). So while we might use more memory on average, we are still obeying the limit set (and you could argue it's better to go ahead and use up memory now instead of waiting for a write stall to happen to test this limit). However, if people are opposed to this default behavior, we can easily set it to 0 and require this parameter be set in order to use transactions. Test Plan: Added a xfunc test to play around with setting different values of this parameter in all tests. Added testing in memtablelist_test and planning on adding more testing here. Reviewers: sdong, rven, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D37443	11 years ago
Yueh-Hsuan Chiang	672dda9b3b	[API Change] Move listeners from ColumnFamilyOptions to DBOptions Summary: Move listeners from ColumnFamilyOptions to DBOptions Test Plan: listener_test compact_files_test Reviewers: rven, anthony, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D39087	11 years ago
Reed Allman	328ad902ab	update an import path to fit in with the rest of the kids	11 years ago
Yueh-Hsuan Chiang	dc81efe415	Change the log-level of DB summary and options from INFO_LEVEL to WARN_LEVEL Summary: Change the log-level of DB summary and options from INFO_LEVEL to WARN_LEVEL Test Plan: Use db_bench to verify the log level. Sample output: 2015/05/22-00:20:39.778064 7fff75b41300 [WARN] RocksDB version: 3.11.0 2015/05/22-00:20:39.778095 7fff75b41300 [WARN] Git sha rocksdb_build_git_sha:7fee8775a459134c4cb04baae5bd1687e268f2a0 2015/05/22-00:20:39.778099 7fff75b41300 [WARN] Compile date May 22 2015 2015/05/22-00:20:39.778101 7fff75b41300 [WARN] DB SUMMARY 2015/05/22-00:20:39.778145 7fff75b41300 [WARN] SST files in /tmp/rocksdbtest-691931916/dbbench dir, Total Num: 0, files: 2015/05/22-00:20:39.778148 7fff75b41300 [WARN] Write Ahead Log file in /tmp/rocksdbtest-691931916/dbbench: 2015/05/22-00:20:39.778150 7fff75b41300 [WARN] Options.error_if_exists: 0 2015/05/22-00:20:39.778152 7fff75b41300 [WARN] Options.create_if_missing: 1 2015/05/22-00:20:39.778153 7fff75b41300 [WARN] Options.paranoid_checks: 1 Reviewers: MarkCallaghan, igor, kradhakrishnan Reviewed By: igor Subscribers: sdong, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38835	11 years ago
Yueh-Hsuan Chiang	7fee8775a4	Allow EventLogger to directly log from a JSONWriter. Summary: Allow EventLogger to directly log from a JSONWriter. This allows the JSONWriter to be shared by EventLogger and potentially EventListener, which is an important step to integrate EventLogger and EventListener. This patch also rewrites EventLoggerHelpers::LogTableFileCreation(), which uses the new API to generate identical log. Test Plan: Run db_bench in debug mode and make sure the log is correct and no assertions fail. Reviewers: sdong, anthony, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38709	11 years ago
Igor Canadi	4cb4d546cd	Set stats_dump_period_sec to 600 by default Summary: Having stats in our LOG more often will help a lot with perf debugging. Test Plan: none Reviewers: sdong, MarkCallaghan Reviewed By: MarkCallaghan Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38781	11 years ago
Yueh-Hsuan Chiang	d1a978ae3d	Rename JSONWritter to JSONWriter Summary: Rename JSONWritter to JSONWriter Test Plan: make Reviewers: igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38733	11 years ago
Igor Canadi	4a855c0799	Add an option wal_bytes_per_sync to control sync_file_range for WAL files Summary: sync_file_range is not always asyncronous and thus can block writes if we do this for WAL in the foreground thread. See more here: http://yoshinorimatsunobu.blogspot.com/2014/03/how-syncfilerange-really-works.html Some users don't want us to call sync_file_range on WALs. Some other do. Thus, I'm adding a separate option wal_bytes_per_sync to control calling sync_file_range on WAL files. bytes_per_sync will apply only to table files now. Test Plan: no more sync_file_range for WAL as evidenced by strace Reviewers: yhchiang, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38253	11 years ago
Yueh-Hsuan Chiang	74f3832d85	Fixed compile errors due to some gcc does not have std::map::emplace Summary: Fixed the following compile errors due to some gcc does not have std::map::emplace util/thread_status_impl.cc: In static member function ‘static std::map<std::basic_string<char>, long unsigned int> rocksdb::ThreadStatus::InterpretOperationProperties(rocksdb::ThreadStatus::OperationType, const uint64_t)’: util/thread_status_impl.cc:88:20: error: ‘class std::map<std::basic_string<char>, long unsigned int>’ has no member named ‘emplace’ util/thread_status_impl.cc:90:20: error: ‘class std::map<std::basic_string<char>, long unsigned int>’ has no member named ‘emplace’ util/thread_status_impl.cc:94:20: error: ‘class std::map<std::basic_string<char>, long unsigned int>’ has no member named ‘emplace’ util/thread_status_impl.cc:96:20: error: ‘class std::map<std::basic_string<char>, long unsigned int>’ has no member named ‘emplace’ util/thread_status_impl.cc:98:20: error: ‘class std::map<std::basic_string<char>, long unsigned int>’ has no member named ‘emplace’ util/thread_status_impl.cc:101:20: error: ‘class std::map<std::basic_string<char>, long unsigned int>’ has no member named ‘emplace’ make: ** [util/thread_status_impl.o] Error 1 Test Plan: make db_bench Reviewers: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38643	11 years ago
stash93	0c8017dbae	Remove duplicated code Summary: Call Flush() function instead Test Plan: make all check Reviewers: igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D38583	11 years ago
Yueh-Hsuan Chiang	3f0867c0fe	Allow GetThreadList to report Flush properties. Summary: Allow GetThreadList to report Flush properties, which includes: * job id * number of bytes that has been written since flush started. * total size of input mem-tables Test Plan: ./db_bench --threads=30 --num=1000000 --benchmarks=fillrandom --thread_status_per_interval=100 --value_size=1000 Sample output from db_bench which tracks same flush job ThreadID ThreadType cfName Operation ElapsedTime Stage State OperationProperties 140213879898240 High Pri default Flush 5789 us FlushJob::WriteLevel0Table BytesMemtables 4112835 \| BytesWritten 577104 \| JobID 8 \| ThreadID ThreadType cfName Operation ElapsedTime Stage State OperationProperties 140213879898240 High Pri default Flush 30.634 ms FlushJob::WriteLevel0Table BytesMemtables 4112835 \| BytesWritten 1734865 \| JobID 8 \| Reviewers: rven, igor, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38505	11 years ago
Yueh-Hsuan Chiang	714fcc067d	Make ThreadStatus::InterpretOperationProperties take const uint64_t* Summary: Make ThreadStatus::InterpretOperationProperties take const uint64_t* Test Plan: make make OPT=-DROCKSDB_LITE shared_lib Reviewers: igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38445	11 years ago
Igor Canadi	dbd95b7532	Add more table properties to EventLogger Summary: Example output: {"time_micros": 1431463794310521, "job": 353, "event": "table_file_creation", "file_number": 387, "file_size": 86937, "table_info": {"data_size": "81801", "index_size": "9751", "filter_size": "0", "raw_key_size": "23448", "raw_average_key_size": "24.000000", "raw_value_size": "990571", "raw_average_value_size": "1013.890481", "num_data_blocks": "245", "num_entries": "977", "filter_policy_name": "", "kDeletedKeys": "0"}} Also fixed a bug where BuildTable() in recovery was passing Env::IOHigh argument into paranoid_checks_file parameter. Test Plan: make check + check out the output in the log Reviewers: sdong, rven, yhchiang Reviewed By: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38343	11 years ago
Yueh-Hsuan Chiang	77a5a543a5	Allow GetThreadList() to report basic compaction operation properties. Summary: Now we're able to show more details about a compaction in GetThreadList() :) This patch allows GetThreadList() to report basic compaction operation properties. Basic compaction properties include: 1. job id 2. compaction input / output level 3. compaction property flags (is_manual, is_deletion, .. etc) 4. total input bytes 5. the number of bytes has been read currently. 6. the number of bytes has been written currently. Flush operation properties will be done in a seperate diff. Test Plan: /db_bench --threads=30 --num=1000000 --benchmarks=fillrandom --thread_status_per_interval=1 Sample output of tracking same job: ThreadID ThreadType cfName Operation ElapsedTime Stage State OperationProperties 140664171987072 Low Pri default Compaction 31.357 ms CompactionJob::FinishCompactionOutputFile BaseInputLevel 1 \| BytesRead 2264663 \| BytesWritten 1934241 \| IsDeletion 0 \| IsManual 0 \| IsTrivialMove 0 \| JobID 277 \| OutputLevel 2 \| TotalInputBytes 3964158 \| ThreadID ThreadType cfName Operation ElapsedTime Stage State OperationProperties 140664171987072 Low Pri default Compaction 59.440 ms CompactionJob::FinishCompactionOutputFile BaseInputLevel 1 \| BytesRead 2264663 \| BytesWritten 1934241 \| IsDeletion 0 \| IsManual 0 \| IsTrivialMove 0 \| JobID 277 \| OutputLevel 2 \| TotalInputBytes 3964158 \| ThreadID ThreadType cfName Operation ElapsedTime Stage State OperationProperties 140664171987072 Low Pri default Compaction 226.375 ms CompactionJob::Install BaseInputLevel 1 \| BytesRead 3958013 \| BytesWritten 3621940 \| IsDeletion 0 \| IsManual 0 \| IsTrivialMove 0 \| JobID 277 \| OutputLevel 2 \| TotalInputBytes 3964158 \| Reviewers: sdong, rven, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D37653	11 years ago
Igor Canadi	7f47ba0e26	Fix possible SIGSEGV in CompactRange (github issue #596 ) Summary: For very detailed explanation of what's happening read this: https://github.com/facebook/rocksdb/issues/596 Test Plan: make check + new unit test Reviewers: yhchiang, anthony, rven Reviewed By: rven Subscribers: adamretter, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D37779	11 years ago
Igor Canadi	1bb4928da9	Include bunch of more events into EventLogger Summary: Added these events: * Recovery start, finish and also when recovery creates a file * Trivial move * Compaction start, finish and when compaction creates a file * Flush start, finish Also includes small fix to EventLogger Also added option ROCKSDB_PRINT_EVENTS_TO_STDOUT which is useful when we debug things. I've spent far too much time chasing LOG files. Still didn't get sst table properties in JSON. They are written very deeply into the stack. I'll address in separate diff. TODO: * Write specification. Let's first use this for a while and figure out what's good data to put here, too. After that we'll write spec * Write tools that parse and analyze LOGs. This can be in python or go. Good intern task. Test Plan: Ran db_bench with ROCKSDB_PRINT_EVENTS_TO_STDOUT. Here's the output: https://phabricator.fb.com/P19811976 Reviewers: sdong, yhchiang, rven, MarkCallaghan, kradhakrishnan, anthony Reviewed By: anthony Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D37521	11 years ago
Aashish Pant	3db81d535a	Fix memory leak in cache_test introduced in the previous commit Test Plan: Verified that valgrind build passes for cache_test Reviewers: igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D37665	11 years ago
clark.kang	6ede020dc4	fix typos	11 years ago
Aashish Pant	242f9b4c26	Fix CLANG build issue introduced in previous commit Summary: Added keyword override for SetCapacity() Test Plan: Fixes build Reviewers: igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D37647	11 years ago
Aashish Pant	794ccfde89	Task 6532943: Rocksdb - SetCapacity() can dynamically change cache capacity if feasible Summary: When new capacity is larger than existing capacity, simply update the capacity to the new valie When new capacity is less than existing capacity, but more than the usage, simply update the capacity to new value When new capacity is less than the existing capacity and existing usage both, try to purge entries in LRU if feasible to make usage < capacity Test Plan: Created unit tests in cache_test.cc Reviewers: sdong, rven, yhchiang, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D37527	11 years ago
sdong	98a44559d5	Build for CYGWIN Summary: Make it build for CYGWIN. Need to define "-std=gnu++11" instead of "-std=c++11" and use some replacement functions. Test Plan: Build it and run some unit tests in CYGWIN Reviewers: yhchiang, rven, anthony, kradhakrishnan, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D37605	11 years ago
Igor Canadi	e003d3864c	Abstract out SetMaxPossibleForUserKey() and SetMinPossibleForUserKey Summary: Based on feedback from D37083. Are all of these correct? In some spaces it seems like we're doing SetMaxPossibleForUserKey() although we want the smallest possible internal key for user key. Test Plan: make check Reviewers: sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D37341	11 years ago
sdong	397b6588bd	options.paranoid_file_checks to read all rows after writing to a file. Summary: To further distinguish the corruption cases were caused by storage media or in memory states when writing it, add a paranoid check after writing the file to iterate all the rows. Test Plan: Add a new unit test for it Reviewers: rven, igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D37335	11 years ago
Jim Meyering	0a91bca5db	test: avoid vuln-inducing use of temporary directory Summary: Without this change, someone on the machine on which I run "make check" could cause me to overwrite arbitrary files owned by me, via a symlink attack. Instead of using a predictable temporary directory and accepting to use a preexisting one, always create a new one using mkdtemp. If $TEST_IOCTL_FRIENDLY_TMPDIR is set and usable, attempt first to find a usable temporary directory therein. If not, or if unusable, then try /var/tmp and /tmp. If none of those is usable abort with a diagnostic. To do that, I added a new class. Its constructor finds a suitable directory or aborts, the sole member prints that directory's name, and the destructor unlinks what should be an empty directory. Note that while the code before this did not remove its temporary directory, there was only one per $UID. Now, there would be at least one per run or one per test, depending on implementation, so it is important to remove them. Test Plan: Run this on a fedora rawhide system, where /tmp is a tmpfs file system, and /var/tmp is ext4. # This gives a diagnostic that /dev/shm is not suitable # and ends up using /var/tmp. TEST_IOCTL_FRIENDLY_TMPDIR=/dev/shm ./env_test # Uses /var/tmp; same as when envvar not set. TEST_IOCTL_FRIENDLY_TMPDIR=/var/tmp ./env_test # Uses /tmp unless it's tmpfs, in which case it gives # a diagnostic and uses /var/tmp. TEST_IOCTL_FRIENDLY_TMPDIR=/tmp ./env_test Reviewers: ljin, rven, igor.sugak, yhchiang, sdong, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D37287	11 years ago

1 2 3 4 5 ...

883 Commits (7189e90c2206c6825ae0459e5f424e437723d1bc)