rocksdb

Commit Graph

Author	SHA1	Message	Date
Aaron Gao	dda6c72ac8	Add DestroyColumnFamilyHandle(ColumnFamilyHandle) to db.h Summary: add DestroyColumnFamilyHandle(ColumnFamilyHandle) to close column family instead of deleting cfh* User should call this to close a cf and then we can detect the deletion in this function. Test Plan: make all check -j64 Reviewers: andrewkr, yiwu, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D60765	8 years ago
Andrew Kryczka	56222f57df	Avoid FileMetaData copy Summary: as titled Test Plan: unit tests Reviewers: sdong, lightmark Reviewed By: lightmark Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D60597	8 years ago
Yi Wu	6ea41f8527	Fix deadlock when trying update options when write stalls Summary: When write stalls because of auto compaction is disabled, or stop write trigger is reached, user may change these two options to unblock writes. Unfortunately we had issue where the write thread will block the attempt to persist the options, thus creating a deadlock. This diff fix the issue and add two test cases to detect such deadlock. Test Plan: Run unit tests. Also, revert db_impl.cc to master (but don't revert `DBImpl::BackgroundCompaction:Finish` sync point) and run db_options_test. Both tests should hit deadlock. Reviewers: sdong Reviewed By: sdong Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D60627	8 years ago
Jay Edgar	efd013d6d8	Miscellaneous performance improvements Summary: I was investigating performance issues in the SstFileWriter and found all of the following: - The SstFileWriter::Add() function created a local InternalKey every time it was called generating a allocation and free each time. Changed to have an InternalKey member variable that can be reset with the new InternalKey::Set() function. - In SstFileWriter::Add() the smallest_key and largest_key values were assigned the result of a ToString() call, but it is simpler to just assign them directly from the user's key. - The Slice class had no move constructor so each time one was returned from a function a new one had to be allocated, the old data copied to the new, and the old one was freed. I added the move constructor which also required a copy constructor and assignment operator. - The BlockBuilder::CurrentSizeEstimate() function calculates the current estimate size, but was being called 2 or 3 times for each key added. I changed the class to maintain a running estimate (equal to the original calculation) so that the function can return an already calculated value. - The code in BlockBuilder::Add() that calculated the shared bytes between the last key and the new key duplicated what Slice::difference_offset does, so I replaced it with the standard function. - BlockBuilder::Add() had code to copy just the changed portion into the last key value (and asserted that it now matched the new key). It is more efficient just to copy the whole new key over. - Moved this same code up into the 'if (use_delta_encoding_)' since the last key value is only needed when delta encoding is on. - FlushBlockBySizePolicy::BlockAlmostFull calculated a standard deviation value each time it was called, but this information would only change if block_size of block_size_deviation changed, so I created a member variable to hold the value to avoid the calculation each time. - Each PutVarint??() function has a buffer and calls std::string::append(). Two or three calls in a row could share a buffer and a single call to std::string::append(). Some of these will be helpful outside of the SstFileWriter. I'm not 100% the addition of the move constructor is appropriate as I wonder why this wasn't done before - maybe because of compiler compatibility? I tried it on gcc 4.8 and 4.9. Test Plan: The changes should not affect the results so the existing tests should all still work and no new tests were added. The value of the changes was seen by manually testing the SstFileWriter class through MyRocks and adding timing code to identify problem areas. Reviewers: sdong, IslamAbdelRahman Reviewed By: IslamAbdelRahman Subscribers: andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D59607	8 years ago
Aaron Gao	816ae098ea	fix test failure Summary: fix Rocksdb Unit Test USER_FAILURE Test Plan: make all check -j64 Reviewers: sdong, andrewkr Reviewed By: andrewkr Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D60603	8 years ago
Aaron Gao	8e6b38d895	update DB::AddFile to ingest list of sst files Summary: DB::AddFile(std::string file_path) API that allow them to ingest an SST file created using SstFileWriter We want to update this interface to be able to accept a list of files that will be ingested, DB::AddFile(std::vector<std::string> file_path_list). Test Plan: Add test case `AddExternalSstFileList` in `DBSSTTest`. To make sure: 1. files key ranges are not overlapping with each other 2. each file key range dont overlap with the DB key range 3. make sure no snapshots are held Reviewers: andrewkr, sdong, IslamAbdelRahman Reviewed By: IslamAbdelRahman Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D58587	8 years ago
Yi Wu	296545a2c7	Fix clang analyzer errors Summary: Fixing erros reported by clang static analyzer. * Removing some unused variables. * Adding assertions to fix false positives reported by clang analyzer. * Adding `__clang_analyzer__` macro to suppress false positive warnings. Test Plan: USE_CLANG=1 OPT=-g make analyze -j64 Reviewers: andrewkr, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D60549	8 years ago
sdong	907f24d0e1	Concurrent memtable inserter to update counters and flush state after all inserts Summary: In concurrent memtable insert case, updating counters in MemTable::Add() can count for 5% CPU usage. By batch all the counters and update in the end of the write batch, the CPU overheads are overhead in the use cases where more than one key is updated in one write batch. Test Plan: Write throughput increases 12% with this benchmark setting: TEST_TMPDIR=/dev/shm/ ./db_bench --benchmarks=fillrandom -disable_auto_compactions -level0_slowdown_writes_trigger=9999 -level0_stop_writes_trigger=9999 -num=10000000 --writes=1000000 -max_background_flushes=16 -max_write_buffer_number=16 --threads=64 --batch_size=128 -allow_concurrent_memtable_write -enable_write_thread_adaptive_yield Reviewers: andrewkr, IslamAbdelRahman, ngbronson, igor Reviewed By: ngbronson Subscribers: ngbronson, leveldb, andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D60495	8 years ago
Andrew Kryczka	e1b3ee8a79	Cleanup auto-roll logger flush-while-rolling test Summary: Use @omegaga's awesome feature to avoid use of callbacks for ensuring SyncPoints happen in a particular thread. Depends on D60375. Test Plan: $ ./auto_roll_logger_test Reviewers: omegaga, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba, omegaga, leveldb Differential Revision: https://reviews.facebook.net/D60471	8 years ago
omegaga	cd4178a015	Add a new feature to enforce a sync point only active on a thread Summary: Add markers to sync points. A marked sync point will only be active when it is on the same thread as the marker sync point. Test Plan: Write a unit test to validate. Reviewers: sdong, IslamAbdelRahman, andrewkr Reviewed By: andrewkr Subscribers: andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D60375	8 years ago
Gunnar Kudrjavets	b954847fca	Fix release build for MyRocks by using debug-only code only in debug builds Summary: MyRocks release integration build breaks because we treat warnings caused by unused variables as errors. Variable `edit` is only used in debug builds. Therefore we need to guard it using `#ifndef NDEBUG` check. Test Plan: - `[p]arc diff --preview` for the default validation. - Verify that release build fails before this fix and passes after applying it. Reviewers: andrewkr, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D60423	8 years ago
sdong	a00bf1b3cf	Add More Logging to track total_log_size Summary: We saw instances where total_log_size is off the real value, but I'm not able to reproduce it. Add more logging to help debugging when it happens again. Test Plan: Run the unit test and see the logging. Reviewers: andrewkr, yhchiang, igor, IslamAbdelRahman Reviewed By: IslamAbdelRahman Subscribers: leveldb, andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D60081	8 years ago
sdong	32df9733d1	Add options.write_buffer_manager: control total memtable size across DB instances Summary: Add option write_buffer_manager to help users control total memory spent on memtables across multiple DB instances. Test Plan: Add a new unit test. Reviewers: yhchiang, IslamAbdelRahman Reviewed By: IslamAbdelRahman Subscribers: adela, benj, sumeet, muthu, leveldb, andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D59925	8 years ago
Aaron Gao	5aaef91d4a	group multiple batch of flush into one manifest file (one call to LogAndApply) Summary: Currently, if several flush outputs are committed together, we issue each manifest write per batch (1 batch = 1 flush = 1 sst file = 1+ continuous memtables). Each manifest write requires one fsync and one fsync to parent directory. In some cases, it becomes the bottleneck of write. We should batch them and write in one manifest write when possible. Test Plan: ` ./db_bench -benchmarks="fillseq" -max_write_buffer_number=16 -max_background_flushes=16 -disable_auto_compactions=true -min_write_buffer_number_to_merge=1 -write_buffer_size=65536 -level0_stop_writes_trigger=10000 -level0_slowdown_writes_trigger=10000` Before ``` Initializing RocksDB Options from the specified file Initializing RocksDB Options from command-line flags RocksDB: version 4.9 Date: Fri Jul 1 15:38:17 2016 CPU: 32 * Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz CPUCache: 20480 KB Keys: 16 bytes each Values: 100 bytes each (50 bytes after compression) Entries: 1000000 Prefix: 0 bytes Keys per prefix: 0 RawSize: 110.6 MB (estimated) FileSize: 62.9 MB (estimated) Write rate: 0 bytes/second Compression: Snappy Memtablerep: skip_list Perf Level: 1 WARNING: Assertions are enabled; benchmarks unnecessarily slow ------------------------------------------------ Initializing RocksDB Options from the specified file Initializing RocksDB Options from command-line flags DB path: [/tmp/rocksdbtest-112628/dbbench] fillseq : 166.277 micros/op 6014 ops/sec; 0.7 MB/s ``` After ``` Initializing RocksDB Options from the specified file Initializing RocksDB Options from command-line flags RocksDB: version 4.9 Date: Fri Jul 1 15:35:05 2016 CPU: 32 * Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz CPUCache: 20480 KB Keys: 16 bytes each Values: 100 bytes each (50 bytes after compression) Entries: 1000000 Prefix: 0 bytes Keys per prefix: 0 RawSize: 110.6 MB (estimated) FileSize: 62.9 MB (estimated) Write rate: 0 bytes/second Compression: Snappy Memtablerep: skip_list Perf Level: 1 WARNING: Assertions are enabled; benchmarks unnecessarily slow ------------------------------------------------ Initializing RocksDB Options from the specified file Initializing RocksDB Options from command-line flags DB path: [/tmp/rocksdbtest-112628/dbbench] fillseq : 52.328 micros/op 19110 ops/sec; 2.1 MB/s ``` Reviewers: andrewkr, IslamAbdelRahman, yhchiang, sdong Reviewed By: sdong Subscribers: igor, andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D60075	8 years ago
omegaga	a45ee83181	Fix a bug that accesses invalid address in iterator cleanup function Summary: Reported in T11889874. When registering the cleanup function we should copy the option so that we can still access it if ReadOptions is deleted. Test Plan: Add a unit test to reproduce this bug. Reviewers: sdong Reviewed By: sdong Subscribers: andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D60087	8 years ago
Gunnar Kudrjavets	bdb1d19a69	Fix UBSan build break caused by variable not initialized Summary: UBSan is unhappy because `cfd` is not initialized. This breaks UBSan build which in turn breaks MyRocks continuous integration with RocksDB which in turns makes me unhappy :-) Fix this. Test Plan: - `[p]arc diff --preview` + Sandcastle. - Verify that `COMPILE_WITH_UBSAN=1 OPT=-g make J=1 ubsan_check` gets past the break. Reviewers: andrewkr, hermanlee4, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D60117	9 years ago
sdong	c4cef07f1b	Update DBTestUniversalCompaction.UniversalCompactionSingleSortedRun to use max_size_amplification_percent = 0 Summary: With max_size_amplification_percent = 0 to make sure that DBTestUniversalCompaction.UniversalCompactionSingleSortedRun tests the configuration to compact to one single sorted run. Test Plan: Run all existing tests Reviewers: yhchiang, andrewkr, IslamAbdelRahman Reviewed By: IslamAbdelRahman Subscribers: leveldb, andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D60021	9 years ago
charsyam	4f2b0946d1	fix simple typos (#1183 )	9 years ago
Andrew Kryczka	3b7ed677de	ColumnFamilyOptions API [CF + RepairDB part 3/3] Summary: Overload RepairDB to take vector-of-ColumnFamilyDescriptor, which tells us CF name + options. Also takes a ColumnFamilyOptions for unspecified column families encountered during the repair. One potentially confusing thing is that we store options in the constructor and don't invoke AddColumnFamily() until discovering the CF in ScanTable. This is because we don't know the CF ID until we find a table belonging to that CF. Depends on D59781. Test Plan: $ ./repair_test Reviewers: yhchiang, IslamAbdelRahman, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D59853	9 years ago
Andrew Kryczka	56ac686292	Detect column family from properties [CF + RepairDB part 2/3] Summary: This diff uses the CF ID and CF name properties in the SST file to associate recovered data with the proper column family. Depends on D59775. - In ScanTable(), create column families in VersionSet each time a new one is discovered (via reading SST file properties) - In ConvertLogToTable(), dump an SST file for every column family with data in the WAL - In AddTables(), make a VersionEdit per-column family that adds all of that CF's tables Test Plan: $ ./repair_test Reviewers: yhchiang, IslamAbdelRahman, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D59781	9 years ago
Andrew Kryczka	343507afb1	Refactor to use VersionSet [CF + RepairDB part 1/3] Summary: To support column families, it is easiest to use VersionSet to manage our column families (if we don't have Versions then ColumnFamilyData always behaves as a dummy column family). This diff only refactors the existing repair logic to use VersionSet; the next two parts will add support for multiple column families. Test Plan: $ ./repair_test Reviewers: yhchiang, IslamAbdelRahman, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D59775	9 years ago
omegaga	c4e19b77e8	Add a read option to enable background purge when cleaning up iterators Summary: Add a read option `background_purge_on_iterator_cleanup` to avoid deleting files in foreground when destroying iterators. Instead, a job is scheduled in high priority queue and would be executed in a separate background thread. Test Plan: Add a variant of PurgeObsoleteFileTest. Turn on background purge option in the new test, and use sleeping task to ensure files are deleted in background. Reviewers: IslamAbdelRahman, sdong Reviewed By: IslamAbdelRahman Subscribers: andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D59499	9 years ago
Islam AbdelRahman	fa813f7478	Update DB::AddFile() to ingest the file to the lowest possible level Summary: DB::AddFile() right now always add the ingested file to L0 update the logic to add the file to the lowest possible level Test Plan: unit tests Reviewers: jkedgar, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba, yoshinorim Differential Revision: https://reviews.facebook.net/D59637	9 years ago
sdong	7b79238b65	Deprectate filter_deletes Summary: filter_deltes is not a frequently used feature. Remove it. Test Plan: Run all test suites. Reviewers: igor, yhchiang, IslamAbdelRahman Reviewed By: IslamAbdelRahman Subscribers: leveldb, andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D59427	9 years ago
Islam AbdelRahman	30a24f2d3d	Add InternalStats and logging for AddFile() Summary: We dont report the bytes that we ingested from AddFile which make the write amplification numbers incorrect Update InternalStats and add logging for AddFile() Test Plan: Make sure the code compile and existing tests pass Reviewers: lightmark, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D59763	9 years ago
sdong	249e796dfc	Fix Flaky DBCompactionTest.SkipStatsUpdateTest Summary: DBCompactionTest.SkipStatsUpdateTest sometimes fails. I don't see any verification related to the deletes issued. Remove them to avoid the uncertainty. Test Plan: Run the test. Reviewers: IslamAbdelRahman, andrewkr, yhchiang Reviewed By: yhchiang Subscribers: leveldb, andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D59613	9 years ago
Islam AbdelRahman	f5177c761f	Remove wasteful instrumentation in FullMerge (stacked on D59577) Summary: [ This diff is stacked on top of D59577 ] We keep calling timer.ElapsedNanos() on every call to MergeOperator::FullMerge even when statistics are disabled, this is wasteful. I run the readseq benchmark on a DB containing 100K merge operands for 100K keys (1 operand per key) with 1GB block cache I see slight performance improvment Original results ``` $ ./db_bench --benchmarks="readseq,readseq,readseq,readseq,readseq,readseq,readseq,readseq,readseq" --merge_operator="max" --merge_keys=100000 --num=100000 --db="/dev/shm/100K_merge_compacted/" --cache_size=1073741824 --use_existing_db --disable_auto_compactions ------------------------------------------------ DB path: [/dev/shm/100K_merge_compacted/] readseq : 0.498 micros/op 2006597 ops/sec; 222.0 MB/s DB path: [/dev/shm/100K_merge_compacted/] readseq : 0.295 micros/op 3393627 ops/sec; 375.4 MB/s DB path: [/dev/shm/100K_merge_compacted/] readseq : 0.285 micros/op 3511155 ops/sec; 388.4 MB/s DB path: [/dev/shm/100K_merge_compacted/] readseq : 0.286 micros/op 3500470 ops/sec; 387.2 MB/s DB path: [/dev/shm/100K_merge_compacted/] readseq : 0.283 micros/op 3530751 ops/sec; 390.6 MB/s DB path: [/dev/shm/100K_merge_compacted/] readseq : 0.289 micros/op 3464811 ops/sec; 383.3 MB/s DB path: [/dev/shm/100K_merge_compacted/] readseq : 0.277 micros/op 3612814 ops/sec; 399.7 MB/s DB path: [/dev/shm/100K_merge_compacted/] readseq : 0.283 micros/op 3539640 ops/sec; 391.6 MB/s DB path: [/dev/shm/100K_merge_compacted/] readseq : 0.285 micros/op 3503766 ops/sec; 387.6 MB/s ``` After patch ``` $ ./db_bench --benchmarks="readseq,readseq,readseq,readseq,readseq,readseq,readseq,readseq,readseq" --merge_operator="max" --merge_keys=100000 --num=100000 --db="/dev/shm/100K_merge_compacted/" --cache_size=1073741824 --use_existing_db --disable_auto_compactions ------------------------------------------------ DB path: [/dev/shm/100K_merge_compacted/] readseq : 0.476 micros/op 2100119 ops/sec; 232.3 MB/s DB path: [/dev/shm/100K_merge_compacted/] readseq : 0.278 micros/op 3600887 ops/sec; 398.4 MB/s DB path: [/dev/shm/100K_merge_compacted/] readseq : 0.275 micros/op 3636698 ops/sec; 402.3 MB/s DB path: [/dev/shm/100K_merge_compacted/] readseq : 0.271 micros/op 3691661 ops/sec; 408.4 MB/s DB path: [/dev/shm/100K_merge_compacted/] readseq : 0.273 micros/op 3661534 ops/sec; 405.1 MB/s DB path: [/dev/shm/100K_merge_compacted/] readseq : 0.276 micros/op 3627106 ops/sec; 401.3 MB/s DB path: [/dev/shm/100K_merge_compacted/] readseq : 0.272 micros/op 3682635 ops/sec; 407.4 MB/s DB path: [/dev/shm/100K_merge_compacted/] readseq : 0.266 micros/op 3758331 ops/sec; 415.8 MB/s DB path: [/dev/shm/100K_merge_compacted/] readseq : 0.266 micros/op 3761907 ops/sec; 416.2 MB/s ``` Test Plan: make check -j64 Reviewers: yhchiang, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D59583	9 years ago
Islam AbdelRahman	7c919deccc	Reuse TimedFullMerge instead of FullMerge + instrumentation Summary: We have alot of code duplication whenever we call FullMerge we keep duplicating the instrumentation and statistics code This is a simple diff to refactor the code to use TimedFullMerge instead of FullMerge Test Plan: COMPILE_WITH_ASAN=1 make check -j64 Reviewers: andrewkr, yhchiang, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D59577	9 years ago
Yi Wu	bc8af90e8c	add option to not flush memtable on open() Summary: Add option to not flush memtable on open() In case the option is enabled, don't delete existing log files by not updating log numbers to MANIFEST. Will still flush if we need to (e.g. memtable full in the middle). In that case we also flush final memtable. If wal_recovery_mode = kPointInTimeRecovery, do not halt immediately after encounter corruption. Instead, check if seq id of next log file is last_log_sequence + 1. In that case we continue recovery. Test Plan: See unit test. Reviewers: dhruba, horuff, sdong Reviewed By: sdong Subscribers: benj, yhchiang, andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D57813	9 years ago
sdong	6faddd7c55	Merge db/slice.cc into util/slice.cc Summary: It confuses some compilers to have slice.cc under multiple directories. Merge them. Test Plan: Run existing tests Reviewers: andrewkr, yhchiang, IslamAbdelRahman Reviewed By: IslamAbdelRahman Subscribers: leveldb, andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D59409	9 years ago
sdong	5009b5326b	BlockBasedTable::FullFilterKeyMayMatch() Should skip prefix bloom if full key bloom exists Summary: Currently, if users define both of full key bloom and prefix bloom in SST files. During Get(), if full key bloom shows the key may exist, we still go ahead and check prefix bloom. This is wasteful. If bloom filter for full keys exists, we should always ignore prefix bloom in Get(). Test Plan: Run existing tests Reviewers: yhchiang, IslamAbdelRahman Reviewed By: IslamAbdelRahman Subscribers: leveldb, andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D57825	9 years ago
sdong	20699df843	memtable_prefix_bloom_bits -> memtable_prefix_bloom_bits_ratio and deprecate memtable_prefix_bloom_probes Summary: memtable_prefix_bloom_probes is not a critical option. Remove it to reduce number of options. It's easier for users to make mistakes with memtable_prefix_bloom_bits, turn it to memtable_prefix_bloom_bits_ratio Test Plan: Run all existing tests Reviewers: yhchiang, igor, IslamAbdelRahman Reviewed By: IslamAbdelRahman Subscribers: gunnarku, yoshinorim, MarkCallaghan, leveldb, andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D59199	9 years ago
Wanning Jiang	56887f6cb8	Backup Options Summary: Backup options file to private directory Test Plan: backupable_db_test.cc, BackupOptions Modify DB options by calling OpenDB for 3 times. Check the latest options file is in the right place. Also check no redundent files are backuped. Reviewers: andrewkr Reviewed By: andrewkr Subscribers: leveldb, dhruba, andrewkr Differential Revision: https://reviews.facebook.net/D59373	9 years ago
Anirban Rahut	a73b26f601	Adding test for contiguous WAL detection Summary: Add a test to detect that when WAL gets truncated, seq no's are checked to be contiguous. This test is put in ColumnFamilyTest as it has the necessary infrastructure/functions for flushing column families, which we use to ensure 2 active WAL files Test Plan: This is a test, no feature has been added. This test fails today and hence disabled Reviewers: sdong Reviewed By: sdong Subscribers: lgalanis, dhruba, andrewkr, pritamdamania Differential Revision: https://reviews.facebook.net/D59253	9 years ago
Aaron Gao	e532877940	Add statistics field to show total size of index and filter blocks in block cache Summary: With `table_options.cache_index_and_filter_blocks = true`, index and filter blocks are stored in block cache. Then people are curious how much of the block cache total size is used by indexes and bloom filters. It will be nice we have a way to report that. It can help people tune performance and plan for optimized hardware setting. We add several enum values for db Statistics. BLOCK_CACHE_INDEX/FILTER_BYTES_INSERT - BLOCK_CACHE_INDEX/FILTER_BYTES_ERASE = current INDEX/FILTER total block size in bytes. Test Plan: write a test case called `DBBlockCacheTest.IndexAndFilterBlocksStats`. The result is: ``` [gzh@dev9927.prn1 ~/local/rocksdb] make db_block_cache_test -j64 && ./db_block_cache_test --gtest_filter=DBBlockCacheTest.IndexAndFilterBlocksStats Makefile:101: Warning: Compiling in debug mode. Don't use the resulting binary in production GEN util/build_version.cc make: `db_block_cache_test' is up to date. Note: Google Test filter = DBBlockCacheTest.IndexAndFilterBlocksStats [==========] Running 1 test from 1 test case. [----------] Global test environment set-up. [----------] 1 test from DBBlockCacheTest [ RUN ] DBBlockCacheTest.IndexAndFilterBlocksStats [ OK ] DBBlockCacheTest.IndexAndFilterBlocksStats (689 ms) [----------] 1 test from DBBlockCacheTest (689 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test case ran. (689 ms total) [ PASSED ] 1 test. ``` Reviewers: IslamAbdelRahman, andrewkr, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D58677	9 years ago
Jan Doms	02ec8154e5	allow updating block cache capacity from C (#1149 )	9 years ago
Andrew Kryczka	842958651f	Fix race condition in SwitchMemtable Summary: MemTableList::current_ could be written by background flush thread and simultaneously read in the user thread (NumNotFlushed() is used in SwitchMemtable()). Use the lock to prevent this case. Found the error from tsan. Related: D58833 Test Plan: $ OPT=-g COMPILE_WITH_TSAN=1 make -j64 db_test $ TEST_TMPDIR=/dev/shm/rocksdb ./db_test --gtest_filter=DBTest.RepeatedWritesToSameKey Reviewers: lightmark, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D59139	9 years ago
PraveenSinghRao	3a276b0cbe	Add a callback for when memtable is moved to immutable (#1137 ) * Create a callback for memtable becoming immutable Create a callback for memtable becoming immutable Create a callback for memtable becoming immutable moved notification outside the lock Move sealed notification to unlocked portion of SwitchMemtable * fix lite build	9 years ago
Mike Kolupaev	936973d145	Small tweaks to logging to track the number of immutable memtables Summary: We see some write stalls because of number of unflushed memtables. With existing logging I couldn't figure out what's happening exactly. See internal task t11446054 for details if interested. This diff adds: - logging of memtable creation at info level; I wanted it on multiple occasions for different reasons; also include number of immutable memtables, - logging of number of remaining immutable memtables after a flush. Test Plan: ran tests Reviewers: sdong Reviewed By: sdong Subscribers: andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D58833	9 years ago
siddontang	21c047ab49	add readahead size option (#1146 )	9 years ago
Reid Horuff	5d85fdb2c5	add missing lock	9 years ago
sdong	345fd73faf	Fix flaky DBTestDynamicLevel.DynamicLevelMaxBytesBase2 Summary: We added more table properties for each SST file, so when using 2KB SST file size, the estimated size of SST files is off by almost half, causing the LSM tree structure not as expected. Fix it by making file size 4x as previously, as well as LSM base size. Also avoid the sleeping based synchronization and turn to use sync points. Test Plan: Run paralell unit tests multiple times and make sure they always pass. Reviewers: IslamAbdelRahman, kradhakrishnan Reviewed By: kradhakrishnan Subscribers: leveldb, andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D58749	9 years ago
krad	8fc75de327	Minor fix to disable DynamicLevelMaxBytesBase2	9 years ago
Ashish Shenoy	99765ed855	Clean up the ComputeCompactionScore() API Summary: Make CompactionOptionsFIFO a part of mutable_cf_options Test Plan: UT Reviewers: sdong Reviewed By: sdong Subscribers: andrewkr, lgalanis, dhruba Differential Revision: https://reviews.facebook.net/D58653	9 years ago
Shen Li	def2f7bd0e	Expose report_bg_io_stats option in the C API. (#1131 )	9 years ago
siddontang	8f1214531e	C API: Expose DeleteFileInRange (#1132 )	9 years ago
Sage Weil	11f329bd40	db/db_impl: restrict WALRecoveryMode when using recycled log files kPointInTimeRecovery is indistinguishable from kTolerateCorruptedTailRecords in recycle mode since we define the "end" of the log as the first corrupt record we encounter. kAbsoluteConsistency doesn't make sense because even a clean shutdown leaves old junk at the end of the log file. Signed-off-by: Sage Weil <sage@redhat.com>	9 years ago
Sage Weil	2b2a898e0b	db/log_reader: combine kBadRecord{Len,Checksum} for readability These vary only by the corruption string reported. Signed-off-by: Sage Weil <sage@redhat.com>	9 years ago
Sage Weil	34df1c94d5	db/log_reader: treat bad record length or checksum as EOF If we are in kTolerateCorruptedTailRecords, treat these errors as the end of the log. This is particularly important for recycled logs, where we will regularly see corrupted headers (bad length or checksum) when replaying a log. If we are aligned with a block boundary or get lucky, we will land on an old header and see the log number mismatch, but more commonly we will land midway through some previous block and record and effectively see noise. These must be treated as the end of the log in order for recycling to work. This makes the LogTest.Recycle/1 test pass. We also modify a number of existing tests because the recycled log files behave fundamentally differently in that they always stop when they reach the first bad record. Signed-off-by: Sage Weil <sage@redhat.com>	9 years ago
Sage Weil	7947aba68c	db/log_reader: move kBadRecord{Len,Checksum} handling into ReadRecord The behavior here needs to depend on the WAL recovery mode. No functional change in this patch. Signed-off-by: Sage Weil <sage@redhat.com>	9 years ago

... 15 16 17 18 19 ...

3169 Commits (01bcc348966566e9cd465a04d8f620d7b68f7785)