rocksdb

Commit Graph

Author	SHA1	Message	Date
sdong	49628c9a83	Use std::numeric_limits<> (#9954 ) Summary: Right now we still don't fully use std::numeric_limits but use a macro, mainly for supporting VS 2013. Right now we only support VS 2017 and up so it is not a problem. The code comment claims that MinGW still needs it. We don't have a CI running MinGW so it's hard to validate. since we now require C++17, it's hard to imagine MinGW would still build RocksDB but doesn't support std::numeric_limits<>. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9954 Test Plan: See CI Runs. Reviewed By: riversand963 Differential Revision: D36173954 fbshipit-source-id: a35a73af17cdcae20e258cdef57fcf29a50b49e0	3 years ago
Peter Dillinger	41237dd306	Add "no compression" job to CircleCI (#9850 ) Summary: Since they operate at distinct abstraction layers, I thought it was prudent to combine with EncryptedEnv CI test for each PR, for efficiency in testing. Also added supported compressions to sst_dump --help output so that CI job can verify no compiled-in compression support. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9850 Test Plan: CI, some manual stuff Reviewed By: riversand963 Differential Revision: D35682346 Pulled By: pdillinger fbshipit-source-id: be9879c1533fed304ee32c89fd9ba4b07c2b90cc	3 years ago
Peter Dillinger	cff0d1e8e6	New backup meta schema, with file temperatures (#9660 ) Summary: The primary goal of this change is to add support for backing up and restoring (applying on restore) file temperature metadata, without committing to either the DB manifest or the FS reported "current" temperatures being exclusive "source of truth". To achieve this goal, we need to add temperature information to backup metadata, which requires updated backup meta schema. Fortunately I prepared for this in https://github.com/facebook/rocksdb/issues/8069, which began forward compatibility in version 6.19.0 for this kind of schema update. (Previously, backup meta schema was not extensible! Making this schema update public will allow some other "nice to have" features like taking backups with hard links, and avoiding crc32c checksum computation when another checksum is already available.) While schema version 2 is newly public, the default schema version is still 1. Until we change the default, users will need to set to 2 to enable features like temperature data backup+restore. New metadata like temperature information will be ignored with a warning in versions before this change and since 6.19.0. The metadata is considered ignorable because a functioning DB can be restored without it. Some detail: * Some renaming because "future schema" is now just public schema 2. * Initialize some atomics in TestFs (linter reported) * Add temperature hint support to SstFileDumper (used by BackupEngine) Pull Request resolved: https://github.com/facebook/rocksdb/pull/9660 Test Plan: related unit test majorly updated for the new functionality, including some shared testing support for tracking temperatures in a FS. Some other tests and testing hooks into production code also updated for making the backup meta schema change public. Reviewed By: ajkr Differential Revision: D34686968 Pulled By: pdillinger fbshipit-source-id: 3ac1fa3e67ee97ca8a5103d79cc87d872c1d862a	3 years ago
mrambacher	281ac9c89e	Add CreateFrom methods to Env/FileSystem (#8174 ) Summary: - Added CreateFromString method to Env and FilesSystem to replace LoadEnv/Load. This method/signature is a precursor to making these classes extend Customizable. - Added CreateFromSystem to Env. This method standardizes creating an Env from the environment variables. Previously, some places would check TEST_ENV_URI and others would also check TEST_FS_URI. Now the code is more command/standardized. - Added CreateFromFlags to Env. These method allows Env to be create from string options (such as GFLAGS options) in a more standard way. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8174 Reviewed By: zhichao-cao Differential Revision: D28999603 Pulled By: mrambacher fbshipit-source-id: 88e6911e7e91f908458a7fe10a20e93ecbc275fb	3 years ago
Hans Holmberg	670567db09	Add support for custom file systems to ldb and sst_dump (#8010 ) Summary: This PR adds support for custom file systems to ldb and sst_dump by adding command line options for specifying --fs_uri and --backup_fs uri (for ldb backup/restore commands). fs_uri is already supported in db_bench and db_stress, and there is already support in ldb and db stress for specifying customized envs. The PR also fixes what looks like a bug in the ldb backup/restore commands. As it is right now, backups can only be made from and to the same environment/file system which does not seem to be the intended behavior. This PR makes it possible to do/restore backups between different envs/file systems. Example: `./ldb backup --fs_uri=zenfs://dev:nvme2n1 --backup_fs_uri=posix:// --backup_dir=/tmp/my_rocksdb_backup --db=rocksdbtest/dbbench ` Pull Request resolved: https://github.com/facebook/rocksdb/pull/8010 Reviewed By: jay-zhuang Differential Revision: D26904654 Pulled By: ajkr fbshipit-source-id: 9b695ed8b944fcc6b27c4daaa9f52e87ee2c1fb4	4 years ago
Andrew Kryczka	d904233d2f	Limit buffering for collecting samples for compression dictionary (#7970 ) Summary: For dictionary compression, we need to collect some representative samples of the data to be compressed, which we use to either generate or train (when `CompressionOptions::zstd_max_train_bytes > 0`) a dictionary. Previously, the strategy was to buffer all the data blocks during flush, and up to the target file size during compaction. That strategy allowed us to randomly pick samples from as wide a range as possible that'd be guaranteed to land in a single output file. However, some users try to make huge files in memory-constrained environments, where this strategy can cause OOM. This PR introduces an option, `CompressionOptions::max_dict_buffer_bytes`, that limits how much data blocks are buffered before we switch to unbuffered mode (which means creating the per-SST dictionary, writing out the buffered data, and compressing/writing new blocks as soon as they are built). It is not strict as we currently buffer more than just data blocks -- also keys are buffered. But it does make a step towards giving users predictable memory usage. Related changes include: - Changed sampling for dictionary compression to select unique data blocks when there is limited availability of data blocks - Made use of `BlockBuilder::SwapAndReset()` to save an allocation+memcpy when buffering data blocks for building a dictionary - Changed `ParseBoolean()` to accept an input containing characters after the boolean. This is necessary since, with this PR, a value for `CompressionOptions::enabled` is no longer necessarily the final component in the `CompressionOptions` string. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7970 Test Plan: - updated `CompressionOptions` unit tests to verify limit is respected (to the extent expected in the current implementation) in various scenarios of flush/compaction to bottommost/non-bottommost level - looked at jemalloc heap profiles right before and after switching to unbuffered mode during flush/compaction. Verified memory usage in buffering is proportional to the limit set. Reviewed By: pdillinger Differential Revision: D26467994 Pulled By: ajkr fbshipit-source-id: 3da4ef9fba59974e4ef40e40c01611002c861465	4 years ago
mrambacher	0a9a05ae12	Make builds reproducible (#7866 ) Summary: Closes https://github.com/facebook/rocksdb/issues/7035 Changed how build_version.cc was generated: - Included the GIT tag/branch in the build_version file - Changed the "Build Date" to be: - If the GIT branch is "clean" (no changes), the date of the last git commit - If the branch is not clean, the current date - Added APIs to access the "build information", rather than accessing the strings directly. The build_version.cc file is now regenerated whenever the library objects are rebuilt. Verified that the built files remain the same size across builds on a "clean build" and the same information is reported by sst_dump --version Pull Request resolved: https://github.com/facebook/rocksdb/pull/7866 Reviewed By: pdillinger Differential Revision: D26086565 Pulled By: mrambacher fbshipit-source-id: 6fcbe47f6033989d5cf26a0ccb6dfdd9dd239d7f	4 years ago
Ramkumar Vadivelu	9a690a74e1	In ParseInternalKey(), include corrupt key info in Status (#7515 ) Summary: Fixes Issue https://github.com/facebook/rocksdb/issues/7497 When allow_data_in_errors db_options is set, log error key details in `ParseInternalKey()` Have fixed most of the calls. Have few TODOs still pending - because have to make more deeper changes to pass in the allow_data_in_errors flag. Will do those in a separate PR later. Tests: - make check - some of the existing tests that exercise the "internal key too small" condition are: dbformat_test, cuckoo_table_builder_test - some of the existing tests that exercise the corrupted key path are: corruption_test, merge_helper_test, compaction_iterator_test Example of new status returns: - Key too small - `Corrupted Key: Internal Key too small. Size=5` - Corrupt key with allow_data_in_errors option set to false: `Corrupted Key: '<redacted>' seq:3, type:3` - Corrupt key with allow_data_in_errors option set to true: `Corrupted Key: '61' seq:3, type:3` Pull Request resolved: https://github.com/facebook/rocksdb/pull/7515 Reviewed By: ajkr Differential Revision: D24240264 Pulled By: ramvadiv fbshipit-source-id: bc48f5d4475ac19d7713e16df37505b31aac42e7	4 years ago
Ramkumar Vadivelu	e04a50923d	Change ParseInternalKey() to return Status instead of bool (#7457 ) Summary: Fixes https://github.com/facebook/rocksdb/issues/7430 Change ParseInternalKey() to return Status instead of bool. db_bench (seekrandom) based before/after results with value size of 100 bytes and 16 bytes can be found at (tests ran on an udb server): https://www.dropbox.com/s/47bwamdy5ozngph/PIK_ret_Status_results.xlsx?dl=0 ![db_bench_results](https://user-images.githubusercontent.com/62277872/94642825-2a21a800-029a-11eb-88f2-124136c83fd3.png) Pull Request resolved: https://github.com/facebook/rocksdb/pull/7457 Reviewed By: ajkr Differential Revision: D24002433 Pulled By: ramvadiv fbshipit-source-id: ac253ecf577a29044c47c3fe254a01e71404c44c	4 years ago
Jay Zhuang	27aa443a15	Add sst_file_dumper status check (#7315 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/7315 Test Plan: `ASSERT_STATUS_CHECKED=1 make sst_dump_test && ./sst_dump_test` And manually run `./sst_dump --file=*.sst` before and after the change. Reviewed By: pdillinger Differential Revision: D23361669 Pulled By: jay-zhuang fbshipit-source-id: 5bf51a2a90ee35c8c679e5f604732ec2aef5949a	4 years ago
Andrew Kryczka	af54c4092a	fix SstFileWriter with dictionary compression (#7323 ) Summary: In block-based table builder, the cut-over from buffered to unbuffered mode involves sampling the buffered blocks and generating a dictionary. There was a bug where `SstFileWriter` passed zero as the `target_file_size` causing the cutover to happen immediately, so there were no samples available for generating the dictionary. This PR changes the meaning of `target_file_size == 0` to mean buffer the whole file before cutting over. It also adds dictionary compression support to `sst_dump --command=recompress` for easy evaluation. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7323 Reviewed By: cheng-chang Differential Revision: D23412158 Pulled By: ajkr fbshipit-source-id: 3b232050e70ef3c2ee85a4b5f6fadb139c569873	4 years ago
Zitan Chen	be41c61f22	Add a new option for BackupEngine to store table files under shared_checksum using DB session id in the backup filenames (#6997 ) Summary: `BackupableDBOptions::new_naming_for_backup_files` is added. This option is false by default. When it is true, backup table filenames under directory shared_checksum are of the form `<file_number>_<crc32c>_<db_session_id>.sst`. Note that when this option is true, it comes into effect only when both `share_files_with_checksum` and `share_table_files` are true. Three new test cases are added. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6997 Test Plan: Passed make check. Reviewed By: ajkr Differential Revision: D22098895 Pulled By: gg814 fbshipit-source-id: a1d9145e7fe562d71cde7ac995e17cb24fd42e76	4 years ago
Peter Dillinger	edf74d1cb1	Add --version and --help to ldb and sst_dump (#6951 ) Summary: as title Pull Request resolved: https://github.com/facebook/rocksdb/pull/6951 Test Plan: tests included + manual Reviewed By: ajkr Differential Revision: D21918540 Pulled By: pdillinger fbshipit-source-id: 79d4991f2a831214fc7e477a839ec19dbbace6c5	4 years ago
Zitan Chen	119b26fac0	Implement a new subcommand "identify" for sst_dump (#6943 ) Summary: Implemented a subcommand of sst_dump called identify, which determines whether a file is an SST file or identifies and lists all the SST files in a directory; This update also fixes the problem that sst_dump exits with a success state even if target file/directory does not exist/is not an SST file/is empty/is corrupted. One test is added to sst_dump_test. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6943 Test Plan: Passed make check and a few manual tests Reviewed By: pdillinger Differential Revision: D21928985 Pulled By: gg814 fbshipit-source-id: 9a8b48e0cf1a0e96b13f42b690aba8ad981afad3	4 years ago
mrambacher	e85cbdb4e8	Fix two core dumps when files are missing (#6922 ) Summary: The LDB create and drop column family commands failed to check if theere was a valid database prior to dereferencing it, leading to a core dump. The SstFileDumper prefetch code would dereference a file when the file did not exist as part of the Prefetch code. This dereference was moved inside an st.ok() check. Tests were added for both failure conditions. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6922 Reviewed By: gg814 Differential Revision: D21884024 Pulled By: pdillinger fbshipit-source-id: bddd45c299aa9dc7e928c17a37a96521f8c9149e	4 years ago
sdong	4a4b8a1344	sst_dump to reduce number of file reads (#6836 ) Summary: sst_dump can issue many file reads from the file system. This doesn't work well with file systems without a OS cache, especially remote file systems. In order to mitigate this problem, several improvements are done: 1. --readahead_size is added, so that users can specify readahead size when scanning the data. 2. Force a 512KB tail readahead, which prevents three I/Os for footer, meta index and property blocks and hopefully index and filter blocks too. 3. Consoldiate SSTDump's I/Os before opening the file for read. Use the same file prefetch buffer. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6836 Test Plan: Add a test that covers this new feature. Reviewed By: pdillinger Differential Revision: D21516607 fbshipit-source-id: 3ae43526286f67b2f4a5bdedfbc92719d579b87e	5 years ago
Akanksha Mahajan	75b13ea94a	Allow sst_dump to check size of different compression levels and report time (#6634 ) Summary: Summary : 1. Add two arguments --compression_level_from and --compression_level_to to check the compression size with different compression level in the given range. Users must specify one compression type else it will error out. Both from and to levels must also be specified together. 2. Display the time taken to compress each file with different compressions by default. Test Plan : make -j64 check Pull Request resolved: https://github.com/facebook/rocksdb/pull/6634 Test Plan: make -j64 check Reviewed By: anand1976 Differential Revision: D20810282 Pulled By: akankshamahajan15 fbshipit-source-id: ac9098d3c079a1fad098f6678dbedb4d888a791b	5 years ago
Mike Kolupaev	e45673dece	Properly report IO errors when IndexType::kBinarySearchWithFirstKey is used (#6621 ) Summary: Context: Index type `kBinarySearchWithFirstKey` added the ability for sst file iterator to sometimes report a key from index without reading the corresponding data block. This is useful when sst blocks are cut at some meaningful boundaries (e.g. one block per key prefix), and many seeks land between blocks (e.g. for each prefix, the ranges of keys in different sst files are nearly disjoint, so a typical seek needs to read a data block from only one file even if all files have the prefix). But this added a new error condition, which rocksdb code was really not equipped to deal with: `InternalIterator::value()` may fail with an IO error or Status::Incomplete, but it's just a method returning a Slice, with no way to report error instead. Before this PR, this type of error wasn't handled at all (an empty slice was returned), and kBinarySearchWithFirstKey implementation was considered a prototype. Now that we (LogDevice) have experimented with kBinarySearchWithFirstKey for a while and confirmed that it's really useful, this PR is adding the missing error handling. It's a pretty inconvenient situation implementation-wise. The error needs to be reported from InternalIterator when trying to access value. But there are ~700 call sites of `InternalIterator::value()`, most of which either can't hit the error condition (because the iterator is reading from memtable or from index or something) or wouldn't benefit from the deferred loading of the value (e.g. compaction iterator that reads all values anyway). Adding error handling to all these call sites would needlessly bloat the code. So instead I made the deferred value loading optional: only the call sites that may use deferred loading have to call the new method `PrepareValue()` before calling `value()`. The feature is enabled with a new bool argument `allow_unprepared_value` to a bunch of methods that create iterators (it wouldn't make sense to put it in ReadOptions because it's completely internal to iterators, with virtually no user-visible effect). Lmk if you have better ideas. Note that the deferred value loading only happens for internal iterators. The user-visible iterator (DBIter) always prepares the value before returning from Seek/Next/etc. We could go further and add an API to defer that value loading too, but that's most likely not useful for LogDevice, so it doesn't seem worth the complexity for now. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6621 Test Plan: make -j5 check . Will also deploy to some logdevice test clusters and look at stats. Reviewed By: siying Differential Revision: D20786930 Pulled By: al13n321 fbshipit-source-id: 6da77d918bad3780522e918f17f4d5513d3e99ee	5 years ago
Connor1996	c8c739a877	Fix sst_dump not able to open ingested file (#6673 ) Summary: When investigating https://github.com/facebook/rocksdb/issues/6666, we encounter an error for sst_dump to dump an ingested SST file with global seqno. ``` Corruption: An external sst file with version 2 have global seqno property with value ��/, while largest seqno in the file is 0） ``` Same as https://github.com/facebook/rocksdb/pull/5097, it is due to SstFileReader don't know the largest seqno of a file, it will fail this check when it open a file with global seqno. `ca89ac2ba9/table/block_based_table_reader.cc (L730)` Pull Request resolved: https://github.com/facebook/rocksdb/pull/6673 Test Plan: run it manually Reviewed By: cheng-chang Differential Revision: D20937546 Pulled By: ajkr fbshipit-source-id: c3fd04d60916a738533ee1885f3ea844669a9479	5 years ago
Levi Tamasi	c15e85bdcb	Move BlobDB related files under db/ to db/blob/ (#6519 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6519 Test Plan: ``` make all make check ``` Differential Revision: D20400691 Pulled By: ltamasi fbshipit-source-id: 20ef911cf1c2c92c7f71ef0b493f9be64f2eef94	5 years ago
sdong	fdf882ded2	Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433 ) Summary: When dynamically linking two binaries together, different builds of RocksDB from two sources might cause errors. To provide a tool for user to solve the problem, the RocksDB namespace is changed to a flag which can be overridden in build time. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6433 Test Plan: Build release, all and jtest. Try to build with ROCKSDB_NAMESPACE with another flag. Differential Revision: D19977691 fbshipit-source-id: aa7f2d0972e1c31d75339ac48478f34f6cfcfb3e	5 years ago
anand76	afa2420c2b	Introduce a new storage specific Env API (#5761 ) Summary: The current Env API encompasses both storage/file operations, as well as OS related operations. Most of the APIs return a Status, which does not have enough metadata about an error, such as whether its retry-able or not, scope (i.e fault domain) of the error etc., that may be required in order to properly handle a storage error. The file APIs also do not provide enough control over the IO SLA, such as timeout, prioritization, hinting about placement and redundancy etc. This PR separates out the file/storage APIs from Env into a new FileSystem class. The APIs are updated to return an IOStatus with metadata about the error, as well as to take an IOOptions structure as input in order to allow more control over the IO. The user can set both ```options.env``` and ```options.file_system``` to specify that RocksDB should use the former for OS related operations and the latter for storage operations. Internally, a ```CompositeEnvWrapper``` has been introduced that inherits from ```Env``` and redirects individual methods to either an ```Env``` implementation or the ```FileSystem``` as appropriate. When options are sanitized during ```DB::Open```, ```options.env``` is replaced with a newly allocated ```CompositeEnvWrapper``` instance if both env and file_system have been specified. This way, the rest of the RocksDB code can continue to function as before. This PR also ports PosixEnv to the new API by splitting it into two - PosixEnv and PosixFileSystem. PosixEnv is defined as a sub-class of CompositeEnvWrapper, and threading/time functions are overridden with Posix specific implementations in order to avoid an extra level of indirection. The ```CompositeEnvWrapper``` translates ```IOStatus``` return code to ```Status```, and sets the severity to ```kSoftError``` if the io_status is retryable. The error handling code in RocksDB can then recover the DB automatically. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5761 Differential Revision: D18868376 Pulled By: anand1976 fbshipit-source-id: 39efe18a162ea746fabac6360ff529baba48486f	5 years ago
Peter Dillinger	fe464bca5c	Fix PlainTableReader not to crash sst_dump (#5940 ) Summary: Plain table SSTs could crash sst_dump because of a bug in PlainTableReader that can leave table_properties_ as null. Even if it was intended not to keep the table properties in some cases, they were leaked on the offending code path. Steps to reproduce: $ db_bench --benchmarks=fillrandom --num=2000000 --use_plain_table --prefix-size=12 $ sst_dump --file=0000xx.sst --show_properties from [] to [] Process /dev/shm/dbbench/000014.sst Sst file format: plain table Raw user collected properties ------------------------------ Segmentation fault (core dumped) Also added missing unit testing of plain table full_scan_mode, and an assertion in NewIterator to check for regression. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5940 Test Plan: new unit test, manual, make check Differential Revision: D18018145 Pulled By: pdillinger fbshipit-source-id: 4310c755e824c4cd6f3f86a3abc20dfa417c5e07	5 years ago
Levi Tamasi	fdc1cb43a6	Support decoding blob indexes in sst_dump (#5926 ) Summary: The patch adds a new command line parameter --decode_blob_index to sst_dump. If this switch is specified, sst_dump prints blob indexes in a human readable format, printing the blob file number, offset, size, and expiration (if applicable) for blob references, and the blob value (and expiration) for inlined blobs. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5926 Test Plan: Used db_bench's BlobDB mode to generate SST files containing blob references with and without expiration, as well as inlined blobs with and without expiration (note: the latter are stored as plain values), and confirmed sst_dump correctly prints all four types of records. Differential Revision: D17939077 Pulled By: ltamasi fbshipit-source-id: edc5f58fee94ba35f6699c6a042d5758f5b3963d	5 years ago
Yanqin Jin	167cdc9f17	Support custom env in sst_dump (#5845 ) Summary: This PR allows for the creation of custom env when using sst_dump. If the user does not set options.env or set options.env to nullptr, then sst_dump will automatically try to create a custom env depending on the path to the sst file or db directory. In order to use this feature, the user must call ObjectRegistry::Register() beforehand. Test Plan (on devserver): ``` $make all && make check ``` All tests must pass to ensure this change does not break anything. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5845 Differential Revision: D17678038 Pulled By: riversand963 fbshipit-source-id: 58ecb4b3f75246d52b07c4c924a63ee61c1ee626	5 years ago
sdong	c06b54d0c6	Apply formatter on recent 45 commits. (#5827 ) Summary: Some recent commits might not have passed through the formatter. I formatted recent 45 commits. The script hangs for more commits so I stopped there. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5827 Test Plan: Run all existing tests. Differential Revision: D17483727 fbshipit-source-id: af23113ee63015d8a43d89a3bc2c1056189afe8f	5 years ago
Peter (Stig) Edwards	2ed91622fb	sst_dump recompress show #blocks compressed and not compressed (#5791 ) Summary: Closes https://github.com/facebook/rocksdb/issues/1474 Helps show when the 12.5% threshold for GoodCompressionRatio (originally from ldb) is hit. Example output: ``` > ./sst_dump --file=/tmp/test.sst --command=recompress from [] to [] Process /tmp/test.sst Sst file format: block-based Block Size: 16384 Compression: kNoCompression Size: 122579836 Blocks: 2300 Compressed: 0 ( 0.0%) Not compressed (ratio): 2300 (100.0%) Not compressed (abort): 0 ( 0.0%) Compression: kSnappyCompression Size: 46289962 Blocks: 2300 Compressed: 2119 ( 92.1%) Not compressed (ratio): 181 ( 7.9%) Not compressed (abort): 0 ( 0.0%) Compression: kZlibCompression Size: 29689825 Blocks: 2300 Compressed: 2301 (100.0%) Not compressed (ratio): 0 ( 0.0%) Not compressed (abort): 0 ( 0.0%) Unsupported compression type: kBZip2Compression. Compression: kLZ4Compression Size: 44785490 Blocks: 2300 Compressed: 1950 ( 84.8%) Not compressed (ratio): 350 ( 15.2%) Not compressed (abort): 0 ( 0.0%) Compression: kLZ4HCCompression Size: 37498895 Blocks: 2300 Compressed: 2301 (100.0%) Not compressed (ratio): 0 ( 0.0%) Not compressed (abort): 0 ( 0.0%) Unsupported compression type: kXpressCompression. Compression: kZSTD Size: 32208707 Blocks: 2300 Compressed: 2301 (100.0%) Not compressed (ratio): 0 ( 0.0%) Not compressed (abort): 0 ( 0.0%) ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/5791 Differential Revision: D17347870 fbshipit-source-id: af10849c010b46b20e54162b70123c2805ffe526	5 years ago
sdong	e1c468d16f	Do readahead in VerifyChecksum() (#5713 ) Summary: Right now VerifyChecksum() doesn't do read-ahead. In some use cases, users won't be able to achieve good performance. With this change, by default, RocksDB will do a default readahead, and users will be able to overwrite the readahead size by passing in a ReadOptions. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5713 Test Plan: Add a new unit test. Differential Revision: D16860874 fbshipit-source-id: 0cff0fe79ac855d3d068e6ccd770770854a68413	5 years ago
Levi Tamasi	3bde41b5a3	Move the filter readers out of the block cache (#5504 ) Summary: Currently, when the block cache is used for the filter block, it is not really the block itself that is stored in the cache but a FilterBlockReader object. Since this object is not pure data (it has, for instance, pointers that might dangle, including in one case a back pointer to the TableReader), it's not really sharable. To avoid the issues around this, the current code erases the cache entries when the TableReader is closed (which, BTW, is not sufficient since a concurrent TableReader might have picked up the object in the meantime). Instead of doing this, the patch moves the FilterBlockReader out of the cache altogether, and decouples the filter reader object from the filter block. In particular, instead of the TableReader owning, or caching/pinning the FilterBlockReader (based on the customer's settings), with the change the TableReader unconditionally owns the FilterBlockReader, which in turn owns/caches/pins the filter block. This change also enables us to reuse the code paths historically used for data blocks for filters as well. Note: Eviction statistics for filter blocks are temporarily broken. We plan to fix this in a separate phase. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5504 Test Plan: make asan_check Differential Revision: D16036974 Pulled By: ltamasi fbshipit-source-id: 770f543c5fb4ed126fd1e04bfd3809cf4ff9c091	5 years ago
haoyuhuang	705b8eecb4	Add more callers for table reader. (#5454 ) Summary: This PR adds more callers for table readers. These information are only used for block cache analysis so that we can know which caller accesses a block. 1. It renames the BlockCacheLookupCaller to TableReaderCaller as passing the caller from upstream requires changes to table_reader.h and TableReaderCaller is a more appropriate name. 2. It adds more table reader callers in table/table_reader_caller.h, e.g., kCompactionRefill, kExternalSSTIngestion, and kBuildTable. This PR is long as it requires modification of interfaces in table_reader.h, e.g., NewIterator. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5454 Test Plan: make clean && COMPILE_WITH_ASAN=1 make check -j32. Differential Revision: D15819451 Pulled By: HaoyuHuang fbshipit-source-id: b6caa704c8fb96ddd15b9a934b7e7ea87f88092d	5 years ago
Zhongyi Xie	d68f9f4580	simplify include directive involving inttypes (#5402 ) Summary: When using `PRIu64` type of printf specifier, current code base does the following: ``` #ifndef __STDC_FORMAT_MACROS #define __STDC_FORMAT_MACROS #endif #include <inttypes.h> ``` However, this can be simplified to ``` #include <cinttypes> ``` as long as flag `-std=c++11` is used. This should solve issues like https://github.com/facebook/rocksdb/issues/5159 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5402 Differential Revision: D15701195 Pulled By: miasantreble fbshipit-source-id: 6dac0a05f52aadb55e9728038599d3d2e4b59d03	6 years ago
Vijay Nadimpalli	50e470791d	Organizing rocksdb/table directory by format Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5373 Differential Revision: D15559425 Pulled By: vjnadimpalli fbshipit-source-id: 5d6d6d615582bedd96a4b879bb25d429a6de8b55	6 years ago
Shobhit Dayal	b45b1cde3e	Feature for sampling and reporting compressibility (#4842 ) Summary: This is a feature to sample data-block compressibility and and report them as stats. 1 in N (tunable) blocks is sampled for compressibility using two algorithms: 1. lz4 or snappy for fast compression 2. zstd or zlib for slow but higher compression. The stats are reported to the caller as raw-bytes and compressed-bytes. The block continues to be compressed for storage using the specified CompressionType. The db_bench_tool how has a command line option for specifying the sampling rate. It's default value is 0 (no sampling). To test the overhead for a certain value, users can compare the performance of db_bench_tool, varying the sampling rate. It is unlikely to have a noticeable impact for high values like 20. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4842 Differential Revision: D13629011 Pulled By: shobhitdayal fbshipit-source-id: 14ca668bcab6499b2a1734edf848eb62a4f4fafa	6 years ago
Andrew Kryczka	62f70f6d14	Reduce scope of compression dictionary to single SST (#4952 ) Summary: Our previous approach was to train one compression dictionary per compaction, using the first output SST to train a dictionary, and then applying it on subsequent SSTs in the same compaction. While this was great for minimizing CPU/memory/I/O overhead, it did not achieve good compression ratios in practice. In our most promising potential use case, moderate reductions in a dictionary's scope make a major difference on compression ratio. So, this PR changes compression dictionary to be scoped per-SST. It accepts the tradeoff during table building to use more memory and CPU. Important changes include: - The `BlockBasedTableBuilder` has a new state when dictionary compression is in-use: `kBuffered`. In that state it accumulates uncompressed data in-memory whenever `Add` is called. - After accumulating target file size bytes or calling `BlockBasedTableBuilder::Finish`, a `BlockBasedTableBuilder` moves to the `kUnbuffered` state. The transition (`EnterUnbuffered()`) involves sampling the buffered data, training a dictionary, and compressing/writing out all buffered data. In the `kUnbuffered` state, a `BlockBasedTableBuilder` behaves the same as before -- blocks are compressed/written out as soon as they fill up. - Samples are now whole uncompressed data blocks, except the final sample may be a partial data block so we don't breach the user's configured `max_dict_bytes` or `zstd_max_train_bytes`. The dictionary trainer is supposed to work better when we pass it real units of compression. Previously we were passing 64-byte KV samples which was not realistic. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4952 Differential Revision: D13967980 Pulled By: ajkr fbshipit-source-id: 82bea6f7537e1529c7a1a4cdee84585f5949300f	6 years ago
Huachao Huang	74f7d7551e	tools: use provided options instead of the default (#4839 ) Summary: The current implementation hardcode the default options in different places, which makes it impossible to support other environments (like encrypted environment). Pull Request resolved: https://github.com/facebook/rocksdb/pull/4839 Differential Revision: D13573578 Pulled By: sagar0 fbshipit-source-id: 76b58b4b758902798d10ff2f52d9f39abff015e7	6 years ago
DorianZheng	2670fe8c73	Get `CompactionJobInfo` from CompactFiles Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4716 Differential Revision: D13207677 Pulled By: ajkr fbshipit-source-id: d0ccf5a66df6cbb07288b0c5ebad81fd9df3926b	6 years ago
Huachao Huang	5e72bc113a	Add SstFileReader to read sst files (#4717 ) Summary: A user friendly sst file reader is useful when we want to access sst files outside of RocksDB. For example, we can generate an sst file with SstFileWriter and send it to other places, then use SstFileReader to read the file and process the entries in other ways. Also rename the original SstFileReader to SstFileDumper because of name conflict, and seems SstFileDumper is more appropriate for tools. TODO: there is only a very simple test now, because I want to get some feedback first. If the changes look good, I will add more tests soon. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4717 Differential Revision: D13212686 Pulled By: ajkr fbshipit-source-id: 737593383264c954b79e63edaf44aaae0d947e56	6 years ago
Sagar Vemuri	dc3528077a	Update all unique/shared_ptr instances to be qualified with namespace std (#4638 ) Summary: Ran the following commands to recursively change all the files under RocksDB: ``` find . -type f -name ".cc" -exec sed -i 's/ unique_ptr/ std::unique_ptr/g' {} + find . -type f -name ".cc" -exec sed -i 's/<unique_ptr/<std::unique_ptr/g' {} + find . -type f -name ".cc" -exec sed -i 's/ shared_ptr/ std::shared_ptr/g' {} + find . -type f -name ".cc" -exec sed -i 's/<shared_ptr/<std::shared_ptr/g' {} + ``` Running `make format` updated some formatting on the files touched. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4638 Differential Revision: D12934992 Pulled By: sagar0 fbshipit-source-id: 45a15d23c230cdd64c08f9c0243e5183934338a8	6 years ago
Abhishek Madan	eaaf1a6f05	Promote rocksdb.{deleted.keys,merge.operands} to main table properties (#4594 ) Summary: Since the number of range deletions are reported in TableProperties, it is confusing to not report the number of merge operands and point deletions as top-level properties; they are accessible through the public API, but since they are not the "main" properties, they do not appear in aggregated table properties, or the string representation of table properties. This change promotes those two property keys to `rocksdb/table_properties.h`, adds corresponding uint64 members for them, deprecates the old access methods `GetDeletedKeys()` and `GetMergeOperands()` (though they are still usable for now), and removes `InternalKeyPropertiesCollector`. The property key strings are the same as before this change, so this should be able to read DBs written from older versions (though I haven't tested this yet). Pull Request resolved: https://github.com/facebook/rocksdb/pull/4594 Differential Revision: D12826893 Pulled By: abhimadan fbshipit-source-id: 9e4e4fbdc5b0da161c89582566d184101ba8eb68	6 years ago
Yanqin Jin	bb5dcea98e	Add path to WritableFileWriter. (#4039 ) Summary: We want to sample the file I/O issued by RocksDB and report the function calls. This requires us to include the file paths otherwise it's hard to tell what has been going on. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4039 Differential Revision: D8670178 Pulled By: riversand963 fbshipit-source-id: 97ee806d1c583a2983e28e213ee764dc6ac28f7a	6 years ago
Maysam Yabandeh	235ab9dd32	Pin mmap files in ReadOnlyDB (#4053 ) Summary: https://github.com/facebook/rocksdb/pull/3881 fixed a bug where PinnableSlice pin mmap files which could be deleted with background compaction. This is however a non-issue for ReadOnlyDB when there is no compaction running and max_open_files is -1. This patch reenables the pinning feature for that case. Closes https://github.com/facebook/rocksdb/pull/4053 Differential Revision: D8662546 Pulled By: maysamyabandeh fbshipit-source-id: 402962602eb0f644e17822748332999c3af029fd	6 years ago
Zhongyi Xie	c3ebc75843	Move prefix_extractor to MutableCFOptions Summary: Currently it is not possible to change bloom filter config without restart the db, which is causing a lot of operational complexity for users. This PR aims to make it possible to dynamically change bloom filter config. Closes https://github.com/facebook/rocksdb/pull/3601 Differential Revision: D7253114 Pulled By: miasantreble fbshipit-source-id: f22595437d3e0b86c95918c484502de2ceca120c	7 years ago
Andrew Kryczka	5d68243e61	Comment out unused variables Summary: Submitting on behalf of another employee. Closes https://github.com/facebook/rocksdb/pull/3557 Differential Revision: D7146025 Pulled By: ajkr fbshipit-source-id: 495ca5db5beec3789e671e26f78170957704e77e	7 years ago
Igor Sugak	aba3409740	Back out "[codemod] - comment out unused parameters" Reviewed By: igorsugak fbshipit-source-id: 4a93675cc1931089ddd574cacdb15d228b1e5f37	7 years ago
David Lai	f4a030ce81	- comment out unused parameters Reviewed By: everiq, igorsugak Differential Revision: D7046710 fbshipit-source-id: 8e10b1f1e2aecebbfb229c742e214db887e5a461	7 years ago
Dmitri Smirnov	ebab2e2d42	Enable MSVC W4 with a few exceptions. Fix warnings and bugs Summary: Closes https://github.com/facebook/rocksdb/pull/3018 Differential Revision: D6079011 Pulled By: yiwu-arbug fbshipit-source-id: 988a721e7e7617967859dba71d660fc69f4dff57	7 years ago
Amy Xu	5785b1fcb8	Fix naming in InternalKey Summary: - Switched all instances of SetMinPossibleForUserKey and SetMaxPossibleForUserKey in accordance to InternalKeyComparator's comparison logic Closes https://github.com/facebook/rocksdb/pull/2868 Differential Revision: D5804152 Pulled By: axxufb fbshipit-source-id: 80be35e04f2e8abc35cc64abe1fecb03af24e183	7 years ago
Andrew Kryczka	8254e9b57c	make sst_dump compression size command consistent Summary: - like other subcommands, reporting compression sizes should be specified with the `--command` CLI arg. - also added `--compression_types` arg as it's useful to restrict the types of compression used, at least in my dictionary compression experiments. Closes https://github.com/facebook/rocksdb/pull/2706 Differential Revision: D5589520 Pulled By: ajkr fbshipit-source-id: 305bb4ebcc95eecc8a85523cd3b1050619c9ddc5	7 years ago
Siying Dong	666a005f9b	Support prefetch last 512KB with direct I/O in block based file reader Summary: Right now, if direct I/O is enabled, prefetching the last 512KB cannot be applied, except compaction inputs or readahead is enabled for iterators. This can create a lot of I/O for HDD cases. To solve the problem, the 512KB is prefetched in block based table if direct I/O is enabled. The prefetched buffer is passed in totegher with random access file reader, so that we try to read from the buffer before reading from the file. This can be extended in the future to support flexible user iterator readahead too. Closes https://github.com/facebook/rocksdb/pull/2708 Differential Revision: D5593091 Pulled By: siying fbshipit-source-id: ee36ff6d8af11c312a2622272b21957a7b5c81e7	7 years ago
Aaron G	7848f0b24c	add VerifyChecksum() to db.h Summary: We need a tool to check any sst file corruption in the db. It will check all the sst files in current version and read all the blocks (data, meta, index) with checksum verification. If any verification fails, the function will return non-OK status. Closes https://github.com/facebook/rocksdb/pull/2498 Differential Revision: D5324269 Pulled By: lightmark fbshipit-source-id: 6f8a272008b722402a772acfc804524c9d1a483b	7 years ago

1 2

77 Commits (e66e6d2faaff38e6338497268041c6957716faf9)