From b085ee13e033647f928c6833b3b2d4cee9c56430 Mon Sep 17 00:00:00 2001 From: xinyuliu Date: Tue, 23 Feb 2021 21:42:55 -0800 Subject: [PATCH] Append all characters not captured by xsputn() in overflow() function (#7991) Summary: In the adapter class `WritableFileStringStreamAdapter`, which wraps WritableFile to be used for std::ostream, previouly only `std::endl` is considered a special case because `endl` is written by `os.put()` directly without going through `xsputn()`. `os.put()` will call `sputc()` and if we further check the internal implementation of `sputc()`, we will see it is ``` int_type __CLR_OR_THIS_CALL sputc(_Elem _Ch) { // put a character return 0 < _Pnavail() ? _Traits::to_int_type(*_Pninc() = _Ch) : overflow(_Traits::to_int_type(_Ch)); ``` As we explicitly disabled buffering, _Pnavail() is always 0. Thus every write, not captured by xsputn, becomes an overflow. When I run tests on Windows, I found not only `std::endl` will drop into this case, writing an unsigned long long will also call `os.put()` then followed by `sputc()` and eventually call `overflow()`. Therefore, instead of only checking `std::endl`, we should try to append other characters as well unless the appending operation fails. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7991 Reviewed By: jay-zhuang Differential Revision: D26615692 Pulled By: ajkr fbshipit-source-id: 4c0003de1645b9531545b23df69b000e07014468 --- HISTORY.md | 3 +++ table/block_based/block_based_table_reader.h | 20 ++++++++++++++------ 2 files changed, 17 insertions(+), 6 deletions(-) diff --git a/HISTORY.md b/HISTORY.md index 5e535f7c4..9a53d581e 100644 --- a/HISTORY.md +++ b/HISTORY.md @@ -1,5 +1,8 @@ # Rocksdb Change Log ## Unreleased +### Bug Fixes +* Fixed the truncation error found in APIs/tools when dumping block-based SST files in a human-readable format. After fix, the block-based table can be fully dumped as a readable file. + ### Public API change * Add a new option BlockBasedTableOptions::max_auto_readahead_size. RocksDB does auto-readahead for iterators on noticing more than two reads for a table file if user doesn't provide readahead_size. The readahead starts at 8KB and doubles on every additional read upto max_auto_readahead_size and now max_auto_readahead_size can be configured dynamically as well. Found that 256 KB readahead size provides the best performance, based on experiments, for auto readahead. Experiment data is in PR #3282. If value is set 0 then no automatic prefetching will be done by rocksdb. Also changing the value will only affect files opened after the change. diff --git a/table/block_based/block_based_table_reader.h b/table/block_based/block_based_table_reader.h index 031206761..1cb978e0d 100644 --- a/table/block_based/block_based_table_reader.h +++ b/table/block_based/block_based_table_reader.h @@ -656,13 +656,21 @@ class WritableFileStringStreamAdapter : public std::stringbuf { explicit WritableFileStringStreamAdapter(WritableFile* writable_file) : file_(writable_file) {} - // This is to handle `std::endl`, `endl` is written by `os.put()` directly - // without going through `xsputn()`. As we explicitly disabled buffering, - // every write, not captured by xsputn, is an overflow. + // Override overflow() to handle `sputc()`. There are cases that will not go + // through `xsputn()` e.g. `std::endl` or an unsigned long long is written by + // `os.put()` directly and will call `sputc()` By internal implementation: + // int_type __CLR_OR_THIS_CALL sputc(_Elem _Ch) { // put a character + // return 0 < _Pnavail() ? _Traits::to_int_type(*_Pninc() = _Ch) : + // overflow(_Traits::to_int_type(_Ch)); + // } + // As we explicitly disabled buffering (_Pnavail() is always 0), every write, + // not captured by xsputn(), becomes an overflow here. int overflow(int ch = EOF) override { - if (ch == '\n') { - file_->Append("\n"); - return ch; + if (ch != EOF) { + Status s = file_->Append(Slice((char*)&ch, 1)); + if (s.ok()) { + return ch; + } } return EOF; }