Clarifying comments for Read() APIs (#8029)

Summary: I recently discovered the confusing, undocumented semantics of Read() functions in the FileSystem and Env APIs. I have added clarification to the best of my reverse-engineered understanding, and made a note in HISTORY.md for implementors to check their implementations, as a subtly non-adherent implementation could lead to RocksDB quietly ignoring some portion of a file. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8029 Test Plan: no code changes Reviewed By: anand1976 Differential Revision: D26831698 Pulled By: pdillinger fbshipit-source-id: 208f97ff6037bc13bb2ef360b987c2640c79bd03
5 years ago · ce391ff84b
parent cb25bc1128
commit ce391ff84b
3 changed files with 31 additions and 2 deletions
--- a/HISTORY.md
+++ b/HISTORY.md
@ -8,6 +8,7 @@
 * Add suppport to extend DB::VerifyFileChecksums API to also verify blob files checksum.
 * When using the new BlobDB, the amount of data written by flushes/compactions is now broken down into table files and blob files in the compaction statistics; namely, Write(GB) denotes the amount of data written to table files, while Wblob(GB) means the amount of data written to blob files.
 * Add new SetBufferSize API to WriteBufferManager to allow dynamic management of memory allotted to all write buffers.  This allows user code to adjust memory monitoring provided by WriteBufferManager as process memory needs change datasets grow and shrink.
 * Clarified the required semantics of Read() functions in FileSystem and Env APIs. Please ensure any custom implementations are compliant.
 * For the new integrated BlobDB implementation, compaction statistics now include the amount of data read from blob files during compaction (due to garbage collection or compaction filters). Write amplification metrics have also been extended to account for data read from blob files.
 ### New Features
--- a/include/rocksdb/env.h
+++ b/include/rocksdb/env.h
@ -619,6 +619,10 @@ class SequentialFile {
  // "scratch[0..n-1]" must be live when "*result" is used.
  // If an error was encountered, returns a non-OK status.
  //
  // After call, result->size() < n only if end of file has been
  // reached (or non-OK status). Read might fail if called again after
  // first result->size() < n.
  //
  // REQUIRES: External synchronization
  virtual Status Read(size_t n, Slice* result, char* scratch) = 0;
@ -664,7 +668,8 @@ struct ReadRequest {
  // File offset in bytes
  uint64_t offset;
-  // Length to read in bytes
+  // Length to read in bytes. `result` only returns fewer bytes if end of file
  // is hit (or `status` is not OK).
  size_t len;
  // A buffer that MultiRead()  can optionally place data in. It can
@ -693,6 +698,10 @@ class RandomAccessFile {
  // "*result" is used.  If an error was encountered, returns a non-OK
  // status.
  //
  // After call, result->size() < n only if end of file has been
  // reached (or non-OK status). Read might fail if called again after
  // first result->size() < n.
  //
  // Safe for concurrent use by multiple threads.
  // If Direct I/O enabled, offset, n, and scratch should be aligned properly.
  virtual Status Read(uint64_t offset, size_t n, Slice* result,
@ -977,6 +986,11 @@ class RandomRWFile {
  // Read up to `n` bytes starting from offset `offset` and store them in
  // result, provided `scratch` size should be at least `n`.
  //
  // After call, result->size() < n only if end of file has been
  // reached (or non-OK status). Read might fail if called again after
  // first result->size() < n.
  //
  // Returns Status::OK() on success.
  virtual Status Read(uint64_t offset, size_t n, Slice* result,
                      char* scratch) const = 0;
--- a/include/rocksdb/file_system.h
+++ b/include/rocksdb/file_system.h
@ -563,6 +563,10 @@ class FSSequentialFile {
  // "scratch[0..n-1]" must be live when "*result" is used.
  // If an error was encountered, returns a non-OK status.
  //
  // After call, result->size() < n only if end of file has been
  // reached (or non-OK status). Read might fail if called again after
  // first result->size() < n.
  //
  // REQUIRES: External synchronization
  virtual IOStatus Read(size_t n, const IOOptions& options, Slice* result,
                        char* scratch, IODebugContext* dbg) = 0;
@ -609,7 +613,8 @@ struct FSReadRequest {
  // File offset in bytes
  uint64_t offset;
-  // Length to read in bytes
+  // Length to read in bytes. `result` only returns fewer bytes if end of file
  // is hit (or `status` is not OK).
  size_t len;
  // A buffer that MultiRead()  can optionally place data in. It can
@ -639,6 +644,10 @@ class FSRandomAccessFile {
  // "*result" is used.  If an error was encountered, returns a non-OK
  // status.
  //
  // After call, result->size() < n only if end of file has been
  // reached (or non-OK status). Read might fail if called again after
  // first result->size() < n.
  //
  // Safe for concurrent use by multiple threads.
  // If Direct I/O enabled, offset, n, and scratch should be aligned properly.
  virtual IOStatus Read(uint64_t offset, size_t n, const IOOptions& options,
@ -975,6 +984,11 @@ class FSRandomRWFile {
  // Read up to `n` bytes starting from offset `offset` and store them in
  // result, provided `scratch` size should be at least `n`.
  //
  // After call, result->size() < n only if end of file has been
  // reached (or non-OK status). Read might fail if called again after
  // first result->size() < n.
  //
  // Returns Status::OK() on success.
  virtual IOStatus Read(uint64_t offset, size_t n, const IOOptions& options,
                        Slice* result, char* scratch,