Summary:
Add support to allow enabling / disabling user-defined timestamps feature for an existing column family in combination with the in-Memtable only feature.
To do this, this PR includes:
1) Log the `persist_user_defined_timestamps` option per column family in Manifest to facilitate detecting an attempt to enable / disable UDT. This entry is enforced to be logged in the same VersionEdit as the user comparator name entry.
2) User-defined timestamps related options are validated when re-opening a column family, including user comparator name and the `persist_user_defined_timestamps` flag. These type of settings and settings change are considered valid:
a) no user comparator change and no effective `persist_user_defined_timestamp` flag change.
b) switch user comparator to enable UDT provided the immediately effective `persist_user_defined_timestamps` flag
is false.
c) switch user comparator to disable UDT provided that the before-change `persist_user_defined_timestamps` is
already false.
3) when an attempt to enable UDT is detected, we mark all its existing SST files as "having no UDT" by marking its `FileMetaData.user_defined_timestamps_persisted` flag to false and handle their file boundaries `FileMetaData.smallest`, `FileMetaData.largest` by padding a min timestamp.
4) while enabling / disabling UDT feature, timestamp size inconsistency in existing WAL logs are handled to make it compatible with the running user comparator.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11623
Test Plan:
```
make all check
./db_with_timestamp_basic_test --gtest-filter="*EnableDisableUDT*"
./db_wal_test --gtest_filter="*EnableDisableUDT*"
```
Reviewed By: ltamasi
Differential Revision: D47636862
Pulled By: jowlyzhang
fbshipit-source-id: dcd19f67292da3c3cc9584c09ad00331c9ab9322
Summary:
1. Public API change: Replace `use_async_io` API in file_system with `SupportedOps` API which is used by underlying FileSystem to indicate to upper layers whether the FileSystem supports different operations introduced in `enum FSSupportedOps `. Right now operations are `async_io` and whether FS will provide its own buffer during reads or not. The api is changed to extend it to various FileSystem operations in one API rather than creating a separate API for each operation.
2. Provide support for underlying FS to pass their own buffer during Reads (async and sync read) instead of using RocksDB provided `scratch` (buffer) in `FSReadRequest`. Currently only MultiRead supports it and later will be extended to other reads as well (point lookup, scan etc). More details in how to enable in file_system.h
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11324
Test Plan: Tested locally
Reviewed By: anand1976
Differential Revision: D44465322
Pulled By: akankshamahajan15
fbshipit-source-id: 9ec9e08f839b5cc815e75d5dade6cd549998d0ec
Summary:
Start to record the value of the flag `AdvancedColumnFamilyOptions.persist_user_defined_timestamps` in the Manifest and table properties for a SST file when it is created. And use the recorded flag when creating a table reader for the SST file. This flag's default value is true, it is only explicitly recorded if it's false.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11515
Test Plan:
```
make all check
./version_edit_test
```
Reviewed By: ltamasi
Differential Revision: D46920386
Pulled By: jowlyzhang
fbshipit-source-id: 075c20363d3d2cc1368422ecc805617ed135cc26
Summary:
This ticker combined with `rocksdb.files.marked.trash` can help give a better picture of how DeleteScheduler is keeping up.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11540
Test Plan:
```
./delete_scheduler_test
```
Reviewed By: ajkr
Differential Revision: D46746401
Pulled By: jowlyzhang
fbshipit-source-id: f3daa622aa3ddefe7d673e0cc257a47699d506df
Summary:
See "unreleased_history/new_features/obsolete_sst_files_size.md" for description
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11533
Test Plan: updated unit test
Reviewed By: jowlyzhang
Differential Revision: D46703152
Pulled By: ajkr
fbshipit-source-id: ea5e31cd6293eccc154130c13e66b5271f57c102
Summary:
Start logging the timestamp size record in WAL and use the record during recovery. Currently, user comparator cannot be different from what was used to create a column family, so the timestamp size record is just used to confirm it's consistent with the timestamp size the running user comparator indicates.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11471
Test Plan:
```
make all check
./db_secondary_test
./db_wal_test --gtest_filter="*WithTimestamp*"
./repair_test --gtest_filter="*WithTimestamp*"
```
Reviewed By: ltamasi
Differential Revision: D46236769
Pulled By: jowlyzhang
fbshipit-source-id: f6c60b5c8defdb05021c63df302ccc0be1275ad0
Summary:
See new NOTE in HISTORY.md and unreleased_history/README.txt
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11481
Test Plan: some manual testing on my CentOS 8 system
Reviewed By: jaykorean
Differential Revision: D46233342
Pulled By: pdillinger
fbshipit-source-id: daf59cf3dc907f450b469090dcc481a30a7d7c0d