Summary:
This patch addresses a couple of minor TODOs for WritePrepared Txn such as double checking some assert statements at runtime as well, skip extra AddPrepared in non-2pc transactions, and safety check for infinite loops.
Closes https://github.com/facebook/rocksdb/pull/3302
Differential Revision: D6617002
Pulled By: maysamyabandeh
fbshipit-source-id: ef6673c139cb49f64c0879508d2f573b78609aca
Summary:
Previously on a blob db read, we are making a read of the blob value, and then make another read to get CRC checksum. I'm combining the two read into one.
readrandom db_bench with 1G database with base db size of 13M, value size 1k:
`./db_bench --db=/home/yiwu/tmp/db_bench --use_blob_db --value_size=1024 --num=1000000 --benchmarks=readrandom --use_existing_db --cache_size=32000000`
master: throughput 234MB/s, get micros p50 5.984 p95 9.998 p99 20.817 p100 787
this PR: throughput 261MB/s, get micros p50 5.157 p95 9.928 p99 20.724 p100 190
Closes https://github.com/facebook/rocksdb/pull/3301
Differential Revision: D6615950
Pulled By: yiwu-arbug
fbshipit-source-id: 052410c6d8539ec0cc305d53793bbc8f3616baa3
Summary:
DestroyDB that is used in tests loops over the files returned by ::GetChildren and delete them one by one. Such files might be already deleted in the file system (during DeleteObsoleteFileImpl for example) but will get actually deleted with a delay sometimes before ::DeleteFile is called on the file name. We have some test failures where FaultInjectionTestEnv::DeleteFile fails on assert(s.ok()) during DestroyDB. This patch removes the assert statement to fix that.
Closes https://github.com/facebook/rocksdb/pull/3324
Differential Revision: D6659545
Pulled By: maysamyabandeh
fbshipit-source-id: 4c9552fbcd494dcf3e61d475c11fc965c4388b2c
Summary:
We dump blob db options on blob db open, but it was removed by mistake in #3246. Adding it back.
Closes https://github.com/facebook/rocksdb/pull/3298
Differential Revision: D6607177
Pulled By: yiwu-arbug
fbshipit-source-id: 2a4aacbfa52fd8f1878dc9e1fbb95fe48faf80c0
Summary:
Previously, if blob_db_options.bytes_per_sync, there is a background job to call fsync() for every bytes_per_sync bytes written to a blob file. With the change we simply pass bytes_per_sync as env_options_ to blob files so that sync_file_range() will be used instead.
Closes https://github.com/facebook/rocksdb/pull/3297
Differential Revision: D6606994
Pulled By: yiwu-arbug
fbshipit-source-id: 452424be52e32ba92f5ea603b564e9b88929af47
Summary:
A proper implementation of Iterator::Refresh() for WritePreparedTxnDB would require release and acquire another snapshot. Since MyRocks don't make use of Iterator::Refresh(), we just simply mark it as not supported.
Closes https://github.com/facebook/rocksdb/pull/3290
Differential Revision: D6599931
Pulled By: yiwu-arbug
fbshipit-source-id: 4e1632d967316431424f6e458254ecf9a97567cf
Summary:
* Include `unistd.h` for `sleep(3)`
* Include `sys/time.h` for `gettimeofday(3)`
* Include `utils/random.h` for `Random64`
Error messages:
utilities/persistent_cache/hash_table_bench.cc: In constructor ‘rocksdb::HashTableBenchmark::HashTableBenchmark(rocksdb::HashTableImpl<long unsigned int, std::__cxx11::basic_string<char> >*, size_t, size_t, size_t, size_t)’:
utilities/persistent_cache/hash_table_bench.cc:76:28: error: ‘sleep’ was not declared in this scope
/* sleep override */ sleep(1);
^~~~~
utilities/persistent_cache/hash_table_bench.cc:76:28: note: suggested alternative: ‘strsep’
/* sleep override */ sleep(1);
^~~~~
strsep
utilities/persistent_cache/hash_table_bench.cc: In member function ‘void rocksdb::HashTableBenchmark::RunRead()’:
utilities/persistent_cache/hash_table_bench.cc:107:5: error: ‘Random64’ was not declared in this scope
Random64 rgen(time(nullptr));
^~~~~~~~
utilities/persistent_cache/hash_table_bench.cc:107:5: note: suggested alternative: ‘random_r’
Random64 rgen(time(nullptr));
^~~~~~~~
random_r
utilities/persistent_cache/hash_table_bench.cc:110:18: error: ‘rgen’ was not declared in this scope
size_t k = rgen.Next() % max_prepop_key;
^~~~
utilities/persistent_cache/hash_table_bench.cc: In static member function ‘static uint64_t rocksdb::HashTableBenchmark::NowInMillSec()’:
utilities/persistent_cache/hash_table_bench.cc:153:5: error: ‘gettimeofday’ was not declared in this scope
gettimeofday(&tv, /*tz=*/nullptr);
^~~~~~~~~~~~
make[2]: *** [CMakeFiles/hash_table_bench.dir/build.make:63: CMakeFiles/hash_table_bench.dir/utilities/persistent_cache/hash_table_bench.cc.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:3346: CMakeFiles/hash_table_bench.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
Closes https://github.com/facebook/rocksdb/pull/3283
Differential Revision: D6594850
Pulled By: ajkr
fbshipit-source-id: fd83957338c210cdfd253763347aafd39476824f
Summary:
Currently non-2pc writes do the 2nd dummy write to actually commit the transaction. This was necessary to ensure that publishing the commit sequence number will be done only from one queue (the queue that does not write to memtable). This is however not necessary when we have only one write queue, which is actually the setup that would be used by non-2pc writes. This patch eliminates the 2nd write when two_write_queues are disabled by updating the commit map in the 1st write.
Closes https://github.com/facebook/rocksdb/pull/3277
Differential Revision: D6575392
Pulled By: maysamyabandeh
fbshipit-source-id: 8ab458f7ca506905962f9166026b2ec81e749c46
Summary:
Previously we store sequence number range of each blob files, and use the sequence number range to check if the file can be possibly visible by a snapshot. But it adds complexity to the code, since the sequence number is only available after a write. (The current implementation get sequence number by calling GetLatestSequenceNumber(), which is wrong.) With the patch, we are not storing sequence number range, and check if snapshot_sequence < obsolete_sequence to decide if the file is visible by a snapshot (previously we check if first_sequence <= snapshot_sequence < obsolete_sequence).
Closes https://github.com/facebook/rocksdb/pull/3274
Differential Revision: D6571497
Pulled By: yiwu-arbug
fbshipit-source-id: ca06479dc1fcd8782f6525b62b7762cd47d61909
Summary:
- check most times after calling snprintf that the buffer didn't fill up. Previously we'd proceed and use `buf_size - len` as the length in subsequent calls, which underflowed as those are unsigned size_t.
- replace some memcpys with snprintf for consistency
Closes https://github.com/facebook/rocksdb/pull/3255
Differential Revision: D6541464
Pulled By: ajkr
fbshipit-source-id: 8610ea6a24f38e0a37c6d17bc65b7c712da6d932
Summary:
There were a few places where MSVC's implicit truncation warnings were getting triggered, which was causing the MSVC build to fail due to warnings being treated as errors. This resolves the issues by making the truncations in some places explicit, and by making it so there are no truncations of literals.
Fixes#3239
Supersedes #3259
Closes https://github.com/facebook/rocksdb/pull/3273
Reviewed By: yiwu-arbug
Differential Revision: D6569204
Pulled By: Orvid
fbshipit-source-id: c188cf1cf98d9acb6d94b71875041cc81f8ff088
Summary:
Refactor BlobDB open logic. List of changes:
Major:
* On reopen, mark blob files found as immutable, do not use them for writing new keys.
* Not to scan the whole file to find file footer. Instead just seek to the end of the file and try to read footer.
Minor:
* Move most of the real logic from blob_db.cc to blob_db_impl.cc.
* Not to hold shared_ptr of event listeners in global maps in blob_db.cc
* Some changes to BlobFile interface.
* Improve logging and error handling.
Closes https://github.com/facebook/rocksdb/pull/3246
Differential Revision: D6526147
Pulled By: yiwu-arbug
fbshipit-source-id: 9dc4cdd63359a2f9b696af817086949da8d06952
Summary:
I started adding gflags support for cmake on linux and got frustrated that I'd need to duplicate the build_detect_platform logic, which determines namespace based on attempting compilation. We can do it differently -- use the GFLAGS_NAMESPACE macro if available, and if not, that indicates it's an old gflags version without configurable namespace so we can simply hardcode "google".
Closes https://github.com/facebook/rocksdb/pull/3212
Differential Revision: D6456973
Pulled By: ajkr
fbshipit-source-id: 3e6d5bde3ca00d4496a120a7caf4687399f5d656
Summary:
Add PreReleaseCallback to be called at the end of WriteImpl but before publishing the sequence number. The callback is used in WritePrepareTxn to i) update the commit map, ii) update the last published sequence number in the 2nd write queue. It also ensures that all the commits will go to the 2nd queue.
These changes will ensure that the commit map is updated before the sequence number is published and used by reading snapshots. If we use two write queues, the snapshots will use the seq number published by the 2nd queue. If we use one write queue (the default, the snapshots will use the last seq number in the memtable, which also indicates the last published seq number.
Closes https://github.com/facebook/rocksdb/pull/3205
Differential Revision: D6438959
Pulled By: maysamyabandeh
fbshipit-source-id: f8b6c434e94bc5f5ab9cb696879d4c23e2577ab9
Summary:
1. Class BackupMeta
```
52 : timestamp_(0), size_(0), meta_filename_(meta_filename),
CID 1168103 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
2. uninit_member: Non-static class member sequence_number_ is not initialized in this constructor nor in any functions that it calls.
153 file_infos_(file_infos), env_(env) {}
```
2. class BackupEngineImpl
```
513 }
7. uninit_member: Non-static class member latest_backup_id_ is not initialized in this constructor nor in any functions that it calls.
CID 1322803 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
9. uninit_member: Non-static class member latest_valid_backup_id_ is not initialized in this constructor nor in any functions that it calls.
514}
```
3. struct BackupAfterCopyOrCreateWorkItem
```
368 struct BackupAfterCopyOrCreateWorkItem {
369 std::future<CopyOrCreateResult> result;
1. member_decl: Class member declaration for shared.
370 bool shared;
3. member_decl: Class member declaration for needed_to_copy.
371 bool needed_to_copy;
5. member_decl: Class member declaration for backup_env.
372 Env* backup_env;
373 std::string dst_path_tmp;
374 std::string dst_path;
375 std::string dst_relative;
2. uninit_member: Non-static class member shared is not initialized in this constructor nor in any functions that it calls.
4. uninit_member: Non-static class member needed_to_copy is not initialized in this constructor nor in any functions that it calls.
CID 1396122 (#1 of 1): Uninitialized pointer field (UNINIT_CTOR)
6. uninit_member: Non-static class member backup_env is not initialized in this constructor nor in any functions that it calls.
376 BackupAfterCopyOrCreateWorkItem() {}
```
4. struct CopyOrCreateWorkItem
```
318 struct CopyOrCreateWorkItem {
319 std::string src_path;
320 std::string dst_path;
321 std::string contents;
1. member_decl: Class member declaration for src_env.
322 Env* src_env;
3. member_decl: Class member declaration for dst_env.
323 Env* dst_env;
5. member_decl: Class member declaration for sync.
324 bool sync;
7. member_decl: Class member declaration for rate_limiter.
325 RateLimiter* rate_limiter;
9. member_decl: Class member declaration for size_limit.
326 uint64_t size_limit;
327 std::promise<CopyOrCreateResult> result;
328 std::function<void()> progress_callback;
329
2. uninit_member: Non-static class member src_env is not initialized in this constructor nor in any functions that it calls.
4. uninit_member: Non-static class member dst_env is not initialized in this constructor nor in any functions that it calls.
6. uninit_member: Non-static class member sync is not initialized in this constructor nor in any functions that it calls.
8. uninit_member: Non-static class member rate_limiter is not initialized in this constructor nor in any functions that it calls.
CID 1396123 (#1 of 1): Uninitialized pointer field (UNINIT_CTOR)
10. uninit_member: Non-static class member size_limit is not initialized in this constructor nor in any functions that it calls.
330 CopyOrCreateWorkItem() {}
```
5. struct RestoreAfterCopyOrCreateWorkItem
```
struct RestoreAfterCopyOrCreateWorkItem {
410 std::future<CopyOrCreateResult> result;
1. member_decl: Class member declaration for checksum_value.
411 uint32_t checksum_value;
CID 1396153 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
2. uninit_member: Non-static class member checksum_value is not initialized in this constructor nor in any functions that it calls.
412 RestoreAfterCopyOrCreateWorkItem() {}
```
Closes https://github.com/facebook/rocksdb/pull/3131
Differential Revision: D6428556
Pulled By: sagar0
fbshipit-source-id: a86675444543eff028e3cae6942197a143a112c4
Summary:
```
utilities/persistent_cache/block_cache_tier_file.cc
64struct CacheRecordHeader {
2. uninit_member: Non-static class member magic_ is not initialized in this constructor nor in any functions that it calls.
4. uninit_member: Non-static class member crc_ is not initialized in this constructor nor in any functions that it calls.
6. uninit_member: Non-static class member key_size_ is not initialized in this constructor nor in any functions that it calls.
CID 1396161 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
8. uninit_member: Non-static class member val_size_ is not initialized in this constructor nor in any functions that it calls.
65 CacheRecordHeader() {}
66 CacheRecordHeader(const uint32_t magic, const uint32_t key_size,
67 const uint32_t val_size)
68 : magic_(magic), crc_(0), key_size_(key_size), val_size_(val_size) {}
69
1. member_decl: Class member declaration for magic_.
70 uint32_t magic_;
3. member_decl: Class member declaration for crc_.
71 uint32_t crc_;
5. member_decl: Class member declaration for key_size_.
72 uint32_t key_size_;
7. member_decl: Class member declaration for val_size_.
73 uint32_t val_size_;
74};
utilities/simulator_cache/sim_cache.cc:
157 miss_times_(0),
CID 1396124 (#1 of 1): Uninitialized pointer field (UNINIT_CTOR)
2. uninit_member: Non-static class member stats_ is not initialized in this constructor nor in any functions that it calls.
158 hit_times_(0) {}
159
```
Closes https://github.com/facebook/rocksdb/pull/3155
Differential Revision: D6427237
Pulled By: sagar0
fbshipit-source-id: 97e493da5fc043c5b9a3e0d33103442cffb75aad
Summary:
utilities/blob_db/blob_db_impl.cc
265 : bdb_options_.blob_dir;
3. uninit_member: Non-static class member env_ is not initialized in this constructor nor in any functions that it calls.
5. uninit_member: Non-static class member ttl_extractor_ is not initialized in this constructor nor in any functions that it calls.
7. uninit_member: Non-static class member open_p1_done_ is not initialized in this constructor nor in any functions that it calls.
CID 1418245 (#1 of 1): Uninitialized pointer field (UNINIT_CTOR)
9. uninit_member: Non-static class member debug_level_ is not initialized in this constructor nor in any functions that it calls.
266}
4. past_the_end: Function end creates an iterator.
CID 1418258 (#1 of 1): Using invalid iterator (INVALIDATE_ITERATOR)
5. deref_iterator: Dereferencing iterator file_nums.end() though it is already past the end of its container.
utilities/col_buf_decoder.h:
nullable_(nullable),
2. uninit_member: Non-static class member remain_runs_ is not initialized in this constructor nor in any functions that it calls.
4. uninit_member: Non-static class member run_val_ is not initialized in this constructor nor in any functions that it calls.
CID 1396134 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
6. uninit_member: Non-static class member last_val_ is not initialized in this constructor nor in any functions that it calls.
46 big_endian_(big_endian) {}
Closes https://github.com/facebook/rocksdb/pull/3134
Differential Revision: D6340607
Pulled By: sagar0
fbshipit-source-id: 25c52566e2ff979fe6c7abb0f40c27fc16597054
Summary:
Adding a list of blob db counters.
Also remove WaStats() which doesn't expose the stats and can be substitute by (BLOB_DB_BYTES_WRITTEN / BLOB_DB_BLOB_FILE_BYTES_WRITTEN).
Closes https://github.com/facebook/rocksdb/pull/3193
Differential Revision: D6394216
Pulled By: yiwu-arbug
fbshipit-source-id: 017508c8ff3fcd7ea7403c64d0f9834b24816803
Summary:
This patch implements MultiGet API for WritePreparedTxnDB and update the existing unit tests.
Closes https://github.com/facebook/rocksdb/pull/3196
Differential Revision: D6401493
Pulled By: maysamyabandeh
fbshipit-source-id: 51501a1e32645fc2da8680e77a50035f6530f2cc
Summary:
Garbage collection checks if the offset in blob index matches the offset of the blob value in the file. If it is a mismatch, the value is the current version. However it failed to check if the blob index is an inlined type, which don't even have an offset. Fixing it.
Closes https://github.com/facebook/rocksdb/pull/3194
Differential Revision: D6394270
Pulled By: yiwu-arbug
fbshipit-source-id: 7c2b9d795f1116f55f4d728086980f9b6e88ea78
Summary:
Saw some redundant log lines when trying to benchmark blob db. So, removed the lines from blob_file.cc, and let the lines in blob_db_impl.cc take the lead.
Closes https://github.com/facebook/rocksdb/pull/3189
Differential Revision: D6381726
Pulled By: sagar0
fbshipit-source-id: 5f0b1e56fe4bc3b715d89ea9b5749bd935cd0606
Summary:
The sequence number was not properly advanced after a rollback marker. The patch extends the existing unit tests to detect the bug and also fixes it.
Closes https://github.com/facebook/rocksdb/pull/3157
Differential Revision: D6304291
Pulled By: maysamyabandeh
fbshipit-source-id: 1b519c44a5371b802da49c9e32bd00087a8da401
Summary:
The current implementation of PinnableSlice move assignment have an issue #3163. We are moving away from it instead of try to get the move assignment right, since it is too tricky.
Closes https://github.com/facebook/rocksdb/pull/3164
Differential Revision: D6319201
Pulled By: yiwu-arbug
fbshipit-source-id: 8f3279021f3710da4a4caa14fd238ed2df902c48
Summary:
This patch clarifies and refactors the logic around tracked keys in transactions.
Closes https://github.com/facebook/rocksdb/pull/3140
Differential Revision: D6290258
Pulled By: maysamyabandeh
fbshipit-source-id: 03b50646264cbcc550813c060b180fc7451a55c1
Summary:
Add tests to ensure that WritePrepared and WriteCommitted policies are cross compatible when the db WAL is empty. This is important when the admin want to switch between the policies. In such case, before the switch the admin needs to empty the WAL by i) committing/rollbacking all the pending transactions, ii) FlushMemTables
Closes https://github.com/facebook/rocksdb/pull/3118
Differential Revision: D6227247
Pulled By: maysamyabandeh
fbshipit-source-id: bcde3d92c1e89cda3b9cfa69f6a20af5d8993db7
Summary:
Add per-exe execution capability
Add fix parsing of groups/tests
Add timer test exclusion
Fix unit tests
Ifdef threadpool specific tests that do not pass on Vista threadpool.
Remove spurious outout from prefix_test so test case listing works
properly.
Fix not using standard test directories results in file creation errors
in sst_dump_test.
BlobDb fixes:
In C++ end() iterators can not be dereferenced. They are not valid.
When deleting blob_db_ set it to nullptr before any other code executes.
Not fixed:. On Windows you can not delete a file while it is open.
[ RUN ] BlobDBTest.ReadWhileGC
d:\dev\rocksdb\rocksdb\utilities\blob_db\blob_db_test.cc(75): error: DestroyBlobDB(dbname_, options, bdb_options)
IO error: Failed to delete: d:/mnt/db\testrocksdb-17444/blob_db_test/blob_dir/000001.blob: Permission denied
d:\dev\rocksdb\rocksdb\utilities\blob_db\blob_db_test.cc(75): error: DestroyBlobDB(dbname_, options, bdb_options)
IO error: Failed to delete: d:/mnt/db\testrocksdb-17444/blob_db_test/blob_dir/000001.blob: Permission denied
write_batch
Should not call front() if there is a chance the container is empty
Closes https://github.com/facebook/rocksdb/pull/3152
Differential Revision: D6293274
Pulled By: sagar0
fbshipit-source-id: 318c3717c22087fae13b18715dffb24565dbd956
Summary:
A race condition will happen when:
* a user thread writes a value, but it hits the write stop condition because there are too many un-flushed memtables, while holding blob_db_impl.write_mutex_.
* Flush is triggered and call flush begin listener and try to acquire blob_db_impl.write_mutex_.
Fixing it.
Closes https://github.com/facebook/rocksdb/pull/3149
Differential Revision: D6279805
Pulled By: yiwu-arbug
fbshipit-source-id: 0e3c58afb78795ebe3360a2c69e05651e3908c40
Summary:
`compression` shadow the method name in `BlobFile`. Rename it.
Closes https://github.com/facebook/rocksdb/pull/3148
Differential Revision: D6274498
Pulled By: yiwu-arbug
fbshipit-source-id: 7d293596530998b23b6b8a8940f983f9b6343a98
Summary:
To fix the issue of failing to decompress existing value after reopen DB with a different compression settings.
Closes https://github.com/facebook/rocksdb/pull/3142
Differential Revision: D6267260
Pulled By: yiwu-arbug
fbshipit-source-id: c7cf7f3e33b0cd25520abf4771cdf9180cc02a5f
Summary:
Adds two new counters:
`key_lock_wait_count` counts how many times a lock was blocked by another transaction and had to wait, instead of being granted the lock immediately.
`key_lock_wait_time` counts the time spent acquiring locks.
Closes https://github.com/facebook/rocksdb/pull/3107
Differential Revision: D6217332
Pulled By: lth
fbshipit-source-id: 55d4f46da5550c333e523263422fd61d6a46deb9
Summary:
After move assignment, we need to re-initialized the moved PinnableSlice.
Also update blob_db_impl.cc to not reuse the moved PinnableSlice since it is supposed to be in an undefined state after move.
Closes https://github.com/facebook/rocksdb/pull/3127
Differential Revision: D6238585
Pulled By: yiwu-arbug
fbshipit-source-id: bd99f2e37406c4f7de160c7dee6a2e8126bc224e
Summary:
Fix unreleased snapshot at the end of the test.
Closes https://github.com/facebook/rocksdb/pull/3126
Differential Revision: D6232867
Pulled By: yiwu-arbug
fbshipit-source-id: 651ca3144fc573ea2ab0ab20f0a752fb4a101d26
Summary:
After adding expiration to blob index in #3066, we are now able to add a compaction filter to cleanup expired blob index entries.
Closes https://github.com/facebook/rocksdb/pull/3090
Differential Revision: D6183812
Pulled By: yiwu-arbug
fbshipit-source-id: 9cb03267a9702975290e758c9c176a2c03530b83
Summary:
Blob db will keep blob file if data in the file is visible to an active snapshot. Before this patch it checks whether there is an active snapshot has sequence number greater than the earliest sequence in the file. This is problematic since we take snapshot on every read, if it keep having reads, old blob files will not be cleanup. Change to check if there is an active snapshot falls in the range of [earliest_sequence, obsolete_sequence) where obsolete sequence is
1. if data is relocated to another file by garbage collection, it is the latest sequence at the time garbage collection finish
2. otherwise, it is the latest sequence of the file
Closes https://github.com/facebook/rocksdb/pull/3087
Differential Revision: D6182519
Pulled By: yiwu-arbug
fbshipit-source-id: cdf4c35281f782eb2a9ad6a87b6727bbdff27a45
Summary:
Add an option to enable/disable auto garbage collection, where we keep counting how many keys have been evicted by either deletion or compaction and decide whether to garbage collect a blob file.
Default disable auto garbage collection for now since the whole logic is not fully tested and we plan to make major change to it.
Closes https://github.com/facebook/rocksdb/pull/3117
Differential Revision: D6224756
Pulled By: yiwu-arbug
fbshipit-source-id: cdf53bdccec96a4580a2b3a342110ad9e8864dfe
Summary:
The test intent to wait until key being overwritten until proceed with garbage collection. It failed to wait for `PutUntil` finally finish. Fixing it.
Closes https://github.com/facebook/rocksdb/pull/3116
Differential Revision: D6222833
Pulled By: yiwu-arbug
fbshipit-source-id: fa9b57a772b92a66cf250b44e7975c43f62f45c5
Summary:
Evict oldest blob file and put it in obsolete_files list when close to blob db size limit. The file will be delete when the `DeleteObsoleteFiles` background job runs next time.
For now I set `kEvictOldestFileAtSize` constant, which controls when to evict the oldest file, at 90%. It could be tweaked or made into an option if really needed; I didn't want to expose it as an option pre-maturely as there are already too many :) .
Closes https://github.com/facebook/rocksdb/pull/3094
Differential Revision: D6187340
Pulled By: sagar0
fbshipit-source-id: 687f8262101b9301bf964b94025a2fe9d8573421
Summary:
Move WritePreparedTxnDB from pessimistic_transaction_db.h to its own header, write_prepared_txn_db.h
Closes https://github.com/facebook/rocksdb/pull/3114
Differential Revision: D6220987
Pulled By: maysamyabandeh
fbshipit-source-id: 18893fb4fdc6b809fe117dabb544080f9b4a301b
Summary:
Implements ValidateSnapshot for WritePrepared txns and also adds a unit test to clarify the contract of this function.
Closes https://github.com/facebook/rocksdb/pull/3101
Differential Revision: D6199405
Pulled By: maysamyabandeh
fbshipit-source-id: ace509934c307ea5d26f4bbac5f836d7c80fd240
Summary:
GetCommitTimeWriteBatch is currently used to store some state as part of commit in 2PC. In MyRocks it is specifically used to store some data that would be needed only during recovery. So it is not need to be stored in memtable right after each commit.
This patch enables an optimization to write the GetCommitTimeWriteBatch only to the WAL. The batch will be written to memtable during recovery when the WAL is replayed. To cover the case when WAL is deleted after memtable flush, the batch is also buffered and written to memtable right before each memtable flush.
Closes https://github.com/facebook/rocksdb/pull/3071
Differential Revision: D6148023
Pulled By: maysamyabandeh
fbshipit-source-id: 2d09bae5565abe2017c0327421010d5c0d55eaa7
Summary:
The collapse of duplicate keys in write batch needs to sort the indexes of duplicate keys since it only checks the index in the batch with the head of the list of duplicate keys.
Closes https://github.com/facebook/rocksdb/pull/3093
Differential Revision: D6186800
Pulled By: maysamyabandeh
fbshipit-source-id: abc9ae8c2f1840445a5584f925cf86ecc6f37154
Summary:
* cleanup num_concurrent_simple_blobs. We don't do concurrent writes (by taking write_mutex_) so it doesn't make sense to have multiple non TTL files open. We can revisit later when we want to improve writes.
* cleanup eviction callback. we don't have plan to use it now.
* rename s/open_simple_blob_files_/open_non_ttl_file_/ and s/open_blob_files_/open_ttl_files_/ to avoid confusion.
Closes https://github.com/facebook/rocksdb/pull/3088
Differential Revision: D6182598
Pulled By: yiwu-arbug
fbshipit-source-id: 99e6f5e01fa66d31309cdb06ce48502464bac6ad
Summary:
Changing blob file format and some code cleanup around the change. The change with blob log format are:
* Remove timestamp field in blob file header, blob file footer and blob records. The field is not being use and often confuse with expiration field.
* Blob file header now come with column family id, which always equal to default column family id. It leaves room for future support of column family.
* Compression field in blob file header now is a standalone byte (instead of compact encode with flags field)
* Blob file footer now come with its own crc.
* Key length now being uint64_t instead of uint32_t
* Blob CRC now checksum both key and value (instead of value only).
* Some reordering of the fields.
The list of cleanups:
* Better inline comments in blob_log_format.h
* rename ttlrange_t and snrange_t to ExpirationRange and SequenceRange respectively.
* simplify blob_db::Reader
* Move crc checking logic to inside blob_log_format.cc
Closes https://github.com/facebook/rocksdb/pull/3081
Differential Revision: D6171304
Pulled By: yiwu-arbug
fbshipit-source-id: e4373e0d39264441b7e2fbd0caba93ddd99ea2af
Summary:
Adding the `min_blob_size` option to allow storing small values in base db (in LSM tree) together with the key. The goal is to improve performance for small values, while taking advantage of blob db's low write amplification for large values.
Also adding expiration timestamp to blob index. It will be useful to evict stale blob indexes in base db by adding a compaction filter. I'll work on the compaction filter in future patches.
See blob_index.h for the new blob index format. There are 4 cases when writing a new key:
* small value w/o TTL: put in base db as normal value (i.e. ValueType::kTypeValue)
* small value w/ TTL: put (type, expiration, value) to base db.
* large value w/o TTL: write value to blob log and put (type, file, offset, size, compression) to base db.
* large value w/TTL: write value to blob log and put (type, expiration, file, offset, size, compression) to base db.
Closes https://github.com/facebook/rocksdb/pull/3066
Differential Revision: D6142115
Pulled By: yiwu-arbug
fbshipit-source-id: 9526e76e19f0839310a3f5f2a43772a4ad182cd0
Summary:
I found that we continue accepting writes even when the blob db goes beyond the configured blob directory size limit. Now, we return an error for writes on reaching `blob_dir_size` limit and if `is_fifo` is set to false. (We cannot just drop any file when `is_fifo` is true.)
Deleting the oldest file when `is_fifo` is true will be handled in a later PR.
Closes https://github.com/facebook/rocksdb/pull/3060
Differential Revision: D6136156
Pulled By: sagar0
fbshipit-source-id: 2f11cb3f2eedfa94524fbfa2613dd64bfad7a23c
Summary:
A few simple changes to allow RocksDB to be built on OpenBSD. Let me know if any further changes are needed.
Closes https://github.com/facebook/rocksdb/pull/3061
Differential Revision: D6138800
Pulled By: ajkr
fbshipit-source-id: a13a17b5dc051e6518bd56a8c5efd1d24dd81b0c