|
|
|
@ -7,24 +7,53 @@ |
|
|
|
|
// Use of this source code is governed by a BSD-style license that can be
|
|
|
|
|
// found in the LICENSE file. See the AUTHORS file for names of contributors.
|
|
|
|
|
//
|
|
|
|
|
// We recover the contents of the descriptor from the other files we find.
|
|
|
|
|
// (1) Any log files are first converted to tables
|
|
|
|
|
// (2) We scan every table to compute
|
|
|
|
|
// (a) smallest/largest for the table
|
|
|
|
|
// (b) largest sequence number in the table
|
|
|
|
|
// (3) We generate descriptor contents:
|
|
|
|
|
// - log number is set to zero
|
|
|
|
|
// - next-file-number is set to 1 + largest file number we found
|
|
|
|
|
// - last-sequence-number is set to largest sequence# found across
|
|
|
|
|
// all tables (see 2c)
|
|
|
|
|
// - compaction pointers are cleared
|
|
|
|
|
// - every table file is added at level 0
|
|
|
|
|
// Repairer does best effort recovery to recover as much data as possible after
|
|
|
|
|
// a disaster without compromising consistency. It does not guarantee bringing
|
|
|
|
|
// the database to a time consistent state.
|
|
|
|
|
//
|
|
|
|
|
// Repair process is broken into 4 phases:
|
|
|
|
|
// (a) Find files
|
|
|
|
|
// (b) Convert logs to tables
|
|
|
|
|
// (c) Extract metadata
|
|
|
|
|
// (d) Write Descriptor
|
|
|
|
|
//
|
|
|
|
|
// (a) Find files
|
|
|
|
|
//
|
|
|
|
|
// The repairer goes through all the files in the directory, and classifies them
|
|
|
|
|
// based on their file name. Any file that cannot be identified by name will be
|
|
|
|
|
// ignored.
|
|
|
|
|
//
|
|
|
|
|
// (b) Convert logs to table
|
|
|
|
|
//
|
|
|
|
|
// Every log file that is active is replayed. All sections of the file where the
|
|
|
|
|
// checksum does not match is skipped over. We intentionally give preference to
|
|
|
|
|
// data consistency.
|
|
|
|
|
//
|
|
|
|
|
// (c) Extract metadata
|
|
|
|
|
//
|
|
|
|
|
// We scan every table to compute
|
|
|
|
|
// (1) smallest/largest for the table
|
|
|
|
|
// (2) largest sequence number in the table
|
|
|
|
|
//
|
|
|
|
|
// If we are unable to scan the file, then we ignore the table.
|
|
|
|
|
//
|
|
|
|
|
// (d) Write Descriptor
|
|
|
|
|
//
|
|
|
|
|
// We generate descriptor contents:
|
|
|
|
|
// - log number is set to zero
|
|
|
|
|
// - next-file-number is set to 1 + largest file number we found
|
|
|
|
|
// - last-sequence-number is set to largest sequence# found across
|
|
|
|
|
// all tables (see 2c)
|
|
|
|
|
// - compaction pointers are cleared
|
|
|
|
|
// - every table file is added at level 0
|
|
|
|
|
//
|
|
|
|
|
// Possible optimization 1:
|
|
|
|
|
// (a) Compute total size and use to pick appropriate max-level M
|
|
|
|
|
// (b) Sort tables by largest sequence# in the table
|
|
|
|
|
// (c) For each table: if it overlaps earlier table, place in level-0,
|
|
|
|
|
// else place in level-M.
|
|
|
|
|
// (d) We can provide options for time consistent recovery and unsafe recovery
|
|
|
|
|
// (ignore checksum failure when applicable)
|
|
|
|
|
// Possible optimization 2:
|
|
|
|
|
// Store per-table metadata (smallest, largest, largest-seq#, ...)
|
|
|
|
|
// in the table's meta section to speed up ScanTable.
|
|
|
|
|