Notice

This document is for a development version of Ceph.

Disaster recovery

Metadata damage and repair

If a file system has inconsistent or missing metadata, it is considered damaged. You may find out about damage from a health message, or in some unfortunate cases from an assertion in a running MDS daemon.

Metadata damage can result either from data loss in the underlying RADOS layer (e.g. multiple disk failures that lose all copies of a PG), or from software bugs.

CephFS includes some tools that may be able to recover a damaged file system, but to use them safely requires a solid understanding of CephFS internals. The documentation for these potentially dangerous operations is on a separate page: Advanced: Metadata repair tools.

Data pool damage (files affected by lost data PGs)

If a PG is lost in a data pool, then the file system continues to operate normally, but some parts of some files will simply be missing (reads will return zeros).

Losing a data pool PG may affect many files. Files are split into many RADOS objects, so identifying which files have been affected by the loss of particular PGs requires a scan of all RADOS objects storing data for the file. This type of scan may be useful for identifying which files must be restored from a backup.

Danger

This command does not repair any metadata, so when restoring files in this case you must remove the damaged file and replace it in order to have a fresh inode. Do not overwrite damaged files in place.

If you know that objects have been lost from PGs, use the pg_files subcommand to scan for the files that may have been damaged as a result:

cephfs-data-scan pg_files <path> <pg id> [<pg id>...]

For example, if you have lost data from PGs 1.4 and 4.5 and you want to know which files under /home/bob have been damaged:

cephfs-data-scan pg_files /home/bob 1.4 4.5

The output is a list of paths to potentially damaged files. One file is listed per line.

Note

This command acts as a normal CephFS client to find all the files in the file system and read their layouts. This means that the MDS must be up and running in order for this command to be usable.

Using first-damage.py

  1. Unmount all clients.

  2. Flush the journal if possible:

    ceph tell mds.<fs_name>:0 flush journal
    
  3. Fail the file system:

    ceph fs fail <fs_name>
    
  4. Recover dentries from the journal. If the MDS flushed the journal successfully, this will be a no-op:

    cephfs-journal-tool --rank=<fs_name>:0 event recover_dentries summary
    
  5. Reset the journal:

    cephfs-journal-tool --rank=<fs_name>:0 journal reset --yes-i-really-mean-it
    
  6. Run first-damage.py to list damaged dentries:

    python3 first-damage.py --memo run.1 <pool>
    
  7. Optionally, remove the damaged dentries:

    python3 first-damage.py --memo run.2 --remove <pool>
    

    Note

    use --memo to specify a different file to save objects that have already been traversed. This makes it possible to separate data made during different, independent runs.

    This command has the effect of removing a dentry from the snapshot or head (in the current hierarchy). The inode’s linkage will be lost. The inode may however be recoverable in lost+found during a future data-scan recovery.

Brought to you by the Ceph Foundation

The Ceph Documentation is a community resource funded and hosted by the non-profit Ceph Foundation. If you would like to support this and our other efforts, please consider joining now.