Backing up whole Linux with GNU tar

Introduction

This article explains how to use GNU tar to create a Linux based operating system backup and how to restore it later if needed. It assumes that the filesystems to backup are the traditional Linux ext2 based filesystems, i.e. of type ext2, ext3 or ext4, or widely used xfs or relatively new btrfs filesystems. It was tested that these filesystems can be fully restored if proper arguments are given as explained later. The described steps should work for other Linux filesystems as well if they're able to store all standard Linux file types and attributes, but that wasn't confirmed at the time of this writing. However, it was successfully tested that GNU tar is able to backup one filesystem of the aforementioned type and restore it to another type. Later versions of GNU tar (e.g. version 1.27.1) support even POSIX ACLs and can be used to perform not just full, but incremental backups as well.

GNU tar capabilities

GNU tar is capable of archiving all Linux file types except unix domain sockets. However, the file of this type normally exists only if the process that created it is running unless the process exits abnormally. Hence, this isn't an issue. All other file types including regular files, directories, symbolic links, named pipes and character and block devices are stored. All important information about these file types is contained in the archive: filenames, permissions, owner, group, modification time, symbolic link target and major and minor device numbers. Hard links are stored and extracted properly too if they're put into the same archive. GNU tar can also handle sparse files efficiently, but only if it obtains the --sparse option during the backup. Later GNU tar versions are able to backup and restore POSIX ACLs if the --acls argument is given to tar during both creating and restoring the archive. However, the special lost+found directory in the root directory of an ext2 based filesystem is treated as a usual directory and should be therefore excluded from the backup by using the --exclude option.

Furthermore, GNU tar can create incremental backups if it is called with the -g switch. It uses ctime (time of last modification of file status information) to determine whether the file or its status has changed. This is a second best choice after using hashes because if the file contents changes, the mtime (modification time) and therefore also the ctime change. However, if the file is only accessed and atime (access time) is modified, ctime remains unaltered. If only the file status changes (permissions, owner, group etc.), ctime is also updated. The only drawback of this method is that it is necessary to maintain correct system time. It is not possible to modify ctime arbitrarily by using some command, but if the system time changes, later modifications may occur as older ones and the incremental backup will unfortunately not contain such files. You should also take into account that the -g switch requires an argument denoting the path to so called snapshot archive. The snapshot archive contains additional information that is necessary to determine whether a file should be included into the incremental archive or not. More precisely, the snapshot archive contains the list of all files and their ctimes.

Creating full backup archives

If you want to create consistent operating system backup, you need to backup a snapshot of all partitions where the system is installed. This can be accomplished either by shutting down the system and backing it up (e.g. from a live CD) or by stopping all processes that are writing to the filesystems to backup, remounting them read-only and then performing a consistent backup.

The latter approach doesn't require to stop the system. Moreover, it can be further improved by using a technology that is able to create filesystem snapshots at a particular time. LVM (Logical Volume Management) can be used for this purpose if you have enough free space in the volume group containing logical volumes with data. The writing processes can be then stopped and the filesystems remounted read-only just for the moment of creating snapshots. The system can continue running after that and the backup can be performed on the snapshots instead of on the running system.

Let's now look at the concrete commands that can be used to backup an operating system snapshot. Let's remount all filesystems read-only at first.

  • mount -o remount,ro /boot
  • mount -o remount,ro /
    ...
  • mount -o remount,ro /home

The backup archives of the snapshot can be then created as follows:

  • tar -cvvzf boot.0.tgz -g boot.0.snar -C /boot --one-file-system --sparse --acls --exclude ./lost+found .
  • tar -cvvzf rootfs.0.tgz -g rootfs.0.snar -C / --one-file-system --sparse --acls --exclude ./lost+found .
    ...
  • tar -cvvzf home.0.tgz -g home.0.snar -C /home --one-file-system --sparse --acls --exclude ./lost+found .

The aforementioned --sparse and --acls options ensure that sparse files are stored efficiently as sparse and that POSIX ACLs are included into the archives as well. The --one-file-system option instructs tar not to archive files from other filesystems. And the --exclude option is self-explanatory. Bare in mind that the special lost+found directory of an ext2 based filesystem should be always excluded from the backup because it is restored as a usual directory. The -g option is not necessary. However, you must pass it to tar in order to create subsequent incremental backup(s) later. The -g switch defines the filename of so called snapshot archive with additional metadata to identify files whose attributes or contents changed since the preceding backup.

Creating incremental backup archives

Creating an incremental backup is analogous to creating the foregoing full backup. An operating system snapshot should be made in the same way as in the case of the full backup, e.g. by invoking:

  • mount -o remount,ro /boot
  • mount -o remount,ro /
    ...
  • mount -o remount,ro /home

Archiving the filesystems is also similar. The snapshot archive referenced by the -g switch is used to identify and backup modified files. It is also updated so that another incremental backup based on the incremental backup being created can be performed later. Hence, let's copy all snapshot archives at first to keep them for each incremental backup level.

  • cp boot.0.snar boot.1.snar
  • cp rootfs.0.snar rootfs.1.snar
    ...
  • cp home.0.snar home.1.snar

The backup can be done after that in the same way as the full backup. Just the archive names are adjusted to refer to the new backup level.

  • tar -cvvzf boot.1.tgz -g boot.1.snar -C /boot --one-file-system --sparse --acls --exclude ./lost+found .
  • tar -cvvzf rootfs.1.tgz -g rootfs.1.snar -C / --one-file-system --sparse --acls --exclude ./lost+found .
    ...
  • tar -cvvzf home.1.tgz -g home.1.snar -C /home --one-file-system --sparse --acls --exclude ./lost+found .

Subsequent incremental backup archives are generated by the same commands. Just the backup level in the archive names must be adjusted again.

Restoring the backups

Let's assume that you have your disk prepared for data extraction, i.e. that it contains valid partition table and that its partitions are formatted to ext2 based or other supported Linux filesystems, ideally the same as in the original system because you don't have to modify configuration of the target system in that case - first of all the file /etc/fstab. The swap partition should be formatted as swap space by mkswap.

Then mount each empty filesystem and extract the archives to its root folder. Firstly the full backup archive and then all incremental backups in the order in which they were created. The following example assumes that the boot filesystem is located at the first primary partition /dev/sda1 and that other filesystems exist on top of LVM group mg.

  • mount -o acl /dev/mg/rootfs /mnt/restore
  • tar -xvvzf rootfs.0.tgz -g /dev/null -C /mnt/restore/ --numeric-owner --acls
  • tar -xvvzf rootfs.1.tgz -g /dev/null -C /mnt/restore/ --numeric-owner --acls
    ...
  • mount -o acl /dev/sda1 /mnt/restore/boot
  • tar -xvvzf boot.0.tgz -g /dev/null -C /mnt/restore/boot --numeric-owner --acls
  • tar -xvvzf boot.1.tgz -g /dev/null -C /mnt/restore/boot --numeric-owner --acls
    ...
    ...
  • mount -o acl /dev/mg/home /mnt/restore/home
  • tar -xvvzf home.0.tgz -g /dev/null -C /mnt/restore/home --numeric-owner --acls
  • tar -xvvzf home.1.tgz -g /dev/null -C /mnt/restore/home --numeric-owner --acls
    ...

The --numeric-owner argument is essential not to restore file owner and group names from the archive, but just their UIDs and GIDs. This becomes important when extracting using different operating system than the final one because tar doesn't change file UIDs and GIDs depending on matching usernames on the running system. The --acls option can be left out if ACLs aren't used on the restored filesystems. It doesn't conflict with the --numeric-owner option if it is used just during restoration.

The disk now contains all data, but the system is not able to boot yet.

Installing the boot loader

This step varies depending on the used Linux distribution, boot loader and disk layout. The following description applies to Debian which is using the GRUB boot loader and has its filesystems on top of LVM with the only exception of the boot directory that lays on the first primary partition /dev/sda1. However, it can be customized for other cases as well by using other commands, boot loader (e.g. lilo) or devices.

The basic idea behind the procedure is simple. Boot from a live CD, mount all filesystems into the target directory tree and run chroot to see just this tree. Then update filesystem UUIDs in some configuration files (or possibly other references to the filesystems if you changed their number, types or names), update the initial ramdisks for all kernel versions and configure and install the boot loader itself.

The filesystems were already mounted to /mnt/restore during archive extraction. However, some filesystems generated by the running kernel or daemons need to be mounted yet.

  • mount -t proc proc /mnt/restore/proc
  • mount -t sysfs sysfs /mnt/restore/sys
  • mount -t udev devtmpfs /mnt/restore/dev

The root directory of the file/directory tree can be changed after that.

  • chroot /mnt/restore

You shouldn't forget to check /etc/fstab for the filesystem references if they are correct. And if you changed the filesystem types, you must update them as well. This example assumes that all filesystems were recreated in the same way as in the original system and thus just the UUIDs had to be updated, e.g.

  • UUID=... /boot ext3 defaults,errors=remount-ro,nodev,nosuid,noexec 0 2

Filesystem UUID can be found out by using the blkid command, e.g.:

  • blkid /dev/sda1

It's recommended to check if the initrd configuration doesn't refer to incorrect filesystem as well. It needs to refer to the swap device for the purpose of resuming from hibernation. This can be done in the file /etc/initramfs-tools/conf.d/resume in Debian:

  • RESUME=/dev/mg/swap

Finally, it should be sufficient to update initrd and GRUB.

  • update-initramfs -k all -u
  • grub-install
  • update-grub

And try to reboot.

Summary

It's possible to use GNU tar to backup and restore whole Linux operating system. The backup can be done while the system is running, but you must ensure that you're backing up a consistent snapshot of the operating system.

The described method should work for any Linux distribution because the GNU implementation of the standard tar command should be part of it. Even if it is not, you could backup from a live CD distribution from time to time. GNU tar should be able to store all permanent Linux file types and important attributes and the procedure should thus work for most Linux filesystems. This method was fully tested only on ext2 based filesystems and partially on xfs and btrfs and it was used to restore several different installations of Debian squeeze, wheezy and jessie and also of CentOS 7. No problem occurred after the whole system was restored, all permissions and all important attributes were maintained as explained above.

 

Inserted: 2017-10-28 21:02:28
Last updated: 2017-10-28 21:02:28