NAME

File::CRBackup - Backup files/directories with histories, using cp+rsync

VERSION

version 0.03

SYNOPSIS

In daily-backup script:

#!/usr/bin/perl
use File::CRBackup qw(backup);
use Log::Any::App;
backup(
    source    => '/path/to/mydata',
    target    => '/backup/mydata',
    histories => [-7, 4, 3],         # 7 days, 4 weeks, 3 months
);

Or, just use the provided script:

% crbackup --source /path/to/mydata --target /backup/mydata

DESCRIPTION

This module utilizes two mature, dependable Unix command-line utilities, cp and rsync, to create a filesystem backup system. Some characteristics of this backup system:

  • Supports backup histories and history levels

    For example, you can create 7 level-1 backup histories (equals 7 daily histories if you run backup once daily), 4 level-2 backup histories (equals 4 weekly histories) and 3 level-3 backup histories (roughly equals 3 monthly histories). The number of levels and history per levels are customizable.

  • Backups (and histories) are not compressed/archived ("tar"-ed)

    They are just verbatim copies (produced by "cp -a", or "rsync -a") of source directory. The upside of this is ease of cherry-picking (taking/restoring individual files from backup). The downside is lack of compression and the backup not being a single archive file.

    This is because rsync needs two real directory trees when comparing. Perhaps when rsync supports tar virtual filesystem in the future...

  • Hardlinks are used between backup histories to save disk space

    This way, we can maintain several backup histories without wasting too much space duplicating data when there are not a lot of differences among them.

  • High performance

    Rsync and cp are implemented in C and have been optimized for a long time. rm is also used instead of Perl implementation File::Path::remove_path.

  • Unix-specific

    There are ports of cp, rm, and rsync on Windows, but this module hasn't been tested on those platforms.

This module uses Log::Any logging framework.

HOW IT WORKS

First-time backup

First, we lock target directory to prevent other backup process from interfering:

mkdir -p TARGET
flock    TARGET/.lock

Then we copy source to temporary directory:

cp -a    SRC            TARGET/.tmp

If copy finishes successfully, we rename temporary directory to final directory 'current':

rename   TARGET/.tmp    TARGET/current
touch    TARGET/.current.timestamp

If copy fails in the middle, TARGET/.tmp will still be lying around and the next backup process will try to rsync it (to be more efficient):

rsync    SRC            TARGET/.tmp

Finally, we remove lock:

unlock   TARGET/.lock

Subsequent backups (after TARGET/current exists)

First, we lock target directory to prevent other backup process to interfere:

flock    TARGET/.lock

Then we copy current to temporary directory, using hardlinks when possible:

cp -la   TARGET/current TARGET/.tmp

Then we rsync source to target directory:

rsync    SRC            TARGET/.tmp

If rsync finishes successfully, we rename target directories:

rename   TARGET/current TARGET/hist.<timestamp>
rename   TARGET/.tmp    TARGET/current
touch    TARGET/.current.timestamp

If rsync fails in the middle, TARGET/.tmp will be lying around and the next backup process will just continue the rsync process.

Finally, we remove lock:

unlock   TARGET/.lock

Maintenance of histories/history levels

TARGET/hist.* are level-1 backup histories. Each backup run will produce a new history:

TARGET/hist.<timestamp1>
TARGET/hist.<timestamp2> # produced by the next backup
TARGET/hist.<timestamp3> # and the next ...
TARGET/hist.<timestamp4> # and so on ...
TARGET/hist.<timestamp5>
...

You can specify the number of histories (or number of days) to maintain. If the number of histories exceeds the limit, older histories will be deleted, or one will be promoted to the next level, if a higher level is specified.

For example, with histories being set to [7, 4, 3], after TARGET/hist.<timestamp8> is created, TARGET/hist.<timestamp1> will be promoted to level 2:

rename TARGET/hist.<timestamp1> TARGET/hist2.<timestamp1>

TARGET/hist2.* directories are level-2 backup histories. After a while, they will also accumulate:

TARGET/hist2.<timestamp1>
TARGET/hist2.<timestamp8>
TARGET/hist2.<timestamp15>
TARGET/hist2.<timestamp22>

When TARGET/hist2.<timestamp29> arrives, TARGET/hist2.<timestamp1> will be promoted to level 3: TARGET/hist3.<timestamp1>. After a while, level-3 backup histories too will accumulate:

TARGET/hist3.<timestamp1>
TARGET/hist3.<timestamp29>
TARGET/hist3.<timestamp57>

Finally, TARGET/hist3.<timestamp1> will be deleted after TARGET/hist3.<timestamp85> comes along.

FUNCTIONS

None of the functions are exported by default.

HISTORY

The idea for this module came out in 2006 as part of the Spanel hosting control panel project. We need a daily backup system for shared hosting accounts that supports histories and cherry-picking. Previously we had been using a Python-based script rdiff-backup. It was not very robust, the script chose to exit on many kinds of non-fatal errors instead of ignoring the errors and continuning backup. It was also very slow: on a server with hundreds of accounts with millions of files, backup process often took 12 hours or more. After evaluating several other solutions, we realized that nothing beats the raw performance of rsync/cp. Thus we designed a simple backup system based on them.

First public release of this module is in Feb 2011.

FAQ

How do I exclude some directories?

Just use rsync's --exclude et al. Pass them to extra_rsync_opts.

What is a good backup practice (using CRBackup)?

Just follow the general practice. While this is not a place to discuss backups in general, some of the principles are:

  • backup regularly (e.g. once daily or more often)

  • automate the process (else you'll forget)

  • backup to another disk partition and computer

  • verify your backups often (what good are they if they can't be restored)

  • when appropriate, encrypt your backups

How do I restore backups?

Backups are just verbatim copies of files/directories, so just use whatever filesystem tools you like.

How to do remote backup?

With CRBackup, rsync+ssh your resulting local backup to another host. I believe with Snapback2 you can directly SSH to remote hosts.

TODO

* Allow ionice etc instead of just nice -n19

SEE ALSO

File::Backup

File::Rotate::Backup

Snapback2, which is a backup system using the same principle as outlined above, created in as early as 2004 (or earlier) by Mike Heins. Do check it out. I wish I had found it first before reinventing it in 2006 :-)

AUTHOR

Steven Haryanto <stevenharyanto@gmail.com>

COPYRIGHT AND LICENSE

This software is copyright (c) 2011 by Steven Haryanto.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.