NAME
DataStore::CAS::FS::Importer - Copy files from filesystem into DataStore::CAS::FS.
VERSION
version 0.010100_03
SYNOPSIS
my $cas_fs= DataStore::CAS::FS->new( ... );
# Defaults are reasonable
my $importer= DataStore::CAS::FS::Importer->new();
$importer->import_tree( "/home/user", $cas_fs->path('/') );
$cas_fs->commit();
# Lots of customizability...
$importer= DataStore::CAS::FS::Importer->new(
dir_format => 'unix', # optimized for storing unix-attrs
filter => sub { return ($_[0] =~ /^\./)? 0 : 1 }, # exclude hidden files
die_on_file_error => 0, # store placeholder for files that can't be read
);
DESCRIPTION
The Importer is a utility class which performs the work of scanning directory entries of the real filesystem, storing new files in the CAS, and encoding new directories and storing those in the CAS as well. It has conditional support for the various Perl modules you need to collect all the metadata you care about, and can be subclassed if you need to collect additional metadata.
ATTRIBUTES
dir_format
$class->new( dir_format => 'universal' );
$importer->dir_format( 'unix' );
Read/write. Directory format to use when encoding directories. Defaults to 'universal'
.
Directories can be recorded with varying levels of metadata and encoded in a variety of formats which are optimized for various uses. Set this to the format string of your preferred encoder.
The format strings are registered by DirCodec classes when loaded. Built-in formats are 'universal', 'minimal', or 'unix'. (more are planned)
Calls to "import_tree" will encode directories in this format. If you wish to re-use the previously encoded directories during an incremental backup, you must use the same dir_format
as before. This is because all directories get re-encoded every time, and the ones containing the same metadata will end up with the same digest-hash, and be re-used.
filter
Read/write. This optional coderef (which may be an object with overloaded function operator) filters out files that you wish to ignore when walking the physical filesystem.
It is passed 3 arguments: The name, the full path, and the results of 'stat' as a blessed arrayref. You are also guaranteed that stat was called on this file immediately preceeding, so you may also use code like "-d _".
Return 0 to exclude the file. Return 1 to store it. Return -1 to record its metadata (directory entry) but not its content.
$importer->filter( sub {
my ($name, $path, $stat)= @_;
return 1 if -d _; # recurse into all directories
return -1 if $stat->size > 1024*1024; # don't store large files
return 0 if substr($name,0,1) eq '.'; # exclude hidden files
return 1;
});
flags
Read/write. This is a hashref of parameters and options for how directories should be scanned and which information is collected. Each member of 'flags' has its own accessor method, but they may be accessed here for easy swapping of entire parameter sets. All flags are read/write, and most are simple booleans.
die_on_dir_error
-
true: Die if there is any problem reading the contents of a directory. false: Warn, and encode as a content-less directory.
Default: true
die_on_file_error
-
true: Die if there is any problem reading the contents of a file. false: Warn, and encode as a content-less file.
Default: true
die_on_hint_error
-
true: Die if there is an error looking up the "hint" for an incremental backup. false: Warn that the hint is unavailable, and just encode the file/directory as if no hint were being used.
Default: false
- collect_metadata_ts
-
Default: true, if available and distinct from mtime.
If true, collect
metadata_ts
, which is the timestamp of the last change to the file's metadata. (ctime, on UNIX) - collect_access_ts
-
Default: false
If true, collects attribute unix_atime
This value is not collected by default because it changes frequently, many people don't use it anyway, and the Importer itself is likely to modify them.
- collect_unix_perm
-
Default: true on unix
If true, collects attributes mode, unix_uid, unix_gid, unix_user, and unix_group.
- collect_unix_misc
-
Default: false
If true, collects attributes unix_dev, unix_inode, unix_nlink, unix_blocksize, and unix_blockcount.
- collect_acl
-
Default: false
If true, would collect attribute
unix_acl
orwindows_acl
(neither of which are currently unimplemented, or have even been spec'd out) - collect_ext_attr
-
Default: false
If true, collects any "extended metadata" available for the file. This is unimplemented and attributes have not been spec'd out yet.
- follow_symlink
-
Default: false.
Use lstat instead of stat. Use this flag at your own risk. It might introduce recursion, and no code has been written yet to detect and prevent this. No symlinks will be recorded as symlinks if this is set.
The interaction of this flag with an incremental backup that contains symlinks (i.e. whether to follow symlinks within the "hint" directory) is unspecified. (I need to spend some time thinking about it before I can decide which makes the most sense)
- cross_mountpoints
-
Default: false
Cross mount points. Leaving this as false will record mount points as a content-less directory. Mount points are detected by the device number changing in a call to stat. This is not robust protection against bind-mounts, however. Support for detecting bind-mounts might be added in the future.
- reuse_digests
-
Default: 2
Options: false (off), 1 (size), 2 (size+mtime), 3 (size+ctime)
Many of the import methods accept a
$hint
parameter. Using digest hints greatly speeds up import operations, at the cost of the certainty of getting an exact copy.The hint is a past result of importing a tree from the filesystem. (a path object from DataStore::CAS::FS). If the size (and optionally metadata_ts / modify_ts) of the file have not changed, the digest_hash from the hint will be used instead of re-calculating it.
Make sure you are collecting and storing your criteria in the directories, or none of the hashes can be re-used. Specifically, you need
collect_metadata_ts => 1
anddir_format => 'unix'
ordir_format => 'universal'
to make use ofreuse_digests => 3
. - utf8_filenames
METHODS
new
my $importer= $class->new( %attributes_and_flags )
The constructor accepts values for any of the official attributes. It also accepts all of the flag names, and will move them into the flags
attribute for you.
No arguments are required, and the defaults should work for most people.
import_tree
$self->import_tree( $path, $FS_Path_object )
# returns true, or throws an exception
Recursively collect directory entries from the real filesystem at $path
and store them at $FS_Path_object (which references an instance of FS, which references an instance of CAS)
This will use the destination path for incremental-bakup hints, if that feature is enabled on this Importer. If you want to make a clean import, you should first unlink the destination path, or turn off the "reuse_digests" flag.
import_directory
$digest_hash= $importer->import_directory( $cas, $path, $hint );
Imports a directory from the real filesystem $path
into the $cas, optionally using the virtual filesystem path $hint as a cache of previously-calculated digest hashes for files whose metadata matches.
import_directory_entry
$dirEnt= $importer->import_directory_entry($cas, $path);
# Or a little more optimized...
$dirEnt= $importer->import_directory_entry($cas, $path, $ent_name, $stat, $hint);
This method scans a path on the real filesystem, and returns a *complete* DirEnt object, importing file contents and recursing and encoding subdirectories as necessary.
collect_dirent_metadata
$attrHash= $importer->collect_dirent_metadata( $path );
# -or-
$attrHash= $importer->collect_dirent_metadata( $path, $hint, $name, $stat );
This method returns a hashref of attributes about the named file. The only required parameter is $path
, however the others can be given to speed up execution. $path
should be in platform-native form. $name
will be calculated with File::Spec->splitpath if not provided. $stat
should be an arrayref from stat() or lstat(), optionally blessed.
If $hint
(a DirEnt) is given, and $path
refers to a file with the same metadata (size, mtime) of the $hint
, then $hint-
ref> will be used instead of re-calculating the digest of the file.
STAT OBJECTS
The stat arrayrefs that Importer passes to the filter are blessed to give you access to methods like '->mode' and '->mtime', but I'm not using File::stat. "Why??" you ask? because blessing an arrayref from the regular stat is 3 times as fast and my accessors are twice as fast, and it requires a miniscule amount of code.
AUTHOR
Michael Conrad <mconrad@intellitree.com>
COPYRIGHT AND LICENSE
This software is copyright (c) 2013 by Michael Conrad, and IntelliTree Solutions llc.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.