NAME

Sys::Export::Unix - Export subsets of a UNIX system

SYNOPSIS

use Sys::Export::Unix;
my $exporter= Sys::Export::Unix->new(
  src => '/', dst => '/initrd'
  rewrite_paths => {
    'sbin'     => 'bin',
    'usr/bin'  => 'bin',
    'usr/sbin' => 'bin',
    'usr/lib'  => 'lib',
  },
);
$exporter->add('bin/busybox');

DESCRIPTION

This object contains the logic for exporting unix-style systems.

CONSTRUCTORS

new

Sys::Export::Unix->new(\%attributes); # hashref
Sys::Export::Unix->new(%attributes);  # key/value list

Required attributes:

src

The root of the system to export from (often '/', but you must specify this)

dst

The root of the exported system. This directory must exist, and should be empty unless you specify 'on_conflict'.

It can also be an object with 'add' and 'finish' methods, which avoids the entire construction of a staging directory, and doesn't require root permission to operate.

Options:

rewrite_path

Convenience for calling "rewrite_path" using a hashref of { src => dst } pairs.

rewrite_user

Convenience for calling "rewrite_user" using a hashref of { src => dst } pairs.

rewrite_group

Convenience for calling "rewrite_group" using a hashref of { src => dst } pairs.

src_userdb

An instance of Sys::Export::Unix::UserDB, or constructor parameters for one. The default is to read $src/etc/passwd, or fall back to the getpwnam function of the host. See "USER REMAPPING" for more details.

dst_userdb

An instance of Sys::Export::Unix::UserDB, or constructor parameters for one. If defined, this will trigger name-based translations of all UID/GID values written to the destination filesystem. See "USER REMAPPING" for more details.

tmp

A temporary directory where this module can prepare temporary files. If you are using a filesystem destination, it will default to the same device as the staging directory.

When finish is called, this is et to undef so that instance of File::Temp can clean themselves up.

on_collision

Specifies what to do if there is a name collision in the destination. See attribute "on_collision".

log

This can either be a Log::Any instance, or a string specifying a log level such as "debug" or "trace". The default logging is on STDOUT (level 'info') and simply lists the files being copied and whether they were patched.

ATTRIBUTES

src

The root of the source filesystem. It must be the actual root used by the symlinks and library paths inside this filesystem, or things will break.

src_abs

The abs_path of the root of the source filesystem, always ending with '/'.

src_userdb

An instance of Sys::Export::Unix::UserDB. This attribute is undef until it is needed, unless you specified it to the constructor. See "USER REMAPPING" for details.

dst

The root of the destination filesystem, OR a coderef which receives files which are ready to be recorded. This must be the logical root of your destination filesystem, which will be used when symlinks or library paths refer to '/'. If you want to move files into a subdirectory of the logical destination filesystem, see "rewrite_path". If you provide a coderef, the signature is

sub ($exporter, $file_attrs) { ... }

dst_abs

The abs_path of the root of the destination filesystem, always ending with '/'. This is only defined if dst is not a coderef.

dst_userdb

An instance of Sys::Export::Unix::UserDB. This attribute is undef until it is needed, unless you specified it to the constructor. See "USER REMAPPING" for details.

tmp

The abs_path of a directory to use for temporary staging before renaming into "dst". This must be in the same volume as dst so that rename() can be used to move temporary files into their dst location.

src_path_set

A hashref of all source paths which have been processed, and which destination path they were written as. All paths are logically absolute to their respective roots, but without a leading slash.

dst_path_set

A hashref of all destination paths which have been created (as keys). If the value of the key is defined, it is the source path. If not defined, it means the destination was created without reference to a source path.

dst_uid_used

The set of numeric user IDs which have been written to dst.

dst_gid_used

The set of numeric group IDs which have been written to dst.

path_rewrite_regex

A regex that matches the longest prefix of a source path having a rewrite rule.

on_collision

Specifies what to do if there is a name collision in the destination. The default (undef) causes an exception unless the existing file is identical to the one that would be written.

Setting this to 'overwrite' will unconditionally replace files as it runs. Setting it to 'ignore' will silently ignore collisions and leave the existing file in place. Setting it to a coderef will provide you with the path and content that was about to be written to it:

$exporter->on_collision(sub ($dst_path, $fileinfo) {
  # dst_path is the relative-to-dst-root path about to be written
  # fileinfo is the hash of file attributes passed to ->add
  return $action; # 'ignore' or 'overwrite' or 'ignore_if_same'
}

log

$exporter->log('info');
$exporter->log($logger);

Set the logging output object, or log level for 'print' output.

METHODS

rewrite_path

$exporter->rewrite_path($src_prefix, $dst_prefix);

Add a path rewrite rule which replaces occurrences of $src_prefix with $dst_prefix. Only one rewrite occurs per path; they don't cascade. Path prefixes refer to the logical absolute path with the source root and destination root. You may specify these prefixes with or without the leading implied '/'.

Returns $exporter for chaining.

rewrite_user

$exporter->rewrite_user( $src_name_or_uid => $dst_name_or_uid );

If you rewrite from a UID to a UID, this doesn't consider any names, and does an efficient numeric remapping.

If src is a name, this instantiates "src_userdb" if it doesn't exist, and resolves the name (which must exist), then creates a numeric mapping.

If dst is a name, this instantiates "dst_userdb" if it doesn't exist, and resolves the name (which must exist, but gets auto-imported from src_userdb in the default configuration) then creates a numeric mapping.

rewrite_group

$exporter->rewrite_group( $local_name_or_gid => $exported_name_or_gid );

Same semantics as "rewrite_user" but for groups.

add

$exporter->add($src_path, ...);
$exporter->add(\%file_attrs, ...);
$exporter->add([ $name, $mode, $mode_specific_data, \%other_attrs ]);

Add one or more source paths (relative to /src) or full file specifications to the export. This immediately copies the file to the destination, also triggering a copy of any interpreters or libraries it depends on which weren't already added.

Any item with a src_path attribute will be translated according to "rewrite_path", "rewrite_user", and "rewrite_group". This includes generating the 'name' attribute and also rewriting the contents of files and symlinks. If it is missing attributes, they will be filled-in with a call to lstat.

Any item without a src_path is assumed to be already rewritten by the user, and must specify at least attributes name and mode.

The file attributes are:

name            # destination path relative to destination root
src_path        # source path relative to source root, no leading '/'
data            # literal data content of file (must be bytes, not unicode)
data_path       # absolute path of file to load 'data' from, not limited to src dir
dev             # device of origin, as per lstat
dev_major       # major(dev), if you know it and don't know 'dev'
dev_minor       # minor(dev), if you know it and don't know 'dev'
ino             # inode, from stat.  used with 'dev' for hardlink tracking
mode            # permissions and type, as per stat
nlink           # number of hard links
uid             # user id
gid             # group id
rdev            # referenced device, for device nodes
rdev_major      # major(rdev), if you know it and don't know 'rdev'
rdev_minor      # minor(rdev), if you know it and don't know 'rdev'
size            # size, in bytes.  Can be ommitted if 'data' is present
mtime           # modification time, as per stat

You can also use the array notation described in "expand_file_stat_array" in Sys::Export. Array-notation provides a name attribute rather than a src_path, so those do no get rewritten.

Returns $exporter for chaining.

src_find

This is a helper function to build lists of source files. It iterates the "src" tree from a given subdirectory, passing each entry to a coderef filter.

@hashrefs= $exporter->src_find(@paths);
@hashrefs= $exporter->src_find($filter, @paths);
@hashrefs= $exporter->src_find(@paths, $filter);

The filter can be a coderef or Regexp-ref. Any other type is considered a path. The filter function runs in the following environment:

$_

the absolute path of the source file

_

the result of lstat on the absolute path of the source file (allowing file tests like -d or -f or -s without running a new stat() call)

$_[0]

the hashref of stat attributes that will be returned by this function if the filter returns true

The callback should return a boolean of whether to include the file in the result. If it returns false for a directory, the directory will still be traversed. If you want to prune a directory tree from being processed, set $_[0]{prune} to a true value before returning.

For a Regexp-ref, you are matching against the full absolute path within "src". If you want a regex to only apply to the relative path of a file, just write it as a sub like

sub { $_[0]{src_path} =~ /pattern/ }

skip

$exporter->skip(@paths);
$exporter->skip({ src_path => $path, ... });

Inform the exporter that it should *not* perform any actions for the specified source path, presumably because you're handling that one specially in some other way.

You may pass hashrefs generated by "src_find", which will include a src_path field.

finish

Apply any postponed changes to the destination filesystem. For instance, this applies mtimes to directories since writing the contents of the directory would have changed the mtime.

get_dst_for_src

my $dst_path= $exporter->get_dst_for_src($src_path);

Returns the relative destination path for a relative source path, rewritten according to the rewrite rules. If no rewrites exist, this just returns $src_path.

get_dst_uid_gid

($uid, $gid)= $exporter->get_dst_uid_gid($uid, $gid);

Given a source uid and gid, return the destination uid and gid. See "USER REMAPPING" for details.

This is the same routine used after every stat on the source filesystem to compute the uid/gid written to dst.

USER REMAPPING

This module tries to be helpful with rewriting UID/GID from your source filesystem to the destination filesystem, but also stay out of your way if you don't need that feature. In the simplest case, you are building an initrd from an environment with the same user database as your final system image and UID/GID can be copied as-is. In other cases, you might be pulling files from Alpine to be used for an initrd that starts a Debian system, and need to map ownership by name instead of number.

The basic rule is that name-based mapping is enabled or disabled by whether attribute "dst_userdb" is defined or not. If you pass that as an initial constructor attribute, then name-based mapping is enabled from the start. If you request a destination name in a call to "rewrite_user" or "rewrite_group", they will automatically instantiate dst_userdb. However, you can also perform ID remapping without name databases. If every call to rewrite_user and rewrite_group exclusively use numbers, then the numeric mapping is handled without triggering dst_userdb to be created.

If name mapping is enabled, then "src_userdb" must also be defined. If you don't initialize it, it will be automatically instantiated from $src/etc/passwd, falling back to the users of the host system via getpwnam etc.

Name Mapping Behavior

Any time a new not-yet-mapped ID is encountered, it checks the src_userdb to find out what name is associated with that ID. If not found, it may import it from getpwnam/getgrnam. If still not found, it dies. Then it checks for any name-baased rewrites to determine what name to look for in dst_userdb, defaulting to the same name as src_userdb. If dst_userdb doesn't have that name yet, the user is copied from src_userdb, but croaks if the UID/GID would conflict with another entry in dst_userdb. Once the src UID/GID and dst UID/GID are both known, it adds those to the numeric mapping, so further name lookups are not needed for that source ID.

VERSION

version 0.003

AUTHOR

Michael Conrad <mike@nrdvana.net>

COPYRIGHT AND LICENSE

This software is copyright (c) 2025 by Michael Conrad.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.