NAME

File::AtomicWrite - writes files atomically via rename()

SYNOPSIS

use File::AtomicWrite ();

# oneshot: requires filename and all the input data
# (as a filehandle or scalar ref)
File::AtomicWrite->write_file(
    {   file  => 'data.dat',
        input => $filehandle,
    }
);
# how paranoid are you?
File::AtomicWrite->write_file(
    {   file     => '/etc/passwd',
        input    => \$scalarref,
        CHECKSUM => 1,
        min_size => 100,
    }
);

# instance interface: use to stream data or to have
# custom signal handlers
use Digest::SHA1;
my $aw = File::AtomicWrite->new(
    {   file     => 'name',
        min_size => 1,
        ...
    }
);
my $digest   = Digest::SHA1->new;
my $tmp_fh   = $aw->fh;
my $tmp_file = $aw->filename;
print $tmp_fh ...;
$digest->add(...);
$aw->checksum( $digest->hexdigest )->commit;

DESCRIPTION

This module offers atomic file writes via a temporary file created in the same directory (and therefore probably the same partition) as the specified file. After data has been written to the temporary file, the rename system call is used to replace the target file. The module optionally supports various sanity checks (min_size, CHECKSUM) that help ensure the data is written without errors.

Should anything go awry, the module will die or croak. All calls should be wrapped in eval blocks or better yet Try::Tiny.

eval { File::AtomicWrite->write_file(...) };
if ($@) { die "uh oh: $@" }

The module attempts to flush and sync the temporary filehandle prior to the rename call. This may cause portability problems. If so, please let the author know. Also notify the author if false positives from the close call are observed.

CLASS METHODS

write_file options hash reference

Requires a hash reference that must contain both the input and file options. Performs the various required steps in a single method call. Only if all checks pass will the input data be moved to the file file via rename. If not, the module will throw an error and attempt to cleanup any temporary files created.

See "OPTIONS" for additional settings that can be passed to write_file.

write_file installs local signal handlers for INT, TERM, and __DIE__ to try to cleanup any active temporary files if the process is killed or dies. If these are a problem instead use the OO interface and setup signal handlers as necessary.

safe_level safe_level value

Method to customize the File::Temp module safe_level value. Consult the File::Temp documentation for more information on this option.

Can also be set via the safe_level option.

set_template File::Temp template

Method to customize the default File::Temp template used when creating temporary files. NOTE: if customized, the template must contain a sufficient number of X that suffix the template string, as otherwise File::Temp will throw an error:

template => "mytmp.X",          # Wrong
template => "mytmp.XXXXXXXXXX", # better

Can also be set via the template option.

new options hash reference

Takes most of the same options as write_file and returns an object, notably not input on the presumption that the temporary file or file handle will be used by other code to write the file. Sanity checks are deferred until the commit method is called. The checksum method call with a suitable argument is required for that verification to pass.

If a rollback is required undef the File::AtomicWrite object; the object destructor should then unlink the temporary file. However, should the process receive a TERM, INT, or other signal that causes the script to exit the temporary file will not be cleaned up. If this is undesirable, a signal handler must be installed:

my $aw = File::AtomicWrite->new({file => 'somefile'});
for my $sig_name (qw/INT TERM/) {
    $SIG{$sig_name} = sub { exit }
}
...

Consult perlipc(1) for more information on signal handling, and the eg/cleanup-test program under this module distribution. A __DIE__ signal handler may also be necessary, consult the die perlfunc documentation for details.

Instances must not be reused; create a new instance instead of calling new again on an existing instance. Reuse may cause undefined behavior or other unexpected problems.

INSTANCE METHODS

fh

Returns the temporary filehandle.

filename

Returns the file name of the temporary file.

checksum SHA1 hexdigest

Takes a single argument that must contain the Digest::SHA1 hexdigest of the data written to the temporary file. Enables the CHECKSUM option.

commit

Call this method once finished with the temporary file. A number of sanity checks (if enabled via the appropriate "OPTIONS") will be performed. If these pass, the temporary file will be renamed to the real filename.

No subsequent use of the instance should be made after calling this method as this would lead to undefined behavior.

OPTIONS

The write_file and new methods accept a number of options, supplied via a hash reference. Mandatory options:

file => filename

A filename in the current working directory, or a path to the file that will (eventually) be created. By default, the temporary file will be written into the parent directory of the file path. This default can be changed by using the tmpdir option.

If the MKPATH option is true, the module will attempt to create any missing directories. If the MKPATH option is false or not set, the module will throw an error should any parent directories of the file not exist.

input => scalar ref or filehandle

Mandatory for the write_file method, illegal for the new method. Scalar reference, or otherwise some filehandle reference that can be looped over via readline. Supplies the data to be written to file.

Optional options:

backup => suffix

Make a backup with this (non-empty) suffix. The backup is always created, even if there was no change. If a previous backup existed, it is deleted first. Usual throwing of error.

BINMODE => true or false

If true, binmode is set on the temporary filehandle prior to writing the input data to it. Default is not to set binmode.

binmode_layer => LAYER

Supply a LAYER argument to binmode. Enables BINMODE.

# just binmode (binary data)
...->write_file({ ..., BINMODE => 1 });

# custom binmode layer
...->write_file({ ..., binmode_layer => ':utf8' });
checksum => sha1 hexdigest

If this option exists, and CHECKSUM is true, the module will not create a Digest::SHA1 hexdigest of the data being written out to disk, but instead will rely on the value passed by the caller.

Only for the write_file interface; instead call the checksum method to supply a hexdigest checksum of the data written when using the instance interface; see the "SYNOPSIS" for an example of this.

CHECKSUM => true or false

If true, Digest::SHA1 will be used to checksum the data read back from the disk against the checksum derived from the data written out to the temporary file.

Use the checksum option (or checksum method) to supply a Digest::SHA1 hexdigest checksum. This will spare the module the task of computing the checksum on the data being written.

Only for the write_file interface.

min_size => size

Specify a minimum size (in bytes) that the data written must exceed. If not, the module throws an error. (It was a process that wrote out a zero-sized /etc/passwd file that prompted the creation of this module.)

MKPATH => true or false

If true, attempt to create the parent directories of file should that directory not exist. If false (or unset), and the parent directory does not exist, the module throws an error. If the directory cannot be created, the module throws an error.

If true, this option will also attempt to create the tmpdir directory, if that option is set.

mode => unix mode

Accepts a Unix mode for chmod to be applied to the file. Usual throwing of error. If the mode is a string starting with 0, oct is used to convert it:

my $orig_mode = (stat $source_file)[2] & 07777;
...->write_file({ ..., mode => $orig_mode });

my $mode = '0644';
...->write_file({ ..., mode => $mode });

The module does not change umask, nor is there a means to specify the permissions on directories created if MKPATH is set.

mtime => mtime

Accepts mtime timestamp for utime to be applied to the file. Usual throwing of error.

owner => unix ownership string

Accepts similar arguments to chown(1) to be applied via chown to the file. Usual throwing of error.

...->write_file({ ..., owner => '0'   });
...->write_file({ ..., owner => '0:0' });
...->write_file({ ..., owner => 'user:somegroup' });
safe_level => safe_level value

Optional means to set the File::Temp module safe_level value. Consult the File::Temp documentation for more information on this option.

This value can also be set via the safe_level class method.

template => File::Temp template

Template to supply to File::Temp. Defaults to a reasonable value if unset. NOTE: if customized, the template must contain a sufficient number of X that suffix the template string, as otherwise File::Temp will throw an error.

Can also be set via the set_template class method.

tmpdir => directory

If set to a directory, the temporary file will be written to this directory instead of by default to the parent directory of the target file. If the tmpdir is on a different partition than the parent directory for file, or if anything else goes awry, the module will throw an error: rename(2) does not operate across partition boundaries.

This option is advisable when writing files to include directories such as /etc/logrotate.d, as the programs that read include files from these directories may read even a temporary dot file while it is being written. To avoid this (slight but non-zero) risk, use the tmpdir option to write the configuration out in full under a different directory on the same partition.

BUGS

No known bugs (lots of potential issues, though, see below).

Reporting Bugs

http://github.com/thrig/File-AtomicWrite

Known Issues

See perlport for various portability problems possible with the rename call. Consult rename(2) or equivalent for caveats. Note however that rename(2) is used heavily by common programs such as mv(1) and rsync.

File hard links created by ln(1) will be broken by this module, as this module has no way of knowing whether any other files link to the inode of the file being operated on:

% touch afile
% ln afile afilehardlink
% ls -i afile*
3725607 afile         3725607 afilehardlink
% perl -MFile::AtomicWrite -e \
  'File::AtomicWrite->write_file({file =>"afile",input=>\"foo"})'
% ls -i afile*
3725622 afile         3725607 afilehardlink

Union or bind mounts might also be a problem, if what is actually some other filesystem is present between the temporary and final file locations.

Some filesystems may also require a fsync call on a filehandle of the directory containing the file (see fsync(2) on RHEL, for example), to ensure that the directory data also reaches disk, in addition to the contents of the file. Certain filesystem options may also need to be set, such as data=journal or data=ordered on ext3, so that any crashes or unexpected glitches have less chance of unanticipated problems (such as the file write being ordered after the rename).

Renames may strip fancy ACL or selinux contexts.

SEE ALSO

Supporting modules:

Digest::SHA1, File::Basename, File::Path, File::Temp

This isn't easy:

http://danluu.com/file-consistency/

https://homes.cs.washington.edu/~lijl/papers/ferrite-asplos16.pdf

https://unix.stackexchange.com/questions/464382

AUTHOR

thrig - Jeremy Mates (cpan:JMATES) <jmates at cpan.org>

mtime and other features contributed by Stijn De Weirdt.

COPYRIGHT

Copyright (C) 2009-2016,2018 Jeremy Mates

This program is distributed under the (Revised) BSD License: http://www.opensource.org/licenses/BSD-3-Clause