NAME

String::PictureFormat - Functions to format and unformat strings based on a "Picture" format string

AUTHOR

Jim Turner

(c) 2015, Jim Turner under the same license that Perl 5 itself is. All rights reserved.

SYNOPSIS

use String::PictureFormat;

$_ = fmt('@"...-..-...."', 123456789); print "-formatted=$_=\n"; #RETURNS "123-45-6789".

$_ = unfmt('@"...-..-...."', '123-45-6789'); print "-unformatted=$_=\n"; #RETURNS "123456789".

$_ = fmt('@$,12.2>', 123456789); print "-formatted=$_= \n"; #RETURNS " $123,456,789.00".

$_ = fmtsiz('@$,12.2>'); print "-format size=$_= \n"; #RETURNS 18.

$_ = fmt('@$,12.2> CR', -123456789); print "-formatted=$_=\n"; #RETURNS " $123,456,789.00 CR".

$_ = fmt('@$,12.2> CR', 123456789); print "-formatted=$_=\n"; #RETURNS " $123,456,789.00 ".

$_ = fmt('@$,12.2>', -123456789); print "-formatted=$_=\n"; #RETURNS " $-123,456,789.00".

$_ = fmt('@-$,12.2>', -123456789); print "-formatted=$_=\n"; #RETURNS " -$123,456,789.00".

$_ = fmt('@$(,12.2>)', -123456789); print "-formatted=$_=\n"; #RETURNS " $(123,456,789.00)".

$_ = fmt('=16<', 'Now is the time for all good men to come to the aid of their country'); print "-s=".join('|',@{$s})."=\n"; #RETURNS "Now is the time |for all good men |to come to the aid|of their country =".

sub foo { (my $data = shift) =~ tr/a-z/A-Z/; return $data; } ... $_ = fmt('@foo()', 'Now is the time for all'); print "-formatted=$_=\n"; #RETURNS "NOW IS THE TIME FOR ALL"

$_ = fmt('@tr/aeiou/AEIOU/', 'Now is the time for all'); print "-formatted=$_=\n"; #RETURNS "NOw Is thE tImE fOr All"

DESCRIPTION

String::PictureFormat provides functions to format and unformat character strings according to separate format strings made up of special characters. Typical usage includes left and right justification, centering, floating dollar signs, adding commas to numbers, formatting phone numbers, Social-Security numbers, converting negative numbers to accounting notations, creating text files containing tables of data in fixed-column format, etc.

EXAMPLES

See SYNOPSIS

FORMAT STRINGS

Format strings consist of special characters, explained in detail below. Each format string begins with one of the following characters: "@", "=", or "%". "@" indicates a standard format string that can be any one of several different formats as described in detail below. "=" indicates a format that will "wrap" the text to be formatted into multiple rows. "%" indicates a standard C-language "printf" format string.

"@"-format strings:

The standard format strings that begin with an "@" sign can be in one of the following formats:

1) @"literal-picture-string" or @'literal-picture-string' or @/literal-picture-string/ or @`literal-picture-string`

    This format does a character-by-character converstion of the data. They can be escaped with "\" to include as literals, if needed. The special characters are:

    "." - return the next character in the data. "^" - skip the next character in the data. "+" - return all remaining characters in the string.

    For example, to convert an integer number to a phone number with area code, one could do:

    my $ph = fmt('@"(...) ...-.+"', '1234567890 x101'); print "-phone# $ph\n"; #-phone# (123) 456-7890 x101

    Or, to format a social security number and return a string of asterisks if it is too long:

    my $ss = fmt('@"...-..-...."', '123456789', {-truncate => 'error'}); print "-ssn: $ss\n" #-ssn: 123-45-6789

    Now suppose you had part numbers where the 3rd character was a letter and the rest were digits and you want only the digits, you could do:

    my $partseq = fmt('@"..^.+"', '12N345'); print "-part# $partseq\n" #-part# 12345

2) @justification-string

    This consists of the special characters "<", "|", and ">", with optional numbers preceeding them to indicate repetition, an optional decimal point, an optional prefix of "floating" characters, and / or an optional suffix of literal characters. Each of the first three characters shown above represent a single character of data to be returned and correspond to "left-justify", "center", or "right-justify" the data returned. For example, the most basic format is:

    my $str = fmt('@>>>>>>>>>', 'Howdy'); print "-formatted=$str=\n"; #-formatted= Howdy=

    This returns a 10-character string right-justified (note that the "@" sign counts as one of the characters representing the size of the field). This could've also been abbreviated as:

    my $str = fmt('@9>', 'Howdy');

    You can mix and match the three special characters, but the first one determines justification. The only exception to this is if a decimal point is provided and the data is numeric. In that case, if ">" is used after the decimal point, trailing decimal places will be rounded and removed if necessary to get the string to fit, otherwise, either asterisks are returned if it won't fit and the "-truncate => 'error'" option is specified. The decimal point is explicit, not implied. This means that a number will be returned as that value with any excess decimal places removed or zeros added to format it to the given format. For example:

    fmt('@6.2>', 123.456) will return " 123.46" (ten characters wide, right justified with two decimal places). The total width is ten, due to the fact that there are 6 digits left of the decimal + 2 decimal places + the decimal point + the "@" sign = 10. The full format could've been given as "@>>>>>>.>>".

    Characters between the "@" sign and the first justification character are considered "floating" characters and anything after the last one is a literal suffix. The main uses for the suffix is to specify negative numbers in accounting format. Here's some examples:

    fmt('@$6.2>', 123.456) will return " $123.45" (eleven characters wide with a floating "$"- sign. The field width is eleven instead of ten due to a space being provided for the floating character.

    Commas are a special floating character, as they will be added to large numbers automatically as needed, if specified. Consider:

    fmt('@$,8.2>', 1234567) will return " $1,234,567.00". Fifteen characters are returned: 9 for the whole number, 1 for the decimal point, 2 decimal places, the "@" sign, the "$" sign, and one for each "," added.

    There are several ways to format egative numbers. For example, the default is to just leave the negative number sign intact. In the case above, the result would've been: " $-1,234,567.00". This could be changed to " -$1,234,567.00" by including the "-" sign as a float character before the floating "$" sign, ie. fmt('@-$,8.2>', 1234567). Note that the string is now sixteen characters long with the addition of another float character. Also note that had the number been positive, the "-" would've been omitted automatically from the returned result! You can force a sign to be displayed (either "+" or "-" depending on whether the input data is a positive or negative number) by using a floating "+" instead of the floating "-".

    If you are formatting numbers for accounting or tax purposes, there are special float and suffix characters for that too. For examples:

    fmt('@$,8.2>CR', -123456.7) will return " $123,456.70CR". The "CR" is replaced by " " if the input data is zero or positive. To get a space between the number and the "CR", simply add a space to the suffix, ie. "@$,8.2> CR".

    Another common accounting format is parenthesis to indicate negative numbers. This is accomplished by combining the special float character "(" with a suffix that starts with a ")". For example:

    fmt('@($,8.2>)', -123456.7) will return " ($123,456.70)". The parenthesis will be replaced by spaces if the number is zero or positive. However, the space in lieu of the "(" may instead be replaced by an extra digit if the number is large and just barely fits. If one desires to have the "$" sign before the parenthesis, simply do "fmt('@$(,8.2>)', -123456.7)" instead! Note that "+" and "-" should not be floated when using parenthesis or "CR" notation.

    Since floating characters, particularly floating commas, and negative numbers can increase the width of the returned value causing variations in width; if you are needing to create columns of fixed width, an absolute width size can be specified (along with the "{-truncate => 'error'}" option. This is given as a numeric value followed by a colon immediately following the "@" sign, for example:

    fmt('@16:($,8.2>)', -123456.7, {-truncate => 'error'})

    This forces the returned value to be either 16 characters right-justified or 16 "*"'s to be returned. You should be careful to anticipate the maximum size of your data including any floating characters to be added.

3) @^date/time-picture-string[^data-picture-string]^ (Date / Time Conversions):

    This format does a character-by-character converstion of date / time data based on certain substrings of special characters. The list of special character strings are described in Date::Time2fmtstr. If this optional module is not installed, then the following are available:

    yyyy - Year in 4 digits.

    yy, rr - Year in last 2 digits.

    mm - Number of month (2 digits, left padded with a zero if needed), ie. "01" for January.

    dd - Day of month (2 digits, left padded with a zero if needed), ie. "01".

    HH, hh - Hour in 24-hour format, 2 digits, left padded with a zero if needed, ie. 00-23.

    mi - Minute, ie. 00-59.

    ss - Seconds since start of last minute (2 digits), ie. 00-59.

    A valid date string will be formatted / unformatted based on the format-string. If Date::Fmtstr2time and Date::Time2fmtstr are installed, the "valid date string" being processed by fmt() can be, and the output produced by unfmt() will be a Perl/Unix time integer. Otherwise, the other valid data strings processed by fmt() are "yyyymmdd[ hhmmss]", "mm-dd-yyyy [hh:mm:ss]", etc. unfmt() will return "yyyymmdd[ hhmm[ss]" unless Date::Time2fmtstr is installed, in which case, it returns a Perl/Unix time integer. This can be changed specifying either -outfmt or a data-picture-string. NOTE: It is highly recommended that both of these modules be installed if formatting or unformatting date / time values, as the manual workarounds used do not always produce desired results.

    Examples:

    fmt('@^mm-dd-yy^, 20150108) will return "01-08-15".

    fmt('@^mm-dd-yy hh:mi^, '01-08-2015 10:25') will return "01-08-15 10:25".

    fmt('@^mm-dd-yy^, '2015/01/08') will return "01-08-15".

    fmt('@^mm-dd-yy^, 1420781025) will return "01-08-15", if Date::Time2fmtstr is installed.

    unfmt('@^mm-dd-yy^, '01-08-15') will return "20150108" unless Date::Fmtstr2time is installed, in which case it will return 1420696800 (equivalent to "2015/01/08 00:00:00".

    unfmt('@^mm-dd-yy^, '01-08-15', {-outfmt => 'yyyymmdd'}) will always return "20150108", if Date::Time2fmtstr is also installed.

    unfmt('@^mm-dd-yy^yyyymmdd^, '01-08-15') works the same way, always returning "20150108", if Date::Time2fmtstr is also installed.

    NOTE: If using unfmt() with either a data-picture-string or -outfmt is specified, and Date::Time2fmtstr is not installed, then data-picture-string or -outfmt must be set to "yyyymmdd[hhmm[ss]]" or it will fail.

4) Regex substitution:

    This format specifies a Perl "regular expression" to perform in the input data and outputs the result. For example:

    $s = fmt('@s/[aeiou]/\[VOWEL\]/ig;', 'Now is the time for all'); would return: "N[VOWEL]w [VOWEL]s th[VOWEL] t[VOWEL]m[VOWEL] f[VOWEL]r [VOWEL]ll".

    The new string is returned as-is regardless of length. To truncate it to a maximum fixed length, specify a length constraint. You can also specify the "-truncate => 'error' option to return a row of "*" of that length if the resulting string is longer, ie: $s = fmt('@50:s/[aeiou]/\[VOWEL\]/ig;', 'Now is the time for all', {-truncate => 'error'});

    Perl's Translate (tr) function is also supported, ie:

    $s = fmt('@tr/aeiou/AEIOU/', 'Now is the time for all'); would return "NOw Is thE tImE fOr All".

5) User-supplied functions:

    You can write your own custum translate function for full control over the data translation. You can also supply any arguments to it that you wish, however two special ones are provided for your use: "*" and "#". If you do not pass any parameters to the function, then it will be called with "(*,#)". "*" represents the input data string and "#" represents the maximum length to be returned (if not specified, it is zero, which means the returned string may be any length. For example:

    $s = fmt('@foo', 'Now is the time for all'); print "-s=$s=\n"; ... sub foo { my ($data, $maxlength) = @_; print "-max. length=$maxlength= just=$just= data in=$data=\n"; $data =~ tr/a-z/A-Z/; return $data; }

    This would return "NOW IS THE TIME FOR ALL". This is the same as: $s = fmt('@foo(*,#)', 'Now is the time for all');

    To call a function with just the $data parameter, do:

    $s = fmt('@foo(*)', 'Now is the time for all');

    To specify a maximum length, say "50" do:

    $s = fmt('@50:foo', 'Now is the time for all', {-truncate => 'error'});

    To append a suffix string ("suffix" in the example, not counted in the max. length) do:

    $s = fmt('@foo()suffix', 'Now is the time for all');

    which would return "NOW IS THE TIME FOR ALLsuffix".

"="-format strings:

These specify text "wrapping" for long strings of characters. Data can be wrapped at either character or word boundaries. The default is to wrap by word. Consider:

$s = fmt('=15<', 'Now is the time for all good men to come to the aid of their country'); print "-s=".join('|',@{$s})."=\n";

This will print: "-s=Now is the time |for all good men|to come to the |aid of their |country " The function returned the data as a reference to an array, each element containing a "row" or "line" of 16 characters of data broken on the nearest "word boundary" and left-justified. Each "row" is right-padded with spaces to bring it to 16 characters (the "=" sign plus the "15" represents a row width of 16 characters. I use "|" to show the boundary between each row/line.

$s = fmt('=15>', 'Now is the time for all good men to come to the aid of their country'); would've returned (right-justified): " Now is the time|for all good men| to come to the| aid of their| country"

$s = fmt('=15|', 'Now is the time for all good men to come to the aid of their country'); would've returned (centered): " Now is the time|for all good men| to come to the | aid of their | country "

To specify simple character wrapping (spaces remain intact), one can add "w" to the format string like so:

$s = fmt('=w14<', 'Now is the time for all good men to come to the aid of their country'); This would return: "Now is the time |for all good men| to come to the |aid of their cou|ntry " NOTE: The change of "15" to "14". This is due to the fact that the "w" adds one to the row "size"!

With "w" (character wrapping), justification is pretty meaningless since each row (except the last) will always contain the full number of characters with spaces as-is (no spaces added). However, the last row will be affected if spaces have to be added to fill it out. To get the string represented "properly", it's usually best to use "<" (left- justification).

The default is "word" wrapping, so a format string of "=15<" is the same as "=W14<".

"%" (C-language) format strings:

You can specify a C/Perl language "printf" format string by preceeding it with a "%" sign. For example:

fmt('%-12.2d', -1234);

returns "-1234 "

There is the added capability of floating "$" sign and commas. For example:

fmt('%$,12.2f', -1234) returns " $-1,234.00". Note the width is 14 instead of 12 characters, since the two floating characters add to the width of the final results. The "$" sign and "," are the only floating character options.

METHODS

<$scalar> || <@array> = fmt(format-string, data-string [, ops ]);

Returns either a formatted string (scalar) or an array of values. The <format-string> is applied to the <data-string> to convert it to a new format (see the myriad of examples in this documentation). If the specified return value is in ARRAY context, the elements are:

[0] - The string or array reference returned in the scalar context ("wrap" formats return an array reference, and all others return a string).

[1] - The length (integer) of the data formatted - note that this is not always the actual length of the returned data. It represents the maximum "format length", which is the max. no. of characters the format can return. If the format is open-ended, ie. if the last character in a fixed format is "+", or the length is indeterminate, it will return zero. For "wrap" formats, it is the no. of characters in a row. If a max. length specifier is given (ie. "@50:..."), then this value is returned.

[2] - The justification (either "<", "|", ">", or "", if no justification is involved).

format-string is the format string (required).

data-string is the data to be formatted (required).

ops is an optional hash-reference representing additional options. The currently valid options are:

    -bad => '<char>' (default '*') - The character to fill the output string if the output string exceeds the specified maximum length and <-truncate> => 'error' is specified.

    -infmt => format-string (default '') - Alternate format to expect the incoming data to be in. If a data-picture-string, it overrides this option. If specified, in a fmt() call, it causes input data to be read in in this format layout (before being formatted by the format-string) and returned. Otherwise (if neither this option nor a data-picture-string is specified), the data can be in a variety of layouts that fmt() can recognize. This option is not particularly useful except for some additional error-checking, and generally need not be used.

    NOTE: If this option is specified, and Date::Fmtstr2time is not installed, then it must be set to "yyyymmdd[hhmm[ss]]" or the format will fail.

    -nonnumeric => true | false (default false or 0) - whether or not to ignore "numeric"-specific formatting, ie. adding commas, sign indicators, decimal places, etc. even if the data is "numeric".

    -outfmt => format-string (default '') - Alternate format to return the "unformatted" result in. If a data-picture-string, it overrides this option. If specified in a unfmt() call, it causes the result to be formatted according to this format (after being unformatted by the format-string) and returned. Otherwise (if not specified), the result is returned as a Perl / Unix Time integer (if Date::Fmtstr2time is installed) or in "yyyymmdd[hhmm[ss]]" format if not.

    NOTE: If this option is specified, and Date::Time2fmtstr is not installed, then it must be set to "yyyymmdd[hhmm[ss]]" or the unformat will fail.

    -sizefixed => true | false (default false or 0) - If true, prevents expansion of certain numeric formats when the number is positive or more than one comma is added. What it actually does is set the format size to be fixed to the value returned by fmtsiz() for the specified format-string. This ensures that the format size will be the same reguardless of what value is passed to it.

    -suffix => '[yes]' | 'no' (default yes) - If 'no', then any suffix string is ignored (not appended) when formatting and not removed when unformatting. Specifying anything but "no" implies the default of yes.

    -truncate => '[yes]' | 'no' | 'er[ror]' - Whether or not to truncate output data that exceeds the maximum width. The default is 'yes'. Specifying 'no' means return the entire output string regardless of length. 'er', 'err', 'error', etc. means return a row of asterisks (changable by -bad). If the string does not begin with "no" or "er", it is assumed to be "yes".

<$scalar> || <@array> = unfmt(format-string, data-string [, ops ]);

For the most part, this is the opposite of the fmt() function. It takes a string and attempts to "undo" the format and return the data as close as possible to what the input data string would've looked like before the <format-string> was applied by assuming that the input <data-string> is the result of having previously had that <format-string> applied to it by fmt(). It is not always possible to exactly undo the format, consider:

my $partseq = fmt('@"..^.+"', '12N345'); my $partno = unfmt('@"..^.+"', $partseq);

would return "12 345", since the original format IGNORED the third character "N" in the original string. Since this is unknown, unfmt() interprets "^" as insert a space character. Careful use of unfmt() can often produce desired results. For example:

$s = fmt('@$,10.2> CR', '-1234567.89'); print "-s4 formatted=$s=\n"; # $s =" $1,234,567.89 CR" $s = unfmt('@$,10.2> CR', $s); print "-s4 unformatted=$s=\n"; # $s ="-1234567.89" (The original number)

<$integer> = fmtsiz(format-string);

Returns the format "size" represented by the <format-string>, just like the second element of the array returned by fmt() in array context, see above. If a maximum length specifier is given, it returns that. Otherwise, attempts to determine the length of the data string that would be returned by applying the format. For "wrap" formats, this is the length of a single row. For regular expressions and user-supplied functions, it is zero (indeterminate).

<$character> = fmtjust(format-string);

Returns a character indicating the justification (if any) represented by the specified <format-string>, just like the third element of the array returned by fmt() in array context, see above. The result can be either "<", ">", "|", or "", if not determinable.

<$integer> = fmtsuffix(format-string, data-string [, ops ]);

Returns the "suffix" string, if any, included in the <format-string>.

KEYWORDS

formatting, picture_clause, strings