NAME
DataStore::CAS::FS::InvalidUTF8 - Wrapper to represent non-utf8 data in a unicode context
VERSION
version 0.0101_00
SYNOPSIS
my $j= JSON->new()->convert_blessed;
DataStore::CAS::FS::InvalidUTF8->add_json_filter($j);
my $x= DataStore::CAS::FS::InvalidUTF8->decode_utf8("\x{FF}");
my $json= $j->encode($x);
my $x2= "".$j->decode($json);
is( $x, $x2 );
ok( !utf8::is_utf8($x2) );
DESCRIPTION
Much like using 'i' (or j) as the square root of -1, InvalidUTF8 allows a value which should have been utf-8, but isn't, to exist alongside the others.
Combining InvalidUTF8 parts to make a valid utf-8 string will automatically decode the utf-8 into the resulting unicode string.
Comparing InvalidUTF8 with a regular perl string will first convert the string to a UTF-8 representation, and then do a byte-wise comparison.
InvalidUTF8 can also safely pass through JSON, if the filter is added to the JSON decoder, and "allow_blessed" is set on the encoder.
METHODS
decode_utf8
$string_or_ref= $class->decode_utf8( $byte_str )
If the $byte_str is valid UTF-8, this method returns the decoded perl unicode string. If not, it returns the string wrapped in an instance of InvalidUTF8.
is_non_unicode
This method returns true, and can be used in tests like
if ($_->can('is_non_unicode')) { ... }
as a way of detecting InvalidUTF8 objects by API rather than class hierarchy.
to_string, '""' operator
Returns the original string.
str_compare, 'cmp' operator
Converts peer to utf-8 bytes, then compares the bytes.
str_concat, '.' operator
Converts the peer to utf-8 bytes, concatenates the bytes, and then re-evaluates whether the result needs to be wrapped in an instance of InvalidUTF8.
add_json_filter
$json_instance= $class->add_json_filter($json_instance);
Applies a filter to the JSON object so that when it encounters
{ "*InvalidUTF8*": "$string" }
it will inflate the string using the FROM_JSON method.
TO_JSON
Called by the JSON module when convert_blessed is enabled. Returns
{ "*InvalidUTF8*" => $original_str }
which can be converted back to a InvalidUTF8 object during decode_json, if the filter is applied.
FROM_JSON
Pass this function to JSON's filter_json_single_key_object with a key of *InvalidUTF8*
to restore the objects that were serialized. It takes care of calling utf8::downgrade to undo the JSON module's unicode conversion.
AUTHOR
Michael Conrad <mconrad@intellitree.com>
COPYRIGHT AND LICENSE
This software is copyright (c) 2013 by Michael Conrad, and IntelliTree Solutions llc.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.