NAME

DataStore::CAS::FS::InvalidUTF8 - Wrapper to represent non-utf8 data in a unicode context

VERSION

version 0.010100_03

SYNOPSIS

my $j= JSON->new()->convert_blessed;
DataStore::CAS::FS::InvalidUTF8->add_json_filter($j);
my $x= DataStore::CAS::FS::InvalidUTF8->decode_utf8("\x{FF}");
my $json= $j->encode($x);
my $x2= "".$j->decode($json);
is( $x, $x2 );
ok( !utf8::is_utf8($x2) );

DESCRIPTION

Much like using 'i' (or j) as the square root of -1, InvalidUTF8 allows a value which should have been utf-8, but isn't, to exist alongside the others.

Combining InvalidUTF8 parts to make a valid utf-8 string will automatically decode the utf-8 into the resulting unicode string.

Comparing InvalidUTF8 with a regular perl string will first convert the string to a UTF-8 representation, and then do a byte-wise comparison.

InvalidUTF8 can also safely pass through JSON, if the filter is added to the JSON decoder, and "allow_blessed" is set on the encoder.

METHODS

decode_utf8

$string_or_ref= $class->decode_utf8( $byte_str )

If the $byte_str is valid UTF-8, this method returns the decoded perl unicode string. If not, it returns the string wrapped in an instance of InvalidUTF8.

is_non_unicode

This method returns true, and can be used in tests like

if ($_->can('is_non_unicode')) { ... }

as a way of detecting InvalidUTF8 objects by API rather than class hierarchy.

to_string, '""' operator

Returns the original string.

str_compare, 'cmp' operator

Converts peer to utf-8 bytes, then compares the bytes.

str_concat, '.' operator

Converts the peer to utf-8 bytes, concatenates the bytes, and then re-evaluates whether the result needs to be wrapped in an instance of InvalidUTF8.

add_json_filter

$json_instance= $class->add_json_filter($json_instance);

Applies a filter to the JSON object so that when it encounters

{ "*InvalidUTF8*": "$string" }

it will inflate the string using the FROM_JSON method.

TO_JSON

Called by the JSON module when convert_blessed is enabled. Returns

{ "*InvalidUTF8*" => $original_str }

which can be converted back to a InvalidUTF8 object during decode_json, if the filter is applied.

FROM_JSON

Pass this function to JSON's filter_json_single_key_object with a key of *InvalidUTF8* to restore the objects that were serialized. It takes care of calling utf8::downgrade to undo the JSON module's unicode conversion.

AUTHOR

Michael Conrad <mconrad@intellitree.com>

COPYRIGHT AND LICENSE

This software is copyright (c) 2013 by Michael Conrad, and IntelliTree Solutions llc.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.