NAME
String::Tagged
- string buffers with value tags on extents
SYNOPSIS
use String::Tagged;
my $st = String::Tagged->new( "An important message" );
$st->apply_tag( 3, 9, bold => 1 );
$st->iter_substr_nooverlap(
sub {
my ( $substring, %tags ) = @_;
print $tags{bold} ? "<b>$substring</b>"
: $substring;
}
);
DESCRIPTION
This module implements an object class, instances of which store a (mutable) string buffer that supports tags. A tag is a name/value pair that applies to some non-empty extent of the underlying string.
The types of tag names ought to be strings, or at least values that are well-behaved as strings, as the names will often be used as the keys in hashes or applied to the eq
operator.
The types of tag values are not restricted - any scalar will do. This could be a simple integer or string, ARRAY or HASH reference, or even a CODE reference containing an event handler of some kind.
Tags may be arbitrarily overlapped. Any given offset within the string has in effect, a set of uniquely named tags. Tags of different names are independent. For tags of the same name, only the the latest, shortest tag takes effect.
For example, consider a string with three tags represented here:
Here is my string with tags
[-------------------------] foo => 1
[-------] foo => 2
[---] bar => 3
Every character in this string has a tag named foo
. The value of this tag is 2 for the words my
and string
and the space inbetween, and 1 elsewhere. Additionally, the words is
and my
and the space between them also have the tag bar
with a value 3.
Since String::Tagged
does not understand the significance of the tag values it therefore cannot detect if two neighbouring tags really contain the same semantic idea. Consider the following string:
A string with words
[-------] type => "message"
[--------] type => "message"
This string contains two tags. String::Tagged
will treat this as two different tag values as far as iter_tags_nooverlap()
is concerned, even though get_tag_at()
yields the same value for the type
tag at any position in the string.
CONSTRUCTOR
$st = String::Tagged->new( $str )
Returns a new instance of a String::Tagged
object. It will contain no tags. If the optional $str
argument is supplied, the string buffer will be initialised from this value.
METHODS
$str = $st->str
"$st"
Returns the plain string contained within the object.
This method is also called for stringification; so the String::Tagged
object can be used in a plain string interpolation such as
my $message = String::Tagged->new( "Hello world" );
print "My message is $message\n";
$str = $st->substr( $start, $len )
Returns a substring of the plain string contained within the object.
$st->apply_tag( $start, $len, $name, $value )
Apply the named tag value to the given extent. The tag will start on the character at the $start
index, and continue for the next $len
characters.
If $start
is given as -1, the tag will be considered to start "before" the actual string. If $len
is given as -1, the tag will be considered to end "after" end of the actual string. These special limits are used by set_substr()
when deciding whether to move a tag boundary. The start of any tag that starts "before" the string is never moved, even if more text is inserted at the beginning. Similarly, a tag which ends "after" the end of the string, will continue to the end even if more text is appended.
$st->unapply_tag( $start, $len, $name )
Unapply the named tag value from the given extent. If the tag extends beyond this extent, then any partial fragment of the tag will be left in the string.
$st->delete_tag( $start, $len, $name )
Delete the named tag within the given extent. Entire tags are removed, even if they extend beyond this extent.
$st->iter_tags( $callback, %opts )
Iterate the tags stored in the string. For each tag, the CODE reference in $callback
is invoked once.
$callback->( $start, $length, $tagname, $tagvalue )
Options passed in %opts
may include:
- start => INT
-
Start at the given position; defaults to 0.
- end => INT
-
End after the given position; defaults to end of string. This option overrides
len
. - len => INT
-
End after the given length beyond the start position; defaults to end of string. This option only applies if
end
is not given.
$st->iter_tags_nooverlap( $callback, %opts )
Iterate non-overlapping extents of tags stored in the string. The CODE reference in $callback
is invoked for each extent in the string where no tags change. The entire set of tags active in that extent is given to the callback.
$callback->( $start, $length, %tags )
The callback will be invoked over the entire length of the string, including any extents with no tags applied.
Options may be passed in %opts
to control the range of the string iterated over, in the same way as the iter_tags()
method.
$st->iter_substr_nooverlap( $callback, %opts )
Iterate extents of the string in the same way as iter_tags_nooverlap()
, but passing the substring of data instead of the start position and length.
$callback->( $substr, %tags )
Options may be passed in %opts
to control the range of the string iterated over, in the same way as the iter_tags()
method.
@names = $st->tagnames
Returns the set of tag names used in the string, in no particular order.
$tags = $st->get_tags_at( $pos )
Returns a HASH reference of all the tag values active at the given position.
$value = $st->get_tag_at( $pos, $name )
Returns the value of the named tag at the given position, or undef
if the tag is not applied there.
$st->set_substr( $start, $len, $newstr )
Modifies a extent of the underlying plain string to that given. The extent of tags in the string are adjusted to cope with the modified region, and the adjustment in length.
Tags entirely before the replaced extent remain unchanged.
Tags entirely within the replaced extent are deleted.
Tags entirely after the replaced extent are moved by appropriate amount to ensure they still apply to the same characters as before.
Tags that start before and end after the extent remain, and have their lengths suitably adjusted.
Tags that span just the start or end of the extent, but not both, are truncated, so as to remove the part of the tag applied on the modified extent but preserving that applied outside.
$st->insert( $start, $newstr )
Insert the given string at the given position. A shortcut around set_substr()
.
$st->append( $newstr )
Append to the underlying plain string. A shortcut around set_substr()
.
$st->append_tagged( $newstr, %tags )
Append to the underlying plain string, and apply the given tags to the newly-inserted extent.
$ret = $st->debug_sprintf
Returns a representation of the string data and all the tags, suitable for debug printing or other similar use. This is a format such as is given in the DESCRIPTION section above.
The output will consist of a number of lines, the first containing the plain underlying string, then one line per tag. The line shows the extent of the tag given by [---]
markers, or a |
in the special case of a tag covering only a single character. Special markings of <
and >
indicate tags which are "before" or "after" anchored.
For example:
Hello, world
[---] word => 1
<[----------]> everywhere => 1
| space => 1
TODO
There are likely variations on the rules for
set_substr()
that could equally apply to some uses of tagged strings. Consider whether the behaviour of modification is chosen per-method, per-tag, or per-string.Ways in which the application might want to merge neighbouring tag values that happen to be equal. Consider the case in the description. Maybe a method like:
$st->merge_tags( $cmp_func ) $equal = $cmp_func->( $name, $value_a, $value_b )
To merge two neighbouring tags of the same name if the
$cmp_func
returns true.Consider if an
String::Tagged::Extent
object needs to be created. Could compress both of theiter_*_nonoverlap()
methods into one, if it was passed an object which hadstart()
,end()
,len()
andsubstr()
methods.
AUTHOR
Paul Evans <leonerd@leonerd.org.uk>