NAME

Parse::StringTokenizer - Extract fields from strings No symbols are exported by default.

SYNOPSIS

use Parse::StringTokenizer;
my $tokenizer = String::Tokenizer->new();
my $str = 'a b c';
die unless 'a' eq $tokenizer->shift($str);
die unless 'b c' eq $str;

EXPORTS

Nothing exported by default

DEPENDENCIES

This module requires these other modules and libraries:

Error::Programatic
Perl::Module

DESCRIPTION

Similiar in spirit to strtok.

Default delimiter is whitespace (regex \s).

Default quotes which protect the delimiter are: '"

PUBLIC INTERFACE

new

Constructor

options:

-delim => $delimiter      # Token separator
-quotes => $quotes        # Quote characters which protect the
                          # delimiter, and will be ignored
-contained => $chars      # Character pairs which protect the delimiter
-keywords => \@keywords   # Words which will not be split
-preserve => 1            # Preserve quotes which surround each field

By default, the following is considered three tokens:

one 'and a two' "and a three"

split

Unpack the string and return non-empty fields

unpack

Split a string into fields (opposite of pack)

pack

Join fields together as a string (opposite of unpack)

-preserve

TODO: when packing the field will be double quoted if it contains the
primary delimiter.  If the field is quoted, then do not double-quote

shift

Return the first field and trim the string

pop

Return the last field and trim the string

push

Push a field on to the string

Returns the new field count.

PACKAGE INTERNALS

_compile

Compile the regular expression

Quotes and contained segments can have escaped characters, like:

'don\'t break me'

right now, it comes back as

dont\'t break me

TODO, unescape it so it becomes

dont't break me

the difficulty is that you DO NOT want to unescape the backslashes here

"my quotes \'are\' different"

which means reworking the RE such that the leading and tailing quotes are NOT removed, i.e., they remain part of the segment. and a subsequent step added to `unpack` which knows what to do by the presence of leading and trailing characters.

AUTHORS

Ryan Gies <ryangies@cpan.org>

COPYRIGHT

Copyright (C) 2014-2016 by Ryan Gies. All rights reserved.
Copyright (C) 2006-2013 by Livesite Networks, LLC. All rights reserved.
Copyright (C) 2000-2005 by Ryan Gies. All rights reserved.
Redistribution and use in source and binary forms, with or without 
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice, 
this list of conditions and the following disclaimer.
* The origin of this software must not be misrepresented; you must not 
claim that you wrote the original software. If you use this software in a 
product, an acknowledgment in the product documentation would be 
appreciated but is not required.
* Altered source versions must be plainly marked as such, and must not be 
misrepresented as being the original software.
* The name of the author may not be used to endorse or promote products 
derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE AUTHOR "AS IS" AND ANY EXPRESS OR IMPLIED 
WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF 
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO 
EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, 
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT 
OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS 
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN 
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING 
IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY 
OF SUCH DAMAGE.
To the best of our knowledge, no patented algorithms have been used. However, we
do not have the resources to carry out a patent search, and therefore cannot 
give any guarantee of the above statement.