NAME

RTF::Tokenizer - Tokenize RTF

SYNOPSIS

use RTF::Tokenizer;

sub entity_handler {
  return "&#" . hex($_[0]);
}

my $object = RTF::Tokenizer->new($line);
#my $object = RTF::Tokenizer->new($line, \&entity_handler);

while (1) {
  my ($type, $value, $extra) = $object->get_token;
  print "$type, $value, $extra\n";
  if ($type eq 'eof') { exit; }
}

METHODS

new

Creates an instance. Needs a string of RTF for the first argument and an optional subroutine for the second. This subroutine is what to do upon finding an entity. Default behaviour is to change it into the character represented, but you can make it spit out HTML entities if you want too (as per the example above). The argument passed to this routine will be a hex value for the entity.

get_token

Returns a list, containing: token type (one of: control, text, group or eof), token data, and then if it's a control word, the integer value associated with it (if there is one).

AUTHOR

Peter Sergeant <pete@clueball.com>

COPYRIGHT

Copyright 2002 Peter Sergeant.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.