NAME

Lingua::TreeTagger::Token - Representing a token tagged by TreeTagger.

VERSION

This documentation refers to Lingua::TreeTagger::Token version 0.01.

SYNOPSIS

use Lingua::TreeTagger;

# Create a Tagger object.
my $tagger = Lingua::TreeTagger->new(
    'language' => 'english',
);

# Tag some text and get a new TaggedText object.
my $tagged_text = $tagger->tag_file( 'path/to/some/file.txt' );

# A TaggedText object is essentially a sequence of Lingua::TreeTagger::Token
# objects.
foreach my $token ( @{ $tagged_text->sequence() } ) {

    # A token may contain a single SGML tag...
    if ( $token->is_SGML_tag() ) {
        print 'An SGML tag: ', $token->tag, "\n";
    }

    # ... or a part-of-speech tag.
    else {
        print 'A part-of-speech tag: ', $token->tag, "\n";
        
        # In the latter case, the token may also have attributes specifying
        # the original string...
        if ( defined $token->original() ) {
            print '  token: ', $token->original(), "\n";
        }

        # ... or the corresponding lemma.
        if ( defined $token->lemma() ) {
            print '  lemma: ', $token->lemma(), "\n";
        }
    }
}

DESCRIPTION

This module is part of the Lingua::TreeTagger distribution. It defines a class for representing a unit in the output of TreeTagger in an object-oriented way. Such a unit consists in either (i) exactly one part-of-speech tag and possibly a token and a lemma (tab-delimited) or (ii) an SGML tag. See also Lingua::TreeTagger and Lingua::TreeTagger:TaggedText.

METHODS

new()

Creates a new Token object. This is normally called by a Lingua::TreeTagger::TaggedText object rather than directly by the user. It requires two parameters:

tag

A string containing either a part-of-speech tag or an SGML tag.

is_SGML_tag

1 if the value of the tag attribute is to be interpreted as an SGML tag, 0 otherwise.

If the <is_SGML_tag> attribute is set to 0, the constructor may take two additional optional parameters:

original

A string containing the original token to which the part-of-speech tag has been attributed.

lemma

A string containing the lemma of the original token.

ACCESSORS

tag()

Read-only accessor for the 'tag' attribute of a token (either a TreeTagger part-of-speech tag or an SGML tag).

is_SGML_tag()

Read-only accessor for the 'is_SGML_tag' attribute of a token (value is 1 if the tag is an SGML tag and 0 otherwise).

original()

Read-only accessor for the 'original' attribute of a token, i.e. the original word token to which a given part-of-speech tag was assigned. Available only if the value of 'is_SGML_tag' is 0.

lemma()

Read-only accessor for the 'lemma' attribute of a token, i.e. the base form of the original word token to which a given part-of-speech tag was assigned. Available only if the value of 'is_SGML_tag' is 0.

DIAGNOSTICS

An SGML tag cannot have a 'original' attribute

This exception is raised by the class constructor when a new Token object is simultaneously specified as being an SGML tag and having a 'original' attribute.

An SGML tag cannot have a 'lemma' attribute

This exception is raised by the class constructor when a new Token object is simultaneously specified as being an SGML tag and having a 'lemma' attribute.

DEPENDENCIES

This module is part of the Lingua::TreeTagger distribution. It is not intended to be used as an independent module.

It requires module Moose and was developed using version 1.09. Please report incompatibilities with earlier versions to the author.

BUGS AND LIMITATIONS

There are no known bugs in this module.

Please report problems to Aris Xanthos (aris.xanthos@unil.ch)

Patches are welcome.

AUTHOR

Aris Xanthos (aris.xanthos@unil.ch)

LICENSE AND COPYRIGHT

Copyright (c) 2010 Aris Xanthos (aris.xanthos@unil.ch).

This program is released under the GPL license (see http://www.gnu.org/licenses/gpl.html).

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

SEE ALSO

Lingua::TreeTagger, Lingua::TreeTagger::TaggedText