NAME
NLP::GATE::AnnotationSet - A class for representing GATE-like annotation sets
VERSION
Version 0.2
SYNOPSIS
use NLP::GATE::AnnotationSet;
my $annset = NLP::GATE::AnnotationSet->new();
$annset->add($annotation);
$newannset = $annset->get($type[,$featuremap]);
$arrayref = $annset->getAsArrayRef();
$ann = $annset->getByIndex();
$ann = $annset->size();
DESCRIPTION
This is a simple class representing a annotation set for documents in the format the GATE software (http://gate.ac.uk/) uses.
An annotation set can contain any number of NLP::GATE::Annotation objects. Currently, there is no code to make sure that annotations are only added once.
Annotation sets behave a bit like arrays in that each annotation can be addressed by an index and each set always contains a known number of annotations.
TODO: use the offset indices in method getByOffset()
METHODS
new()
Create a new annotation set. The name of the annotationset is not a property of the set, instead, each set is associated with a name when stored with a NLP::GATE::Document object using the setAnnotationSet() method.
add($annotation)
Add an annotation object to the annotation set.
getByIndex($n)
Return the annotation for index $n or signal an error.
get($type[,$featureset[,$matchtype]])
Return a new annotation set containing all the annotations from this set that match the given type, and if specified, all the feature/value pairs given in the $featureset hash map reference. If no annotations match, an empty annotation set will be returned.
The parameter $matchtype specifies how features are matched: "exact" will do an exact string comparison, "nocase" will compare after converting both strings to lower case using perl's lc function, and "regexp" will interpret the string given in the parameter as a regular expression. Default is "exact".
If some feature is specified in the featureset it MUST occur in the feature set of the annotation AND satisfy the testing matchtype method of testing for equality.
The annotations in the new set will be the same as in the original set, so changing the annotation objects will change them in both sets!
getByOffset(from,to,type,featureset,$featurematchtype,$rangematchtype)
Return all the annotations that span the given offset range, optionally filtering in addition by type and features. This method requires an offset range and in addition filters annotation as the get method does.
If from one of the parameters is undef, any value is allowed for the match to be successful.
The parameter $featurematchtype specifies how features are matched: "exact" will do an exact string comparison, "nocase" will compare after converting both strings to lower case using perl's lc function, and "regexp" will interpret the string given in the parameter as a regular expression. Default is "exact".
The $rangematchtype argument specifies how offsets will be compared, if they are specified (case does not matter): "COVER" - any annotation with a from less than or equal than $from and a to greater than or equal than $to: annotations that contain this range "EXACT" - any annotation with from and to offsets exactly as specified. This is the default: annotations that are co-extensive with this range "WITHIN" - any annotation that lies fully within the range "OVERLAP" - any annotation that overlaps with the given range
For example to find an annotation that fully contains the text from offset 12 to offset 17, use getByOffset(12,17,undef,undef,"cover").
getAsArrayRef()
Return an array reference whose elements are the Annotation objects in this set.
getAsArray()
Return an array whose elements are the Annotation objects in this set.
size()
Return the number of annotations in the set
getTypes()
Return an array of all different types in the set.
NOTE: this will currently go through all annotations in the set and collect the types. No caching of type names is done in this function or during creation of the set.
indexByOffsetFrom ()
Creates an index for the set that will speed up the retrieval of annotations by offset or offset interval. Unlike in GATE, this is not called automatically but must be explicitly requested before doing the retrieval.
If an index already exist it is discarded and a new index is built.
AUTHOR
Johann Petrak, <firstname.lastname-at-jpetrak-dot-com>
BUGS
Please report any bugs or feature requests to bug-gate-document at rt.cpan.org
, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=NLP::GATE. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
SUPPORT
You can find documentation for this module with the perldoc command.
perldoc NLP::GATE
You can also look for information at:
AnnoCPAN: Annotated CPAN documentation
CPAN Ratings
RT: CPAN's request tracker
Search CPAN