NAME

Text::TEI::Collate - a collation program for variant manuscript texts

SYNOPSIS

use Text::TEI::Collate;
my $aligner = Text::TEI::Collate->new( 'language' => 'Armenian' );

# Read from strings.
my @manuscripts;
foreach my $str ( @strings_to_collate ) {
  push( @manuscripts, $aligner->read_source( $str ) );
}
$aligner->align( @manuscripts; );

# Read from files.  Also works for XML::LibXML::Document objects.
@manuscripts = ();
foreach my $xml_file ( @TEI_files_to_collate ) {
  push( @manuscripts, $aligner->read_source( $xml_file ) )
}
$aligner->align( @manuscripts );

# Read from a JSON input.
@manuscripts = $aligner->read_source( $JSON_string );
$aligner->align( @manuscripts );

DESCRIPTION

Text::TEI::Collate is a collation program for multiple (transcribed) manuscript copies of a known text. It is an object-oriented interface, mostly for the convenience of the author and for the ability to have global settings.

The object is the alignment engine, or "aligner". The methods that a user will care about are "read_source" and "align", as well as the various output methods; the other methods in this file are public in case a user needs a subset of this package's functionality.

An aligner takes two or more texts; the texts can be strings, filenames, or XML::LibXML::Document objects. It returns two or more Manuscript objects -- one for each text input -- in which identical and similar words are lined up with each other, via empty-string padding.

Please see the documentation for Text::TEI::Collate::Manuscript and Text::TEI::Collate::Word for more information about the manuscript and word objects.

METHODS

new

Creates a new aligner object. Takes a hash of options; available options are listed.

debuglevel - Default 0. The higher the number (between 0 and 3), the more the debugging output.
title - Display title for the collation output results, should those results need a display title (e.g. TEI or JSON output).
language - Specify the language module we should use from those available in Text::TEI::Collate::Lang. Default is 'Default'.
fuzziness - The maximum allowable word distance for an approximate match, expressed as a percentage of word distance / word length. It can also be expressed as a hashref with keys 'val', 'short', and 'shortval', if you want to increase the tolerance for short words (defined as at or below the value of 'short').
binmode - If STDERR should be using something other than UTF-8, you can set it here. You are probably in for a world of hurt anyway though.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 865:

Non-ASCII character seen before =encoding in ''հարիւրից''. Assuming UTF-8