Changes for version 0.6
- Add of installation tests
- Add of examples for French
- Integration of the MIG corrections (Node, Testified terms)
- Correction in the forbidden structure management (split action)
- Add NEXT.pm in the pre-required module list
- UTF-8 is used as charset for both configuration files and texts to process
- Integration of attested terms (in a bracketed format, can be produced by the bootstrap option)
- Option "bootstrap" to generate an output in a bracketed format. This output can be used as a attested resource on a other corpus
- Add a defined test in the function appendInclude (but it seems there is still a bug on tree building).
- Monolexical term occurrences are identified as MNP when the monolexical oprion is set
- Addition of a config set for French based on the POSTagger Flemm (which uses the Multex tagset) and modification of the read method in Corpus.pm (addition opf the parameter language) to normalisze the input (correction of the output of Flemm)
- Correction in the input normalization for Flemm
- Each printing function can now print results on stdout
- Replace ':' as tag by 'COLUMN'
- Add of some Chunking Frontiers as '('
- Only the tag is taken into account for sentence_boundary detection
- Add Weights section to the DTD and in the XML rendering
- Correction: TF-IDF based ranking method is a DDW ranking
- Add ROOT as a reference to the term in which the current term is nested
- Addition of several term weighting and selection measures (C-Value and variations, iLong, ilnc, iLong, and term autonomy)
- Add the status of the term candidate (0 : not a term, 1 : term). By default, term candidate are terms
- Add a option for term output style to print only term candidate having the status of term
- colors of the terms in the HTML output can be parametrized (option file, options PARSED_COLOR and UNPARSED_COLOR)
- new function for printing only list of candidate terms without XML header
- Addition of the option XML-corpus-raw, rendering the corpus in XML format with terms. Documents and sentences are identified in the XML files.
Documentation
Perl script for extracting terms from a corpus of texts and providing a syntactic analysis in a head-modifier representation.
Modules
Perl extension for extracting terms from a corpus and providing a syntactic analysis in a head-modifier format.
Perl extension for annotation marks
Perl extension for the set of chuncking data
Perl extension for subset of chuncking data.
Perl extension for ???
Perl extension for words of input document
Perl extension for document set
Perl extension for edge between nodes
Perl extension for managing information related to a configuration file.
Perl extension for managing the directory containing the configuration file set given a language.
Perl extension for the forbidden structures.
Perl extension for forbidden structures in any position of a chunk.
Perl extension for mananging the annotation marks for the forbidden structures
Perl extension for managing the forbiddent structures.
Perl extension for forbidden structures in at the start or end position of a chunk.
Perl extension for ???
Perl extension for internal nodes
Perl extension for island of reliability
Perl extension for set of reliability islands
Perl extension for lexicon of the corpus.
Perl extension for representing word
Perl extension for the linguistic item of the forbiddent structures
Perl extension for managing a message in the term extractor
Perl extension for message set
Perl extension for monoloexical phrases
Perl extension for the monolexical term candidate
Perl extension for monolexical testified terms
Perl extension for monolexical word
Perl extension for ???
Perl extension for ???
Perl extension for multi-word testified terms
Perl extension for ???
Perl extension for ???
Perl extension for ???
Perl extension for the phrase occurrences
Perl extension for option of the term extraction process
Perl extension for handling option set in YaTeA
Perl extension for parsing pattern
Perl extension for parsing the file containing the parsing patterns (based on Parse::Yapp)
Perl extension for recording parsing patterns
Perl extension for managing the set of the parsing patterns
Perl extension for the leaf node of a syntactic pattern tree
Perl extension for phrases corresponding to the parsed terms
Perl extension for ???
Perl extension for the root node of the syntactic tree of a term
Perl extension for sentence
Perl extension for the sentence set
Perl extension for managing the set of Part-of-Speech tags and inflected that can be accepted in the terms.
Perl extension for Term Candidate
Perl extension for leaf node of term tree
Perl extension for Testified Term
Perl extension for marks of testified terms
Perl extension for the parser of testified term file (based on Parse::Yapp)
Perl extension for ???
Perl extension for ???
Perl extension for a trigger.
Perl extension for managing the trigger set
Perl extension for managing word of the corpus and related information
Perl extension for managing word occurrence
Perl extension for managing characters which can not be used in a XML document
Examples
- examples/load_config
- examples/sampleEN.ttg
- examples/sampleFR-flemm.ttg
- examples/sampleFR.ttg
- examples/testified_terms.txt
- examples/yatea-fullexample
- examples/yatea-fullexample-tft
- examples/yatea-fullexampleFR
- examples/yatea-fullexampleFR-Flemm
- examples/yatea-test-FR.rc
- examples/yatea-test-Flemm.rc
- examples/yatea-test-tft.rc
- examples/yatea-test.rc