NAME
wsd.pl - disambiguate words
SYNOPSIS
wsd.pl --context FILE [--scheme SCHEME] [--type MEASURE] [--config FILE] [--compounds FILE] [--stoplist FILE] [--window INT] [--contextScore NUM] [--pairScore NUM] [--outfile FILE] [--boundary] [--trace INT] [--silent] | --help | --version
DESCRIPTION
Disambiguates each word in the context file using the specified relatedness measure (or WordNet::Similarity::lesk if none is specified).
OPTIONS
N.B., the = sign between the option name and the option parameter is optional.
- --context=FILE
-
The input file containing the text to be disambiguated. This "option" is required.
- --scheme=SCHEME
-
The disambiguation scheme to use. Valid values are "normal" and "sense1". The default is "normal". WordNet sense 1 disambiguation guesses that the correct sense for each word is the first sense in WordNet because the senses of words in WordNet are ranked according to frequency. The first sense is more likely than the second, the second is more likely than the third, etc.
- --measure=MEAURE
-
The relatedness measure to be used. The default is WordNet::Similarity::lesk.
- --config=FILE
-
The name of a configuration file for the specified relatedness measure.
- --compounds=FILE
-
A file containing compound words.
- --stoplist=FILE
-
A file containing regular expressions (as understood by Perl). Any word matching one of the regular expressions in the file is removed. Each regular expression must be on its own line, and any trailing whitespace is ignored.
- --window=INTEGER
-
The window size used in the disambiguation algorithm. The default is 3.
- --contextScore=REAL
-
If no sense of the target word achieves this minimum score, then no winner will be projected (e.g., it is assumed that there is no best sense or that none of the senses are sufficiently related to the surrounding context). The default is zero.
- --pairScore=REAL
-
The minimum pairwise score between a sense of the target word and the best sense of a context word that will be used in computing the overall score for that sense of the target word. Setting this to be greater than zero (but not too large) will reduce noise. The default is zero.
- --outfile=FILE
-
The name of a file to which output should be sent.
- --boundary
-
Automatically detect sentence boundaries. By default, if the input text is POS tagged, then it is assumed that the input file has once sentence per line. If the text is not POS tagged, then sentence boundary detection is done. This option can be used to override this default behavior. To force sentence boundary detection, use this option. To prevent sentence boundary detection, negate the option (--no-boundary).
- --trace=INT
-
Turn tracing on/off. A value of zero turns tracing off, a non-zero value turns tracing on. The different trace levels can be added together to see the combined traces. The trace levels are:
Show the context window for each pass through the algorithm.
Display winning score for each pass.
Display the scores for all senses for each pass (overrides 2).
Display traces from the semantic relatedness module.
- --silent
-
Silent mode. No information about progress, etc. is printed. Just the final output.
AUTHORS
Jason Michelizzi, <jmichelizzi at users.sourceforge.net>
Ted Pedersen, <tpederse at d.umn.edu>
BUGS
None known.
COPYRIGHT
Copyright (C) 2004 Jason Michelizzi and Ted Pedersen
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
2 POD Errors
The following errors were encountered while parsing the POD:
- Around line 353:
You have '=item 4' instead of the expected '=item 3'
- Around line 357:
You have '=item 8' instead of the expected '=item 4'