NAME
rawtextFreq.pl - Perl program for finding the frequencies of words in raw text files
SYNOPSIS
rawtextFreq.pl --compfile COMPFILE --outfile OUTFILE [--stopfile=STOPFILE] {--stdin | --infile FILE [--infile FILE ...]} [--wnpath WNPATH] [--resnik] [--smooth=SCHEME] | --help | --version
OPTIONS
--compfile=filename
The name of a file containing the compound words (collocations) in
WordNet
--outfile=filename
The name of a file to which output should be written
--stopfile=filename
A file containing a list of stop listed words that will not be
considered in the frequency counts. A sample file can be down-
loaded from
http://www.d.umn.edu/~tpederse/Group01/WordNet/words.txt
--wnpath=path
Location of the WordNet data files (e.g.,
/usr/local/WordNet-2.0/dict)
--resnik
Use Resnik (1995) frequency counting
--smooth=SCHEME
Smoothing should used on the probabilities computed. SCHEME can
only be ADD1 at this time
--help
Show a help message
--version
Display version information
--stdin
Read from the standard input the text that is to be used for
counting the frequency of words.
--infile=PATTERN
The name of a raw text file to be used to count word frequencies.
This can actually be a filename, a directory name, or a pattern (as
understood by Perl's glob() function). If the value is a directory
name, then all the files in that directory and its subdirectories will
be used.
If you are looking for some interesting files to use, check out
Project Gutenberg: <http://www.gutenberg.org>.
This option may be given more than once (if more than one file
should be used).
AUTHORS
Siddharth Patwardhan, University of Utah, Salt Lake City
sidd @ cs.utah.edu
Ted Pedersen, University of Minnesota, Duluth
tpederse @ d.umn.edu
Satanjeev Banerjee, Carnegie Mellon University, Pittsburgh
banerjee+ @ cs.cmu.edu
Jason Michelizzi, University of Minnesota, Duluth
mich0212 @ d.umn.edu
BUGS
None.
COPYRIGHT AND LICENSE
Copyright (c) 2004, Siddharth Patwardhan, Ted Pedersen, Satanjeev Banerjee, and Jason Michelizzi
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to
Free Software Foundation, Inc.
59 Temple Place - Suite 330
Boston, MA 02111-1307, USA