config.pod - listing of configuration options


Description of all known configuration options.


The following is a list of options supported by the measures of semantic relatedness. This is intended to serve as a "master list" of options so that descriptions can be copied from here and pasted into the documentation for specific modules.


This option is supported by all measures.

The value of this parameter specifies the level of tracing that should be employed for generating the traces. This value is an integer equal to 0, 1, or 2. If the value is omitted, then the default value, 0, is used. A value of 0 switches tracing off. A value of 1 or 2 switches tracing on. The difference between a value of 1 or 2 depends upon the measure being used.

For vector and lesk, a value of 1 displays as traces only the gloss overlaps found. A value of 2 displays as traces all the text being compared.

For the res, lin, jcn, wup, lch, path, and hso measures, a trace of level 1 means the synsets are represented as word#pos#sense strings, while for level 2, the synsets are represented as word#pos#offset strings.


This option is supported by all measures.

The value of this parameter specifies whether or not caching of the relatedness values should be performed. This value is an integer equal to 0 or 1. If the value is omitted, then the default value, 1, is used. A value of 0 switches caching 'off', and a value of 1 switches caching 'on'.


This option is supported by all measures.

The value of this parameter indicates the size of the cache, used for storing the computed relatedness value. The specified value must be a non-negative integer. If the value is omitted, then the default value, 5,000, is used. Setting maxCacheSize to zero has the same effect as setting cache to zero, but setting cache to zero is likely to be more efficient. Caching and tracing at the same time can result in excessive memory usage because the trace strings are also cached. If you intend to perform a large number of relatedness queries, then you might want to turn tracing off.


This option is supported by the res, lin, jcn, wup, path, and lch measures.

The value of this parameter indicates whether or not a unique root node should be used. In WordNet, there is no unique root node for the noun and verb taxonomies. If this parameter is set to 1 (or if the value is omitted), then certain measures (wup, path, lch, res, lin, and jcn) will "fake" a unique root node. If the value is set to 0, then no unique root node will be used. If the value is omitted, then the default value, 1, is used.


This option is supported by the res, lin, and jcn measures.

The value for this parameter should be a string that specifies the path of an information content file containing the frequency of occurrence of every WordNet concept in a large corpus. A number of utility programs are included in this distribution that can be used to generate an infocontent file (see utils.pod). If no path is specified, then the default infocontent file is used, which was generated from SemCor using the sense-tags.


This option is supported only by the lch measure.

The value for this parameter should be a string that specifies the location of a taxonomy depths file (as generated by If no path is specified, then the default file is used, which was generated when the Similarity package was installed.


This option is supported only by the wup measure.

The value for this parameter should be a string that specifies the location of a synset depths file (as generated by If no path is specified, then the default file is used, which was generated when the Similarity package was installed.


This option is supported only by the lesk and vector measures.

The value of this parameter is the path to a file that contains a list of WordNet relations. The path may be either an absolute path or a relative path.

The vector module combines the glosses of synsets related to the target synsets by these relations and forms the gloss-vector from this combined gloss.

The lesk module combines glosses of synsets related to the target synsets by these relations and then searches for overlaps in these "super-glosses."

WARNING: the format of the relation file is different for the vector and lesk measures. The documentation for lesk and vector describe the respective formats for the relation files. See WordNet::Similarity::vector(3pm) and WordNet::Similarity::lesk(3pm).


This option is supported only by the lesk and vector measures.

The value of this parameter the path of a file containing a list of stop words that should be ignored in the glosses. The path may be either an absolute path or a relative path.


This option is supported only by the lesk and vector measures.

The value of this parameter indicates whether or not stemming should be performed. The value must be an integer equal to 0 or 1. If the value is omitted, then the default value, 0, is used. A value of 1 switches 'on' stemming, and a value of 0 switches stemming 'off'. When stemming is enabled, all the words of the glosses are stemmed before their vectors are created for the vector measure or their overlaps are compared for the lesk measure.


This option is supported only by the lesk measure.

The value of this parameter indicates whether or not normalization of scores is performed. The value must be an integer equal to 0 or 1. If the value is omitted, then the default value, 0, is assumed. A value of 1 switches 'on' normalizing of the score, and a value of 0 switches normalizing 'off'. When normalizing is enabled, the score obtained by counting the gloss overlaps is normalized by the size of the glosses. The details are described in Banerjee and Pedersen (2002).


This option is supported only by the vector measure.

The value of this parameter is the path of a file containing a list of compound words in WordNet. This path may be either an absolute path or a relative path. If the value is omitted, then the default behavior (no compound recognition) is used.


This option is supported only by the vector measure.

The value of this parameter is the path to a Berkeley DB file containing word vectors, i.e. co-occurrence vectors for all the words in the WordNet glosses. The value of this parameter may not be omitted, and the vector measure will not run without a DB file being specified in a configuration file.


This option is supported only by the random measure.

The value of this option is the maximum random number that will be generated. The value of this option must be a positive floating-point number. The default value is 1.0. All random numbers generated will be in the range [0, maxrand).


Siddharth Patwardhan, University of Utah, Salt Lake City
sidd at

Ted Pedersen, University of Minnesota Duluth
tpederse at

Jason Michelizzi, University of Minnesota Duluth
mich0212 at




intro.pod, WordNet::Similarity(3pm)


Copyright (C) 2003 Siddharth Patwardhan, Ted Pedersen, and Jason Michelizzi Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.

Note: a copy of the GNU Free Documentation License is available on the web at and is included in this distribution as FDL.txt.