NAME
umls-similarity.pl - This program returns a semantic similarity score between two concepts.
SYNOPSIS
This is a utility that takes as input either two terms (DEFAULT) or two CUIs and returns the similarity between the two.
USAGE
Usage: umls-similarity.pl [OPTIONS] [CUI1|TERM1] [CUI2|TERM2]
INPUT
[CUI1|TERM1] [CUI2|TERM2]
The input are two terms or two CUIs associated to concepts in the UMLS.
Optional Arguments:
--infile FILE
A file containing pairs of concepts or terms in the following format:
term1<>term2
or
cui1<>cui2
or
cui1<>term2
or
term1<>cui2
Unless the --matrix option is chosen then it is just a list of CUIS: cui1 cui2 cui3 ...
--matrix
This option returns a matrix of similarity scores given a file containing a list of CUIs. The file is passed using the --infile option
--username STRING
Username is required to access the umls database on MySql
--password STRING
Password is required to access the umls database on MySql
--hostname STRING
Hostname where mysql is located. DEFAULT: localhost
--database STRING
Database contain UMLS DEFAULT: umls
--measure MEASURE
Use the MEASURE module to calculate the semantic similarity. The available measure are: 1. Leacock and Chodorow (1998) refered to as lch 2. Wu and Palmer (1994) refered to as wup 3. The basic path measure refered to as path 4. Rada, et. al. (1989) refered to as cdist 5. Nguyan and Al-Mubaid (2006) refered to as nam
--precision N
Displays values upto N places of decimal.
--info
Displays information about the concept if it doesn't exist in the source.
--dbfile FILE
This is the Berkley DB file that contains the vector information to use with the vector measure. This is required if you specify vector with the --measure option.
--allsenses
This option prints out all the possible CUIs pairs and their semantic similarity score if one of the inputs is a term that maps to more than one CUI. Right now we just return the CUIs that are the most similar.
--forcerun
This option will bypass any command prompts such as asking if you would like to continue with the index creation.
--debug
Sets the UMLS-Interface debug flag on for testing
--verbose
This option will print out the table information to the config file that you specified.
--help
Displays the quick summary of program options.
--version
Displays the version information.
OUTPUT
disambiguate.pl creates two directories. One containing the arff files and the other containing the weka files. In the weka directory, the overall averages are stored in the OverallAverage file.
SYSTEM REQUIREMENTS
Perl (version 5.8.5 or better) - http://www.perl.org
UMLS::Interface - http://search.cpan.org/dist/UMLS-Interface
UMLS::Similarity - http://search.cpan.org/dist/UMLS-Similarity
CONTACT US
If you have any trouble installing and using UMLS-Similarity,
please contact us via the users mailing list :
umls-similarity@yahoogroups.com
You can join this group by going to:
http://tech.groups.yahoo.com/group/umls-similarity/
You may also contact us directly if you prefer :
Bridget T. McInnes: bthomson at cs.umn.edu
Ted Pedersen : tpederse at d.umn.edu
AUTHOR
Bridget T. McInnes, University of Minnesota
COPYRIGHT
Copyright (c) 2007-2009,
Bridget T. McInnes, University of Minnesota
bthomson at cs.umn.edu
Ted Pedersen, University of Minnesota Duluth
tpederse at d.umn.edu
Siddharth Patwardhan, University of Utah, Salt Lake City
sidd@cs.utah.edu
Serguei Pakhomov, University of Minnesota Twin Cities
pakh0002@umn.edu
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to:
The Free Software Foundation, Inc.,
59 Temple Place - Suite 330,
Boston, MA 02111-1307, USA.