NAME

umls-similarity.pl - This program returns a semantic similarity score between two concepts.

SYNOPSIS

This is a utility that takes as input either two terms (DEFAULT) or two CUIs and returns the similarity between the two.

USAGE

Usage: umls-similarity.pl [OPTIONS] [CUI1|TERM1] [CUI2|TERM2]

INPUT

[CUI1|TERM1] [CUI2|TERM2]

The input are two terms or two CUIs associated to concepts in the UMLS.

General Options:

--config FILE

This is the configuration file. The format of the configuration file is as follows:

SAB :: <include|exclude> <source1, source2, ... sourceN>

REL :: <include|exclude> <relation1, relation2, ... relationN>

For example, if we wanted to use the MSH vocabulary with only the RB/RN relations, the configuration file would be:

SAB :: include MSH REL :: include RB, RN

or

SAB :: include MSH REL :: exclude PAR, CHD

If you go to the configuration file directory, there will be example configuration files for the different runs that you have performed.

--realtime

This option will not create a database of the path information for all of concepts in the specified set of sources and relations in the config file but obtain the information for just the input concept

--forcerun

This option will bypass any command prompts such as asking if you would like to continue with the index creation.

--measure MEASURE

Use the MEASURE module to calculate the semantic similarity. The available measure are: 1. Leacock and Chodorow (1998) refered to as lch 2. Wu and Palmer (1994) refered to as wup 3. The basic path measure refered to as path 4. Rada, et. al. (1989) refered to as cdist 5. Nguyan and Al-Mubaid (2006) refered to as nam 6. Resnik (1996) refered to as res 7. Lin (1988) refered to as lin 8. Jiang and Conrath (1997) refered to as jcn 9. The vector measure refered to as vector

--precision N

Displays values up to N places of decimal.

--allsenses

This option prints out all the possible CUIs pairs and their semantic similarity score if one of the inputs is a term that maps to more than one CUI. Right now we just return the CUIs that are the most similar.

--help

Displays the quick summary of program options.

--version S/Similarity/ErrorHandler.pm (unchanged) Manifying blib/man3/UMLS::Similarity::path.3pm Installing /usr/local/share/perl/5.10.1/UMLS/Similarity/path.pm Appending installation info to /usr/local/lib/perl/5.10.1/perllocal.pod bridget@jabberwocky:~/work/UMLS-Similarity$

Displays the version information.

Input Options:

--infile FILE

A file containing pairs of concepts or terms in the following format:

term1<>term2 

or 

cui1<>cui2

or 

cui1<>term2

or 

term1<>cui2

Unless the --matrix option is chosen then it is just a list of CUIS: cui1 cui2 cui3 ...

--matrix

This option returns a matrix of similarity scores given a file containing a list of CUIs. The file is passed using the --infile option

Debug Options:

--debug

Sets the UMLS-Interface debug flag on for testing

--info

Displays information about the concept if it doesn't exist in the source.

--verbose

This option will print out the table information to the config file that you specified.

Database Options:

--username STRING

Username is required to access the umls database on MySql

--password STRING

Password is required to access the umls database on MySql

--hostname STRING

Hostname where mysql is located. DEFAULT: localhost

--database STRING

Database contain UMLS DEFAULT: umls

IC Measure Options:

--icpropagation FILE

FILE containing the propagation counts of the CUIs. This file must be in the following format:

CUI<>ic

where ci = the information content of the CUI CUI = the concept's UMLS CUI

See example in samples/ directory called icpropagation.

--icfrequency FILE

FILE containing frequency counts of CUIs. This file must be in the following format:

CUI<>freq

where freq = the frequency of the concept CUI = the concept's UMLS CUI

See example in samples/ directory called icfrequency.

Vector Measure Options:

--vectormatrix FILE

This is the matrix file that contains the vector information to use with the vector measure. This is required if you specify vector with the --measure option.

This file is generated by the vector-input.pl program. An example of this file can be found in the samples/ directory and is called matrix.

--vectorindex FILE

This is the index file that contains the vector information to use with the vector measure. This is required if you specify vector with the --measure option.

This file is generated by the vector-input.pl program. An example of this file can be found in the samples/ directory and is called index.

--debugfile FILE

This prints the vector information to file, FILE, for debugging purposes.

--dictfile FILE

This is a dictionary file for the vector measure. It contains the 'definitions' of a concept (or term) which would be used rather than the definitions from the UMLS.

The format of this file is:

CUI <definition> CUI <definition> TERM <definition> TERM <definition>

If using TERM, the term is mapped to concepts in the UMLS and the terms difinition is used as itheir definitions. If more than one term in the dictfile maps to a concept, all of the definitions are used.

Keep in mind, when using this file, if one of the CUIs that you are obtaining the similarity for does not exist in the file the vector will be empty which will lead to strange similarity scores.

An example of this file can be found in the samples/ directory and is called dictfile.

--defraw

This is a flag for the vector measures. The definitions used are 'cleaned'. If the --defraw flag is set they will not be cleaned.

--stoplist FILE

A file containing a list of words to be excluded from the features in the vector method. The format required is one stopword per line. For example:

the a and for

...

OUTPUT

disambiguate.pl creates two directories. One containing the arff files and the other containing the weka files. In the weka directory, the overall averages are stored in the OverallAverage file.

SYSTEM REQUIREMENTS

  • Perl (version 5.8.5 or better) - http://www.perl.org

  • UMLS::Interface - http://search.cpan.org/dist/UMLS-Interface

  • UMLS::Similarity - http://search.cpan.org/dist/UMLS-Similarity

CONTACT US

If you have any trouble installing and using UMLS-Similarity, 
please contact us via the users mailing list :
  
    umls-similarity@yahoogroups.com
   
You can join this group by going to:
  
    http://tech.groups.yahoo.com/group/umls-similarity/
   
You may also contact us directly if you prefer :
  
    Bridget T. McInnes: bthomson at cs.umn.edu 

    Ted Pedersen : tpederse at d.umn.edu

AUTHOR

Bridget T. McInnes, University of Minnesota

COPYRIGHT

Copyright (c) 2007-2009,

Bridget T. McInnes, University of Minnesota
bthomson at cs.umn.edu
   
Ted Pedersen, University of Minnesota Duluth
tpederse at d.umn.edu


Siddharth Patwardhan, University of Utah, Salt Lake City
sidd@cs.utah.edu

Serguei Pakhomov, University of Minnesota Twin Cities
pakh0002@umn.edu

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to:

The Free Software Foundation, Inc.,
59 Temple Place - Suite 330,
Boston, MA  02111-1307, USA.