SYNOPSIS This package consists of Perl modules along with supporting Perl programs that implement the semantic similarity and relatedness measures described by Leacock & Chodorow (1998), Wu & Palmer (1994), Nguyen and Al-Mubaid (2006), Rada, et. al. 1989, Jiang & Conrath (1997), Resnik (1995), Lin (1998), Banerjee and Pedersen(2002), Patwardhan and Pedersen (2006) and a simple path based measure.

    UMLS::Similarity requires the UMLS::Interface module to access 
    the Unified Medical Language System (UMLS) in order to determine 
    the similarity between two UMLS concepts.

    The Perl modules are designed as objects with methods that take as
    input two concepts from the UMLS. The semantic relatedness of these 
    concepts is returned by these methods. A quantitative measure of 
    the degree to which the two concepts are related has wide ranging 
    applications in numerous areas, such as word sense disambiguation, 
    information retrieval, etc. For example, in order to determine which 
    sense of a given word is being used in a particular context, the sense 
    having the highest relatedness with its context word senses is most 
    likely to be the sense being used. Similarly, in information retrieval, 
    retrieving documents containing highly related concepts are more likely 
    to have higher precision and recall values.

    The following sections describe the organization of this software
    package and how to use it. A few typical examples are given to help
    clearly understand the usage of the modules and the supporting
    utilities.

SEMANTIC RELATEDNESS
      We observe that humans find it extremely easy to say if two words are
      related and if one word is more related to a given word than another.
      For example, if we come across two words -- 'car' and 'bicycle', we know
      they are related as both are means of transport. Also, we easily observe
      that 'bicycle' is more related to 'car' than 'fork' is. But is there
      some way to assign a quantitative value to this relatedness? Some ideas
      have been put forth by researchers to quantify the concept of
      relatedness of words, with encouraging results.

      A number of different measures of relatedness have been implemented in
      this software package. These include a simple edge counting
      approach. The measures require the UMLS-Interface that define UMLS 
      concepts, and some basic relationships between these concepts.

CONTENTS
      All the modules that will be installed in the Perl system directory are
      present in the '/lib' directory tree of the package. These include the
      semantic relatedness modules -- 

        UMLS/Similarity/lch.pm
        UMLS/Similarity/path.pm
        UMLS/Similarity/wup.pm
        UMLS/Similarity/nam.pm
        UMLS/Similarity/cdist.pm
        UMLS/Similarity/res.pm
        UMLS/Similarity/lin.pm
        UMLS/Similarity/jcn.pm
        UMLS/Similarity/random.pm
        UMLS/Similarity/vector.pm (beta)
        UMLS/Similarity/lesk.pm (beta)

      -- present in the lib/ subdirectory. All these modules, once installed
      in the Perl system directory, can be directly used by Perl programs.

      The package contains a utils/ directory that contain Perl utility 
      programs. These utilities use the modules or provide some supporting
      functionality.

        umls-similarity.pl         -- returns the semantic similarity of two 
                                      terms or UMLS CUIs given a specified 
                                      measure (and view of the UMLS).

        spearman.pl                -- calculates the Spearman Rank 
                                      Correlation between two files

        vector-input.pl            -- creates the matrix and index files 
                                      required for the vector measure

        SignificanceTesting.r      -- R script to calculate the correlation 
                                      between a gold standard and the results 
                                      obtained using the measures in the 
                                      umls-similarity.pl program

        sim2r.pl                   -- converts umls-similarity.pl output to 
                                      a format that can be read by the R script
        create-icfrequency.pl      -- create the frequency file required for 
                                      information content measures

        create-icpropagation.pl    -- create the probability file required for 
                                      information content measures

INSTALL
      To install these modules run:

        perl Makefile.PL
        make
        make test
        make install

      This will install the modules in the standard locations. You will, 
      most probably, require root privileges to install in standard system
      directories. To install in a non-standard directory, specify a prefix
      during the 'perl Makefile.PL' stage as:

        perl Makefile.PL PREFIX=/home

      It is possible to modify other parameters during installation. The
      details of these can be found in the ExtUtils::MakeMaker
      documentation. However, it is highly recommended not messing 
      around with other parameters, unless you know what you're doing.

      To conduct an extensive test of the package please set the 
      UMLS_SIMILARITY_ALL_TESTS environment variable prior to 
      running make test. This will run the long tests:

      1. path-long.t
      2. ic-long.t
      3. relatedness-long.t

      To set the environment variable in c shell:

        setenv UMLS_SIMILARITY_RUN_ALL 1

      and in bash shell:

        export UMLS_SIMILARITY_RUN_ALL=1

SOFTWARE COPYRIGHT AND LICENSE
      Copyright (C) 2004-2010 Bridget T McInnes, Siddharth Patwardhan, 
      Serguei Pakhomov and Ted Pedersen

      This suite of programs is free software; you can redistribute it and/or
      modify it under the terms of the GNU General Public License as published
      by the Free Software Foundation; either version 2 of the License, or (at
      your option) any later version.

      This program is distributed in the hope that it will be useful, but
      WITHOUT ANY WARRANTY; without even the implied warranty of
      MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
      General Public License for more details.

      You should have received a copy of the GNU General Public License along
      with this program; if not, write to the Free Software Foundation, Inc.,
      59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.

      Note: The text of the GNU General Public License is provided in the file
      'GPL.txt' that you should have received with this distribution.

REFERENCING
      If you write a paper that has used UMLS-Similarity in some way, we'd 
      certainly be grateful if you sent us a copy and referenced UMLS-Interface. 
      We have a published paper that provides a suitable reference:

      @inproceedings{McInnesPP09,
         title={{UMLS-Interface and UMLS-Similarity : Open Source 
                 Software for Measuring Paths and Semantic Similarity}}, 
         author={McInnes, B.T. and Pedersen, T. and Pakhomov, S.V.}, 
         booktitle={Proceedings of the American Medical Informatics 
                    Association (AMIA) Symposium},
         year={2009}, 
         month={November}, 
         address={San Fransico, CA}
      }

      This paper is also found in
      <http://www-users.cs.umn.edu/~bthomson/publications/pubs.html>
      or
      <http://www.d.umn.edu/~tpederse/Pubs/amia09.pdf>

REFERENCES
      1   Wu Z. and Palmer M. 1994. Verb Semantics and Lexical Selection. In
          Proceedings of the 32nd Annual Meeting of the Association for
          Computational Linguistics.  Las Cruces, New Mexico.

      2   Resnik P. 1995. Using information content to evaluate semantic
          similarity. In Proceedings of the 14th International Joint
          Conference on Artificial Intelligence, pages 448-453, Montreal.

      3   Jiang J. and Conrath D. 1997. Semantic similarity based on corpus
          statistics and lexical taxonomy. In Proceedings of International
          Conference on Research in Computational Linguistics, Taiwan.

      4   Fellbaum C., editor. WordNet: An electronic lexical database. MIT
          Press, 1998.

      5   Leacock C. and Chodorow M. 1998. Combining local context and WordNet
          similarity for word sense identification. In Fellbaum 1998, pp.
          265-283.

      6   Lin D. 1998. An information-theoretic definition of similarity. In
          Proceedings of the 15th International Conference on Machine
          Learning, Madison, WI.

      7   Hirst G. and St-Onge D. 1998. Lexical Chains as representations of
          context for the detection and correction of malapropisms. In
          Fellbaum 1998, pp. 305-332.

      8   Schütze H. 1998. Automatic Word Sense Discrimination. Computational
          Linguistics, 24(1):97-123.

      9   Resnik P. 1999. Semantic Similarity in a Taxonomy: An Information-
          Based Measure and its Applications to Problems of Ambiguity in
          Natural Language. Journal of Artificial Intelligence Research, 11,
          95-130.

      10  Budanitsky A. and Hirst G. 2001. Semantic distance in WordNet: An
          experimental, application-oriented evaluation of five measures. In
          Workshop on WordNet and Other Lexical Resources, Second meeting of
          the North American Chapter of the Association for Computational
          Linguistics. Pittsburgh, PA.

      11  Banerjee S. and Pedersen T. 2002. An Adapted Lesk Algorithm for Word
          Sense Disambiguation Using WordNet. In Proceeding of the Fourth
          International Conference on Computational Linguistics and
          Intelligent Text Processing (CICLING-02). Mexico City.

      12  Patwardhan S., Banerjee S. and Pedersen T. 2002. Using Semantic
          Relatedness for Word Sense Disambiguation. In Proceedings of the
          Fourth International Conference on Intelligent Text Processing and
          Computational Linguistics, Mexico City.

      13  Banerjee S. Adapting the Lesk algorithm for word sense
          disambiguation to WordNet. Master Thesis, University of Minnesota,
          Duluth, 2002.

      14  Patwardhan S. Incorporating dictionary and corpus information into a
          vector measure of semantic relatedness. Master Thesis, University of
          Minnesota, Duluth, 2003.

      15  Patwardhan, S. and Pedersen T. Using WordNet Based Context Vectors 
          to Estimate the Semantic Relatedness of Concepts. In Proceedings of 
          the EACL 2006 Workshop Making Sense of Sense - Bringing Computational 
          Linguistics and Psycholinguistics Together, pp. 1-8, April 4, 2006, 
          Trento, Italy.

      16  Rada, R., Mili, H., Bicknell, E. and Blettner, M. Development and 
          application of a metric on semantic nets. In Proceedings of the 
          IEEE Transactions on Systems, Man, and Cybernetics, volume 19, 
          pages 17-30, 1989.

      17  Nguyen, H.A. and Al-Mubaid, H. New ontology based semantic 
          similarity mesaure for the biomedical domain. In Proceedings of 
          the IEEE International Conference on Granular Computing, pages 
          623-628, 2006.

SEE ALSO
  <http://search.cpan.org/dist/UMLS-Interface>

  <http://search.cpan.org/dist/UMLS-Similarity>

CONTACT US
  If you have any trouble installing and using UMLS-Interface, please
  contact us via the users mailing list :

  umls-similarity@yahoogroups.com

  You can join this group by going to:

  <http://tech.groups.yahoo.com/group/umls-similarity/>

  You may also contact us directly if you prefer :

    Bridget T. McInnes: bthomson at cs.umn.edu
    Ted Pedersen      : tpederse at d.umn.edu

AUTHORS
   Bridget T McInnes, University of Minnesota Twin Cities
   bthomson at cs.umn.edu

   Siddharth Patwardhan, University of Utah
   sidd at cs.utah.edu

   Serguei Pakhomov, University of Minnesota Twin Cities
   pakh002 at umn.edu

   Ted Pedersen, University of Minnesota Duluth
   tpederse at d.umn.edu

   Ying Liu, University of Minnesota
   liux0395 at umn.edu

DOCUMENTATION COPYRIGHT AND LICENSE
  Copyright (C) 2003-2010 Bridget T. McInnes, Siddharth Patwardhan,
  Serguei Pakhomov and Ted Pedersen.

  Permission is granted to copy, distribute and/or modify this document
  under the terms of the GNU Free Documentation License, Version 1.2 or
  any later version published by the Free Software Foundation; with no
  Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.

  Note: a copy of the GNU Free Documentation License is available on the
  web at:

  <http://www.gnu.org/copyleft/fdl.html>

  and is included in this distribution as FDL.txt.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 199:

Non-ASCII character seen before =encoding in 'Schütze'. Assuming CP1252