NAME
NNexus::Classification
- Dismabiguation logic for NNexus concept harvests
SYNOPSIS
$concepts_refined
= disambiguate(
$concept_harvest
,
%options
);
$similarity_score
= msc_similarity(
$category1
,
$category2
);
DESCRIPTION
NNexus::Classification contains disambiguation and clustering algorithms for determining a subset of "relevant" concept candidates from a given concept harvest. Relevance is determined heuristically.
The current algorithm considers two facets of "relevance":
1. Relevant candidates come from empirically similar domains of knowledge.
To this extent, a similarity metric
has
been extracted from 3+ million mathematical reviews
in Zentrallblatt Math,
each
annotated
with
categories from the Math Subject Classification.
2. Technical terms are more likely to be relevant. Consequently:
- The more words in a candidate, the more likely that it is a term
- The more characters in a candidate, the more likely that it is a term
METHODS
$concepts_refined = disambiguate($concept_harvest,%options);
-
Disambiguates a concept harvest, as returned by NNexus::Discover, following the algorithm in the description.
Currently the only accepted option is a boolean value for "verbosity".
$similarity_score = msc_similarity($category1,$category2);
-
Retrieves the ZBL similarity score of two MSC categories given via the standard MSC naming scheme (e.g. 00-XX, 15Axx, 15B33)
Note that currently the similarity metric only covers the top-level MSC categories.
AUTHOR
Deyan Ginev <d.ginev@jacobs-university.de>
COPYRIGHT
Research software, produced as part of work done by
the KWARC group at Jacobs University Bremen.
Released under the MIT License (MIT)