NAME

Lucy::Docs::IRTheory - Crash course in information retrieval.

ABSTRACT

Just enough Information Retrieval theory to find your way around Apache Lucy.

Terminology

Lucy uses some terminology from the field of information retrieval which may be unfamiliar to many users. "Document" and "term" mean pretty much what you'd expect them to, but others such as "posting" and "inverted index" need a formal introduction:

document - An atomic unit of retrieval.
term - An attribute which describes a document.
posting - One term indexing one document.
term list - The complete list of terms which describe a document.
posting list - The complete list of documents which a term indexes.
inverted index - A data structure which maps from terms to documents.

Since Lucy is a practical implementation of IR theory, it loads these abstract, distilled definitions down with useful traits. For instance, a "posting" in its most rarefied form is simply a term-document pairing; in Lucy, the class Lucy::Index::Posting::MatchPosting fills this role. However, by associating additional information with a posting like the number of times the term occurs in the document, we can turn it into a ScorePosting, making it possible to rank documents by relevance rather than just list documents which happen to match in no particular order.

TF/IDF ranking algorithm

Lucy uses a variant of the well-established "Term Frequency / Inverse Document Frequency" weighting scheme. A thorough treatment of TF/IDF is too ambitious for our present purposes, but in a nutshell, it means that...

in a search for skate park, documents which score well for the comparatively rare term skate will rank higher than documents which score well for the more common term park.
a 10-word text which has one occurrence each of both skate and park will rank higher than a 1000-word text which also contains one occurrence of each.

A web search for "tf idf" will turn up many excellent explanations of the algorithm.

To install Lucy::Simple, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Lucy::Simple

CPAN shell

perl -MCPAN -e shell
install Lucy::Simple

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	Go to GitHub issues (only if GitHub is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)

NAME

ABSTRACT

Terminology

TF/IDF ranking algorithm

Module Install Instructions

Keyboard Shortcuts