NAME
Plucene::Simple - An interface to Plucene
SYNOPSIS
use
Plucene::Simple;
# create an index
my
$plucy
= Plucene::Simple->
open
(
$index_path
);
# add to the index
$plucy
->add(
$id1
=> {
$field
=>
$term1
},
$id2
=> {
$field
=>
$term2
},
);
# or ...
$plucy
->index_document(
$id
=>
$data
);
# search an existing index
my
$plucy
= Plucene::Simple->
open
(
$index_path
);
my
@results
=
$plucy
->search(
$search_string
);
# optimize the index
$plucy
->optimize;
# remove something from the index
$plucy
->delete_document(
$id
);
# is something in the index?
if
(
$plucy
->indexed(
$id
) { ... }
DESCRIPTION
This provides a simple interface to Plucene. Plucene is large and multi-featured, and it expected that users will subclass it, and tie all the pieces together to suit their own needs. Plucene::Simple is, therefore, just one way to use Plucene. It's not expected that it will do exactly what *you* want, but you can always use it as an example of how to build your own interface.
INDEXING
open
You make a new Plucene::Simple object like so:
my
$plucy
= Plucene::Simple->
open
(
$index_path
);
If this index doesn't exist, then it will be created for you, otherwise you will be adding to an exisiting one.
Then you can add your documents to the index:
add
Every document must be indexed with a unique key (which will be returned from searches).
A document can be made up of many fields, which can be added as a hashref:
$plucy
->add(
$key
, \
%data
);
$plucy
->add(
chap1
=> {
title
=>
"Moby-Dick"
,
author
=>
"Herman Melville"
,
text
=>
"Call me Ishmael ..."
},
chap2
=> {
title
=>
"Boo-Hoo"
,
author
=>
"Lydia Lee"
,
text
=>
"..."
,
}
);
index_document
Alternatively, if you do not want to index lots of metadata, but rather just simple text, you can use the index_document() method.
$plucy
->index_document(
$key
,
$data
);
$plucy
->index_document(
chap1
=>
'Call me Ishmael ...'
);
delete_document
$plucy
->delete_document(
$id
);
optimize
$plucy
->optimize;
Plucene is set-up to perform insertions quickly. After a bunch of inserts it is good to optimize() the index for better search speed.
SEARCHING
search
my
@ids
=
$plucy
->search(
'ishmael'
);
# ("chap1", ...)
This will return the IDs of each document matching the search term.
If you have indexed your documents with fields, you can also search with the field name as a prefix:
my
@ids
=
$plucy
->search(
"author:lee"
);
# ("chap2" ...)
my
@results
=
$plucy
->search(
$search_string
);
This will search the index with the given query, and return a list of document ids.
Searches can be much more powerful than this - see Plucene for further details.
search_during
my
@results
=
$lucy
->search_during(
$search_string
,
$date1
,
$date2
);
my
@results
=
$lucy
->search_during(
"to:Fred"
,
"2001-01-01"
=>
"2003-12-31"
);
If your documents were given an ISO 'date' field when indexing, search_during() will restrict the results to all documents between the specified dates. Any document without a 'date' field will be ignored.
indexed
if
(
$plucy
->indexed(
$id
) { ... }
This returns true if there is a document with the given ID in the index.
COPYRIGHT
Copyright (C) 2003-2004 Kasei Limited