NAME
Dezi::Lucy::Indexer - Dezi::App Apache Lucy indexer
SYNOPSIS
use Dezi::Lucy::Indexer;
my $indexer = Dezi::Lucy::Indexer->new(
config => Dezi::Indexer::Config->new(),
invindex => Dezi::Lucy::InvIndex->new(),
highlightable_fields => 0,
);
DESCRIPTION
Dezi::Lucy::Indexer is an Apache Lucy based indexer class based on SWISH::3.
CONSTANTS
All the SWISH::3 constants are imported into this namespace, including:
- SWISH_DOC_PROP_MAP
- SWISH_INDEX_STEMMER_LANG
- SWISH_INDEX_NAME
- SWISH_INDEX_FORMAT
METHODS
Only new and overridden methods are documented here. See the Dezi::Indexer documentation.
BUILD
Implements basic object set up. Called internally by new().
In addition to the attributes documented in Dezi::Indexer, this class implements the following attributes:
- highlightable_fields
-
Value should be 0 or 1. Default is 0. Passed directly to the constructor for Lucy::Plan::FullTextField objects as the value for the
highlightable
option.
swish3_handler( swish3_data )
Called by the SWISH::3::handler() function for every document being indexed.
finish
Calls commit() on the internal Lucy::Indexer object, writes the swish.xml
header file and calls the superclass finish() method.
get_lucy
Returns the internal Lucy::Index::Indexer object.
abort
Sets the internal Lucy::Index::Indexer to undef, which should release any locks on the index. Also flags the Dezi::Lucy::Indexer object as stale.
MetaNames and PropertyNames
Some implementation notes about MetaNames and PropertyNames. See also http://dezi.org/2014/07/18/metanames-and-propertynames/.
A field defined as either a MetaName, PropertyName or both, can be searched.
Fields are matched against tag names in your XML/HTML documents. See also the TagAlias, UndefinedMetaTags, UndefinedXMLAttributes, and XMLClassAttributes directives.
You can alias field names with MetaNamesAlias and PropertyNamesAlias.
MetaNames are tokenized and case-insensitive and (optionally, with FuzzyIndexingMode) stemmed.
PropertyNames are stored, case-sensitive strings.
If a field is defined as both a MetaName and PropertyName, then it will be tokenized.
If a field is defined only as a MetaName, it will be parsed but not stored. That means you can search on the field but when you try and retrieve the field's value from the results, it will cause a fatal error.
If a field is defined only as a PropertyName, it will be parsed and stored, but it will not be tokenized. That means the field's contents are stored without being split up into words.
You can control the parsing and storage of PropertyName-only fields with the following additional directives:
- PropertyNamesCompareCase
-
case sensitive search
- PropertyNamesIgnoreCase
-
case insensitive search (default)
- PropertyNamesNoStripChars
-
preserve whitespace
There are two default MetaNames defined: swishdefault and swishtitle.
There are two default PropertyNames defined: swishtitle and swishdescription.
The libswish3 XML and HTML parsers will automatically treat a <title> tag as swishtitle. Likewise they will treat <body> tag as swishdescription.
Things get complicated quickly when defining fields. Experiment with small test cases to arrive a the configuration that works best with your application.
AUTHOR
Peter Karman, <karpet@dezi.org>
BUGS
Please report any bugs or feature requests to bug-dezi-app at rt.cpan.org
, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Dezi-App. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
SUPPORT
You can find documentation for this module with the perldoc command.
perldoc Dezi::App
You can also look for information at:
Website
IRC
#dezisearch at freenode
Mailing list
RT: CPAN's request tracker
AnnoCPAN: Annotated CPAN documentation
CPAN Ratings
Search CPAN
COPYRIGHT AND LICENSE
Copyright 2014 by Peter Karman
This library is free software; you can redistribute it and/or modify it under the terms of the GPL v2 or later.
SEE ALSO
http://dezi.org/, http://swish-e.org/, http://lucy.apache.org/