NAME

Dezi::Lucy::Indexer - Dezi::App Apache Lucy indexer

SYNOPSIS

use Dezi::Lucy::Indexer;
my $indexer = Dezi::Lucy::Indexer->new(
   config               => Dezi::Indexer::Config->new(),
   invindex             => Dezi::Lucy::InvIndex->new(),
   highlightable_fields => 0,
);

DESCRIPTION

Dezi::Lucy::Indexer is an Apache Lucy based indexer class based on SWISH::3.

CONSTANTS

All the SWISH::3 constants are imported into this namespace, including:

SWISH_DOC_PROP_MAP
SWISH_INDEX_STEMMER_LANG
SWISH_INDEX_NAME
SWISH_INDEX_FORMAT

METHODS

Only new and overridden methods are documented here. See the Dezi::Indexer documentation.

BUILD

Implements basic object set up. Called internally by new().

In addition to the attributes documented in Dezi::Indexer, this class implements the following attributes:

highlightable_fields

Value should be 0 or 1. Default is 0. Passed directly to the constructor for Lucy::Plan::FullTextField objects as the value for the highlightable option.

swish3_handler( swish3_data )

Called by the SWISH::3::handler() function for every document being indexed.

finish

Calls commit() on the internal Lucy::Indexer object, writes the swish.xml header file and calls the superclass finish() method.

get_lucy

Returns the internal Lucy::Index::Indexer object.

abort

Sets the internal Lucy::Index::Indexer to undef, which should release any locks on the index. Also flags the Dezi::Lucy::Indexer object as stale.

MetaNames and PropertyNames

Some implementation notes about MetaNames and PropertyNames. See also http://dezi.org/2014/07/18/metanames-and-propertynames/.

  • A field defined as either a MetaName, PropertyName or both, can be searched.

  • Fields are matched against tag names in your XML/HTML documents. See also the TagAlias, UndefinedMetaTags, UndefinedXMLAttributes, and XMLClassAttributes directives.

  • You can alias field names with MetaNamesAlias and PropertyNamesAlias.

  • MetaNames are tokenized and case-insensitive and (optionally, with FuzzyIndexingMode) stemmed.

  • PropertyNames are stored, case-sensitive strings.

  • If a field is defined as both a MetaName and PropertyName, then it will be tokenized.

  • If a field is defined only as a MetaName, it will be parsed but not stored. That means you can search on the field but when you try and retrieve the field's value from the results, it will cause a fatal error.

  • If a field is defined only as a PropertyName, it will be parsed and stored, but it will not be tokenized. That means the field's contents are stored without being split up into words.

  • You can control the parsing and storage of PropertyName-only fields with the following additional directives:

    PropertyNamesCompareCase

    case sensitive search

    PropertyNamesIgnoreCase

    case insensitive search (default)

    PropertyNamesNoStripChars

    preserve whitespace

  • There are two default MetaNames defined: swishdefault and swishtitle.

  • There are two default PropertyNames defined: swishtitle and swishdescription.

  • The libswish3 XML and HTML parsers will automatically treat a <title> tag as swishtitle. Likewise they will treat <body> tag as swishdescription.

  • Things get complicated quickly when defining fields. Experiment with small test cases to arrive a the configuration that works best with your application.

AUTHOR

Peter Karman, <karpet@dezi.org>

BUGS

Please report any bugs or feature requests to bug-dezi-app at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Dezi-App. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

perldoc Dezi::App

You can also look for information at:

COPYRIGHT AND LICENSE

Copyright 2014 by Peter Karman

This library is free software; you can redistribute it and/or modify it under the terms of the GPL v2 or later.

SEE ALSO

http://dezi.org/, http://swish-e.org/, http://lucy.apache.org/