NAME

Search::Tools::QueryParser - convert string queries into objects

SYNOPSIS

use Search::Tools::QueryParser;
my $qparser = Search::Tools::QueryParser->new(
       
       # regex to define a query term (word)
           term_re        => qr/\w+(?:'\w+)*/,
       
       # or assemble a definition from the following
           word_characters     => q/\w\'\-/,
           ignore_first_char   => q/\+\-/,
           ignore_last_char    => q/\+\-/,
           term_min_length     => 1,
           
       # words to ignore
           stopwords           => [qw( the )],
           
       # query operators
           and_word            => q(and),
           or_word             => q(or),
           not_word            => q(not),
           phrase_delim        => q("),
           treat_uris_like_phrases => 1,
           ignore_fields       => [qw( site )],
           wildcard            => quotemeta(q(*)),
                       
       # language-specific settings
           stemmer             => &your_stemmer_here,       
           charset             => 'iso-8859-1',
           lang                => 'en_US',
           locale              => 'en_US.iso-8859-1',

       # development help
           debug               => 0,
   );
   
my $query    = $qparser->parse(q(the quick color:brown "fox jumped"));
my $terms    = $query->terms; # ['quick', 'brown', '"fox jumped"']

# a Search::Tools::RegEx object
my $regexp   = $query->regexp_for($terms->[0]); 

# the Search::Query::Dialect tree()
my $tree     = $query->tree;

print "$query\n";  # the quick color:brown "fox jumped"
print $query->str . "\n"; # same thing

DESCRIPTION

Search::Tools::QueryParser turns search queries into objects that can be applied for highlighting, spelling, and extracting matching snippets from source documents.

METHODS

new( %opts )

The new() method instantiates a QueryParser object. With the exception of parse(), all the following methods can be passed as key/value pairs in new().

BUILD

Called internally by new().

parse( query )

The parse() method parses query and returns a Search::Tools::Query object.

query must be a scalar string.

NOTE: All queries are converted to UTF-8. See the charset param.

stemmer

The stemmer function is used to find the root 'stem' of a word. There are many stemming algorithms available, including many on CPAN. The stemmer function should expect to receive two parameters: the QueryParser object and the word to be stemmed. It should return exactly one value: the stemmed word.

Example stemmer function:

use Lingua::Stem;
my $stemmer = Lingua::Stem->new;

sub mystemfunc {
    my ($parser, $word) = @_;
    return $stemmer->stem($word)->[0];
}

# and pass to the new() method:

my $qparser = Search::Tools::QueryParser->new(stemmer => \&mystemfunc);
    

stopwords

A list of common words that should be ignored in parsing out keyword terms. May be either a string that will be split on whitespace, or an array ref.

NOTE: If a stopword is contained in a phrase, then the phrase will be tokenized into words based on whitespace, then the stopwords removed.

end_bound

get_defaults

html_phrase_bound

phrase_delim

plain_phrase_bound

start_bound

tag_re

term_re

term_min_length

whitespace

word_characters

ignore_first_char

String of characters to strip from the beginning of all words.

ignore_last_char

String of characters to strip from the end of all words.

ignore_case

All queries are run through Perl's built-in lc() function before parsing. The default is 1 (true). Set to 0 (false) to preserve case.

ignore_fields

Value may be a hash or array ref of field names to ignore in query parsing. Example:

ignore_fields => [qw( site )]

would parse the query:

site:foo.bar AND baz   # terms = baz

default_field

Set the default field to be used in parsing the query, if no field is specified. The default is the empty string (the Search::Query::Parser default).

treat_uris_like_phrases

Boolean (default true (1)).

If set to true, queries like foo@bar.com will be treated like a single phrase "foo bar com" instead of being split into three separate terms.

and_word

Default: and|near\d*

or_word

Default: or

not_word

Default: not

wildcard

Default: *

locale

Set a locale explicitly. If not set, the locale is inherited from the LC_CTYPE environment variable.

LC_CTYPE

Imported function by locale pragma. Documented only to satisfy pod tests.

lang

Base language. If not set, extracted from locale or defaults to en_US.

charset

Base charset used for converting queries to UTF-8. If not set, extracted from locale or defaults to iso-8859-1.

query_class

The default is Search::Tools::Query but you can set your own to subclass the Query object.

query_dialect

The default is Search::Query::Dialect::Native but you can set your own. See the Search::Query::Dialect documentation.

LIMITATIONS

The special HTML chars &, < and > can pose problems in regexps against markup, so they are ignored in creating regular expressions if you include them in word_characters in new().

AUTHOR

Peter Karman <karman@cpan.org>

BUGS

Please report any bugs or feature requests to bug-search-tools at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Search-Tools. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

perldoc Search::Tools

You can also look for information at:

COPYRIGHT

Copyright 2009 by Peter Karman.

This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

SEE ALSO

Search::Query::Parser