NAME
Search::Tools::QueryParser - convert string queries into objects
SYNOPSIS
use Search::Tools::QueryParser;
my $qparser = Search::Tools::QueryParser->new(
# regex to define a query term (word)
term_re => qr/\w+(?:'\w+)*/,
# or assemble a definition from the following
word_characters => q/\w\'\-/,
ignore_first_char => q/\+\-/,
ignore_last_char => q/\+\-/,
term_min_length => 1,
# words to ignore
stopwords => [qw( the )],
# query operators
and_word => q(and),
or_word => q(or),
not_word => q(not),
phrase_delim => q("),
treat_uris_like_phrases => 1,
ignore_fields => [qw( site )],
wildcard => quotemeta(q(*)),
# language-specific settings
stemmer => &your_stemmer_here,
charset => 'iso-8859-1',
lang => 'en_US',
locale => 'en_US.iso-8859-1',
# development help
debug => 0,
);
my $query = $qparser->parse(q(the quick color:brown "fox jumped"));
my $terms = $query->terms; # ['quick', 'brown', '"fox jumped"']
# a Search::Tools::RegEx object
my $regexp = $query->regexp_for($terms->[0]);
# the Search::Query::Dialect tree()
my $tree = $query->tree;
print "$query\n"; # the quick color:brown "fox jumped"
print $query->str . "\n"; # same thing
DESCRIPTION
Search::Tools::QueryParser turns search queries into objects that can be applied for highlighting, spelling, and extracting matching snippets from source documents.
METHODS
new( %opts )
The new() method instantiates a QueryParser object. With the exception of parse(), all the following methods can be passed as key/value pairs in new().
BUILD
Called internally by new().
parse( query )
The parse() method parses query and returns a Search::Tools::Query object.
query must be a scalar string.
NOTE: All queries are converted to UTF-8. See the charset
param.
stemmer
The stemmer function is used to find the root 'stem' of a word. There are many stemming algorithms available, including many on CPAN. The stemmer function should expect to receive two parameters: the QueryParser object and the word to be stemmed. It should return exactly one value: the stemmed word.
Example stemmer function:
use Lingua::Stem;
my $stemmer = Lingua::Stem->new;
sub mystemfunc {
my ($parser, $word) = @_;
return $stemmer->stem($word)->[0];
}
# and pass to the new() method:
my $qparser = Search::Tools::QueryParser->new(stemmer => \&mystemfunc);
stopwords
A list of common words that should be ignored in parsing out keyword terms. May be either a string that will be split on whitespace, or an array ref.
NOTE: If a stopword is contained in a phrase, then the phrase will be tokenized into words based on whitespace, then the stopwords removed.
end_bound
get_defaults
html_phrase_bound
phrase_delim
plain_phrase_bound
start_bound
tag_re
term_re
term_min_length
whitespace
word_characters
ignore_first_char
String of characters to strip from the beginning of all words.
ignore_last_char
String of characters to strip from the end of all words.
ignore_case
All queries are run through Perl's built-in lc() function before parsing. The default is 1
(true). Set to 0
(false) to preserve case.
ignore_fields
Value may be a hash or array ref of field names to ignore in query parsing. Example:
ignore_fields => [qw( site )]
would parse the query:
site:foo.bar AND baz # terms = baz
default_field
Set the default field to be used in parsing the query, if no field is specified. The default is the empty string (the Search::Query::Parser default).
treat_uris_like_phrases
Boolean (default true (1)).
If set to true, queries like foo@bar.com will be treated like a single phrase "foo bar com" instead of being split into three separate terms.
and_word
Default: and|near\d*
or_word
Default: or
not_word
Default: not
wildcard
Default: *
locale
Set a locale explicitly. If not set, the locale is inherited from the LC_CTYPE
environment variable.
LC_CTYPE
Imported function by locale pragma. Documented only to satisfy pod tests.
lang
Base language. If not set, extracted from locale
or defaults to en_US
.
charset
Base charset used for converting queries to UTF-8. If not set, extracted from locale
or defaults to iso-8859-1
.
query_class
The default is Search::Tools::Query
but you can set your own to subclass the Query object.
query_dialect
The default is Search::Query::Dialect::Native
but you can set your own. See the Search::Query::Dialect documentation.
LIMITATIONS
The special HTML chars &, < and > can pose problems in regexps against markup, so they are ignored in creating regular expressions if you include them in word_characters
in new().
AUTHOR
Peter Karman <karman@cpan.org>
BUGS
Please report any bugs or feature requests to bug-search-tools at rt.cpan.org
, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Search-Tools. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
SUPPORT
You can find documentation for this module with the perldoc command.
perldoc Search::Tools
You can also look for information at:
RT: CPAN's request tracker
AnnoCPAN: Annotated CPAN documentation
CPAN Ratings
Search CPAN
COPYRIGHT
Copyright 2009 by Peter Karman.
This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
SEE ALSO
Search::Query::Parser