NAME

Search::Tools::RegExp - build regular expressions from search queries

SYNOPSIS

my $regexp = Search::Tools::RegExp->new();

my $kw = $regexp->build('the quick brown fox');

for my $w ($kw->keywords)
{
   my $r = $kw->re( $w );
   
   # the word itself
   printf("the word is %s\n", $r->word);
   
   # is it flagged as a phrase?
   print "the word is a phrase\n" if $r->phrase;
   
   # each of these are regular expressions
   print $r->plain;
   print $r->html;
}

DESCRIPTION

Build regular expressions for a string of text.

All text is converted to UTF-8 automatically if it isn't already, via the Search:Tools::Keywords module.

VARIABLES

The following package variables are defined:

UTF8Char: Regexp defining a valid UTF-8 word character. Default \w.
WordChar: Default word_characters regexp. Defaults to UTF8Char plus ', . and -.
IgnFirst: Default ignore_first_char regexp. Defaults to ' and -.
IgnLast: Default ignore_last_char regexp. Defaults to ', . and -.
PhraseDelim: Phrase delimiter character. Default is double-quote '"'.
Wildcard: Character to use as a wildcard. Default is asterik '*'.

METHODS

new

Create new object. The following parameters are also accessors:

kw: A Search::Tools::Keywords object, if you want to pass in one instead of having one made for you.
wildcard: The wildcard character. Default is $Wildcard.
word_characters: Regexp for what characters constitute a 'word'. Default is $WordChar.
ignore_first_char: Default is $IgnFirst.
ignore_last_char: Default is $IgnLast.
stemmer: Stemming code ref passed through to the default Search::Tools::Keywords object.
phrase_delim: Phrase delimiter. Defaults to $PhraseDelim.
stopwords: Words to be ignored.
debug: Turn on helpful info on stderr.

isHTML( str )

Returns true if str contains anything that looks like HTML markup:

< > or &[#\w]+;

This is a naive check but useful for internal purposes.

build( str )

Returns a Search::Tools::RegExp::Keywords object.

BUGS and LIMITATIONS

The special HTML chars &, < and > can pose problems in regexps against markup, so they are ignored if you include them in word_characters in new().

AUTHOR

Peter Karman perl@peknet.com

Thanks to Atomic Learning www.atomiclearning.com for sponsoring the development of this module.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)

NAME

SYNOPSIS

DESCRIPTION

VARIABLES

METHODS

new

isHTML( str )

build( str )

BUGS and LIMITATIONS

AUTHOR

COPYRIGHT

SEE ALSO

NAME

SYNOPSIS

DESCRIPTION

VARIABLES

METHODS

new

isHTML( str )

build( str )

BUGS and LIMITATIONS

AUTHOR

COPYRIGHT

SEE ALSO

Module Install Instructions