NAME

Text::Search - Perl module to allow quick searching of directories for given text.

Version 0.90

SYNOPSIS

use Text::Search;

Simple Search: my $term = 'foo AND bar';

my $search = Text::Search->new(); my @results = $search->Find($term);

foreach (@results) { print "Found $term in $_->{'FILENAME'} $_->{'OCCURENCES'} times.\n" };

RegEx Search: my @terms = ('(foo.*)','(bar)');

my $search = Text::Search->new(RegExSearch = '1'); my @results = $search->Find(@terms);

foreach (@results) { print "Found $_->{'FILENAME'} with $_->{'OCCURENCES'}.\n" };

DESCRIPTION

Text::Search takes in a given directory and search term, and will recursively search for all occurences for the term. Features include: extension filtering, binary filter (won't search binary files), simple and regex search expressions.

Information is returned as an array of hashes sorted descending by number of occurences.

CONSTRUCTORS

Text::Search->new()

$search = Text::Search->new([RegExSearch => '1'], [DocumentRoot => '/usr/home/mike'], [FileFilter => '(^.*\.htm*$)'] );

Prepares a search to be performed. The search will execute with a $search->Find().

RegExSearch = Set to 1 if this is to be a regular expression search. 0 if this is a simple search. (Default)

DocumentRoot = Where to begin the search from, search is recursive. Default is (/usr).

WebRoot = For use with a website. Only needs to be set if your DocumentRoot is set to something other than your WebRoot.

FileFilter = Regular expression to filter out unwanted files. Default is all files.

Recursive = Set to 0 to turn off recursive searching. Default is 1.

Highlight = Set to 1 to turn on Highlighting of matched words. Useful for bolding matched text in websites. Default is 0.

HighlightBegin = Customize the code to appear before a match. Default is '<b>'.

HighlightEnd = Customize the code to appear after a match. Default is '</b>'.

$search->Find()

@results = $search->Find('blowfish OR foo AND bar'); @results = $search->Find('(blowfish)|(foo)','(bar)');

Executes a search for the given terms.

Data is returned in an array of hashes which can be accessed like so:

foreach (@results) { print "$_->{'FILENAME'} : $_->{'FILEKSIZE'} : $_->{'LAST_MODIFIED'} : $_->{'OCCURENCES'}\n" };

OR

print "Most likely target is $results[0]{'FILENAME'}\n";

The following keys are available in the returned hash:

FULLNAME = Full path and filename of file. (ie. /home/doug/readme.txt)

FILENAME = Name of the file (ie. readme.txt)

FILEPATH = Path to the file (ie. /home/doug)

FILESIZE = Size of file in bytes.

FILEKSIZE = Size of file in kilobytes.

LAST_MODIFIED_EPOCH = Time since file was last modified in seconds.

LAST_MODIFIED = Date and Time of last modified in long format.

OCCURENCES = Number of times the search pattern was matched.

URL = For Website use, the path and filename of file from the given DocumentRoot.

SNIPPET = Snippet of text containing text of the first matched term.

TITLE = Pulled from the <TITLE> tag of an HTML page.

META_TITLE = Pulled from the <meta name="title"> tag.

META_DESCRIPTION = Pulled from the <meta name="description"> tag.

META_KEYWORDS = Pulled from the <meta name="keywords"> tag.

EXAMPLES

Simple script usage.

use Text::Search;

my $search = Text::Search->new(DocumentRoot => '/usr/home/bill/ebooks');

print "Searching:\n\n";

my @results = $search->Find('romeo AND juliette');

foreach (@results) { print "Found it $_->{'OCCURENCES'} times in $_->{'FILENAME'}\n" };

Web based application.

This is an excellent way to add a search engine to any html site.

require ("cgi-lib.pl"); use Text::Search;

&ReadParse (*input);

my $search = Text::Search->new(DocumentRoot => $ENV{'DOCUMENT_ROOT'}, FileFilter => '(^.*\.html$)');

my @results = $search->Find($input{'User_Search'});

print "Content-type: text/html \n\n";

print "Sorry, I couldn't find your request!" if (scalar @results == 0);

foreach (@results) { print "<a href=\"$_->{'URL'}\">$_->{'URL'}</a><br>Relevancy: $_->{'OCCURENCES'}<br>Last Updated: $_->{'LAST_MODIFIED'}<br><hr>\n" };

BUGS

Be careful when using simple search. It will attemp to quote any illegal characters, etc. But for security sake, do your own checks before passing user input into the simple search.

If you have trouble with the HTML grabbing (ie. Title tags, Meta tags), take a look at the syntax of the HTML document. Text::Search tries to be forgiving, but it expects something like this:

<title>This is my title</title>

<meta name="description" content="This is my wonderful website">

DISCLAIMER

This package is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

COPYRIGHT

Copyright (c) 2001 Mike Miller. All rights reserved.

LICENSE

This program is free software: you can redistribute it and/or modify it under the same terms as Perl itself.

AUTHOR

Mike Miller <mrmike@2bit.net>

2 POD Errors

The following errors were encountered while parsing the POD:

Around line 295:

You forgot a '=back' before '=head1'

Around line 360:

=back without =over