NAME

Search::Indexer::Incremental::MD5 - Incrementaly index your files

SYNOPSIS

  use File::Find::Rule ;
  
  use Readonly ;
  Readonly my $DEFAUT_MAX_FILE_SIZE_INDEXING_THRESHOLD => 300 << 10 ; # 300KB
  
  my $indexer 
	= Search::Indexer::Incremental::MD5::Indexer->new
		(
		USE_POSITIONS => 1, 
		INDEX_DIRECTORY => 'text_index', 
		get_perl_word_regex_and_stop_words(),
		) ;
  
  my @files = File::Find::Rule
		->file()
		->name( '*.pm', '*.pod' )
		->size( "<=$DEFAUT_MAX_FILE_SIZE_INDEXING_THRESHOLD" )
		->not_name(qr[auto | unicore | DateTime/TimeZone | DateTime/Locale])
		->in('.') ;
  
  indexer->add_files(@files) ;
  indexer->add_files(@more_files) ;
  indexer = undef ;
  
  my $search_string = 'find_me' ;
  my $searcher = 
	eval 
	{
	Search::Indexer::Incremental::MD5::Searcher->new
		(
		USE_POSITIONS => 1, 
		INDEX_DIRECTORY => 'text_index', 
		get_perl_word_regex_and_stop_words(),
		)
	} or croak "No full text index found! $@\n" ;
  
  my $results = $searcher->search($search_string) ;
  
  # sort in decreasing score order
  my @indexes = map { $_->[0] }
		    reverse
		        sort { $a->[1] <=> $b->[1] }
			    map { [$_, $results->[$_]{SCORE}] }
			        0 .. $#$results ;
  
  for (@indexes)
	{
	print "$results->[$_]{PATH} [$results->[$_]{SCORE}].\n" ;
	}
	
  $searcher = undef ;
  

DESCRIPTION

This module implements an incrementatl text indexer and searcher based on Search::Indexer.

DOCUMENTATION

Given a list of files, this module will allow you to create an indexed text database that you can later query for matches. You can also use the siim command line application installed with this module.

SUBROUTINES/METHODS

delete_indexing_databases($index_directory)

Removes all the index databases from the passed directory

Arguments

  • $index_directory - location of the index databases

Returns - Nothing

Exceptions - Can't remove index databases.

get_file_MD5($file)

Returns the MD5 of the $file argument.

Arguments

$file - The location of the file to compute an MD5 for

Returns - A string containing the file md5

Exceptions - fails if the file can't be open

new( %named_arguments)

Create a Search::Indexer::Incremental::MD5::Indexer object.

my $indexer = new Search::Indexer::Incremental::MD5::Indexer(%named_arguments) ;

Arguments - %named_arguments

%named_arguments -

Returns - A Search::Indexer::Incremental::MD5::Indexer object

Exceptions -

  • Incomplete argument list

  • Error creating index directory

  • Error creating index metadata database

  • Error creating a Search::Indexer object

add_files(%named_arguments)

Adds the contents of the files passed as arguments to the index database. Files already indexed are checked and re-indexed only if their content has changed

Arguments %named_arguments

FILES - Array reference - a list of files to add to the index
DONE_ONE_FILE_CALLBACK - sub reference - called everytime a file is handled
$file_name - the name of the file re-indexed
$file_info - Hash reference
  • STATE - Boolean -

    0 - up to date, no re-indexing necessary
    1 - file content changed since last index, re-indexed
  • TIME - Float - re_indexing time

Returns - Hash reference keyed on the file name

  • STATE - Boolean -

    0 - up to date, no re-indexing necessary
    1 - file content changed since last index, re-indexed
  • TIME - Float - re_indexing time

Exceptions

new( %named_arguments)

Create a Search::Indexer::Incremental::MD5::Searcher object.

my $indexer = new Search::Indexer::Incremental::MD5::Searcher(%named_arguments) ;

Arguments - %named_arguments

-

Returns - A Search::Indexer::Incremental::MD5::Searcher object

Exceptions -

  • Incomplete argument list

  • Error creating index directory

  • Error opening index metadata database

  • Error creating a Search::Indexer object

search(%named_arguments)

search for $search_string in the index database

Arguments %named_arguments

SEARCH_STRING - Query string see Search::Indexer

Returns - Array reference - each entry contains

  • SCORE - the score obtained by the file when applying the query

  • PATH - the path to the file

  • MD5 - the file MD5 when the indexing was done

BUGS AND LIMITATIONS

None so far.

AUTHOR

Nadim ibn hamouda el Khemir
CPAN ID: NH
mailto: nadim@cpan.org

LICENSE AND COPYRIGHT

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

SUPPORT

You can find documentation for this module with the perldoc command.

perldoc Search::Indexer::Incremental::MD5

You can also look for information at:

SEE ALSO