NAME
Apache::Wyrd::Site::IndexBot - Sample 'bot for forcing index builds
SYNOPSIS
Sample Implementation:
package BASENAME::IndexBot;
use strict;
use base qw(Apache::Wyrd::Site::MySQLIndexBot BASENAME::Wyrd);
use BASENAME::Index;
sub params {
my ($self) = @_;
my $params = {
basefile => $self->dbl->req->document_root . '/var/indexbot',
server_hostname => $self->dbl->req->server->server_hostname,
document_root => $self->dbl->req->document_root,
fastindex => $self->_flags->fastindex || 0,
purge => $self->_flags->purge || 0,
realclean => $self->_flags->realclean || 0,
};
return $params;
}
sub _work {
my ($self) = @_;
my $index = BASENAME::Index->new;
$index->delete_index if ($self->{'purge'});
$self->index_site($index);
}
Sample Usage:
<BASENAME::IndexBot refresh="20" expire="40" flags="reverse, purge">
<BASENAME::Template name="meta">$:meta</BASENAME::Attribute>
<H1>Rebuilding the Index</H1>
<H2>$:status</H2>
$:view
</BASENAME::Page>
</BASENAME::IndexBot>
DESCRIPTION
The IndexBot is an Apache::Wyrd::Bot
object which performs the action of causing a site to be completely indexed, and any remaining deleted documents purged from the index. It does so by reading the name of existing files from the document root down, purging files that are no longer found in that file- tree, and generating HTTP requests for all the pages which are found.
As these pages are "Indexable Pages", they update their own index pages when loaded by the server in answer to the HTTP request.
It should be used in a webmaster-protected section of the site for two reasons: 1. providing public access to the indexing bot is inviting a denial- of-service attack, since indexing is very resource-intensive and 2. The Apache:Wyrd::Site::IndexBot
"borrows" the webmaster's authorization cookie in order to be granted full access to the site.
HTML ATTRIBUTES
FLAGS
- purge
-
Clear the entire index beforehand. When a first-time or major change has been made to a site, this tends to speed up the process by eliminating the need to detect and purge stale data.
- fastindex
-
Only purge missing documents and index documents that have changed or have been added since the last build.
- reverse
-
Per
Apache::Wyrd::Bot
. Show the bot output log in reverse, with newest events at the top.
PERL METHODS
(format: (returns) name (arguments after self))
- (void)
_work
(void) -
Per
Apache::Wyrd::Bot
. Each site must provide a _work method to the Bot in which the index is given as a reference and pass that index as the argument to the index_site method. - (void)
index_site
(Index Object Ref) -
Performs the indexing.
BUGS/CAVEATS
Other bugs/caveats per Apache::Wyrd::Bot
. Also reserves the methods index_site and purge_missing.
AUTHOR
Barry King <wyrd@nospam.wyrdwright.com>
SEE ALSO
- Apache::Wyrd
-
General-purpose HTML-embeddable perl object
- Apache::Wyrd::Bot
-
Server-launched, monitored processes.
- Apache::Wyrd::Page
-
Construct and track a page of an integrated site
LICENSE
Copyright 2002-2007 Wyrdwright, Inc. and licensed under the GNU GPL.
See LICENSE under the documentation for Apache::Wyrd
.