NAME

sitemapper - script for generating site maps

SYNOPSIS

sitemapper [ -verbose ] [ -help ] [-doc ] [ -depth <depth> ] [ -proxy <proxy URL> ] [ -authen <none|prompt|apache> ] [ -access <access config file> ] [ -format <html|text|js> ] [ -abstract <no. chars> ] [ -title <page title> ] -site <base URL> 

DESCRIPTION

sitemapper generates site maps for a given site. It traverses a site from the base URL given as the "-site" in OPTIONS option and generates an HTML page consisting of a bulleted list which reflects the structure of the site.

The structure reflects the distance from the home page of the pages listed; i.e. the first level bullets are pages accessible directly from the home page, the next level, pages accessible from those pages, etc. Obviously, pages that are linked from "higher" up pages may appear in the "wrong place" in the tree, than they "belong".

The "-format" in OPTIONS option can be used to specify alternative options for formating the site map. Currently the options are html (as described above - the default), js, which uses Jef Pearlman's (jef@mit.edu) javascript Tree class to display the site map as a collapsable tree, and text (plain text).

OPTIONS

-depth

Option to specify the depth of the site map generated. If no specified, generates a sitemap of unlimited depth.

-site

Option to specify a base URL to generate a site map for.

-proxy

Specify an HTTP proxy to use.

-format

Option for specifying the for the site map. Possible values are:

html

Plain old HTML bulleted list.

js

A collapsable DHTML tree, genrated using Jef Pearlman's (jef@mit.edu) javascript Tree class.

text

Plain text.

-abstract <no. chars>

Automatically extract an abstract to display with the title. This will be truncated at the specified number of characters.

-title

Option to specify a page title for the site map.

-authen

Option to specify behaviour for Unauthorized HTTP responses (401). Possible values are:

none

Treat unauthorized pages as inaccessible.

prompt

Prompt for unsername / password to be typed for each unauthorized page

apache

Use an apache style config file to describe the access control for the site. The config file is given by the "-access" in OPTIONS option.

-access

Name of the apache style access control config file - used when the "-authen" in OPTIONS option is specified as apache. An example, access.conf is included with this distribution, along with an example password file, htpasswd. Note that: htpasswrd contains non-encripted passwords.

-help

Display a short help message to standard output, with a brief description of purpose, and supported command-line switches.

-doc

Display the full documentation for the script, generated from the embedded pod format doc.

-verbose

Print out the current version number.

ENVIRONMENT

sitemapper makes use of the $http_proxy environment variable, if it is set.

SEE ALSO

Getopt::Long (Getopt::Long) IO::File (IO::File) LWP::UserAgent (LWP::UserAgent) HTML::LinkExtor (HTML::LinkExtor) URI::URL (URI::URL) Pod::Usage (Pod::Usage) MD5 (MD5) Date::Format (Date::Format) Jef Pearlman's javascript Tree class (http://developer.netscape.com/docs/examples/dynhtml/tree.html)

BUGS

Should use WWW::Robot (WWW::Robot) to do the site traversal.

The javascript sitemap has only been tested on Netscape 4.05.

AUTHOR

Ave Wrigley <wrigley@cre.canon.co.uk> Web Group, Canon Research Centre Europe

COPYRIGHT

Copyright (c) 1998 Canon Research Centre Europe. All rights reserved.

This script is free software; you can redistribute it and/or modify it under the same terms as Perl itself.