NAME

sitemapper.pl - script for generating site maps

SYNOPSIS

sitemapper.pl 
    [ -verbose <debug level> ] 
    [ -help ] 
    [ -doc ] 
    [ -depth <depth> ] 
    [ -proxy <proxy URL> ] 
    [ -[no]envproxy ] 
    [ -agent <agent> ]
    [ -authen ] 
    [ -format <html|text|js|xml> ] 
    [ -summary <no. chars> ] 
    [ -title <page title> ] 
    [ -email <e-mail address> ]
    [ -gui ]
    -url <root URL> 

DESCRIPTION

sitemapper.pl generates site maps for a given site. It traverses a site from the root URL given as the -site option and generates an HTML page consisting of a bulleted list which reflects the structure of the site.

The structure reflects the distance from the home page of the pages listed; i.e. the first level bullets are pages accessible directly from the home page, the next level, pages accessible from those pages, etc. Obviously, pages that are linked from "higher" up pages may appear in the "wrong place" in the tree, than they "belong".

The -format option can be used to specify alternative options for formating the site map. Currently the options are html (as described above - the default), js, which uses Jef Pearlman's (jef@mit.edu) Javascript Tree class to display the site map as a collapsable tree, and text (plain text).

OPTIONS

-depth <depth>

Option to specify the depth of the site map generated. If no specified, generates a sitemap of unlimited depth.

-email <e-mail address>

Option to specify the e-mail address which is reported by the robot to the site it gets pages from.

-url <root URL>

Option to specify a root URL to generate a site map for.

-proxy <proxy URL>

Specify an HTTP proxy to use.

-[no]envproxy

If -envproxy is set, the proxy specified by the $http_proxy environment variable will be used (this is the default behaviour). Use -noenvproxy to suppress this. -proxy takes precedence over -envproxy.

-agent <agent>

Allows the user to specify an agent for the robot to pretend to be (e.g. 'Mozilla/4.5'). This can be necessary for sites that do browser sniff for serving particular content, etc.

-format <formatting option>

Option for specifying the for the site map. Possible values are:

html

Plain old HTML bulleted list.

js

A collapsable DHTML tree, generated using Jef Pearlman's (jef@mit.edu) Javascript Tree class.

text

Plain text.

xml

An XML graph of linkage between pages.

-summary <no. chars>

Automatically extract a summary to display with the title. This will be truncated at the specified number of characters.

-title <page title>

Option to specify a page title for the site map.

-authen

Option to use LWP::AuthenAgent to get HTML pages. This allows the user to type username / password for pages that are access controlled.

-gui

Use a Tk GUI to run sitemapper.

-help

Display a short help message to standard output, with a brief description of purpose, and supported command-line switches.

-doc

Display the full documentation for the script, generated from the embedded pod format doc.

-version

Print out the current version number.

-verbose <debug level>

Turn on verbose error messages.

ENVIRONMENT

sitemapper.pl makes use of the $http_proxy environment variable, if it is set.

PREREQUISITES

Date::Format
HTML::Entities
Getopt::Long
IO::File
LWP::AuthenAgent
LWP::UserAgent
Pod::Usage
URI::URL
WWW::Sitemap

OSNAMES

hpux 10 PA-RISC1.1 
linux 2.2.1 ppc-linux 
linux 2.2.2 i686-linux 
MSWin32 4.0 MSWin32-x86 
sunos 4.1.4 sun4-sunos 
sunos 5.6 sun4-solaris

SEE ALSO

Jef Pearlman's Javascript Tree class (http://developer.netscape.com/docs/examples/dynhtml/tree.html)

BUGS

The Javascript sitemap has only been tested on Netscape 4.05.

AUTHOR

Ave Wrigley <Ave.Wrigley@itn.co.uk>

COPYRIGHT

Copyright (c) 1998 Canon Research Centre Europe. All rights reserved.

This script is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

SCRIPT CATEGORIES

Web