NAME
sitemapper - script for generating site maps
SYNOPSIS
sitemapper [ -verbose ] [ -help ] [-doc ] [ -authen <none|cl> ] [ -depth
<depth> ] [ -proxy <proxy URL> ] [ -format <ul|dl|js> ] [ -title <page title> ] -site <base URL>
DESCRIPTION
sitemapper generates site maps for a given site. It traverses a site from the base URL given as the "-site" in OPTIONS option and generates an HTML page consisting of a bulleted list which reflects the structure of the site.
The structure reflects the distance from the home page of the pages listed; i.e. the first level bullets are pages accessible directly from the home page, the next level, pages accessible from those pages, etc. Obviously, pages that are linked from "higher" up pages may appear in the "wrong place" in the tree, than they "belong".
The "-format" in OPTIONS option can be used to specify alternative options for formating the site map. Currently the options are ul (as described above - the default) js, which uses Jef Pearlman's (jef@mit.edu) javascript Tree class to display the site map as a collapsable tree, and dl, which also uses a bulleted list, but has a definition list with the page title and the relative URL for each page in each bullet.
OPTIONS
- -depth
-
Option to specify the depth of the site map generated. If no specified, generates a sitemap of unlimited depth.
- -site
-
Option to specify a base URL to generate a site map for.
- -proxy
-
Specify an HTTP proxy to use.
- -format
-
Option for specifying the for the site map. Possible values are:
- ul
-
Plain old HTML bulleted list.
- dl
-
Plain old HTML, with each bulleted list item as a definition list, containing the page title, and the relative URL path.
- js
-
A collapsable DHTML tree, genrated using Jef Pearlman's (jef@mit.edu) javascript Tree class.
- -title
-
Option to specify a page title for the site map.
- -authen
-
Option to specify behaviour for
Unauthorized
HTTP responses (401
). Possible values are:- prompt
-
Prompt for unsername / password to be typed for each unauthorized page
- none
-
Treat unauthorized pages as inaccessible.
- -help
-
Display a short help message to standard output, with a brief description of purpose, and supported command-line switches.
- -doc
-
Display the full documentation for the script, generated from the embedded pod format doc.
- -verbose
-
Print out the current version number.
ENVIRONMENT
sitemapper makes use of the $http_proxy
environment variable, if it is set.
SEE ALSO
Getopt::Long (Getopt::Long) IO::File (IO::File) LWP::UserAgent (LWP::UserAgent) HTML::LinkExtor (HTML::LinkExtor) URI::URL (URI::URL) Pod::Usage (Pod::Usage) MD5 (MD5) Date::Format (Date::Format) Jef Pearlman's javascript Tree class (http://developer.netscape.com/docs/examples/dynhtml/tree.html)
BUGS
Should use WWW::Robot (WWW::Robot) to do the site traversal.
The javascript sitemap has only been tested on Netscape 4.05.
AUTHOR
Ave Wrigley <wrigley@cre.canon.co.uk> Web Group, Canon Research Centre Europe
COPYRIGHT
Copyright (c) 1998 Canon Research Centre Europe. All rights reserved.
This script is free software; you can redistribute it and/or modify it under the same terms as Perl itself.