NAME

HTML::GenToc - Generate a Table of Contents (ToC) for HTML documents.

SYNOPSIS

  use HTML::GenToc;

  # create a new object
  my $toc = new HTML::GenToc();

  my $toc = new HTML::GenToc(["--title", "Table of Contents",
			  "--toc", $my_toc_file,
			  "--tocmap", $my_tocmap_file,
    ]);

  my $toc = new HTML::GenToc(\@ARGV);

  # add further arguments
  $toc->args(["--toc_tag", "BODY",
	     "--toc_tag_replace", 0,
    ]);

  # generate anchors for a file
  $toc->generate_anchors(["--file", $html_file,
			 "--nooverwrite"
    ]);

  # generate a ToC from a file
  $toc->generate_toc(["--file", $html_file,
		     "--footer", $footer_file,
		     "--header", $header_file
    ]);

DESCRIPTION

HTML::GenToc allows you to specify "significant elements" that will be hyperlinked to in a "Table of Contents" (ToC) for a given set of HTML documents.

Basically, the ToC generated is a multi-level level list containing links to the significant elements. HTML::GenToc inserts the links into the ToC to significant elements at a level specified by the user.

Example:

If H1s are specified as level 1, than they appear in the first level list of the ToC. If H2s are specified as a level 2, than they appear in a second level list in the ToC.

See "ToC Map File" on how to tell HTML::GenToc what are the significant elements and at what level they should occur in the ToC.

See "Config File" on how to tell HTML::GenToc not only what are the significant elements and their levels, but all options you want to use as defaults.

There are two phases to the ToC generation. The first phase is to put suitable anchors into the HTML documents, and the second phase is to generate the ToC from HTML documents which have anchors in them for the ToC to link to.

For more information on controlling the contents of the created ToC, see "Formatting the ToC".

HTML::GenToc also supports the ability to incorporate the ToC into the HTML document itself via the -inline option. See "Inlining the ToC" for more information.

In order for HTML::GenToc to support linking to significant elements, HTML::GenToc inserts anchors into the significant elements. One can use HTML::GenToc as a filter, outputing the result to another file, or one can overwrite the original file, with the original backed up with a suffix (default: "org") appended to the filename.

A Note about Arguments

Because this is a subclass of AppConfig, one can use all the power of AppConfig for defining and parsing options/arguments.

All arguments can be set when the object is created, and further options can be set on any method (though some may not make sense). Methods expect a reference to an array (which will then be processed as if it were a command-line, which makes this very easy to use from scripts).

Options can start with '--' or '-'. If it is a yes/no option, that is the only part of the option (and such an option can be prefaced with "no" to negate it). If the option takes a value, then the list must be ("--option", "value").

Order does matter. For options which are yes/no options, a later argument overrides an earlier one. For arguments which are single values, a later value replaces an earlier one. For arguments which are cumulative, a later argument is added on to the list. For such arguments, if you want to clear the old value and start afresh, give it the special value of CLEAR.

Methods

  • new

    $toc = new HTML::GenToc();
    
    $toc = new HTML::GenToc(\@args);
    
    $toc = new HTML::GenToc(["--config", $my_config_file,
        ]);

    Creates a new HTML::GenToc object. Optionally takes one argument, a reference to an array of arguments, which will be used in invocations of other methods.

    Common Arguments:

    The following arguments apply to both the generate_anchors and generate_toc methods.

    • --bak string

      If the input file/files is/are being overwritten (--overwrite is on), copy the original file to "filename.string". If the value is empty, there is no backup file written. (def:org)

    • --config file

      A file containing options, which is read in, and the options from the file are treated as if they were in the argument list at the point at which the --config option was. See "Config File" for more information.

    • --debug

      Enable verbose debugging output. Used for debugging this module; in other words, don't bother. (def:off)

    • --file file

      Input file. This is a cumulative list argument. If you want to process more than one file, just add another --file file to the list of arguments. If you want to process a different file, you need to CLEAR this argument before you call a particular method. (def:undefined)

    • --infile file

      (same as --file)

    • --overwrite

      Overwrite the input file with the output. If this is in effect, --outfile and --toc_file are ignored. Used in generate_anchors for creating the anchors "in place" and in generate_toc if the --inline option is in effect. (def:off)

    • --quiet

      Suppress informative messages.

    • --toc_after tag=suffix

      For defining significant elements. The tag is the HTML tag which marks the start of the element. The suffix is what is required to be appended to the Table of Contents entry generated for that tag. This is a cumulative hash argument; if you wish to clear it, give --toc_after CLEAR to do so. (def: undefined)

    • --toc_before tag=prefix

      For defining significant elements. The tag is the HTML tag which marks the start of the element. The prefix is what is required to be prepended to the Table of Contents entry generated for that tag. This is a cumulative hash argument; if you wish to clear it, give --toc_before CLEAR to do so. (def: undefined)

    • --toc_end tag=endtag

      For defining significant elements. The tag is the HTML tag which marks the start of the element. The endtag the HTML tag which marks the end of the element. When matching in the input file, case is ignored (but make sure that all your tag options referring to the same tag are exactly the same!). This is a cumulative hash argument; if you wish to clear the default, give --toc_end CLEAR to do so. (def: H1=/H1 H2=/H2)

    • --toc_entry tag=level

      For defining significant elements. The tag is the HTML tag which marks the start of the element. The level is what level the tag is considered to be. The value of level must be numeric, and non-zero. If the value is negative, consective entries represented by the significant_element will be separated by the value set by --entrysep option. This is a cumulative hash argument; if you wish to clear the default, give --toc_entry CLEAR to do so. (def: H1=1 H2=2)

    • --tocmap file

      ToC map file defining significant elements. This is read in immediately, and overrides any previous toc_entry, toc_end, toc_before and toc_after options. However, they can be cleared and/or added to by later options. See "ToC Map File" for further information.

  • generate_anchors

    $toc->generate_anchors(["--outfile", "index2.html",
        ]);

    Generates anchors for the significant elements in the HTML documents. Optionally takes one argument, a reference to an array of arguments, which will be used to influence this method's behavour (and if arguments have already been set earlier, they also will be taken into account).

    Arguments:

    These arguments apply only to this method, but see above for common arguments.

    • --outfile file

      File to write the output to. This is where the modified be-anchored HTML output goes to. Note that it doesn't make sense to use this option if you are processing more than one file. If you give '-' as the filename, then output will go to STDOUT. (def: STDOUT)

    • --useorg

      Use pre-existing backup files as the input source; that is, files of the form infile.bak (see --infile and --bak).

  • generate_toc

    $toc->generate_toc(\@args);

    Generates a Table of Contents (ToC) for the significant elements in the HTML documents. Optionally takes one argument, a reference to an array of arguments, which will be used to influence this method's behavour (and if arguments have already been set earlier, they also will be taken into account).

    Arguments:

    These arguments apply only to this method, but see above for common arguments.

    • --entrysep string

      Separator string for non-<li> item entries (def: ", ")

    • --footer file

      File containing footer text for ToC

    • --header file

      File containing header text for ToC.

    • --inline

      Put ToC in document at a given point. See "Inlining the ToC" for more information.

    • --ol

      Use an ordered list for level 1 ToC entries.

    • --textonly

      Use only text content in significant elements.

    • --title string

      Title for ToC page (if not using --header or --inline or --toc_only) (def: "Table of Contents")

    • --toc_file file / --toc file

      File to write the output to. This is where the ToC goes. If you give '-' as the filename, then output will go to STDOUT. (def: STDOUT)

    • --toc_label string

      HTML text that labels the ToC. Always used. (def: "<H1>Table of Contents</H1>")

    • --toc_tag string

      If a ToC is to be included inline, this is the pattern which is used to match the tag where the ToC should be put. This can be a start-tag, an end-tag or a comment, but the < should be left out; that is, if you want the ToC to be placed after the BODY tag, then give "BODY". If you want a special comment tag to make where the ToC should go, then include the comment marks, for example: "!--toc--" (def:BODY)

    • --toc_tag_replace

      In conjunction with --toc_tag, this is a flag to say whether the given tag should be replaced, or if the ToC should be put after the tag. (def:false)

    • --toc_only / --notoc_only

      Output only the Table of Contents, that is, the Table of Contents plus the toc_label. If there is a --header or a --footer, these will also be output. If --toc_only is false (i.e. --notoc_only is set) then if there is no --header, and --inline is not true, then a suitable HTML page header will be output, and if there is no --footer and --inline is not true, then a HTML page footer will be output. (def:--notoc_only)

    • --toclabel string

      (same as --toc_label)

  • args

    $toc->args(\@args);
    
    $toc->args(["--file", "CLEAR"]);

    Updates the current arguments/options of the HTML::GenToc object. Takes a reference to an array of arguments, which will be used in invocations of other methods.

Config File

The Config file is a way of specifying default options (including specifying significant elements) in a file instead of having to do it when you call a particular method.

The file may contain blank lines and comments (prefixed by '#') which are ignored. Continutation lines may be marked by ending the line with a '\'.

# this is a comment
toc_label = <h1>Table of Wonderful and Inexplicably Joyous \
Things You Want To Know About</h1>

Options that are simple flags and do not expect an argument can be specified without any value. They will be set with the value 1, with any value explicitly specified (except "0" and "off") being ignored. The option may also be specified with a "no" prefix to implicitly set the variable to 0.

quiet                                 # on (1)
quiet = 1                             # on (1)
quiet = 0                             # off (0)
quiet off                             # off (0)
quiet on                              # on (1)
quiet mumble                          # on (1)
noquiet                               # off (0)

Options that expect an argument (but are not cumulative) will be set to whatever follows the variable name, up to the end of the current line. An equals sign may be inserted between the option and value for clarity.

bak = org
bak   bak

Each subsequent re-definition of the option value overwites the previous value. From the above example, the value of the backup suffix would now be "bak".

Some options are simple cumulative options, with each subsequent definition of the option adding to the list of previously set values for that option.

file = index.html
file = about.html

If you want to clear the list and start again, give the CLEAR option.

file = CLEAR

Some options are "hash" cumulative options, building up a hash of key=value pairs. Each subsequent definition creates a new key and value in the hash array of that option.

toc_entry H1=1
toc_entry H2=2
toc_end H1=/H1
toc_end H2=/H2
toc_before H1=<STRONG>
toc_after H1=</STRONG>

This is probably the most useful part, because one can use this to define the significant elements, and other defaults all in one file, rather than having a separate tocmap file.

If you want to clear the hash and start again, give the CLEAR option.

toc_before CLEAR
toc_after CLEAR

The '-' prefix can be used to reset a variable to its default value and the '+' prefix can be used to set it to 1.

-quiet
+debug

Option values may contain references to other options, environment variables and/or users' home directories.

tocmap = ~/.tocmap	# expand '~' to home directory

quiet = ${TOC_QUIET}   # expand TOC_QUIET environment variable

The configuration file may have options arranged in blocks. A block header, consisting of the block name in square brackets, introduces a configuration block. The block name and an underscore are then prefixed to the names of all options subsequently referenced in that block. The block continues until the next block definition or to the end of the current file.

[toc]
entry H1=1              # toc_entry H1=1
entry H2=2              # toc_entry H2=2
end H1=/H1              # toc_end H1=/H1
end H2=/H2              # toc_end H2=/H2

See AppConfig for more information.

ToC Map File

For backwards compatibility with htmltoc, this method of specifying significant elements for the ToC is retained, but see also "Config File" for an alternative method.

The ToC map file allows you to tell HTML::GenToc what significant elements to include in the ToC, what level they should appear in the ToC, and any text to include before and/or after the ToC entry. The format of the map file is as follows:

significant_element:level:sig_element_end:before_text,after_text
significant_element:level:sig_element_end:before_text,after_text
...

Each line of the map file contains a series of fields separated by the `:' character. The definition of each field is as follows:

  • significant_element

    The tag name of the significant element. Example values are H1, H2, H5. This field is case-insensitive.

  • level

    What level the significant element occupies in the ToC. This value must be numeric, and non-zero. If the value is negative, consective entries represented by the significant_element will be separated by the value set by -entrysep option.

  • sig_element_end (Optional)

    The tag name that signifies the termination of the significant_element.

    Example: The DT tag is a marker in HTML and not a container. However, one can index DT sections of a definition list by using the value DD in the sig_element_end field (this does assume that each DT has a DD following it).

    If the sig_element_end is empty, then the corresponding end tag of the specified significant_element is used. Example: If H1 is the significant_element, than HTML::GenToc looks for a "</H1>" for terminating the significant_element.

    Caution: the sig_element_end value should not contain the `<` and `>' tag delimiters. If you want the sig_element_end to be the end tag of another element than that of the significant_element, than use "/element_name".

    The sig_element_end field is case-insensitive.

  • before_text,after_text (Optional)

    This is literal text that will be inserted before and/or after the ToC entry for the given significant_element. The before_text is separated from the after_text by the `,' character (which implies a comma cannot be contained in the before/after text). See examples following for the use of this field.

In the map file, the first two fields MUST be specified.

Following are a few examples to help illustrate how a ToC map file works.

EXAMPLE 1

The following map file reflects the default mapping HTML::GenToc uses if no map file is explicitly specified:

# Default mapping for HTML::GenToc
# Comments can be inserted in the map file via the '#' character
H1:1 # H1 are level 1 ToC entries
H2:2 # H2 are level 2 ToC entries

EXAMPLE 2

The following map file makes use of the before/after text fields:

# A ToC map file that adds some formatting
H1:1::<STRONG>,</STRONG>      # Make level 1 ToC entries <STRONG>
H2:2::<EM>,</EM>              # Make level 2 entries <EM>
H2:3                          # Make level 3 entries as is

EXAMPLE 3

The following map file tries to index definition terms:

# A ToC map file that can work for Glossary type documents
H1:1
H2:2
DT:3:DD:<EM>,<EM>    # Assumes document has a DD for each DT, otherwise ToC
                   # will get entries with alot of text.

Formatting the ToC

The ToC Map File gives you control on how the ToC entries may look, but HTML::GenToc has other options to affect the final appearance of the ToC file created.

With the -header option, HTML::GenToc will prepend the contents of the file before the generated ToC. This allows you to have introductory text, or any other text, before the ToC.

Note:

If you use the -header option, make sure the file specified contains the opening HTML tag, the HEAD element (containing the TITLE element), and the opening BODY tag. However, these tags/elements should not be in the header file if the -inline options is used. See "Inlining the ToC" for information on what the header file should contain for inlining the ToC.

With the --toc_label option, HTML::GenToc will prepend the contents of the given string before the generated ToC (but after any text taken from a --header file).

With the -footer option, HTML::GenToc will append the contents of the file after the generated ToC.

Note:

If you use the -footer, make sure it includes the closing BODY and HTML tags (unless, of course, you are using the --inline option).

HTML::GenToc will add the appropriate HTML markup to if either the -header or -footer option is not specified to insure a valid HTML document is created for the ToC.

If you do not want/need to deal with header, and footer, files, then HTML::GenToc allows you specify the title, -title option, of the ToC file; and it allows you to specify a heading, or label, to put before ToC entries' list, the -toclabel option. Both options have default values, see Methods for more information on each option.

If you do not want HTML::GenToc to supply HTML page tags, and just want the ToC itself, then specify the --toc_only option. If there are no --header or --footer files, then this will simply output the contents of --toc_label and the ToC itself.

Inlining the ToC

HTML::GenToc supports the ability to incorporating the ToC directly into an HTML document via the -inline option. Inlining will be done on the first file in the list of files processed, and will only be done if that file contains an opening tag matching the --toc_tag value.

If --overwrite is true, then the first file in the list will be overwritten, with the generated ToC inserted at the appropriate spot. Otherwise a modified version of the first file is output to either STDOUT or to the output file defined by the --toc_file option.

The options --toc_tag and --toc_tag_replace are used to determine where and how the ToC is inserted into the output.

Example 1

# this is the default
toc_tag = BODY
toc_tag_replace = off

This will put the generated ToC after the BODY tag of the first file. If the --header option is specified, then the contents of the specified file are inserted after the BODY tag. If the --toc_label option is not empty, then HTML::GenToc inserts the text specified by the --toc_label option. Then the ToC is inserted, and finally, if the --footer option is specified, it inserts the footer. Then the rest of the input file follows as it was before.

Example 2

toc_tag = !--toc--
toc_tag_replace = on

This will put the generated ToC after the first comment of the form <!--toc-->, and that comment will be replaced by the ToC (in the order --header --toc_label ToC --footer) followed by the rest of the input file.

Note:

The header file should not contain the beginning HTML tag and HEAD element since the HTML file being processed should already contain these tags/elements.

EXAMPLE

A simple script to process HTML documents.

#! /usr/bin/perl -w
require 5.005_03;
use HTML::GenToc;

my $toc = HTML::GenToc->new(\@ARGS);
$toc->generate_anchors();
$toc->generate_toc();

NOTES

  • One cannot use "CLEAR" as a value for the cumulative arguments.

  • HTML::GenToc is smart enough to detect anchors inside significant elements. If the anchor defines the NAME attribute, HTML::GenToc uses the value. Else, it adds its own NAME attribute to the anchor.

  • The TITLE element is treated specially if specified in the ToC map file. It is illegal to insert anchors (A) into TITLE elements. Therefore, HTML::GenToc will actually link to the filename itself instead of the TITLE element of the document.

  • HTML::GenToc will ignore significant elements if it does not contain any non-whitespace characters. A warning message is generated if such a condition exists.

LIMITATIONS

  • HTML::GenToc is not very efficient (memory and speed), and can be extremely slow for large documents.

  • Invalid markup will be generated if a significant element is contained inside of an anchor. For example:

    <A NAME="foo"><H1>The FOO command</H1></A>

    will be converted to (if H1 is a significant element),

    <A NAME="foo"><H1><A NAME="xtocidX">The</A> FOO command</H1></A>

    which is illegal since anchors cannot be nested.

    It is better style to put anchor statements within the element to be anchored. For example, the following is preferred:

    <H1><A NAME="foo">The FOO command</A></H1>

    HTML::GenToc will detect the "foo" NAME and use it.

  • NAME attributes without quotes are not recognized.

BUGS

Tell me about them.

PREREQUSITES

HTML::GenToc requires Perl 5.005_03 or later.

It also requires HTML::SimpleParse, AppConfig and Pod::Usage.

EXPORT

None by default.

SEE ALSO

perl(1) htmltoc(1) AppConfig HTML::SimpleParse

AUTHOR

Kathryn Andersen rubykat@katspace.com http://www.katspace.com based on htmltoc by Earl Hood ehood@medusa.acs.uci.edu

COPYRIGHT

Copyright (C) 1994-1997 Earl Hood, ehood@medusa.acs.uci.edu Copyright (C) 2002 Kathryn Andersen, rubykat@katspace.com

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.

10 POD Errors

The following errors were encountered while parsing the POD:

Around line 226:

=back doesn't take any parameters, but you said =back 4

Around line 261:

=back doesn't take any parameters, but you said =back 4

Around line 367:

=back doesn't take any parameters, but you said =back 4

Around line 380:

=back doesn't take any parameters, but you said =back 4

Around line 548:

=back doesn't take any parameters, but you said =back 4

Around line 605:

=back doesn't take any parameters, but you said =back 4

Around line 621:

=back doesn't take any parameters, but you said =back 4

Around line 689:

=back doesn't take any parameters, but you said =back 4

Around line 731:

=back doesn't take any parameters, but you said =back 4

Around line 766:

=back doesn't take any parameters, but you said =back 4