NAME
html2text.pl - script for generating formatted text from HTML
SYNOPSIS
html2text.pl <filename>
cat <filename> | html2text.pl
DESCRIPTION
html2text.pl generated simple formatted text from HTML. It uses HTML::Element to traverse an HTML tree built by HTML::TreeBuilder, and formats the output text using Text::Format. It is very simple at the moment. The type of things it does are:
- Headings
-
All headings are underlined. <H1>s are double underlined. Headings are numbered, by using the heading levels, and previous heading levels.
- Paragraphs
-
Paragraph text is formatted with the paragraph method of Text::Format.
- Lists
-
List items are indented by 4 spaces, and preceded with an asterisk.
- Definition Lists
-
<DT>s are intented by 4 spaces; <DD>s are indented by 8 spaces.
PREREQUISITES
Text::Format
HTML::TreeBuilder
OSNAMES
sunos 5.6 sun4-solaris
AUTHOR
Ave Wrigley <wrigley@cre.canon.co.uk> Web Group, Canon Research Centre Europe
COPYRIGHT
Copyright (c) 1998 Canon Research Centre Europe. All rights reserved.
This script is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
SCRIPT CATEGORIES
HTML