=head1 NAME

C<urhtml_fmt> - Reformat HTML, indented according to structure

=head1 SYNOPSIS

    urhtml_fmt [uri|file]

=head1 EXAMPLE

    urhtml_fmt http://perl.org

=head1 DESCRIPTION

Given the URI or the name of a file,
writes it to C<STDOUT>
reformatted and
indented according to the HTML structure.
Missing start and end tags are supplied and
comments added to indicate this.
Text inside
C<< <pre> >> elements 
is not altered.

L<urhtml_fmt> tries to parse everything that is actually out there on the Web.
In fact,
L<urhtml_fmt> will assume any file fed to it was intended as HTML,
and will produce its best guess of the author's intent.

L<urhtml_fmt> supplies missing start and end tags.
L<urhtml_fmt>'s parser is extremely liberal in what it accepts.
When its liberalization of the standards is not sufficient to make
a document into valid HTML,
L<urhtml_fmt>
will pick characters to treat as noise or "cruft".
The parser ignores cruft in determining
the structure of the document.

When
L<urhtml_fmt> adds
a missing start tag,
it precedes the new start tag with a comment.
When
L<urhtml_fmt> adds
a missing end tag,
it follows the new end tag with a comment.
When L<urhtml_fmt> classifies characters
as "cruft",
it adds a comment to that effect before the "cruft".

C<pre> elements receive special treatment.
The contents of 
C<pre> elements are not reformatted.
When missing tags or cruft occur inside a C<pre> element,
the comments to that effect are placed 
before the C<< <pre> >> start tag.

The argument to L<urhtml_score> can be either as a URI or a file
name.  If it starts with alphanumerics followed by a colon, it is treated
as a URI.  Otherwise it is treated as file name.

=head1 SAMPLE OUTPUT

Given this input:

    <title>Test page<tr>x<head attr="I am cruft"><p>Final graf

L<urhtml_fmt> returns

    <!-- Following start tag is replacement for a missing one -->
    <html>
      <!-- Following start tag is replacement for a missing one -->
      <head>
        <title>
          Test page
        </title>
        <!-- Preceding end tag is replacement for a missing one -->
      </head>
      <!-- Preceding end tag is replacement for a missing one -->
      <!-- Following start tag is replacement for a missing one -->
      <body>
        <!-- Following start tag is replacement for a missing one -->
        <table>
          <!-- Following start tag is replacement for a missing one -->
          <tbody>
            <tr>
              <!-- Following start tag is replacement for a missing one -->
              <td>
                x
                <!-- Next line is cruft -->
                <head attr="I am cruft">
                <p>
                  Final graf
                </p>
                <!-- Preceding end tag is replacement for a missing one -->
              </td>
              <!-- Preceding end tag is replacement for a missing one -->
            </tr>
            <!-- Preceding end tag is replacement for a missing one -->
          </tbody>
          <!-- Preceding end tag is replacement for a missing one -->
        </table>
        <!-- Preceding end tag is replacement for a missing one -->
      </body>
      <!-- Preceding end tag is replacement for a missing one -->
    </html>
    <!-- Preceding end tag is replacement for a missing one -->

=head1 PURPOSE

This program is a demo of a demo.
It purpose is to show how easy it is to write applications which look
at the structure of web pages using L<Marpa::UrHTML>.
And the purpose of L<Marpa::UrHTML>
is to demonstrate the power of its parse engine,
L<Marpa>.
L<Marpa::UrHTML> was written in a few days,
and its logic 
is a straightforward,
natural expression of the structure of HTML.

=head1 AUTHOR

Jeffrey Kegler

=head1 BUGS

Please report any bugs or feature requests to
C<bug-parse-marpa at rt.cpan.org>, or through the web interface at
L<http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Marpa>.
I will be notified, and then you'll automatically be notified of progress on
your bug as I make changes.

=head1 SUPPORT

You can find documentation for this module with the perldoc command.

    perldoc Marpa
    
You can also look for information at:

=over 4

=item * AnnoCPAN: Annotated CPAN documentation

L<http://annocpan.org/dist/Marpa>

=item * CPAN Ratings

L<http://cpanratings.perl.org/d/Marpa>

=item * RT: CPAN's request tracker

L<http://rt.cpan.org/NoAuth/Bugs.html?Dist=Marpa>

=item * Search CPAN

L<http://search.cpan.org/dist/Marpa>

=back

=head1 ACKNOWLEDGMENTS

The starting template for this code was
HTML::TokeParser, by Gisle Aas.

=head1 LICENSE AND COPYRIGHT

Copyright 2007-2009 Jeffrey Kegler, all rights reserved.

This program is free software; you can redistribute
it and/or modify it under the same terms as Perl 5.10.0.

=cut