NAME

urhtml_fmt - Reformat HTML, indented according to structure

SYNOPSIS

ur_html_fmt [uri|file]

EXAMPLE

urhtml_fmt http://perl.org

DESCRIPTION

Given the URI or the name of a file, writes it to STDOUT reformatted and indented according to the HTML structure. Missing start and end tags are supplied and comments added to indicate this. Text inside <pre> elements is not altered.

urhtml_fmt tries to parse everything that is actually out there on the Web. In fact, urhtml_fmt will assume any file fed to it was intended as HTML, and will produce its best guess of the author's intent.

urhtml_fmt supplies missing start and end tags. urhtml_fmt's parser is extremely liberal in what it accepts. When its liberalization of the standards is not sufficient to make a document into valid HTML, urhtml_fmt will pick characters to treat as noise or "cruft". The parser ignores cruft in determining the structure of the document.

When urhtml_fmt adds a missing start tag, it precedes the new start tag with a comment. When urhtml_fmt adds a missing end tag, it follows the new end tag with a comment. When urhtml_fmt classifies characters as "cruft", it adds a comment to that effect before the "cruft".

pre elements receive special treatment. The contents of pre elements are not reformatted. When missing tags or cruft occur inside a pre element, the comments to that effect are placed before the <pre> start tag.

The argument to urhtml_score can be either as a URI or a file name. If it starts with alphanumerics followed by a colon, it is treated as a URI. Otherwise it is treated as file name.

SAMPLE OUTPUT

Given this input:

<title>Test page<tr>x<head attr="I am cruft"><p>Final graf

urhtml_fmt returns

<!-- Following start tag is replacement for a missing one -->
<html>
  <!-- Following start tag is replacement for a missing one -->
  <head>
    <title>
      Test page
    </title>
    <!-- Preceding end tag is replacement for a missing one -->
  </head>
  <!-- Preceding end tag is replacement for a missing one -->
  <!-- Following start tag is replacement for a missing one -->
  <body>
    <!-- Following start tag is replacement for a missing one -->
    <table>
      <!-- Following start tag is replacement for a missing one -->
      <tbody>
        <tr>
          <!-- Following start tag is replacement for a missing one -->
          <td>
            x
            <-- Next line is cruft -->
            <head attr="I am cruft">
            <p>
              Final graf
            </p>
            <!-- Preceding end tag is replacement for a missing one -->
          </td>
          <!-- Preceding end tag is replacement for a missing one -->
        </tr>
        <!-- Preceding end tag is replacement for a missing one -->
      </tbody>
      <!-- Preceding end tag is replacement for a missing one -->
    </table>
    <!-- Preceding end tag is replacement for a missing one -->
  </body>
  <!-- Preceding end tag is replacement for a missing one -->
</html>
<!-- Preceding end tag is replacement for a missing one -->

PURPOSE

This program is a demo of a demo. It purpose is to show how easy it is to write applications which look at the structure of web pages using Marpa::UrHTML. And the purpose of Marpa::UrHTML is to demonstrate the power of its parse engine, Marpa. Marpa::UrHTML was written in a few days, and its logic is a straightforward, natural expression of the structure of HTML.

AUTHOR

Jeffrey Kegler

BUGS

Please report any bugs or feature requests to bug-parse-marpa at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Marpa. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

perldoc Marpa

You can also look for information at:

AnnoCPAN: Annotated CPAN documentation

http://annocpan.org/dist/Marpa
CPAN Ratings

http://cpanratings.perl.org/d/Marpa
RT: CPAN's request tracker

http://rt.cpan.org/NoAuth/Bugs.html?Dist=Marpa
Search CPAN

http://search.cpan.org/dist/Marpa

ACKNOWLEDGMENTS

The starting template for this code was HTML::TokeParser, by Gisle Aas.

LICENSE AND COPYRIGHT

This program is free software; you can redistribute it and/or modify it under the same terms as Perl 5.10.0.

To install Marpa::UrHTML, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Marpa::UrHTML

CPAN shell

perl -MCPAN -e shell
install Marpa::UrHTML

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	Go to GitHub issues (only if GitHub is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)