NAME

HTML::Lint - check for HTML errors in a string or file

VERSION

Version 1.28

$Header: /cvsroot/html-lint/html-lint/lib/HTML/Lint.pm,v 1.63 2004/01/27 23:15:34 petdance Exp $

SYNOPSIS

    my $lint = HTML::Lint->new;
    $lint->only_types( HTML::Lint::STRUCTURE );

    $lint->parse( $data );
    $lint->parse_file( $filename );

    my $error_count = $lint->errors;

    foreach my $error ( $lint->errors ) {
	print $error->as_string, "\n";
    }

HTML::Lint also comes with a wrapper program called weblint that handles linting from the command line:

$ weblint http://www.cnn.com/
http://www.cnn.com/ (395:83) <IMG> tag has no HEIGHT and WIDTH attributes.
http://www.cnn.com/ (395:83) <IMG> does not have ALT text defined
http://www.cnn.com/ (396:217) Unknown element <nobr>
http://www.cnn.com/ (396:241) </nobr> with no opening <nobr>
http://www.cnn.com/ (842:7) target attribute in <a> is repeated

And finally, you can also get Apache::HTML::Lint that passes any mod_perl-generated code through HTML::Lint and get it dumped into your Apache error_log.

[Mon Jun  3 14:03:31 2002] [warn] /foo.pl (1:45) </p> with no opening <p>
[Mon Jun  3 14:03:31 2002] [warn] /foo.pl (1:49) Unknown element <gronk>
[Mon Jun  3 14:03:31 2002] [warn] /foo.pl (1:56) Unknown attribute "x" for tag <table>

EXPORTS

None. It's all object-based.

METHODS

HTML::Lint is based on the HTML::Parser module. Any method call that works with HTML::Parser will work in HTML::Lint. However, you'll probably only want to use the parse() or parse_file() methods.

new()

Create an HTML::Lint object, which inherits from HTML::Parser. You may pass the types of errors you want to check for in the only_types parm.

my $lint = HTML::Lint->new( only_types => HTML::Lint::Error::STRUCTURE );

If you want more than one, you must pass an arrayref:

    my $lint = HTML::Lint->new( 
	only_types => [HTML::Lint::Error::STRUCTURE, HTML::Lint::Error::FLUFF] );

only_types( $type1[, $type2...] )

Specifies to only want errors of a certain type.

$lint->only_types( HTML::Lint::Error::STRUCTURE );

Calling this without parameters makes the object return all possible errors.

The error types are STRUCTURE, HELPER and FLUFF. See HTML::Lint::Error for details on these types.

errors()

In list context, errors returns all of the errors found in the parsed text. Each error is an object of the type HTML::Lint::Error.

In scalar context, it returns the number of errors found.

clear_errors()

Clears the list of errors, in case you want to print and clear, print and clear.

gripe( $errcode, [$key1=>$val1, ...] )

Adds an error message, in the form of an HTML::Lint::Error object, to the list of error messages for the current object. The file, line and column are automatically passed to the HTML::Lint::Error constructor, as well as whatever other key value pairs are passed.

For example:

$lint->gripe( 'attr-repeated', tag => $tag, attr => $attr );

Usually, the user of the object won't call this directly, but just in case, here you go.

newfile( $filename )

Call newfile() whenever you switch to another file in a batch of linting. Otherwise, the object thinks everything is from the same file. Note that the list of errors is NOT cleared.

file()

Returns the current file being linted.

line()

Returns the current line in the file.

column()

Returns the current column in the file.

Here are all the internal functions that nobody needs to know about

BUGS, WISHES AND CORRESPONDENCE

Please feel free to email me at andy@petdance.com. I'm glad to help as best I can, and I'm always interested in bugs, suggestions and patches.

Please report any bugs or feature requests to <bug-html-lint@rt.cpan.org>, or through the web interface at http://rt.cpan.org. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

TODO

  • Check for attributes that require values

    For instance, BGCOLOR should be BGCOLOR="something", but if it's just BGCOLOR, that's a problem. (Plus, that crashes IE OSX)

  • Add link checking

  • Handle obsolete tags

  • Anything like <BR> or <P> inside of <A>

  • <TABLE>s that have no rows.

  • Form fields that aren't in a FORM

  • Check for valid entities, and that they end with semicolons

  • DIVs with nothing in them.

  • HEIGHT= that have percents in them.

  • Check for goofy stuff like:

    <b><li></b><b>Hello Reader - Spanish Level 1 (K-3)</b>

LICENSE

Copyright 2003 Andy Lester, All Rights Reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

Please note that these modules are not products of or supported by the employers of the various contributors to the code.

AUTHOR

Andy Lester, <andy@petdance.com>