NAME
checksite - Check the contents of a website
SYNOPSIS
$ checksite [options] -p <name> uri
OPTIONS
- Results
-
--prefix|-p <name> The prefix (dir) of this check [mandatory] --dir|-d <dir> The target directory
- Persistence
-
--[no]save Save validation results --load Load the validation results
- (X)HTML validation
-
--nohtml Skip (X)HTML validation --html_validator <uri> Base uri for the W3C (X)HTML validator --html_upload Validate (X)HTML by uploading --html_uri Validate (X)HTML by sending the uri --xmllint Validate by using the xmllint program
- CSS validation
-
--nocss Skip CSS validation --css_validator <uri> Base uri for the W3C CSS validator --css_upload Validate CSS by uploading --css_uri Validate CSS by sending the uri
- Exclusion
-
--disallow <path> Add Disallow: rules to robots.txt (multiple) --nostrictrules Do not impose /robots.txt on the validator for "local" url's
- General
-
--lang|-l <lang> Set language(s) for Accept-Language: header --ua_class <Module> Set a new UserAgent class (child of WWW::Mechanize) -v Increase verbosity (multiple) --help|-h This message
See WWW::CheckSite::Manual for more information.
DESCRIPTION
This program will spider the specified url and check the availability of the links, images and stylesheets on each page.
INCOMPATIBLE CHANGE AS OF 0.020: Pages and stylesheets are NO LONGER validated with the validators available at http://validator.w3.org and http://jigsaw.w3.org. These validators do not allow robots! The W3C-HTML validator is now widly available and very installable, so I advise you to run your own. The W3C-CSS validator is more work, but I have managed to get that to work as well with Jigsaw.
When all pages are checked two reports in HTML-format are generated. The full.html report contains all the information for all pages and the summ.html report contains only the pages with errors and their errors.
Metrics for a spidered page
Each page fetched by the spider will have these metrics:
status, status_tx
The HTTP-returncode and a verbal explanation of that code
title
The contents of the
<title></title>
tag.ct
The MIME type returned by the HTTP-server for the document.
links
A list of
<a href=>
,<area href=>
and<frame src=>
uri's found on the page with the HTTP-returncode. Each HTML-code is also checked for the text or ALT/TITLE attribute.link_cnt, links_ok
The number of links found and the number of links that are ok.
images
A list of
<img src=>
and<input type=image>
uri's found on the page with the HTTP-returncode and MIME type. Each HTML tag is also checked for the existance of the ALT attribute.image_cnt, images_ok
The number of images found and the number of images that are ok.
styles
A list of
<link rel=stylesheet type=text/css>
uri's found on the page with the HTTP-returncode, MIME type and CSS-validation result.style_cnt, styles_ok
The number of stylesheets found and the number of stylesheets that are ok.
valid
The HTML-validation result.
FILES
checksite
supports Config::Auto. This means that any of following directories is searched for checksiteconfig, checksite.config, checksiterc and .checksiterc:
SEE ALSO
AUTHOR
Abe Timmerman, <abeltje@cpan.org>
BUGS
Please report any bugs or feature requests to bug-WWW-CheckSite@rt.cpan.org
, or through the web interface at http://rt.cpan.org. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
COPYRIGHT & LICENSE
Copyright MMV-MMVII Abe Timmerman, All Rights Reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.