NAME

checksite - Check the contents of a website

SYNOPSIS

$ checksite [options] -p <name> uri

OPTIONS

Results

--prefix|-p <name>        The prefix (dir) of this check [mandatory]
--dir|-d <dir>            The target directory

Persistence

--[no]save                Save validation results
--load                    Load the validation results

(X)HTML validation

--nohtml                  Skip (X)HTML validation
--html_validator <uri>    Base uri for the W3C (X)HTML validator
--html_upload             Validate (X)HTML by uploading
--html_uri                Validate (X)HTML by sending the uri
--xmllint                 Validate by using the xmllint program

CSS validation

--nocss                   Skip CSS validation
--css_validator <uri>     Base uri for the W3C CSS validator
--css_upload              Validate CSS by uploading
--css_uri                 Validate CSS by sending the uri

Exclusion

--disallow <path>         Add Disallow: rules to robots.txt (multiple)

--nostrictrules           Do not impose /robots.txt on the validator
                          for "local" url's

General

--lang|-l <lang>          Set language(s) for Accept-Language: header

--ua_class <Module>       Set a new UserAgent class
                          (child of WWW::Mechanize)

-v                        Increase verbosity (multiple)
--help|-h                 This message

See WWW::CheckSite::Manual for more information.

DESCRIPTION

This program will spider the specified url and check the availability of the links, images and stylesheets on each page.

INCOMPATIBLE CHANGE AS OF 0.020: Pages and stylesheets are NO LONGER validated with the validators available at http://validator.w3.org and http://jigsaw.w3.org. These validators do not allow robots! The W3C-HTML validator is now widly available and very installable, so I advise you to run your own. The W3C-CSS validator is more work, but I have managed to get that to work as well with Jigsaw.

When all pages are checked two reports in HTML-format are generated. The full.html report contains all the information for all pages and the summ.html report contains only the pages with errors and their errors.

Metrics for a spidered page

Each page fetched by the spider will have these metrics:

status, status_tx

The HTTP-returncode and a verbal explanation of that code
title

The contents of the <title></title> tag.
ct

The MIME type returned by the HTTP-server for the document.
links

A list of <a href=>, <area href=> and <frame src=> uri's found on the page with the HTTP-returncode. Each HTML-code is also checked for the text or ALT/TITLE attribute.
link_cnt, links_ok

The number of links found and the number of links that are ok.
images

A list of <img src=> and <input type=image> uri's found on the page with the HTTP-returncode and MIME type. Each HTML tag is also checked for the existance of the ALT attribute.
image_cnt, images_ok

The number of images found and the number of images that are ok.
styles

A list of <link rel=stylesheet type=text/css> uri's found on the page with the HTTP-returncode, MIME type and CSS-validation result.
style_cnt, styles_ok

The number of stylesheets found and the number of stylesheets that are ok.
valid

The HTML-validation result.

FILES

checksite supports Config::Auto. This means that any of following directories is searched for checksiteconfig, checksite.config, checksiterc and .checksiterc:

current directory
bin directory (where the script is installed)
$HOME
/etc/
/usr/local/etc/

AUTHOR

Abe Timmerman, <abeltje@cpan.org>

BUGS

Please report any bugs or feature requests to bug-WWW-CheckSite@rt.cpan.org, or through the web interface at http://rt.cpan.org. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

COPYRIGHT & LICENSE

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

To install WWW::CheckSite, copy and paste the appropriate command in to your terminal.

cpanm

cpanm WWW::CheckSite

CPAN shell

perl -MCPAN -e shell
install WWW::CheckSite

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)