NAME
WWW::CheckSite::Manual - A description of the metrics used in this package
SYNOPSIS
This document contains a description of modules and tools in this suite.
DESCRIPTION
Kwalitee
The idea behind this package is to provide an analysis of items contained in a web-site. We use the word kwalitee because it looks and sounds like quality but just isn't. The metrics used to assess kwalitee only give an indication of the technical state a web-site is in, and do not reflect on the user experience of quality of that web-site.
At the heart of the package is the spider that fetches all the pages referred to within the web-site. For each page that is fetched a number of things is checked. Here is an explanation of the kwalitee metrics:
- * return status
-
The most basic check for a web-page is to see if it can be fetched. The HTTP return-status should be 200 OK.
SCORE: 0 for return status other than 200; 1 for return status 200
- * title
-
The next check is to see if the
<title></title>
tag-pair has content.SCORE: 0 for not content; 1 for content
- * valid
-
The next check is to see if the (X)HTML in the page validates. The default behaviour is to use the validator available on http://validator.w3.org
SCORE: 0 for not valid, 1 for valid or validation disabled
- * links
-
The next check is to see if the web-page does not contain "dead links".
All hyperlinks (
<a href=>, <area href=>
) are checked with a HTTP HEAD request to see if they can be "followed". URLs that have the same origin as the primary url will also be put on the "to-fetch-list" of the spider.MAX SCORE: 1 (do not count urls excluded by robot-rules/exclude pattern)
- * images
-
The next check is to see if the web-page does not contain "dead images".
All images (
<img src=>, <input type=image>
) are checked with a HTTP HEAD request to see if they exist on the server. If the Image::Info module is available, the image is fetched from the server and a basic sanity test on the image is done.MAX SCORE: 1 (do not count images excluded by robot-rules/exclude pattern)
- * styles
-
The next check is to see if the web-page does not contain "dead style references".
All styles referenced in
<link rel=stylesheet type=text/css>
are fetched and if validation is switched on, they will be sent to the css-validator at: http://jigsaw.w3.org/validatorTODO: Extract inline styles, and send them of for validation.
MAX SCORE: 1
- kwalitee
-
Every individual page can have a maximum of 6 kwalitee points that lead to a kwalitee of 1.00. For the complete web-site the mean of the page scores is taken and presented as a fraction of 1.
checksite
This script is a wrapper around WWW::CheckSite that supports some command-line options to tweak the behaviour of the module.
Here is an explanation of these options:
- [--uri|-u] <uri> (mandatory unless --load)
-
This specifies the uri to be spidered. The --uri option-qualifier is optional. --uri can be abbreviated to -u.
- --prefix|-p <prefix> (mandatory)
-
This option specifies a prefix that will be used as a subdirectory name which is used to store the saved spider data and the reports. --prefix can be abbreviated to -p.
The subdirectory is created the current directory, or in the directory specified with the --dir option. The data stored as a result of the --save option will be in this subdirectory with the name <prefix>.wcs
- --dir|-d <directory>
-
This option specifies the base directory for storing the data. --dir can be abbreviated to -d.
- --save or --nosave
-
This option specifies that the spider data should be saved. The default behaviour is to save the data, if you do not want that, use --nosave. The saved data can later be used to regenerate the reports with the --load option. The data is stored as <directory>/<prefix>/<prefix>.wcs with
Storable::nstore()
. --[no]save cannot be abbreviated.See also: WWW::CheckSite Report-Templates
- --load
-
This options specifies that you want to load the results of a previous run and not do an actual run of the programme. This option is useful to regenerate the reports. --load cannot be abbreviated.
See also: WWW::CheckSite Report-Templates
- --html or --nohtml
-
This option specifies if (X)HTML-validation should be done. The default behaviour is to validate by_upload (see --html_upload). If you do not want the validation, use the --nohtml option. --[no]html cannot be abbreviated.
See also: checksite --html_uri, --html_upload, --xmllint and --html_validator
- --html_validator <w3c-validator-uri>
-
As of version 0.20, the (X)HTML-validator at W3C is no longer used as the validator for (X)HTML as they do not allow robots!
The default w3c-validator-uri is now
http://localhost/w3c-validator/
. It is strongly advised to run your own copy of the W3C validator. --html_validator cannot be abbreviated.The W3C (X)HTML-validator is widly available and runs smoothly on most systems with Apache and Perl running. See http://validator.w3.org/source/ for more information.
- --html_uri
-
This option sets the validation method to use the uri interface (unless --nohtml is specified). You can optionally specify an alternative (X)HTML-validator site with --html_validator. --html_uri cannot be abbreviated.
- --html_upload
-
This option sets the validation method to use the upload interface (unless --nohtml is specified). All the content to be validated is saved as a local file (using File::Temp). --html_upload cannot be abbreviated.
- --xmllint <path/to/xmllint>
-
This option specifies that the validation of (X)HTML should be done the xmllint(1) program (unless --nohtml is specified). You can optionally specify the full path to your xmllint program. --xmllint cannot be abbreviated.
- --css or --nocss
-
This option specifies if CSS-validation should be done. The default behaviour is to validate by_upload (see --css_upload). If you do not want the validation, use the --nocss option. --[no]css cannot be abbreviated.
See also: checksite --css_uri, --css_upload and --css_validator
- --css_validator <css-validator-uri>
-
As of version 0.20, the CSS-validator at W3C is no longer used as the validator for CSS as they do not allow robots!
The default w3c-validator-uri is now
http://localhost/css-validator/
. It is strongly advised to run your own copy of the W3C validator. --css_validator cannot be abbreviated.The W3C CSS-validator is available and runs under Jigsaw on most systems with a working java JDK. See http://www.w3.org/Jigsaw/#Getting for more information on Jigsaw applet server, and http://jigsaw.w3.org/css-validator/DOWNLOAD.html for more information on the W3C CSS-validator.
- --css_uri
-
This option sets the validation method to use the uri interface (unless --nocss is specified). You can optionally specify an alternative CSS-validator site with --css_validator. --css_uri cannot be abbreviated.
- --css_upload
-
This option sets the validation method to use the upload interface (unless --nocss is specified). All the content to be validated is saved as a local file (using File::Temp). --css_upload cannot be abbreviated.
- --lang|-l <accept-language>
-
This option can be used to force a web-server to return web-pages in the specified language (if applicable). The accept-language argument can be a simple two letter language code as specified in ISO 639, or a complete Accept-language: field as described in section 14.4 of RFC 2616.
NOTE: My apache config says:
# Note 3: In the case of 'ltz' we violate the RFC by using a three # char specifier. There is 'work in progress' to fix this and get # the reference data for rfc1766 cleaned up.
So there may be more weird stuff out there, but since you are supposed to be using this on your own web-sites only, you should know about that!
--lang can be abbreviated to -l.
- --ua_class <ua_class>
-
This option can be used to override the default user-agent class WWW::Mechanize. The new user-agent class could be a WWW::Mechanize descendant that caters for your special needs:
package BA_Mech; # This package sets credentials for basic authentication use base 'WWW::Mechanize'; sub get_basic_credentials { ( 'abeltje', '********' ) } 1;
and call
checksite
likechecksite -p mysite --ua_class BA_Mech http://www.mysite.org
- --verbose|-v (multiple)
-
Each --verbose option increases the verbosity. When
$v==1
you will see the messages from WWW::CheckSite and when$v==2
you will also see the messages from WWW::CheckSite::Valiadator and WWW::CheckSite::Spider. - configuration file
-
The checksite program supports Config::Auto. This means you can specify any of the commandline arguments as options (without the prefixing dashes) in a file.
The files searched are (and in this order):
- ./checksiteconfig
- ./checksite.config
- ./checksiterc
- ./.checksiterc
- <bindir>/checksiteconfig
- <bindir>/checksite.config
- <bindir>/checksiterc
- <bindir>/.checksiterc
- $HOME/checksiteconfig
- $HOME/checksite.config
- $HOME/checksiterc
- $HOME/.checksiterc
- /etc/checksiteconfig
- /etc/checksite.config
- /etc/checksiterc
- /etc/.checksiterc
- /urs/local/etc/checksiteconfig
- /urs/local/etc/checksite.config
- /urs/local/etc/checksiterc
- /urs/local/etc/.checksiterc
WWW::CheckSite
The WWW::CheckSite module uses the WWW::CheckSite::Validator module to get information about a website and assess its kwalitee. The findings are presented in two html reports, one with all the information and one with just the "errors".
The reports are created with the use of templates. The module caters for two template systems: Template (TT2) and HTML::Template. The template-toolkit templates are prefered if both modules are installed.
Your own report templates
The report templates have the base names: wcsfullrpt.EXT and wcssummrpt.EXT, where EXT eq 'tt'
for template-toolkit and EXT eq 'tmpl'
for HTML::Template.
First the current directory is searched, then directory where checksite
is installed and finally the directory where the WWW::CheckSite module is installed (and where the default templates are). If you put your own templates in one of the first two directories, they will override the default templates.
Saving and loading validation data
Saving the validation data can help you develop your own templates.
AUTHOR
Abe Timmerman, <abeltje@cpan.org>
$Id: Manual.pod 675 2007-05-28 21:58:52Z abeltje $
COPYRIGHT & LICENSE
Copyright MMV-MMVII Abe Timmerman, All Rights Reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
2 POD Errors
The following errors were encountered while parsing the POD:
- Around line 75:
Expected text after =item, not a bullet
- Around line 86:
Expected text after =item, not a bullet