NAME
Test::Pod::LinkCheck::Lite - Test POD links
SYNOPSIS
use Test::More 0.88; # for done_testing();
use Test::Pod::LinkCheck::Lite;
my $t = Test::Pod::LinkCheck::Lite->new();
$t->all_pod_files_ok();
done_testing;
DESCRIPTION
This Perl module tests POD links. A given file generates one failure for each broken link found. If no broken links are found, one passing test is generated. This all means that there is no way to know how many tests will be generated, and you will need to use Test::More's done_testing()
(or something equivalent) at the end of your test.
By its nature this module should be used only for author testing. The problem with using it in an installation test is that the validity of links external to the distribution being tested varies with things like operating system type and version, Perl version, installed Perl modules and their versions, and the Internet at large. Caveat user.
This module should probably be considered alpha-quality code at this point. It checks most of my modest corpus (correctly, I hope), but beyond that deponent sayeth not.
One thing perlpod is silent on (at least, I could not find anything about it) is how (or even whether) to normalize links and section names. Maybe I looked in the wrong place?
Anyhow, because Meta CPAN has been observed to link
L<SOME
SECTION>
to =head1 SOME SECTION
, this module normalizes both link and section names by removing leading and trailing white space, and replacing embedded white space with a single space. Yes, I know that Meta CPAN's observed handling of POD is far from being definitive.
This module started its life as a low-dependency version of Test::Pod::LinkCheck. Significant differences from that module include:
- Minimal use of the shell
-
This module shells out only to check
man
links. - Unchecked links are explicitly skipped
-
That is, a skipped test is generated for each. Note that Test::Pod::LinkCheck appears to fail the link in at least some such cases.
- URL links are checked
-
This seemed to be an easy enough addition.
- Dependencies are minimized
-
Given at least Perl 5.13.9, the only non-core module used is B::Keywords.
POD links come in the following flavors:
man
These links are of the form
L<manpage (section)>
. They will only be checked if theman
attribute is true, and can only be successfully checked if theman
command actually displays man pages, andman -w
can be executed.url
These links are of the form
L<http://...>
(orhttps:
or whatever). They will only be checked if thecheck_url
attribute is true, and can only be successfully checked if Perl has access to the specified URL.NOTE that
https:
links can only be checked if IO::Socket::SSL version 1.42 (at least) and Net::SSLeay version 1.49 (at least) are installed. These are NOT prerequisites ofTest::Pod::LinkCheck::Lite
because they are not in core, and I am trying to keep non-core dependencies to a minimum. If these modules are not present an attempt to check anhttps:
link will result in a skipped test. In addition, a diagnostic will be issued for the firsthttps:
link skipped by the test object.pod (internal)
These links are of the form
L<text|/section>
. They are checked using the parse tree in which the link was found.pod (external)
This is pretty much everything else. There are a number of cases, and the only way to distinguish them is to run through them.
- Perl built-ins
-
These links are of the form
L<text|builtin>>
orL<builtin>
, and are checked against the lists in B::Keywords. - Installed modules and pod files
-
These are resolved to a file using Pod::Perldoc. If a section was specified, the file is parsed to determine whether the section name is valid.
- Uninstalled modules
-
These are checked against modules/02packages.details.txt.gz, provided that (or some reasonable facsimile) can be found. Currently we can look for this information in the following places:
- File Metadata in the directory used by the
CPAN
client; - Website https://cpanmetadb.plackperl.org/, a.k.a. the CPAN Meta DB.
If more than one of these is configured (by default they all are), we look in the newest one.
Sections can not be checked. If a link to a valid (but uninstalled) module has a section, a skipped test is generated.
- File Metadata in the directory used by the
The ::Lite
refers to the fact that a real effort has been made to reduce non-core dependencies. Under Perl 5.14 and up, the only known non-core dependency is B::Keywords.
An effort has also been made to minimize the spawning of system commands.
METHODS
This class supports the following public methods:
new
my $t = Test::Pod::LinkCheck::Lite->new();
This static method instantiates an object. Optional arguments are passed as name/value pairs.
The following arguments are supported:
- agent
-
This argument is the user agent string to use for web access.
Note that this probably should have been called something more verbose like
user_agent_string
, but I was influenced by the name used by HTTP::Tiny, and did not anticipate the need for the interface to be able to specify the actual user agent.The default is
undef
, which specifies whatever the actual user agent'sagent()
method returns. - allow_man_spaces
-
This Boolean argument is set true to allow internal spaces in a 'man' link. Note that such links can not be checked under some operating systems (e.g. FreeBSD) because the man (1) program splits its arguments on spaces.
The default is false.
- cache_url_response
-
This Boolean argument is set true to cache the responses from URL links. This means each URL is queried only once, no matter how many times it appears.
This is an in-memory cache, and persists only for the life of the
Test::Pod::LinkCheck::Lite
object.The default is true.
- check_external_sections
-
This Boolean argument is true if the sections of links outside the current Pod are to be checked. If it is false, such sections are not checked, and the link is considered valid if the external Pod exists at all.
The default is true.
- check_url
-
This Boolean argument is true if
url
links are to be checked, and false if not.The default is true.
- ignore_url
-
This argument specifies one or more URLs to ignore when checking
url
links. It can be specified as:- A
Regexp
object -
Any URL that matches this Regexp is ignored.
undef
-
No URLs are ignored.
- a scalar
-
This URL is ignored.
- a SCALAR reference
-
The URL referred to is ignored.
- a HASH reference
-
The URL is ignored if the hash contains a true value for the URL.
- a CODE reference
-
The code is called with the URL to ignore in the topic variable (a.k.a.
$_
). The URL is ignored if the code returns a true value. - an
ARRAY
reference -
The array can contain any legal ignore specification, and any URL that matches any value in the array is ignored. Nested arrays are flattened.
The default is
[]
.Note that the order in which the individual checks are made is undefined. OK, the implementation is deterministic, but the order of evaluation is an implementation detail that the author reserves the right to change without warning.
- A
- man
-
This Boolean argument is true if
man
links are to be checked, and false if not.The default is false (with a diagnostic) if
$^O
is'DOS'
or'MSWin32'
. Under any other operating system the default is the value ofIPC::Cmd::can_run( 'man' )
. If this returns false a diagnostic is generated, andman
links are not checked.In case you're wondering: the Windows testing was done under ReactOS, and that appears to come with a MAN.EXE which (at least under 0.4.11) causes
can_run()
to return true, but which does, as far as I can tell, nothing useful. - module_index
-
This argument specifies a list of module indices to consult, as either a comma-delimited string or an array reference. Even if specified a given index will only be used if it is actually available for use. If more than one index is found, the most-recently-updated index will be used. Possible indices are:
- cpan
-
Use the module index found in the CPAN working directory.
- cpan_meta_db
-
Use the CPAN Meta database. Because this is an on-line index it is considered to be current, but its as-of time is offset to favor local indices.
By default all indices are considered.
- prohibit_redirect
-
Added in version 0.004.
This argument controls whether redirects are allowed in the resolution of a URL link.
If a code reference is specified, it is called whenever a URL link is successfully resolved. The arguments are the
Test::Pod::LinkCheck::Lite
object, the HTTP::Tiny response hash, and the URL from the link. The code returns true to declare the link in error, false to allow it, or a code reference to defer the decision to that code. This latter is provided because I found the case where I wanted to do a little pre-processing and then defer to ALLOW_REDIRECT_TO_INDEX, but could not find a clean way to use a manifest constant in agoto
.Any other value is interpreted as a Boolean. If the argument is true, any redirect is an error. If false, redirects are allowed.
This argument is ignored unless check_url is true.
The default is false, for historical reasons.
- require_installed
-
This Boolean argument is true to disable the uninstalled module checks. This means links to modules not installed on the system will fail, even if the module exists.
By default this is false.
- skip_server_errors
-
Added in version 0.002.
This Boolean argument is true to generate skips rather than failures if an attempt to check a URL link fails with a server error (status
5xx
).By default this is true; it can be made false by passing value
0
or''
.The default represents a change in the default behaviour from version
0.001
, which failed a URL link if the check returned a server error. The logic (if any) in changing the default behaviour is that5xx
errors can represent actual server problems rather than errors in the link being checked, so changing the default behaviour eliminates possible false positives. - user_agent
-
Added in version 0.011
This argument is either a class name or an object. Either way, it must be a subclass of HTTP::Tiny.
If a class name is passed, the class must already be loaded. An object of that class will be instantiated by calling its
new()
method -- with the agent argument if that was specified,
agent
This method returns the value of the 'agent'
attribute.
all_pod_files_ok
$t->all_pod_files_ok();
This method takes as its arguments the names of one or more files, and tests any such that are deemed to be Perl files. Directories are recursed into.
Perl files are considered to be all text files whose names end in .pod, .pm, or .PL, plus any text files with a shebang line containing 'perl'
. File name suffixes are case-sensitive except for .PL.
If no arguments are specified, the contents of blib/ are tested. This is the recommended usage.
If called in scalar context, this method returns the number of test failures encountered. If called in list context it return the number of failures, passes, and skipped tests, in that order.
allow_man_spaces
$t->allow_man_spaces()
and say 'Embedded spaces are allowed in man page names';
This method returns the value of the 'allow_man_spaces'
attribute.
cache_url_response
$t->cache_url_response()
and say 'URL responses are cached';
This method returns the value of the 'cache_url_response'
attribute.
check_external_sections
$t->check_external_sections()
and say 'Sections in external links are checked';
This method returns the value of the 'check_url'
attribute.
check_url
$t->check_url() and say 'URL links are checked';
This method returns the value of the 'check_url'
attribute.
configuration
say $t->configuration( ' ' );
This convenience method returns a string containing all attributes of the object in human-readable form. The argument, if any, is prefixed to each line of the returned string.
ignore_url
print 'Ignored URLs ', join ', ', $t->ignore_url();
This method returns the value of the 'ignore_url'
attribute. If called in scalar context, it returns an array reference. If called in list context it returns an array. Either way, the results will not be in the same order as originally specified to new().
man
$t->man() and say 'man links are checked';
This method returns the value of the 'man'
attribute.
module_index
say 'Module indices: ', join ', ', $self->module_index();
This method returns the value of the 'module_index'
attribute. If called in scalar context it returns a comma-delimited string.
pod_file_ok
my $failures = $t->pod_file_ok( 'lib/Foo/Bar.pm' );
This method tests the links in the given file. Each failure appears in the TAP output as a test failure. If no failures are found, a passing test will appear in the TAP output.
If called in scalar context, this method returns the number of test failures encountered. If called in list context it return the number of failures, passes, and skipped tests, in that order.
prohibit_redirect
$t->prohibit_redirect()
and say 'All URL links must resolve without redirection';
Added in version 0.004.
This method returns the value of the 'prohibit_redirect'
attribute.
require_installed
$t->require_installed()
and say 'All POD links must be to installed modules';
This method returns the value of the 'require_installed'
attribute.
skip_server_errors
$t->skip_server_errors()
and say 'URL links that return status 5xx are skipped';
Added in version 0.002.
This method returns the value of the 'skip_server_errors'
attribute.
MANIFEST CONSTANTS
The following manifest constants can be imported by name, or using the :const
tag:
ALLOW_REDIRECT_TO_INDEX
Added in version 0.003.
This manifest constant is intended to be used as a value of the 'prohibit_redirect'
attribute. It is a reference to a piece of code that accepts old-style redirects of an hierarchical URL ending in a '/'
to an index of that leaf of the hierarchy.
Because this is a minimal-dependency module, the code referred to by this constant works by hand-checking for an hierarchical scheme (anything but 'data:'
, 'mailto:'
, or 'urn:'
). If a URL with an hierarchical scheme ends in '/'
, the URL in the response has everything after the last '/'
removed before comparison to the original URL.
This mess exists because of my bias that old-style redirection to an index is a different beast than indirection in general, and ought to be allowed. If you disagree you can ignore this functionality, or re-implement to suit yourself.
MAYBE_IGNORE_GITHUB
Added in version 0.009.
This manifest constant is intended to be used as a value of the 'ignore_url'
attribute. It is a reference to a piece of code that ignores GitHub urls unless the directory specified by environment variable GIT_DIR
(default: .git/ exists, and GitHub is a remote for the repository.
This is (maybe) a convenience for developers whose boilerplate includes GitHub links but have not yet uploaded to GitHub.
SEE ALSO
Test::Pod::LinkCheck by Apocalypse (APOCAL
) checks all POD links except for URLs. It is Moose-based.
Test::Pod::Links by Sven Kirmess (SKIRMESS
) checks all URLs or URL-like things in the document, whether or not they are actual POD links.
Test::Pod::No404s by Apocalypse (APOCAL
) checks URL POD links.
ACKNOWLEDGMENTS
The author would like to acknowledge the following, without whom this module would not exist -- at least, not in anything like its current form.
Mohammed Anwar (MANWAR
) who submitted the "broken POD link" ticket that started me thinking about testing for this kind of thing.
The CPAN Testers who, by testing my code under such a broad range of configurations, gave me an opportunity to make this module much more robust than it would otherwise have been. It is probably unfair to single out individual testers, but as the luck of the testing cycle would have it, results from Andreas J. König (ANDK
), Slaven Rezić (SREZIC
), Chris Williams (BINGOS
), and Alceu Rodrigues de Freitas Junior were particularly useful to me.
SUPPORT
Support is by the author. Please file bug reports at https://rt.cpan.org/Public/Dist/Display.html?Name=Test-Pod-LinkCheck-Lite, https://github.com/trwyant/perl-Test-Pod-LinkCheck-Lite/issues, or in electronic mail to the author.
AUTHOR
Thomas R. Wyant, III wyant at cpan dot org
COPYRIGHT AND LICENSE
Copyright (C) 2019-2023 by Thomas R. Wyant, III
This program is free software; you can redistribute it and/or modify it under the same terms as Perl 5.10.0. For more details, see the full text of the licenses in the directory LICENSES.
This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose.