NAME
Perl6::Perldoc::Parser - Parse Perl 6's documentation mark-up language
VERSION
This document describes Perl6::Perldoc::Parser version 0.0.6
SYNOPSIS
use Perl6::Perldoc::Parser;
$representation = Perl6::Perldoc::Parser->parse($file, \%options);
$errors = $representation->{errors};
$warnings = $representation->{warnings};
$obj_tree = $representation->{tree};
DESCRIPTION
This module parses text marked up with the Perl 6 Pod notation and converts it to a hierarchical object-based representation.
MODULE INTERFACE
$rep = Perl6::Perldoc::Parser->parse($file, \%options)
The parse()
method expects either:
a string containing the filename, or
a filehandle that's already open for input, or
a reference to a string that contains actual Pod mark-up
as its first argument. This argument is used as the source of the Pod to be parsed.
You may also (optionally) pass a reference to a hash of options as its second argument. The options that can be passed in this second argument are:
all_pod => $status
-
If $status is true, specifies that the entire text should be considered to be Pod. Any text not inside a Pod block will be treated as a plain paragraph or code block, rather than as ambient source code. Specifying this option is the same as placing a
=begin pod
/=end pod
block around the entire text.If $status is false, specifies that the text should be considered to be heterogeneous: a mixture of Pod and source code. Any text not inside a Pod block will be treated as ambient source code.
As a specical if $status is the string
'auto'
, the option will be automatically set by looking at the filename passed toparse()
. If that filename ends in '.pod6' or '.pod', the option will be set true.Defaults to false when
parse()
is passed a filehandle and'auto'
whenparse()
is passed a filename. allow => \%allowed
-
Specifies that the formatting codes whose names appear as keys of the hash value are to be allowed within verbatim blocks. For example, to universally allow the
E<>
andL<>
codes with otherwise verbatim text:Perl6::Perldoc::Parser->parse($file, { allow => {E=>1, L=>1} });
Defaults to no allowed codes.
filename => $path
-
If the given Pod contains placement links with relative files to be inlined and the source path is not passed as the first argument then you must use this option to indicate the path of the source file. It will be used to resolve all relative paths to be inlined.
reset_id => $status
-
By default, the parser assigns unique identifiers (increasing integers) to each created object. These identifiers can later be used, for instance, to create unique links.
If $status is true, the parser will reset the identifier generator to start again from 1.
Text is read from the file and parsed as Perl 6 Pod. The method call returns a hash containing three entries:
$rep->{tree}
-
The root object of a hierarchical representation of the tree (see ""DOM INTERFACE").
$rep->{errors}
-
A reference to an array of error messages generated during the parse. If this array is non-empty then the parse failed and the resulting object tree is not guaranteed to be correct. It is suggested that if
parse()
returns a non-empty$rep->{errors}
the application calling it should report the errors and abort. $rep->{warnings}
-
A reference to an array of warning messages generated during the parse. If this array is non-empty the parse probably succeeded and the resulting object tree is very likely to be correct. It is suggested that if
parse()
returns a non-empty$rep->{warnings}
the application calling it should report the warnings before continuing.
$rep->report_errors(@optional_message)
This method can be called on the hash returned by parse()
. It prints to STDERR any errors or warnings returned from the parse and then throws an exception (containing the optional message) if there were any errors.
If there are no errors, it returns its own invocant, so it can be chained directly to the end of an actual parse:
$rep = Perl6::Perldoc::Parser->parse($file, \%options)
->report_errors('Bad pod');
DOM INTERFACE
The class hierarchy of the objects returned by parse()
is as follows:
(All classes prefixed with Perl6::Perldoc::)
Root
File
Ambient
Directive
Directive::config
Directive::use
Directive::encoding
Block
Block::pod
Block::para
Block::code
Block::input
Block::output
Block::Named
Block::Named::Whatever
Block::Named::WhateverElse
etc.
Block::head
Block::head1
Block::head2
Block::head3
Block::head4
Block::head5
Block::head6
etc.
Block::list
Block::item
Block::item1
Block::item2
Block::item3
etc.
Block::nested
Block::comment
Block::table
Block::table::Row
Block::table::Cell
Block::toclist
Block::tocitem
Block::tocitem1
Block::tocitem2
Block::tocitem3
Block::tocitem4
Block::tocitem5
Block::Semantic
Block::NAME
Block::VERSION
Block::SYNOPSIS
Block::DESCRIPTION
etc.
FormattingCode
FormattingCode::B
FormattingCode::C
FormattingCode::D
FormattingCode::E
FormattingCode::I
FormattingCode::K
FormattingCode::L
FormattingCode::M
FormattingCode::Named
FormattingCode::Named::Whatever
FormattingCode::Named::WhateverElse
etc.
FormattingCode::N
FormattingCode::P
FormattingCode::R
FormattingCode::S
FormattingCode::T
FormattingCode::U
FormattingCode::V
FormattingCode::X
FormattingCode::Z
Every class has a new()
constructor, which expects its first argument to be a reference to a hash containing the parsed information for the block. The second, optional argument is a reference to a hash containing any of the global options that may be passed to Perl6::Perldoc::Parser::parse()
.
The Perl6::Perldoc::Root
class (and hence every other class in the DOM hierrachy) has the following methods available, all of which are currently read-only accessors:
typename()
-
Returns the name of the block type, typically the same as the last component of the object's classname. Handy for text (re)generation, but consider using polymorphic methods instead of switching on this value.
style()
-
Returns the style of block that the object was generated from. The possibilities are:
'delimited'
-
The object was derived from a block that was specified in
=begin
/=end
markers 'paragraph'
-
The object was derived from a block that was specified with a
=for
marker 'abbreviated'
-
The object was derived from a block that was specified using the short-form
=typename
syntax 'directive'
-
The object was derived from a
=use
,=config
, or=encoding
directive 'formatting'
-
The object was derived from a formatting code.
'implicit'
-
The object was created internally by the parser. Such objects are typically list containers, top-level pod blocks, or representations of raw code or text paragraph blocks.
content()
-
Returns a list of objects and/or strings representing the content of the block. Objects always represent nested blocks; strings are always unformatted text.
range()
-
Returns a reference to a hash specifying the range of lines in which the corresponding block was defined. The entries of the hash are:
$obj->range->{from} # Line at which block opened $obj->range->{to} # Line at which block closed $obj->range->{file} # File in which block opened
number()
-
Returns the hierarchical number of the block within its block type. Will be undefined if the block was not numbered, so typically only meaningful for headers and list items.
is_semantic()
-
Returns true if the block is a standard semantic block.
is_verbatim()
-
Returns true if the block is verbatim (that is: a
=code
,C<
>, orV<
>) is_numbered()
-
Returns true if the block has a
:numbered
option specified (either explicitly, or by preconfiguration). is_post_numbered()
-
Returns true if the block is special in that its number should appear at the end of its content, not at the start. Typically this is true for certain types of semantic block (for example:
=CHAPTER
) where a rendering such as:Chapter 1
makes more sense than:
1. Chapter
User-defined block are often defined to have their
is_post_numbered()
methods return true as well. For example:for Image :numbered :caption<Our mascot> :source<file:images/camel.jpg>
is better captioned with post-numbering:
Image 7: Our mascot
config()
-
Returns a reference to a nested hash containing the configuration (i.e.
=config
) environment in effect for the block. Each top-level key of the hash is the name of a block type being configured, each second-level hash contains the configuration options for that block type. option( $opt_name )
-
Returns the value of the named option for the specific block object. This value may be derived from an explicit option on the declaration, or implicitly from the configuration for the block.
term( \%options )
-
Returns the value of the "term" option of the block. Typically this will be
undef
unless the block is an=item
.The "term" value is normally returned as a raw string, but you can have the method return a fully parsed Pod subtree by specifying an option on the call:
$pod_tree = $item->term({ as_objects => 1 })
-
Returns the value of the "caption" option of the block. This is most often used for
=table
blocks, but any block may be given a caption.The "caption" value is normally returned as a raw string, but you can have the method return a fully parsed Pod subtree by specifying an option on the call:
$pod_tree = $item->caption({ as_objects => 1 })
Some DOM classes offer additional methods, as follows:
Perl6::Perldoc::Directive::config
target()
-
Returns the typename of the block type that the corresponding
=config
directive configures.
Perl6::Perldoc::Block::table
rows()
-
Returns a list of
Perl6::Perldoc::Block::table::Row
objects, representing the rows of the table.
Perl6::Perldoc::Block::table::Row
cells()
-
Returns a list of
Perl6::Perldoc::Block::table::Cell
objects, representing the cells of the table row.
Perl6::Perldoc::Block::table::Cell
Perl6::Perldoc::FormattingCode::D
synonyms()
-
Returns a list of strings containing the specified synonyms for the correspondinging
D<>
definition.
Perl6::Perldoc::FormattingCode::L
has_distinct_text()
-
Returns true if the formatting code was specified with both a display text and a seperate target URI. For example, the method would return true for an object representing:
L<The Perl development page|http://dev.perl.org>
but would return false for an object representing:
L<http://dev.perl.org>
Perl6::Perldoc::FormattingCode::L and Perl6::Perldoc::FormattingCode::P
target()
-
Returns a string containing the target URI of the
L<>
orP<>
formatting code represented by the object.
Perl6::Perldoc::FormattingCode::X
entries()
-
Returns a list of strings or array references containing the index entries for the corresponding
X<>
formatting code.In the special case of a
X<self targeting entry>
which also contains nested formatting (e.g.X<dE<eacute>jE<agrave> vu>
), theentries()
method returns a reference to an array containing alternating strings and FormattingCode objects.
DIAGNOSTICS
- parse() can't open file %s
-
The
parse()
method expects as its first argument either an open filehandle or else a string containing a filename. If the argument isn't a filehandle, it's assumed to be a filename. This error indicates that assumption proved to be wrong and that something unexpected was passed instead. - Unable to open URI in '=use %s'
-
This parser only handles
file:path
andperl5:module
style URIs in an=use
directive. The Pod being parsed had something else. - Missing scheme specifier in M<> formatting code
-
An
M<>
formatting code must start with a scheme/class name, followed by a colon. For example:M<Image: logo.gif>
That initial identifier was missing. For example:
M<logo.gif>
- No =item%d before =item%d
-
An
=item2
can only appear after an=item1
; an=item3
, only after an=item2
; etc.A common mistake that produces this error is to physically nest
=item
markers:=begin item1 The choices are: =item2 Tom Swift =item2 Dick Wittington =item2 Harry Houdini =end item1
Items are not physically nested in Pod; they are logically nested. The workaround is to rewrite the Pod without nested items:
=item1 The choices are: =item2 Tom Swift =item2 Dick Wittington =item2 Harry Houdini
- Ignored explicit '=end END'
-
END
blocks, no matter how they're specified, run from the line at which they're opened to the very end of the file. An explicit=end END
is always ignored (and should be removed, because it's misleading). - Invalid '=end %s' (not in %s block)
-
The parser came across an
=end
marker for a block that isn't open at that point. This is usually caused by either misspelling the block name, or accidentally closing an outer block before an inner one. - Possible attempt to specify extra options too late in %s block
-
Extra options on a block are specified by lines immediately after the block declarator that start with an
=
, followed by whitespace:=begin SomeBlock :option(1) = :extra_option = :yet_another<here>
As soon as a line that doesn't start with an
=
is encountered, the rest of the block is considered to be content. So any line that begins with an=
after that point is content, not configuration:=begin SomeBlock :option(1) = :this<content> :!config
Such lines are reported as possible mistakes.
- Unknown reserved block type (%s)
-
Block names that consist of entirely uppercase or entirely lowercase identifiers are reserved for Pod itself. User-defined block types must be mixed-case. The Pod that was parsed contained an reserved identifier that the parser did not recognize. This is reported as a possible future-compatibility problem.
- Trailing junk after %s
-
Any options on a block must be specified in the Perl 6
:name(value)
option syntax (or any of its variations). Anything else on an option line is invalid, and reported as "trailing junk". - No closing delimiter for %s block opened at line %s
-
There was an unbalanced
=begin
in the Pod. This is often caused by typos in the (supposedly) matching=end
directive. - Multivalued accessor %s called in scalar context
-
Some DOM object accessor methods (for example:
Perl6::Perldoc::Root::content()
) return a list of values in list context. If these accessors are called in scalar context, only the first value in the list is returned. However, if there is more than one value in the list, a scalar-context call is a source of potential errors, so this warning is issued. - Internal error: %s
-
The module's internal diagnostics detected a problem in the implementation itself. There is nothing you can do about this error, except report it.
CONFIGURATION AND ENVIRONMENT
Perl6::Perldoc::Parser requires no configuration files or environment variables.
DEPENDENCIES
version.pm
INCOMPATIBILITIES
None reported.
LIMITATIONS
This parser does not currently fully support
=use
directives. In particular, only the forms:=use file:path/to/file =use path/to/file
and:
=use perl5:Module::Name :options(here) =use Module::Name :options(here)
are supported.
The
=encoding
directive is parsed and internally represented, but ignored.
BUGS
The parser does not assume a default encoding of UTF-8 (as per the specification in Synopsis 26).
Please report any bugs or feature requests to bug-perldoc-parser@rt.cpan.org
, or through the web interface at http://rt.cpan.org.
AUTHOR
Damian Conway <DCONWAY@CPAN.org>
LICENCE AND COPYRIGHT
Copyright (c) 2006, Damian Conway <DCONWAY@CPAN.org>
. All rights reserved.
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See perlartistic.
DISCLAIMER OF WARRANTY
BECAUSE THIS SOFTWARE IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE SOFTWARE, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE SOFTWARE "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE SOFTWARE IS WITH YOU. SHOULD THE SOFTWARE PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR, OR CORRECTION.
IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE SOFTWARE AS PERMITTED BY THE ABOVE LICENCE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE SOFTWARE (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE SOFTWARE TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.