NAME

Perl6::Perldoc::Parser - Parse Perl 6's documentation mark-up language

VERSION

This document describes Perl6::Perldoc::Parser version 0.0.6

SYNOPSIS

  use Perl6::Perldoc::Parser;

  $representation = Perl6::Perldoc::Parser->parse($file, \%options);

  $errors   = $representation->{errors};
  $warnings = $representation->{warnings};

  $obj_tree = $representation->{tree};

DESCRIPTION

This module parses text marked up with the Perl 6 Pod notation and converts it to a hierarchical object-based representation.

MODULE INTERFACE

$rep = Perl6::Perldoc::Parser->parse($file, \%options)

The parse() method expects either:

  • a string containing the filename, or

  • a filehandle that's already open for input, or

  • a reference to a string that contains actual Pod mark-up

as its first argument. This argument is used as the source of the Pod to be parsed.

You may also (optionally) pass a reference to a hash of options as its second argument. The options that can be passed in this second argument are:

all_pod => $status

If $status is true, specifies that the entire text should be considered to be Pod. Any text not inside a Pod block will be treated as a plain paragraph or code block, rather than as ambient source code. Specifying this option is the same as placing a =begin pod/=end pod block around the entire text.

If $status is false, specifies that the text should be considered to be heterogeneous: a mixture of Pod and source code. Any text not inside a Pod block will be treated as ambient source code.

As a specical if $status is the string 'auto', the option will be automatically set by looking at the filename passed to parse(). If that filename ends in '.pod6' or '.pod', the option will be set true.

Defaults to false when parse() is passed a filehandle and 'auto' when parse() is passed a filename.

allow => \%allowed

Specifies that the formatting codes whose names appear as keys of the hash value are to be allowed within verbatim blocks. For example, to universally allow the E<> and L<> codes with otherwise verbatim text:

Perl6::Perldoc::Parser->parse($file, { allow => {E=>1, L=>1} });

Defaults to no allowed codes.

filename => $path

If the given Pod contains placement links with relative files to be inlined and the source path is not passed as the first argument then you must use this option to indicate the path of the source file. It will be used to resolve all relative paths to be inlined.

reset_id => $status

By default, the parser assigns unique identifiers (increasing integers) to each created object. These identifiers can later be used, for instance, to create unique links.

If $status is true, the parser will reset the identifier generator to start again from 1.

Text is read from the file and parsed as Perl 6 Pod. The method call returns a hash containing three entries:

$rep->{tree}

The root object of a hierarchical representation of the tree (see ""DOM INTERFACE").

$rep->{errors}

A reference to an array of error messages generated during the parse. If this array is non-empty then the parse failed and the resulting object tree is not guaranteed to be correct. It is suggested that if parse() returns a non-empty $rep->{errors} the application calling it should report the errors and abort.

$rep->{warnings}

A reference to an array of warning messages generated during the parse. If this array is non-empty the parse probably succeeded and the resulting object tree is very likely to be correct. It is suggested that if parse() returns a non-empty $rep->{warnings} the application calling it should report the warnings before continuing.

$rep->report_errors(@optional_message)

This method can be called on the hash returned by parse(). It prints to STDERR any errors or warnings returned from the parse and then throws an exception (containing the optional message) if there were any errors.

If there are no errors, it returns its own invocant, so it can be chained directly to the end of an actual parse:

$rep = Perl6::Perldoc::Parser->parse($file, \%options)
                             ->report_errors('Bad pod');

DOM INTERFACE

The class hierarchy of the objects returned by parse() is as follows:

(All classes prefixed with Perl6::Perldoc::)

Root
    File
    Ambient
    Directive
        Directive::config
        Directive::use
        Directive::encoding
    Block
        Block::pod
        Block::para
        Block::code
        Block::input
        Block::output
        Block::Named
            Block::Named::Whatever
            Block::Named::WhateverElse
            etc.
        Block::head
            Block::head1
            Block::head2
            Block::head3
            Block::head4
                Block::head5
                Block::head6
                etc.
        Block::list
        Block::item
            Block::item1
            Block::item2
            Block::item3
            etc.
        Block::nested
        Block::comment
        Block::table
        Block::table::Row
        Block::table::Cell
        Block::toclist
        Block::tocitem
            Block::tocitem1
            Block::tocitem2
            Block::tocitem3
            Block::tocitem4
            Block::tocitem5
        Block::Semantic
            Block::NAME
            Block::VERSION
            Block::SYNOPSIS
            Block::DESCRIPTION
            etc.
    FormattingCode
        FormattingCode::B
        FormattingCode::C
        FormattingCode::D
        FormattingCode::E
        FormattingCode::I
        FormattingCode::K
        FormattingCode::L
        FormattingCode::M
        FormattingCode::Named
            FormattingCode::Named::Whatever
            FormattingCode::Named::WhateverElse
            etc.
        FormattingCode::N
        FormattingCode::P
        FormattingCode::R
        FormattingCode::S
        FormattingCode::T
        FormattingCode::U
        FormattingCode::V
        FormattingCode::X
        FormattingCode::Z

Every class has a new() constructor, which expects its first argument to be a reference to a hash containing the parsed information for the block. The second, optional argument is a reference to a hash containing any of the global options that may be passed to Perl6::Perldoc::Parser::parse().

The Perl6::Perldoc::Root class (and hence every other class in the DOM hierrachy) has the following methods available, all of which are currently read-only accessors:

typename()

Returns the name of the block type, typically the same as the last component of the object's classname. Handy for text (re)generation, but consider using polymorphic methods instead of switching on this value.

style()

Returns the style of block that the object was generated from. The possibilities are:

'delimited'

The object was derived from a block that was specified in =begin/=end markers

'paragraph'

The object was derived from a block that was specified with a =for marker

'abbreviated'

The object was derived from a block that was specified using the short-form =typename syntax

'directive'

The object was derived from a =use, =config, or =encoding directive

'formatting'

The object was derived from a formatting code.

'implicit'

The object was created internally by the parser. Such objects are typically list containers, top-level pod blocks, or representations of raw code or text paragraph blocks.

content()

Returns a list of objects and/or strings representing the content of the block. Objects always represent nested blocks; strings are always unformatted text.

range()

Returns a reference to a hash specifying the range of lines in which the corresponding block was defined. The entries of the hash are:

$obj->range->{from}     # Line at which block opened
$obj->range->{to}       # Line at which block closed
$obj->range->{file}     # File in which block opened
number()

Returns the hierarchical number of the block within its block type. Will be undefined if the block was not numbered, so typically only meaningful for headers and list items.

is_semantic()

Returns true if the block is a standard semantic block.

is_verbatim()

Returns true if the block is verbatim (that is: a =code, C<>, or V<>)

is_numbered()

Returns true if the block has a :numbered option specified (either explicitly, or by preconfiguration).

is_post_numbered()

Returns true if the block is special in that its number should appear at the end of its content, not at the start. Typically this is true for certain types of semantic block (for example: =CHAPTER) where a rendering such as:

Chapter 1

makes more sense than:

1. Chapter

User-defined block are often defined to have their is_post_numbered() methods return true as well. For example:

for Image :numbered :caption<Our mascot> :source<file:images/camel.jpg>

is better captioned with post-numbering:

Image 7: Our mascot
config()

Returns a reference to a nested hash containing the configuration (i.e. =config) environment in effect for the block. Each top-level key of the hash is the name of a block type being configured, each second-level hash contains the configuration options for that block type.

option( $opt_name )

Returns the value of the named option for the specific block object. This value may be derived from an explicit option on the declaration, or implicitly from the configuration for the block.

term( \%options )

Returns the value of the "term" option of the block. Typically this will be undef unless the block is an =item.

The "term" value is normally returned as a raw string, but you can have the method return a fully parsed Pod subtree by specifying an option on the call:

$pod_tree = $item->term({ as_objects => 1 })
caption( \%options )

Returns the value of the "caption" option of the block. This is most often used for =table blocks, but any block may be given a caption.

The "caption" value is normally returned as a raw string, but you can have the method return a fully parsed Pod subtree by specifying an option on the call:

$pod_tree = $item->caption({ as_objects => 1 })

Some DOM classes offer additional methods, as follows:

Perl6::Perldoc::Directive::config

target()

Returns the typename of the block type that the corresponding =config directive configures.

Perl6::Perldoc::Block::table

rows()

Returns a list of Perl6::Perldoc::Block::table::Row objects, representing the rows of the table.

Perl6::Perldoc::Block::table::Row

cells()

Returns a list of Perl6::Perldoc::Block::table::Cell objects, representing the cells of the table row.

Perl6::Perldoc::Block::table::Cell

is_header()

Returns true if the corresponding table cell is in the header row.

Perl6::Perldoc::FormattingCode::D

synonyms()

Returns a list of strings containing the specified synonyms for the correspondinging D<> definition.

Perl6::Perldoc::FormattingCode::L

has_distinct_text()

Returns true if the formatting code was specified with both a display text and a seperate target URI. For example, the method would return true for an object representing:

L<The Perl development page|http://dev.perl.org>

but would return false for an object representing:

L<http://dev.perl.org>

Perl6::Perldoc::FormattingCode::L and Perl6::Perldoc::FormattingCode::P

target()

Returns a string containing the target URI of the L<> or P<> formatting code represented by the object.

Perl6::Perldoc::FormattingCode::X

entries()

Returns a list of strings or array references containing the index entries for the corresponding X<> formatting code.

In the special case of a X<self targeting entry> which also contains nested formatting (e.g. X<dE<eacute>jE<agrave> vu>), the entries() method returns a reference to an array containing alternating strings and FormattingCode objects.

DIAGNOSTICS

parse() can't open file %s

The parse() method expects as its first argument either an open filehandle or else a string containing a filename. If the argument isn't a filehandle, it's assumed to be a filename. This error indicates that assumption proved to be wrong and that something unexpected was passed instead.

Unable to open URI in '=use %s'

This parser only handles file:path and perl5:module style URIs in an =use directive. The Pod being parsed had something else.

Missing scheme specifier in M<> formatting code

An M<> formatting code must start with a scheme/class name, followed by a colon. For example:

M<Image: logo.gif>

That initial identifier was missing. For example:

M<logo.gif>
No =item%d before =item%d

An =item2 can only appear after an =item1; an =item3, only after an =item2; etc.

A common mistake that produces this error is to physically nest =item markers:

=begin item1
The choices are:
=item2 Tom Swift
=item2 Dick Wittington
=item2 Harry Houdini
=end item1

Items are not physically nested in Pod; they are logically nested. The workaround is to rewrite the Pod without nested items:

=item1 The choices are:
=item2 Tom Swift
=item2 Dick Wittington
=item2 Harry Houdini
Ignored explicit '=end END'

END blocks, no matter how they're specified, run from the line at which they're opened to the very end of the file. An explicit =end END is always ignored (and should be removed, because it's misleading).

Invalid '=end %s' (not in %s block)

The parser came across an =end marker for a block that isn't open at that point. This is usually caused by either misspelling the block name, or accidentally closing an outer block before an inner one.

Possible attempt to specify extra options too late in %s block

Extra options on a block are specified by lines immediately after the block declarator that start with an =, followed by whitespace:

=begin SomeBlock :option(1)
=                :extra_option
=                :yet_another<here>

As soon as a line that doesn't start with an = is encountered, the rest of the block is considered to be content. So any line that begins with an = after that point is content, not configuration:

=begin SomeBlock :option(1)

= :this<content> :!config

Such lines are reported as possible mistakes.

Unknown reserved block type (%s)

Block names that consist of entirely uppercase or entirely lowercase identifiers are reserved for Pod itself. User-defined block types must be mixed-case. The Pod that was parsed contained an reserved identifier that the parser did not recognize. This is reported as a possible future-compatibility problem.

Trailing junk after %s

Any options on a block must be specified in the Perl 6 :name(value) option syntax (or any of its variations). Anything else on an option line is invalid, and reported as "trailing junk".

No closing delimiter for %s block opened at line %s

There was an unbalanced =begin in the Pod. This is often caused by typos in the (supposedly) matching =end directive.

Multivalued accessor %s called in scalar context

Some DOM object accessor methods (for example: Perl6::Perldoc::Root::content()) return a list of values in list context. If these accessors are called in scalar context, only the first value in the list is returned. However, if there is more than one value in the list, a scalar-context call is a source of potential errors, so this warning is issued.

Internal error: %s

The module's internal diagnostics detected a problem in the implementation itself. There is nothing you can do about this error, except report it.

CONFIGURATION AND ENVIRONMENT

Perl6::Perldoc::Parser requires no configuration files or environment variables.

DEPENDENCIES

version.pm

INCOMPATIBILITIES

None reported.

LIMITATIONS

  • This parser does not currently fully support =use directives. In particular, only the forms:

    =use file:path/to/file
    =use      path/to/file

    and:

    =use perl5:Module::Name  :options(here)
    =use       Module::Name  :options(here)

    are supported.

  • The =encoding directive is parsed and internally represented, but ignored.

BUGS

  • The parser does not assume a default encoding of UTF-8 (as per the specification in Synopsis 26).

Please report any bugs or feature requests to bug-perldoc-parser@rt.cpan.org, or through the web interface at http://rt.cpan.org.

AUTHOR

Damian Conway <DCONWAY@CPAN.org>

LICENCE AND COPYRIGHT

Copyright (c) 2006, Damian Conway <DCONWAY@CPAN.org>. All rights reserved.

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See perlartistic.

DISCLAIMER OF WARRANTY

BECAUSE THIS SOFTWARE IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE SOFTWARE, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE SOFTWARE "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE SOFTWARE IS WITH YOU. SHOULD THE SOFTWARE PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR, OR CORRECTION.

IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE SOFTWARE AS PERMITTED BY THE ABOVE LICENCE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE SOFTWARE (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE SOFTWARE TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.