README - metacpan.org


            
              1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
              NAME
    HTML::HTML5::Parser - parse HTML reliably
SYNOPSIS
      use HTML::HTML5::Parser;
   
      my $parser = HTML::HTML5::Parser->new;
      my $doc    = $parser->parse_string(<<'EOT');
      <!doctype html>
      <title>Foo</title>
      <p><b><i>Foo</b> bar</i>.
      <p>Baz</br>Quux.
      EOT
   
      my $fdoc   = $parser->parse_file( $html_file_name );
      my $fhdoc  = $parser->parse_fh( $html_file_handle );
DESCRIPTION
    This library is substantially the same as the non-CPAN module
    Whatpm::HTML. Changes include:
    *       Provides an XML::LibXML-like DOM interface. If you usually use
            XML::LibXML's DOM parser, this should be a drop-in solution for
            tag soup HTML.
    *       Constructs an XML::LibXML::Document as the result of parsing.
    *       Via bundling and modifications, removed external dependencies on
            non-CPAN packages.
  Constructor
    "new"
              $parser = HTML::HTML5::Parser->new;
            The constructor does not do anything interesting.
  XML::LibXML-Compatible Methods
    "parse_file", "parse_html_file"
              $doc = $parser->parse_file( $html_file_name [,\%opts] );
            This function parses an HTML document from a file or network;
            $html_file_name can be either a filename or an URL.
            Options include 'encoding' to indicate file encoding (e.g.
            'utf-8') and 'user_agent' which should be a blessed
            "LWP::UserAgent" object to be used when retrieving URLs.
            If requesting a URL and the response Content-Type header
            indicates an XML-based media type (such as XHTML),
            XML::LibXML::Parser will be used automatically (instead of the
            tag soup parser). The XML parser can be told to use a DTD
            catalogue by setting the option 'xml_catalogue' to the filename
            of the catalogue.
            HTML (tag soup) parsing can be forced using the option
            'force_html', even when an XML media type is returned. If an
            options hashref was passed, parse_file will set
            $options->{'parser_used'} to the name of the class used to parse
            the URL, to allow the calling code to double-check which parser
            was used afterwards.
            If an options hashref was passed, parse_file will set
            $options->{'response'} to the HTTP::Response object obtained by
            retrieving the URI.
    "parse_fh", "parse_html_fh"
              $doc = $parser->parse_fh( $io_fh [,\%opts] );
            "parse_fh()" parses a IOREF or a subclass of "IO::Handle".
            Options include 'encoding' to indicate file encoding (e.g.
            'utf-8').
    "parse_string", "parse_html_string"
              $doc = $parser->parse_string( $html_string [,\%opts] );
            This function is similar to "parse_fh()", but it parses an HTML
            document that is available as a single string in memory.
            Options include 'encoding' to indicate file encoding (e.g.
            'utf-8').
    The push parser and SAX-based parser are not supported. Trying to change
    an option (such as recover_silently) will make HTML::HTML5::Parser carp
    a warning. (But you can inspect the options.)
  Additional Methods
    The module provides a few additional methods to obtain additional,
    non-DOM data from DOM nodes.
    "compat_mode"
              $mode = $parser->compat_mode( $doc );
            Returns 'quirks', 'limited quirks' or undef (standards mode).
    "dtd_public_id"
              $pubid = $parser->dtd_public_id( $doc );
            For an XML::LibXML::Document which has been returned by
            HTML::HTML5::Parser, using this method will tell you the Public
            Identifier of the DTD used (if any).
    "dtd_system_id"
              $sysid = $parser->dtd_system_id( $doc );
            For an XML::LibXML::Document which has been returned by
            HTML::HTML5::Parser, using this method will tell you the System
            Identifier of the DTD used (if any).
    "source_line"
              ($line, $col) = $parser->source_line( $node );
              $line = $parser->source_line( $node );
            In scalar context, "source_line" returns the line number of the
            source code that started a particular node (element, attribute
            or comment).
            In list context, returns a line/column pair. (Tab characters
            count as one column, not eight.)
SEE ALSO
    <http://suika.fam.cx/www/markup/html/whatpm/Whatpm/HTML.html>
AUTHOR
    Toby Inkster, <tobyink@cpan.org>
COPYRIGHT AND LICENSE
    Copyright (C) 2007-2010 by Wakaba
    Copyright (C) 2009-2011 by Toby Inkster
    This library is free software; you can redistribute it and/or modify it
    under the same terms as Perl itself, either Perl version 5.8.1 or, at
    your option, any later version of Perl 5 you may have available.
	Global
`s`	Focus search bar
`?`	Bring up this help dialog
	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)
	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse
	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)