Changes for version 5.900 - 2012-12-15

  • Trial Release by Christopher J. Madsen
    • THINGS THAT MAY BREAK YOUR CODE OR TESTS
      • parse_file (and new_from_file) now try to determine the encoding automatically when given a filename (not a filehandle). To restore the old behavior, set the new encoding attribute to the empty string. To restore it globally, set $HTML::Element::default_encoding = ''.
    • ENHANCEMENTS
      • new_from_file & new_from_url now let you set parsing attributes
      • New shortcut constructor new_from_string is like new_from_content, but allows you to set parsing attributes
      • New shortcut constructor new_from_http for constructing a tree from the content of a HTTP::Message (or subclass like HTTP::Response)
      • Setting the new self_closed_tags attribute to 1 makes TreeBuilder handle XML-style self-closed tags (e.g. <a id="a1" />)
      • New child_nodes method makes for simpler recursion
      • New openw and encode_fh methods for writing a file with the correct encoding
    • DOCUMENTATION
      • new actually does take optional attributes (It has since at least 3.18, although undocumented, and it did not previously work with ignore_ignorable_whitespace.)
      • methods & attributes added in version 4.0 or later are now marked
      • don't recommend the traverse method; give recursive example (RT #48344)
    • TESTS
      • Add test for self_closed_tags attribute.
      • Clarify skip message in construct_tree.t (RT #81371)

Documentation

article: "User's View of Object-Oriented Modules"
article on tree-shaped data structures in Perl
article: "Scanning HTML"

Modules

functions that construct a HTML syntax tree
Class for objects that represent HTML elements
discussion of HTML::Element's traverse method
Deprecated, a wrapper around HTML::TreeBuilder
build and scan parse-trees of HTML
Parser that builds a HTML syntax tree