NAME
Daizu::HTML - functions for handling HTML and XHTML content
FUNCTIONS
The following functions are available for export from this module. None of them are exported by default.
- dom_body_to_html4($doc, [$start_node], [$end_node])
-
Given an XML::LibXML::Document object for an XHTML document fragment, whose root element should be
body, returns a string representation of the content in HTML 4 format.$start_nodeand$end_nodeare both independently optional. If either is present then only part of the document will be presented in the HTML output. Both must be eitherundefor a node from the root (body) element of the document.$start_nodeshould be the first node to be shown in the output, orundefto start from the beginning.$end_nodeshould be the node after the last node to be output, orundefto end at the end of the document. - dom_node_to_html4($node)
-
Used by the dom_body_to_html4() function above to process individual nodes. The argument should be an XML::LibXML::Node object of some kind. Returns a string containing HTML 4 code, which for example will have text properly escaped.
- dom_body_to_text($doc)
-
Given an XHTML body (as an XML::LibXML::Document object in the usually format) return a plain text version of the content, with some markup translatted into text formatting in a limited way to make it reasonably readable.
- dom_filtered_for_feeds($doc)
-
Return a new version of the article content in
$doc, with bits of markup which aren't relevant or might be unwelcome in feed content, such asscriptelements andstyleattributes. Also removespanelements because they're not needed when there's no custom styling, and Bloglines currently turns them into invalid HTML. Also removeclassattributes in case they cause some unexpected styling to be applied.In addition, any elements in the Daizu HTML extension namespace are removed. Elements in other non-XHTML namespaces will cause this function to fail. They shouldn't be there by the time the content is being output anyway.
Both
$docand the return value are XML::LibXML::Document objects of the kind returned by the article_doc() method in Daizu::File. The original DOM in$docis not altered. The return value is a completely independent copy. - absolutify_links($doc, $base_url)
-
Given an XHTML document (as an XML::LibXML::Document object), find all the attributes in the markup which are relative URLs and turn them into absolute URLs relative to
$base_url. This can be used to prepare content from an article to be published in a different place with a different URL, such as in an RSS feed or on an index page, while ensuring that any links or embedded files continue to work.The document's elements must be in the XHTML namespace, or they will be ignored.
TODO - some of this could be refactored with the link replacing stuff in Daizu::Preview to be more thorough. For now though it just works on 'a href' and 'img src', since that will catch almost all cases.
- html_escape_text($text)
-
Escape
$textin a way which makes it safe to include in the content of HTML or XML elements. The characters<,>, and&are escaped. Returns the new value.The output may not be suitable for including as the value of an HTML or XML attribute.
The return value is always formatted as bytes encoded in UTF-8.
- html_escape_attr($text)
-
Escape
$textin a way which makes it safe to include in the content of HTML or XML elements, or the values of HTML or XML attributes in double quotes. The characters<,>,&, and"are escaped. Returns the new value.The return value is always formatted as bytes encoded in UTF-8.
COPYRIGHT
This software is copyright 2006 Geoff Richards <geoff@laxan.com>. For licensing information see this page: