NAME
Daizu::HTML - functions for handling HTML and XHTML content
FUNCTIONS
The following functions are available for export from this module. None of them are exported by default.
- dom_body_to_html4($doc, [$start_node], [$end_node])
-
Given an XML::LibXML::Document object for an XHTML document fragment, whose root element should be
body
, returns a string representation of the content in HTML 4 format.$start_node
and$end_node
are both independently optional. If either is present then only part of the document will be presented in the HTML output. Both must be eitherundef
or a node from the root (body
) element of the document.$start_node
should be the first node to be shown in the output, orundef
to start from the beginning.$end_node
should be the node after the last node to be output, orundef
to end at the end of the document. - dom_node_to_html4($node)
-
Used by the dom_body_to_html4() function above to process individual nodes. The argument should be an XML::LibXML::Node object of some kind. Returns a string containing HTML 4 code, which for example will have text properly escaped.
- dom_body_to_text($doc)
-
Given an XHTML body (as an XML::LibXML::Document object in the usually format) return a plain text version of the content, with some markup translatted into text formatting in a limited way to make it reasonably readable.
- dom_filtered_for_feeds($doc)
-
Return a new version of the article content in
$doc
, with bits of markup which aren't relevant or might be unwelcome in feed content, such asscript
elements andstyle
attributes. Also removespan
elements because they're not needed when there's no custom styling, and Bloglines currently turns them into invalid HTML. Also removeclass
attributes in case they cause some unexpected styling to be applied.In addition, any elements in the Daizu HTML extension namespace are removed. Elements in other non-XHTML namespaces will cause this function to fail. They shouldn't be there by the time the content is being output anyway.
Both
$doc
and the return value are XML::LibXML::Document objects of the kind returned by the article_doc() method in Daizu::File. The original DOM in$doc
is not altered. The return value is a completely independent copy. - absolutify_links($doc, $base_url)
-
Given an XHTML document (as an XML::LibXML::Document object), find all the attributes in the markup which are relative URLs and turn them into absolute URLs relative to
$base_url
. This can be used to prepare content from an article to be published in a different place with a different URL, such as in an RSS feed or on an index page, while ensuring that any links or embedded files continue to work.The document's elements must be in the XHTML namespace, or they will be ignored.
TODO - some of this could be refactored with the link replacing stuff in Daizu::Preview to be more thorough. For now though it just works on 'a href' and 'img src', since that will catch almost all cases.
- html_escape_text($text)
-
Escape
$text
in a way which makes it safe to include in the content of HTML or XML elements. The characters<
,>
, and&
are escaped. Returns the new value.The output may not be suitable for including as the value of an HTML or XML attribute.
The return value is always formatted as bytes encoded in UTF-8.
- html_escape_attr($text)
-
Escape
$text
in a way which makes it safe to include in the content of HTML or XML elements, or the values of HTML or XML attributes in double quotes. The characters<
,>
,&
, and"
are escaped. Returns the new value.The return value is always formatted as bytes encoded in UTF-8.
COPYRIGHT
This software is copyright 2006 Geoff Richards <geoff@laxan.com>. For licensing information see this page: