NAME
HTML::Blitz - high-performance, selector-based, content-aware HTML template engine
SYNOPSIS
use HTML::Blitz ();
my $blitz = HTML::Blitz->new;
$blitz->add_rules(@rules);
my $template = $blitz->apply_to_file("template.html");
my $html = $template->process($variables);
my $fn = $template->compile_to_sub;
my $html = $fn->($variables);
DESCRIPTION
HTML::Blitz is a high-performance, CSS-selector-based, content-aware template engine for HTML5. Let's unpack that:
You want to generate web pages. Those are written in HTML5.
Your HTML documents are mostly static in nature, but some parts need to be filled in dynamically (often with data obtained from a database query). This is where a template engine shines.
(On the other hand, if you prefer to generate your HTML completely dynamically with ad-hoc code, but you still want to be safe from HTML injection and XSS vulnerabilities, have a look at HTML::Blitz::Builder.)
Most template systems are content agnostic: They can be used for pretty much any format or language as long as it is textual.
HTML::Blitz is different. It is restricted to HTML, but that also means it understands more about the documents it processes, which eliminates certain classes of bugs. (For example, HTML::Blitz will never produce mismatched tags or forget to properly encode HTML entities.)
The format for HTML::Blitz template files is plain HTML. Instead of embedding special template directives in the source document (like with most other template systems), you write a separate piece of Perl code that instructs HTML::Blitz to fill in or repeat elements of the source document. Those elements are targeted with CSS selectors.
Having written the HTML document template and the corresponding processing rules (consisting of CSS selectors and actions to be applied to matching elements), you then compile them together into an HTML::Blitz::Template object. This object provides functions that take a set of input values, insert them into the document template, and return the finished HTML page.
This latter step is quite fast. See "PERFORMANCE" for details.
General flow
In a typical web application, HTML::Blitz is intended to be used in the following way ("compile on startup"):
When the application starts up, do the following steps:
For each template, create an HTML::Blitz object by calling "new".
Tell the object what rules to apply, either by passing them to "new", or by calling "add_rules" afterwards (or both). This doesn't do much yet; it just accumulates rules inside the object.
Apply the rules to the source document by calling "apply_to_file" (if the source document is stored in a file) or "apply_to_html" (if you have the source document in a string). This gives you an HTML::Blitz::Template object.
Turn the HTML::Blitz::Template object into a function by calling "compile_to_sub" in HTML::Blitz::Template. Stash the function away somewhere.
(The previous steps are meant to be performed once, when the application starts up and initializes.)
When a request comes in, retrieve the corresponding template function from where you stashed it in step 4, then call it with the set of variables you want to use to populate the template document. The result is the finished HTML page.
Alternatively, if your application is not persistent (e.g. because it exits after processing each request, like a CGI script) or if you just don't want to spend time recompiling each template on startup, you can use a different model ("precompiled") as follows:
In a separate script, run steps 1 to 3 from the list above in advance.
Serialize each template to a string by calling "compile_to_string" in HTML::Blitz::Template and store it where you can load it back later, e.g. in a database or on disk. In the latter case, you can simply call "compile_to_file" in HTML::Blitz::Template directly.
Take care to recompile your templates as needed by rerunning steps 1 and 2 each time the source documents or processing rules change.
In your application, load your template functions by
eval
'ing the code stored in step 2. In the case of files, you can simply use "do EXPR" in perlfunc. The return value will be a subroutine reference.Call your template functions as described in step 5 above.
Processing model
Conceptually, HTML::Blitz operates in two phases: First all selectors are tested against the source document and their matches recorded. Then, in the second phase, all matching actions are applied.
Consider the following document fragment:
<div class="foo"> ... </div>
And these rules:
[ 'div' => ['remove_all_attributes'] ],
[ '.foo' => ['replace_inner_text', 'Hello!'] ],
The second rule matches against the class
attribute, but the first rule removes all attributes. However, it doesn't matter in what order you define these rules: Both selectors are matched first, and then both actions are applied together. The attribute removal does not prevent the second rule from matching. The result will always come out as:
<div>Hello!</div>
In cases where multiple actions apply to the same element, all actions are run, but their order is unspecified. Consider the following document fragment:
<div class="foo"> ... </div>
And these rules:
[ 'div' => ['replace_inner_text', 'A'], ['replace_inner_text', 'B'] ],
[ '.foo' => ['replace_inner_text', 'B'] ],
All three actions will run and replace the contents of the div
element, but since their order is unspecified, you may end up with any of the following three results (depending on which action runs last):
<div class="foo">A</div>
or
<div class="foo">B</div>
or
<div class="foo">C</div>
Implementation details (results not guaranteed, your mileage may vary, void where prohibited, not financial advice): The current implementation tries to maximize an internal metric called "unhelpfulness". Consider the following document fragment:
<img class="profile" src="dummy.jpg">
And these actions:
['remove_all_attributes'], #1
['set_attribute_text', src => 'kitten.jpg'], #2
['set_attribute_text', alt => "Photo of a sleeping kitten"], #3
['transform_attribute_sub', src => sub { "/media/images/$_[0]" }], #4
Clearly the most sensible way to arrange these actions is from #1 to #4; first removing all existing attributes, giving <img>
, then gradually setting new attributes, giving <img src="kitten.jpg" alt="Photo of a sleeping kitten">
, and finally transforming them. This would result in:
<img src="/media/images/kitten.jpg" alt="Photo of a sleeping kitten">
However, that's too helpful.
In order to maximize unhelpfulness, you would apply these actions from #4 back to #1; first transforming the src
attribute, giving <img class="profile" src="/media/images/dummy.jpg">
, then adding/overwriting other attributes, giving <img class="profile" src="kitten.jpg" alt="Photo of a sleeping kitten">
, and finally removing all attributes. This would result in:
<img>
And that's what HTML::Blitz actually does.
METHODS
new
my $blitz = HTML::Blitz->new;
my $blitz = HTML::Blitz->new(\%options);
my $blitz = HTML::Blitz->new(@rules);
my $blitz = HTML::Blitz->new(\%options, @rules);
Creates a new HTML::Blitz
object.
You can optionally specify initial options by passing a hash reference as the first argument. The following keys are supported:
- keep_doctype
-
Default: true
By default,
<!DOCTYPE html>
declarations in template files are retained. If you set this option to a false value, they are removed instead. - keep_comments_re
-
Default:
qr/\A/
By default, HTML comments in template files are retained. This option accepts a regex object (as created by
qr//
), which is matched against the contents of all HTML comments. Only those that match the regex are retained; all others are removed.For example, to remove all comments except for copyright notices, you could use the following:
HTML::Blitz->new({ keep_comments_re => qr/ \(c\) | \b copyright \b | \N{COPYRIGHT SIGN} /xi, })
If you want to invert this functionality, e.g. to remove comments containing
DELETEME
and keep everything else, use negative look-ahead:HTML::Blitz->new({ keep_comments_re => qr/\A(?!.*DELETEME)/s, })
- dummy_marker_re
-
Default:
qr/\A(?!)/
Sometimes you might have dummy content or filler text in your templates that is intended to be replaced by your processing rules (like "Lorem ipsum" or user details for "Firstname Lastname"). To make sure all such instances are actually found and replaced by your processing rules, come up with a distinctive piece of marker text (e.g. XXX), include it in all of your dummy content, and pass a regex object (as created by
qr//
) that detects it. For example:HTML::Blitz->new({ dummy_marker_re => qr/\bXXX\b/, })
If any of the attribute values or plain text parts of your source template match this regex, template processing will stop and an exception will be thrown.
Note that this only applies to text from the template; strings that are substituted in by your processing rules are not checked.
The default behavior is to not detect/reject dummy content.
All other arguments are interpreted as processing rules:
my $blitz = HTML::Blitz->new(@rules);
is just a shorter way to write
my $blitz = HTML::Blitz->new;
$blitz->add_rules(@rules);
See "add_rules".
set_keep_doctype
$blitz->set_keep_doctype(1);
$blitz->set_keep_doctype(0);
Turns the "keep_doctype" option on/off. See the description of "new" for details.
set_keep_comments_re
$blitz->set_keep_comments_re( qr/copyright/i );
Sets the "keep_comments_re" option. See the description of "new" for details.
set_dummy_marker_re
$blitz->set_dummy_marker_re( qr/\bXXX\b/ );
Sets the "dummy_marker_re" option. See the description of "new" for details.
add_rules
$blitz->add_rules(
[ 'a.info, a.next' =>
[ set_attribute_text => 'href', 'https://example.com/' ],
[ replace_inner_text => "click here" ],
],
[ '#list-container' =>
[ repeat_inner => 'list',
[ '.name' => [ replace_inner_var => 'name' ] ],
[ '.group' => [ replace_inner_var => 'group' ] ],
[ 'hr' => ['separator'] ],
],
],
);
The add_rules
method adds processing rules to the HTML::Blitz
object. It accepts any number of rules (even 0, but calling it without arguments is a no-op).
A rule is an array reference whose first element is a selector and whose remaining elements are processing actions. The actions will be applied to all HTML elements in the template document that match the selector.
A selector is a CSS selector in the form of a string.
An action is an array reference whose first element is a string that specifies the type of the action; the remaining elements are arguments. Different types of actions take different kinds of arguments.
In the current implementation, only a subset of the full CSS selector specification is supported. In particular, anything involving selector combinators (i.e. S1 S2
, S1 > S2
, S1 ~ S2
, or S1 + S2
) is not implemented. A supported selector is a comma-separated list of one or more simple selector sequences. It matches any element matched by any of the sequences in the list.
A simple selector sequence is a sequence of one or more simple selectors separated by nothing (not even whitespace). If a universal or type selector is present, it must come first in the sequence. A sequence matches any element that is matched by all of the simple selectors in the sequence.
A simple selector is one of the following:
- universal selector
-
The universal selector
*
matches all elements. It is generally redundant and can be omitted unless it is the only component of a selector sequence. (Selector sequences cannot be empty.) - type selector
-
A type selector consists of a name. It matches all elements of that name. For example, a selector of
form
matches all form elements,p
matches all paragraph elements, etc. - attribute presence selector
-
A selector of the form
[FOO]
(whereFOO
is a CSS identifier) matches all elements that have aFOO
attribute. - attribute value selector
-
A selector of the form
[FOO=BAR]
(whereBAR
is a CSS identifier or a CSS string in single or double quotes) matches all elements that have aFOO
attribute whose value is exactlyBAR
. - attribute prefix selector
-
A selector of the form
[FOO^=BAR]
(whereBAR
is a CSS identifier or a CSS string in single or double quotes) matches all elements that have aFOO
attribute whose value starts withBAR
. However, ifBAR
is the empty string (i.e. the selector looks like[FOO^=""]
orFOO^='']
), then it matches nothing. - attribute suffix selector
-
A selector of the form
[FOO$=BAR]
(whereBAR
is a CSS identifier or a CSS string in single or double quotes) matches all elements that have aFOO
attribute whose value ends withBAR
. However, ifBAR
is the empty string (i.e. the selector looks like[FOO$=""]
orFOO$='']
), then it matches nothing. - attribute infix selector
-
A selector of the form
[FOO*=BAR]
(whereBAR
is a CSS identifier or a CSS string in single or double quotes) matches all elements that have aFOO
attribute whose value containsBAR
as a substring. However, ifBAR
is the empty string (i.e. the selector looks like[FOO*=""]
orFOO*='']
), then it matches nothing. - attribute word selector
-
A selector of the form
[FOO~=BAR]
(whereBAR
is a CSS identifier or a CSS string in single or double quotes) matches all elements that have aFOO
attribute whose value is a list of whitespace-separated words, one of which is exactlyBAR
. - attribute language prefix selector
-
A selector of the form
[FOO|=BAR]
(whereBAR
is a CSS identifier or a CSS string in single or double quotes) matches all elements that have aFOO
attribute whose value is either exactlyBAR
or starts withBAR
followed by a-
(minus) character. For example,[lang|=en]
would match an attribute of the formlang="en"
, but alsolang="en-us"
,lang="en-uk"
,lang="en-fr"
, etc. - class selector
-
A selector of the form
.FOO
(whereFOO
is a CSS identifier) matches all elements whoseclass
attribute contains a list of whitespace-separated words, one of which is exactlyFOO
. It is equivalent to[class~=FOO]
. - identity selector
-
A selector of the form
#FOO
(whereFOO
is a CSS name) matches all elements whoseid
attribute is exactlyFOO
. It is equivalent to[id=FOO]
. - nth child selector
-
A selector of the form
:nth-child(An+B)
or:nth-child(An-B)
(whereA
andB
are integers) matches all elements that are the An+Bth (or An-Bth, respectively) child of their parent element, for any non-negative integer n. For the purposes of this selector, counting starts at 1.The full syntax is a bit more complicated:
A
can be negative; ifA
is 1, it can be omitted (i.e.1n
can be shortened to justn
); ifA
is 0, the wholeAn
part can be omitted; ifB
is 0, the+B
(or-B
) part can be omitted unless theAn
part is also gone;n
can also be writtenN
.In short, all of these are valid arguments to
:nth-child
:3n+1 3n-2 -4n+7 2n 9 n-2 1n-0
In addition, the special keywords
odd
andeven
are also accepted.:nth-child(odd)
is equivalent to:nth-child(2n+1)
and:nth-child(even)
is equivalent to:nth-child(2n)
. - nth child of type selector
-
A selector of the form
:nth-of-type(An+B)
or:nth-of-type(An-B)
(whereA
andB
are integers) matches all elements that are the An+Bth (or An-Bth, respectively) child of their parent element, only counting elements of the same type, for any non-negative integer n. Counting starts at 1.It accepts the same argument syntax as the "nth child selector", which see for details.
For example,
span:nth-of-type(3)
matches everyspan
element whose list of preceding sibling contains exactly two elements of typespan
. - first child selector
-
A selector of the form
:first-child
matches all elements that have no preceding sibling elements. It is equivalent to:nth-child(1)
. - first child of type selector
-
A selector of the form
:first-of-type
matches all elements that have no preceding sibling elements of the same type. It is equivalent to:nth-of-type(1)
. - negated selector
-
A selector of the form
:not(FOO)
(whereFOO
is any simple selector excluding the negated selector itself) matches all elements that are not matched byFOO
.For example,
img:not([alt])
matches allimg
elements without analt
attribute, and:not(*)
matches nothing.
Other selectors or pseudo-classes are not currently implemented.
In the following section, a variable name refers to a string that starts with a letter or _
) (underscore), followed by 0 or more letters, _
, digits, .
, or -
. Template variables identify sections that are filled in later when the template is expanded (at runtime, so to speak).
The following types of actions are available:
['remove']
-
Removes the matched element. Equivalent to
['replace_outer_text', '']
. ['remove_inner']
-
Removes the contents of the matched element, leaving it empty. Equivalent to
['replace_inner_text', '']
. ['remove_if', VAR]
-
Removes the matched element if VAR (a runtime variable) contains a true value.
['replace_inner_text', STR]
-
Replaces the contents of the matched element by the fixed string STR.
['replace_inner_var', VAR]
-
Replaces the contents of the matched element by the value of the runtime variable VAR, which is interpreted as plain text (and properly HTML escaped).
['replace_inner_template', TEMPLATE]
-
Replaces the contents of the matched element by TEMPLATE, which must be an instance of HTML::Blitz::Template. This action lets you include a sub-template as part of an outer template; all variables of the inner template become variables of the outer template.
['replace_inner_dyn_builder', VAR]
-
Replaces the contents of the matched element by the value of the runtime variable VAR, which must be an instance of HTML::Blitz::Builder. This is the only way to incorporate dynamic HTML in a template (without interpreting the HTML code as text and escaping everything).
['replace_outer_text', STR]
-
Replaces the matched element (and all of its contents) by the fixed string STR.
['replace_outer_var', VAR]
-
Replaces the matched element (and all of its contents) by the value of the runtime variable VAR, which is interpreted as plain text (and properly HTML escaped).
['replace_outer_template', TEMPLATE]
-
Replaces the matched element by TEMPLATE, which must be an instance of HTML::Blitz::Template. This action lets you include a sub-template as part of an outer template; all variables of the inner template become variable of the outer template.
['replace_outer_dyn_builder', VAR]
-
Replaces the matched element by the value of the runtime variable VAR, which must be an instance of HTML::Blitz::Builder. This is the only way to incorporate dynamic HTML in a template (without interpreting the HTML code as text and escaping everything).
['transform_inner_sub', SUB]
-
Collects the text contents of the matched element and all of its descendants in a string and passes it to SUB, which must be a code reference (or an object with an overloaded
&{}
operator). The returned string replaces the previous contents of the matched element.It is analogous to
elem.textContent = SUB(elem.textContent)
in JavaScript. ['transform_inner_var', VAR]
-
Collects the text contents of the matched element and all of its descendants in a string and passes it to the runtime variable VAR, which must be a code reference (or an object with an overloaded
&{}
operator). The returned string replaces the previous contents of the matched element. ['transform_outer_sub', SUB]
-
Collects the text contents of the matched element and all of its descendants in a string and passes it to SUB, which must be a code reference (or an object with an overloaded
&{}
operator). The returned string replaces the entire matched element. (Thus, if SUB returns an empty string, it effectively removes the matched element from the document.) ['transform_outer_var', VAR]
-
Collects the text contents of the matched element and all of its descendants in a string and passes it to the runtime variable VAR, which must be a code reference (or an object with an overloaded
&{}
operator). The returned string replaces the entire matched element. ['remove_attribute', ATTR_NAMES]
-
Removes all attributes from the matched element whose names are listed in ATTR_NAMES, which must be a list of strings.
['remove_all_attributes']
-
Removes all attributes from the matched elements.
['replace_all_attributes', ATTR_HASHREF]
-
Removes all attributes from the matched elements and creates new attributes based on ATTR_HASHREF, which must be a reference to a hash. Its keys are attribute names; its values are array references with two elements: The first is either the string
text
, in which case the second element is the attribute value as a string, or the stringvar
, in which case the second element is a variable name and the attribute value is substituted in at runtime.For example:
['replace_all_attributes', { class => [text => 'button cta-1'], title => [var => 'btn_title'], }]
This specifies that the matched element should only have two attributes:
class
, with a value ofbutton cta-1
, andtitle
, whose final value will come from the runtime variablebtn_title
. ['set_attribute_text', ATTR, STR]
-
Creates an attribute named ATTR with a value of STR (a string) in the matched element. If an attribute of that name already exists, it is replaced.
For example:
['set_attribute_text', href => 'https://example.com/']
['set_attribute_text', HASHREF]
-
If you want to set multiple attributes at once, you can this form. The keys of HASHREF specify the attribute names, and the values specify the attribute values.
For example:
['set_attribute_text', { src => $src, alt => $alt, title => $title }] # is equivalent to: ['set_attribute_text', src => $src], ['set_attribute_text', alt => $alt], ['set_attribute_text', title => $title],
['set_attribute_var', ATTR, VAR]
-
Creates an attribute named ATTR whose value comes from VAR (a runtime variable) in the matched element. If an attribute of that name already exists, it is replaced.
For example:
['set_attribute_var', href => 'target_url']
['set_attribute_var', HASHREF]
-
If you want to set multiple attributes at once, you can this form. The keys of HASHREF specify the attribute names, and the values specify the names of runtime variables from which the attribute values will be taken.
For example:
['set_attribute_var', { src => 'img_src', alt => 'img_alt', title => 'img_title' }] # is equivalent to: ['set_attribute_var', src => 'img_src'], ['set_attribute_var', alt => 'img_alt'], ['set_attribute_var', title => 'img_title'],
['set_attributes', ATTR_HASHREF]
-
Works exactly like "
['replace_all_attributes', ATTR_HASHREF]
", but without removing any existing attributes from the matched element.For example:
['set_attributes', { class => [text => 'button cta-1'], title => [var => 'btn_title'], }]
This specifies that the matched element should have two attributes:
class
, with a value ofbutton cta-1
, andtitle
, whose final value will come from the runtime variablebtn_title
. All other attributes remain unchanged. ['transform_attribute_sub', ATTR, SUB]
-
Calls SUB, which must be a code reference (or an object with an overloaded
&{}
operator), with the value of the attribute named ATTR in the matched element. If there is no such attribute,undef
is passed instead.The return value, normally a string, is used as the new value for ATTR. However, if SUB returns
undef
instead, the attribute is removed entirely. ['transform_attribute_var', ATTR, VAR]
-
Calls the runtime variable VAR, whose value must be a code reference (or an object with an overloaded
&{}
operator), with the value of the attribute named ATTR in the matched element. If there is no such attribute,undef
is passed instead.The return value, normally a string, is used as the new value for ATTR. However, if VAR returns
undef
instead, the attribute is removed entirely. ['add_attribute_word', ATTR, WORDS]
-
Takes the attribute named ATTR from the matched element and treats it as a list of whitespace-separated words. Any words from WORDS (a list of strings) that are not already present in ATTR will be added to it. If the matched element has no ATTR attribute, it is treated as an empty list (thus all WORDS are added).
As a side effect, duplicate words in the original attribute value may be removed.
['remove_attribute_word', ATTR, WORDS]
-
Takes the attribute named ATTR from the matched element and treats it as a list of whitespace-separated words. Any words from WORDS (a list of strings) that are present in ATTR will be removed from it. If the resulting value of ATTR is empty, the attribute is removed entirely. If the matched element has no ATTR attribute to begin with, nothing changes.
As a side effect, duplicate words in the original attribute value may be removed.
['add_class', WORDS]
-
Adds the words in WORDS (a list of strings) to the
class
attribute of the matched element (unless they are already present there). Equivalent to['add_attribute_word', 'class', WORDS]
. ['remove_class', WORDS]
-
Removes the words in WORDS (a list of strings) from the
class
attribute of the matched element (if they exist there). Equivalent to['remove_attribute_word', 'class', WORDS]
. ['repeat_outer', VAR, ACTIONS?, RULES]
-
Clones the matched element (along with its descendants), once for each element of the runtime variable VAR, which must contain an array of variable environments. Each copy of the matched element has RULES (a list of processing rules) applied to it, with variables looked up in the corresponding environment taken from VAR.
For example:
['repeat_outer', 'things', ['.name', ['replace_inner_var', 'name']], ['.phone', ['replace_inner_var', 'phone']], ]
This specifies that the matched element should be repeated once for each element of the
things
variable. In each copy, elements with a class ofname
should have their contents replaced by the value of thename
variable in the current environment (i.e. the current element ofthings
), and elements with a class ofphone
should have their contents replaced by the value of thephone
variable in the current loop environment.The optional ACTIONS argument, if present, is a reference to a reference to an array of actions (yes, that's a reference to a reference). It specifies what to do with the matched element itself within the context of the repetition.
For example, consider the following rule:
['.foo' => ['set_attribute_var', title => 'title'], ['replace_inner_var', 'content'], ['repeat_outer', 'things', ... ], ]
This says that elements with a class of
foo
should have theirtitle
attribute set to the value of the string variabletitle
and their text content replaced by the value of the string variablecontent
, and then be repeated as directed by the array variablethings
. While this will clone the element as many times as there are elements inthings
, the clones will all have the same attributes and content.On the other hand:
['.foo' => ['repeat_outer', 'things', \[ ['set_attribute_var', title => 'title'], ['replace_inner_var', 'content'], ], ... ], ]
With this rule, the
title
attribute and contents of elements with classfoo
are taken from thetitle
andcontent
(sub-)variables insidethings
. That is, the variable referencestitle
andcontent
are scoped within the loop. This way each copy of the matched element will be different.As a special case, if the ACTIONS list only contains one action, the outer array can be omitted. That is, instead of a reference to an array reference of actions, you can use a reference to an action:
['repeat_outer', 'things', \[ ['replace_inner_var', 'content'], ], ... ] # can be simplified to: ['repeat_outer', 'things', \['replace_inner_var', 'content'], ... ]
['repeat_inner', VAR, RULES]
-
Clones the descendants of the matched element (but not the element itself), once for each element of the runtime variable VAR, which must contain an array of variable environments. Each copy of the descendants has RULES (a list of processing rules) applied to it, with variables looked up in the corresponding environment taken from VAR.
This is very similar to "
['repeat_outer', VAR, ACTIONS?, RULES]
", with the following differences:The matched element acts as a list container and is not repeated.
The ACTIONS argument is not supported.
The RULES list may contain the special "
['separator']
" action, which is only allowed in the context ofrepeat_inner
.
['separator']
-
This action is only available within a "
['repeat_inner', VAR, RULES]
" section. It indicates that the matched element is to be removed from the first copy of the repeated elements. The results are probably not useful unless the matched element is the first child of the parent whose contents are repeated.For example, consider the following template code:
<div id="list"> <hr class="sep"> <p class="c1">other stuff</p> <p class="c2">more stuff</p> </div>
... with this set of rules:
['#list' => ['repeat_inner', 'things', ['.c1' => [...]], ['.c2' => [...]], ['.sep' => ['separator']], ], ]
Since the
hr
element targeted by theseparator
action occurs at the beginning of the section, it acts as a separator: It will not appear in the first copy of the section, but every following copy will include it. The result will look Like this:<div id="list"> <p class="c1">...</p> <p class="c2">...</p> <hr class="sep"> <p class="c1">...</p> <p class="c2">...</p> <hr class="sep"> <p class="c1">...</p> <p class="c2">...</p> ... </div>
apply_to_html
my $template = $blitz->apply_to_html($name, $html_code);
Applies the processing rules (added in the constructor or via "add_rules") to the specified source document. The first argument is a purely informational string; it is used to refer to the document in error messages and the like. The second argument is the HTML code of the source document. The returned value is an instance of HTML::Blitz::Template, which see.
There are some restrictions on the HTML code you can pass in. This module does not implement the full HTML specification; in particular, implicit tags of any kind are not supported. For example, the following fragment is valid HTML:
<div>
<p> A
<p> B
<p> C
</div>
But HTML::Blitz requires you to write this instead:
<div>
<p> A </p>
<p> B </p>
<p> C </p>
</div>
This is because implicit closing tags are not supported. In fact, HTML::Blitz thinks <p> A <p> B </p> C </p>
is valid HTML code containing a p
element nested within another p
. (It is not; p
elements don't nest and the second </p>
tag should be a syntax error. So don't do that.)
Similarly, a real HTML parser would not create tr
elements as direct children of a table
:
<table
<tr><td>A</td></tr>
</table>
Here an implicit tbody
element is supposed to be inserted instead:
<table
<tbody>
<tr><td>A</td></tr>
</tbody>
</table>
HTML::Blitz does not do that. If you have a rule with a tbody
selector, it will only apply to elements explicitly written out in the source document.
In other matters, HTML::Blitz tries to follow HTML parsing rules closely. For example, it knows about void elements (i.e. elements that have no content), like br
, img
, or input
. Such elements do not have closing tags:
<!-- this is a syntax error; you cannot "close" a <br> tag -->
<br></br>
It is not generally possible to have a self-closing opening tag:
<!-- syntax error: -->
<div />
<!-- you need to write this instead: -->
<div></div>
However, a trailing slash in the opening tag is accepted (and ignored) in void elements. The following are all equivalent:
<br>
<br/>
<br />
Another exception applies to descendants of math
and svg
elements, which follow slightly different rules:
<svg>
<!-- this is OK; it is parsed as if it were <circle></circle> -->
<circle/>
</svg>
<!-- this is a syntax error: attempt to self-close a non-void tag outside of svg/math -->
<circle />
The (utterly bonkers) special parsing rules for script
elements are faithfully implemented:
<!-- OK: -->
<script> /* <!-- */ </script>
<!-- OK: -->
<script> /* <script> <!-- */ </script>
<!-- still OK (script containing raw "</script>" text): -->
<script> /* <!-- <script> </script> --> */ </script>
<!-- still OK: -->
<script> /* <!-- <script> --> */ </script>
Attributes may contain whitespace around =
:
<img src = "kitten.jpg" alt = "photo of a kitten">
Attribute values don't need to be quoted if they don't contain whitespace or "special" characters (one of <>="'`
):
<img src=kitten.jpg alt="photo of a kitten">
<img src = kitten.jpg alt = photo of a kitten>
Attributes without values are allowed (and implicitly assigned the empty string as a value):
<input disabled class>
<!-- is equivalent to -->
<input disabled="" class="">
apply_to_file
my $template = $blitz->apply_to_file($filename);
A convenience wrapper around "apply_to_html". It reads the contents of $filename
(which must be UTF-8 encoded) and calls apply_to_html($filename, $contents)
.
EXAMPLES
The following is a complete program:
use strict;
use warnings;
use HTML::Blitz ();
my $template_html = <<'EOF';
<!DOCTYPE html>
<html>
<head>
<title>@@@ Hello, people!</title>
</head>
<body>
<h1 id="greeting">@@@ placeholder heading</h1>
<div id="list">
<hr class="between">
<p>
Name: <span class="name">@@@Bob</span> <br>
Age: <span class="age">@@@42</span>
</p>
</div>
</body>
</html>
EOF
my $blitz = HTML::Blitz->new({
# sanity check: die() if any template parts marked '@@@' above are not
# replaced by processing rules
dummy_marker_re => qr/\@\@\@/,
});
$blitz->add_rules(
[ 'html' => ['set_attribute_text', lang => 'en'] ],
[ 'title, #greeting' => ['replace_inner_var', 'title'] ],
[ '#list' =>
[ 'repeat_inner', 'people',
[ '.between' => ['separator'] ],
[ '.name' => ['replace_inner_var', 'name'] ],
[ '.age' => ['replace_inner_var', 'age'] ],
],
],
);
my $template = $blitz->apply_to_html('(inline document)', $template_html);
my $template_fn = $template->compile_to_sub;
my $data = {
title => "Hello, friends, family & other creatures of the sea!",
people => [
{ name => 'Edward', age => 17 },
{ name => 'Marvin', age => 510_119_077_042 },
{ name => 'Bronze', age => '<redacted>' },
],
};
my $html = $template_fn->($data);
print $html;
It produces the following output:
<!DOCTYPE html>
<html lang=en>
<head>
<title>Hello, friends, family & other creatures of the sea!</title>
</head>
<body>
<h1 id=greeting>Hello, friends, family & other creatures of the sea!</h1>
<div id=list>
<p>
Name: <span class=name>Edward</span> <br>
Age: <span class=age>17</span>
</p>
<hr class=between>
<p>
Name: <span class=name>Marvin</span> <br>
Age: <span class=age>510119077042</span>
</p>
<hr class=between>
<p>
Name: <span class=name>Bronze</span> <br>
Age: <span class=age><redacted></span>
</p>
</div>
</body>
</html>
RATIONALE
(I.e. why does this module exist?)
Template systems like Template::Toolkit are both powerful and general. In my opinion, that's a disadvantage: TT is both too powerful and too stupid for its own good. Since TT embeds its own programming language that can call arbitrary methods in Perl, it is possible to write "templates" that send their own database queries, iterate over resultsets, and do pretty much anything they want, completely bypassing the notional "controller" or "model" in an application. On the other hand, since TT knows nothing about the document structure it is generating (to TT it's all just strings being concatenated), you have to make sure to manually HTML escape every piece of text. Anything you overlook may end up being used for HTML injection and XSS exploits.
HTML::Zoom offers an intriguing alternative: Templates are plain HTML without any special template directives or variables at all. Instead, these static HTML documents are manipulated through structure-aware selectors and modification actions. Not only does this eliminate the disadvantages listed above, it also means you can automatically validate your templates to make sure they're well-formed HTML, which is basically impossible with TT.
There is only one tiny problem: HTML::Zoom is slow. A template page that seems fine when fed with 10 or 20 variables during development can suddenly crawl to a near halt when fed with an unexpectedly large dataset (with hundreds or thousands of entries) in production.
(In fact, I once had reports of a single page in a big web app taking 50-60 seconds to load, which is clearly unacceptable. At first I tried to optimize the database queries behind it, but without much success. That's when I realized that >85% of the time was spent in the HTML::Zoom based view, just slowly churning through the template, and nothing I changed in the code before that would significantly improve loading times.)
This module was born from an attempt to retain the general concept behind HTML::Zoom (which I'm a big fan of) while reimplementing every part of the API and code with a focus on pure execution speed.
PERFORMANCE
For benchmarking purposes I set up a simple HTML template and filled it with a medium-sized dataset consisting of 5 "categories" with 40 "products" each (200 in total). Each "product" had a custom image, description, and other bits of metadata.
To get a performance baseline, I timed a hand-written piece of Perl code consisting only of string constants, variables, calls to encode_entities
(from HTML::Entities), concatenation, and nested loops. Everything was hard-coded; nothing was modularized or factored out into subroutines.
Against this, I timed a few template systems (HTML::Blitz, HTML::Zoom, Template::Toolkit) as well as HTML::Blitz::Builder, which is rather the opposite of a template system.
Results:
- baseline
-
457/s (0.0022s per iteration), 100% (of baseline performance, the theoretical maximum)
- HTML::Blitz
-
392/s (0.0026s per iteration), 85.8%
- Template::Toolkit
-
48.0/s (0.0208s per iteration), 10.5%
- HTML::Blitz::Builder
-
40.8/s (0.0245s per iteration), 8.9%
- HTML::Zoom
-
1.39/s (0.7194s per iteration), 0.3%
Conclusions:
HTML::Zoom is slooooooow. Using it in anything but the most simple cases has a noticeable impact on performance.
HTML::Blitz is orders of magnitude faster. It can easily outperform HTML::Zoom by a factor of 200 or 300. A dataset that might take HTML::Blitz 20 milliseconds to zip through would lock up HTML::Zoom for over 5 seconds.
HTML::Blitz is competitive with hand-written code that sacrifices all semblance of maintainability for speed. In fact, it still runs at 80%-90% of that speed.
WHY THE NAME
I'm German, and Blitz is the German word for "lightning" or "flash" (or "thunderbolt"). Because the main motivation behind this module is performance, I wanted something that represents speed. Something lightning-fast, in fact (that's blitzschnell in German). I didn't want to use the English name "Flash" because that name is already taken by the infamous "Flash Player" browser plugin (even if it is currently dead).
The second reason is also connected to speed: I wanted a template system that assembles pages as effortlessly and efficiently as copying around blocks of memory, with minimal additional computation. Something roughly like a bit blit ("bit block transfer") operation. HTML::Blitz "blits" in the sense that it efficiently transfers blocks of HTML.
The third reason relates to my frustration with HTML::Zoom's performance. When I was struggling with HTML::Zoom, fruitlessly trying to come up with ways to optimize or work around its code, I remembered a funny coincidence: It just so happens that (Professor) Zoom is the name of a supervillain in the superhero comic books published by DC Comics. He is the archenemy of the Flash ("the fastest man alive"), whose main ability is super-speed. When the Flash comics were published in Germany in the 1970s and 1980s, his name was translated as der Rote Blitz ("the Red Flash"). Thus: "Blitz" is the hero that triumphs over "Zoom" through superior speed. :-)
AUTHOR
Lukas Mai, <lmai at web.de>
COPYRIGHT & LICENSE
Copyright 2022 Lukas Mai.
This module is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.