NAME

HTML::Stream - HTML output stream class, and some markup utilities

DESCRIPTION

This module provides you with an object-oriented (and subclassable) way of outputting HTML. Basically, you open up an "HTML stream" on an existing filehandle, and then do all of your output to the HTML stream (you can intermix HTML-stream-output and ordinary-print-output, if you like).

Here's small sample of the different ways you can use this module:

use HTML::Stream;
$HTML = new HTML::Stream \*STDOUT;

# The vanilla interface...
tag  $HTML 'A', HREF=>"$href";
tag  $HTML 'IMG', SRC=>"logo.gif", ALT=>"LOGO";
text $HTML "My caption!";
tag  $HTML '_A';
text $HTML $a_lot_of_text;

# The chocolate interface (with whipped cream)...
$HTML -> A(HREF=>"$href")
      -> IMG(SRC=>"logo.gif", ALT=>"LOGO")
      -> t("My caption!")
      -> _A
      -> t($a_lot_of_text);

# The strawberry interface...
output $HTML [A, HREF=>"$href"], 
             [IMG, SRC=>"logo.gif", ALT=>"LOGO"],
             "My caption!",
             [_A];
output $HTML $a_lot_of_text;

Function interface

Let's start out with the simple stuff. This module provides a collection of non-OO utility functions for escaping HTML text and producing HTML tags, like this:

use HTML::Stream qw(:funcs);        # imports functions from @EXPORT_OK

print html_tag(A, HREF=>$url);
print '© 1996 by', html_escape($myname), '!';
print html_tag('/A');

By the way: that last line could be rewritten as:

print html_tag(_A);

And if you need to get a parameter in your tag that doesn't have an associated value, supply the undefined value (not the empty string!):

print html_tag(TD, NOWRAP=>undef, ALIGN=>'LEFT');

     <TD NOWRAP ALIGN=LEFT>

print html_tag(IMG, SRC=>'logo.gif', ALT=>'');

     <IMG SRC="logo.gif" ALT="">

There are also some routines for reversing the process, like:

$text = "This <i>isn't</i> &quot;fun&quot;...";    
print html_unmarkup($text);
   
     This isn't &quot;fun&quot;...
  
print html_unescape($text);
   
     This isn't "fun"...

Yeah, yeah, yeah, I hear you cry. We've seen this stuff before. But wait! There's more...

OO interface, vanilla

Using the function interface can be tedious... so we also provide an "HTML output stream" class. Messages to an instance of that class generally tell that stream to output some HTML. Here's the above example, rewritten using HTML streams:

use HTML::Stream;
$HTML = new HTML::Stream \*STDOUT;

tag  $HTML 'A', HREF=>$url;
ent  $HTML 'copy';
text $HTML " 1996 by $myname!";
tag  $HTML '_A';

Or, if indirect-object syntax ain't your thang:

$HTML->tag(A, HREF=>$url);
$HTML->ent('copy');
$HTML->text(" 1996 by $myname!");
$HTML->tag(_A);

As you've probably guessed:

ent()      Outputs an HTML entity, like C<&copy;>.
tag()      Outputs an ordinary tag, like <A>, possibly with parameters.
           The parameters will all be HTML-escaped automatically.
text()     Outputs some text, which will be HTML-escaped.

It you're not using indirect-object syntax, you might prefer to use t() and e() instead of text() and ent(): they are absolutely identical... just shorter to type:

$HTML -> tag(A, HREF=>$url);
$HTML -> e('copy');
$HTML -> t(" 1996 by $myname!");
$HTML -> tag(_A);

Now, it wouldn't be nice to give you those text() and ent() shortcuts without giving you one for tag(), would it? Of course not...

OO interface, chocolate

The known HTML tags are even given their own tag-methods, compiled on demand... so the above could be written like this:

$HTML -> A(HREF=>$url);
$HTML -> e('copy');
$HTML -> t(" 1996 by $myname!");
$HTML -> _A;

As you've probably guessed:

A(HREF=>$url)   ==   tag(A, HREF=>$url)   ==   <A HREF="/the/url">
_A              ==   tag(_A)              ==   </A>

All such "tag-methods" use the tagname in all-uppercase. A "_" prefix on any tag-method means that an end-tag is desired. The "_" was chosen for several reasons: (1) it's short and easy to type, (2) it doesn't produce much visual clutter to look at, (3) _TAG looks a little like /TAG because of the straight line.

  • I know, I know... it looks like a private method. You get used to it. Really.

I should stress that this module will only auto-create tag methods for known HTML tags. So you're protected from typos like this (which will cause a fatal exception at run-time):

$HTML -> IMGG(SRC=>$src);

(You're not yet protected from illegal tag parameters, but it's a start, ain't it?)

If you need to make a tag known (sorry, but this is currently a global operation, and not stream-specific), do this:

HTML::Stream->accept_tag('MARQUEE');     # for you MSIE fans...

There is no corresponding "reject_tag". I thought and thought about it, and could not convince myself that such a method would do anything more useful that cause other people's modules to suddenly stop working because some bozo function decided to reject the FONT tag.

OO interface, with whipped cream

In the grand tradition of C++, output method chaining is supported in both the Vanilla Interface and the Chocolate Interface. So you can (and probably should) say:

$HTML -> A(HREF=>$url) 
      -> e('copy') -> t("1996 by $myname!") 
      -> _A;

But wait... there's one more flavor...

OO interface, strawberry

I was jealous of the compact syntax of HTML::AsSubs, but I didn't want to worry about clogging the namespace with a lot of functions like p(), a(), etc. (especially when markup-functions like tr() conflict with existing Perl functions). So I came up with this:

output $HTML [A, HREF=>$url], "Here's my $caption", [_A];

Conceptually, arrayrefs are sent to html_tag(), and strings to html_escape().

Newlines

As special cases, some tag-methods (like P, _P, and BR) all cause newlines to be output before and/or after the tag, so your HTML is a little more readable when you do stuff like "view source" on a browser. So:

$HTML -> HTML 
      -> HEAD  
      -> TITLE -> t("Hello!") -> _TITLE 
      -> _HEAD
      -> BODY(BGCOLOR=>'#808080');

Actually produces:

<HTML><HTML>
<HEAD>
<TITLE>Hello!</TITLE>
</HEAD>
<BODY BGCOLOR="#808080">

(This will improve slightly as time goes on). You can also output newline explicitly via the special nl method in the Chocolate Interface:

$HTML->nl;     # one newline
$HTML->nl(6);  # six newlines

Entities

As shown above, You can use the ent() (or e()) method to output an entity:

$HTML->t('Copyright ')->e('copy')->t(' 1996 by Me!');

But this can be a pain, particularly for Europeans:

$HTML -> t('Copyright ') 
      -> e('copy') 
      -> t(' 1996 by Fran') -> e('ccedil') -> t('ois, Inc.!');

Sooooooooo...

Changing the way text is escaped

The default "autoescape" behavior of an HTML stream can be a drag if you've got a lot character entities that you want to output. So here's how you can use the autoescape() method to change the way an HTML::Stream works at any time:

$HTML->autoescape('ALL');        # escapes [<>"&] - the default
$HTML->autoescape('NON_ENT');    # escapes [<>"] only, and not [&]

If you can also install your own autoescape function (note that you might very well want to install it for just a little bit only, and then de-install it):

    sub my_autoescape {
        my $text = shift;
	$text = HTML::Stream::html_escape_all($text);   # start with default
        $text =~ s/\(c\)/&copy;/ig;        # (C) becomes copyright
        $text =~ s/\\,(c)/\&$1cedil;/ig;   # \,c becomes a cedilla
 	$text;
    }

    # Start using my autoescape:
    my $oldesc = $HTML->autoescape(\&my_autoescape);      # use sub refs ONLY!
    $HTML-> ADDRESS;
    $HTML-> IMG(SRC=>'logo.gif', ALT=>'Fran\,cois, Inc');
    output $HTML 'Copyright (C) 1996 by Fran\,cois, Inc.!';
    $HTML->_ADDRESS;
    
    # Stop using my autoescape:
    $HTML->autoescape($oldesc);

By the way, the following are equivalent:

$HTML->autoescape('ALL')
$HTML->autoescape(\&HTML::Stream::escape_all);

No arguments to autoescape() returns the current autoescape function.

PERFORMANCE

Slower than I'd like. Both the output() method and the various "tag" methods seem to run about 5 times slower than the old just-hardcode-the-darn stuff approach. That is, in general, this:

### Approach #1...
tag  $HTML 'A', HREF=>"$href";
tag  $HTML 'IMG', SRC=>"logo.gif", ALT=>"LOGO";
text $HTML "My caption!";
tag  $HTML '_A';
text $HTML $a_lot_of_text;

And this:

    ### Approach #2...
    output $HTML [A, HREF=>"$href"], 
	         [IMG, SRC=>"logo.gif", ALT=>"LOGO"],
		 "My caption!",
		 [_A];
    output $HTML $a_lot_of_text;

And this:

    ### Approach #3...
    $HTML -> A(HREF=>"$href")
	  -> IMG(SRC=>"logo.gif", ALT=>"LOGO")
	  -> t("My caption!")
	  -> _A
          -> t($a_lot_of_text);

Each run about 5x slower than this:

  ### Approach #4...
  print '<A HREF="', html_escape($href), '>',
        '<IMG SRC="logo.gif" ALT="LOGO">',
	  "My caption!",
        '</A>';
  print html_escape($a_lot_of_text);

Of course, I'd much rather use any of first three (especially #3) if I had to get something done right in a hurry. Or did you not notice the typo in approach #4? ;-)

(BTW, thanks to Benchmark:: for allowing me to... er... benchmark stuff.)

WHY IN THE WORLD DID I WRITE THIS?

I was just mucking about with different ways of generating large HTML documents, seeing which ways I liked the most/least.

VERSION

$Revision: 1.19 $

AUTHOR

Eryq, eryq@rhine.gsfc.nasa.gov .

Enjoy.