NAME

HTML::Stream - HTML output stream class, and some markup utilities

DESCRIPTION

The HTML::Stream module provides you with an object-oriented (and subclassable) way of outputting HTML. Basically, you open up an "HTML stream" on an existing filehandle, and then do all of your output to the HTML stream. You can intermix HTML-stream-output and ordinary-print-output, if you like.

Here's small sample of the different ways you can use this module:

use HTML::Stream;
$HTML = new HTML::Stream \*STDOUT;

# The vanilla interface...
tag  $HTML 'A', HREF=>"$href";
tag  $HTML 'IMG', SRC=>"logo.gif", ALT=>"LOGO";
text $HTML "My caption!";
tag  $HTML '_A';
text $HTML $a_lot_of_text;

# The chocolate interface (with whipped cream)...
$HTML -> A(HREF=>"$href")
      -> IMG(SRC=>"logo.gif", ALT=>"LOGO")
      -> t("My caption!")
      -> _A
      -> t($a_lot_of_text);

# The strawberry interface...
output $HTML [A, HREF=>"$href"], 
             [IMG, SRC=>"logo.gif", ALT=>"LOGO"],
             "My caption!",
             [_A];
output $HTML $a_lot_of_text;

There's even a small built-in subclass, HTML::Stream::Latin1, which can handle Latin-1 input right out of the box. But all in good time...

Function interface

Let's start out with the simple stuff. This module provides a collection of non-OO utility functions for escaping HTML text and producing HTML tags, like this:

use HTML::Stream qw(:funcs);        # imports functions from @EXPORT_OK

print html_tag(A, HREF=>$url);
print '© 1996 by', html_escape($myname), '!';
print html_tag('/A');

By the way: that last line could be rewritten as:

print html_tag(_A);

And if you need to get a parameter in your tag that doesn't have an associated value, supply the undefined value (not the empty string!):

print html_tag(TD, NOWRAP=>undef, ALIGN=>'LEFT');

     <TD NOWRAP ALIGN=LEFT>

print html_tag(IMG, SRC=>'logo.gif', ALT=>'');

     <IMG SRC="logo.gif" ALT="">

There are also some routines for reversing the process, like:

$text = "This <i>isn't</i> &quot;fun&quot;...";    
print html_unmarkup($text);
   
     This isn't &quot;fun&quot;...
  
print html_unescape($text);
   
     This isn't "fun"...

Yeah, yeah, yeah, I hear you cry. We've seen this stuff before. But wait! There's more...

OO interface, vanilla

Using the function interface can be tedious... so we also provide an "HTML output stream" class. Messages to an instance of that class generally tell that stream to output some HTML. Here's the above example, rewritten using HTML streams:

use HTML::Stream;
$HTML = new HTML::Stream \*STDOUT;

tag  $HTML 'A', HREF=>$url;
ent  $HTML 'copy';
text $HTML " 1996 by $myname!";
tag  $HTML '_A';

Or, if indirect-object syntax ain't your thang:

$HTML->tag(A, HREF=>$url);
$HTML->ent('copy');
$HTML->text(" 1996 by $myname!");
$HTML->tag(_A);

As you've probably guessed:

ent()      Outputs an HTML entity, like the &copy; or &lt; .
tag()      Outputs an ordinary tag, like <A>, possibly with parameters.
           The parameters will all be HTML-escaped automatically.
text()     Outputs some text, which will be HTML-escaped.

It you're not using indirect-object syntax, you might prefer to use t() and e() instead of text() and ent(): they are absolutely identical... just shorter to type:

$HTML -> tag(A, HREF=>$url);
$HTML -> e('copy');
$HTML -> t(" 1996 by $myname!");
$HTML -> tag(_A);

Now, it wouldn't be nice to give you those text() and ent() shortcuts without giving you one for tag(), would it? Of course not...

OO interface, chocolate

The known HTML tags are even given their own tag-methods, compiled on demand... so the above could be written like this:

$HTML -> A(HREF=>$url);
$HTML -> e('copy');
$HTML -> t(" 1996 by $myname!");
$HTML -> _A;

As you've probably guessed:

A(HREF=>$url)   ==   tag(A, HREF=>$url)   ==   <A HREF="/the/url">
_A              ==   tag(_A)              ==   </A>

All such "tag-methods" use the tagname in all-uppercase. A "_" prefix on any tag-method means that an end-tag is desired. The "_" was chosen for several reasons: (1) it's short and easy to type, (2) it doesn't produce much visual clutter to look at, (3) _TAG looks a little like /TAG because of the straight line.

  • I know, I know... it looks like a private method. You get used to it. Really.

I should stress that this module will only auto-create tag methods for known HTML tags. So you're protected from typos like this (which will cause a fatal exception at run-time):

$HTML -> IMGG(SRC=>$src);

(You're not yet protected from illegal tag parameters, but it's a start, ain't it?)

If you need to make a tag known (sorry, but this is currently a global operation, and not stream-specific), do this:

HTML::Stream->accept_tag('MARQUEE');     # for you MSIE fans...

There is no corresponding "reject_tag". I thought and thought about it, and could not convince myself that such a method would do anything more useful that cause other people's modules to suddenly stop working because some bozo function decided to reject the FONT tag.

OO interface, with whipped cream

In the grand tradition of C++, output method chaining is supported in both the Vanilla Interface and the Chocolate Interface. So you can (and probably should) say:

$HTML -> A(HREF=>$url) 
      -> e('copy') -> t("1996 by $myname!") 
      -> _A;

But wait... there's one more flavor...

OO interface, strawberry

I was jealous of the compact syntax of HTML::AsSubs, but I didn't want to worry about clogging the namespace with a lot of functions like p(), a(), etc. (especially when markup-functions like tr() conflict with existing Perl functions). So I came up with this:

output $HTML [A, HREF=>$url], "Here's my $caption", [_A];

Conceptually, arrayrefs are sent to html_tag(), and strings to html_escape().

Newlines

As special cases, some tag-methods (like P, _P, and BR) all cause newlines to be output before and/or after the tag, so your HTML is a little more readable when you do stuff like "view source" on a browser. So:

$HTML -> HTML 
      -> HEAD  
      -> TITLE -> t("Hello!") -> _TITLE 
      -> _HEAD
      -> BODY(BGCOLOR=>'#808080');

Actually produces:

<HTML><HTML>
<HEAD>
<TITLE>Hello!</TITLE>
</HEAD>
<BODY BGCOLOR="#808080">

(This will improve slightly as time goes on). You can also output newline explicitly via the special nl method in the Chocolate Interface:

$HTML->nl;     # one newline
$HTML->nl(6);  # six newlines

Entities

As shown above, You can use the ent() (or e()) method to output an entity:

$HTML->t('Copyright ')->e('copy')->t(' 1996 by Me!');

But this can be a pain, particularly for Europeans:

$HTML -> t('Copyright ') 
      -> e('copy') 
      -> t(' 1996 by Fran') -> e('ccedil') -> t('ois, Inc.!');

Sooooooooo...

Changing the way text is escaped

The default "autoescape" behavior of an HTML stream can be a drag if you've got a lot character entities that you want to output, or if you're using the Latin-1 character set, or some other input encoding. Fortunately, you can use the autoescape() method to change the way a particular HTML::Stream works at any time.

First, here's a couple of special invocations:

$HTML->autoescape('ALL');        # escapes [<>"&] - the default
$HTML->autoescape('NON_ENT');    # escapes [<>"] only, and not [&]

You can also install your own autoescape function (note that you might very well want to install it for just a little bit only, and then de-install it):

    sub my_autoescape {
        my $text = shift;
	$text = HTML::Stream::escape_all($text);   # start with default
        $text =~ s/\(c\)/&copy;/ig;        # (C) becomes copyright
        $text =~ s/\\,(c)/\&$1cedil;/ig;   # \,c becomes a cedilla
 	$text;
    }

    # Start using my autoescape:
    my $oldesc = $HTML->autoescape(\&my_autoescape);      # use sub refs ONLY!
    $HTML-> ADDRESS;
    $HTML-> IMG(SRC=>'logo.gif', ALT=>'Fran\,cois, Inc');
    output $HTML 'Copyright (C) 1996 by Fran\,cois, Inc.!';
    $HTML->_ADDRESS;
    
    # Stop using my autoescape:
    $HTML->autoescape($oldesc);

If you find yourself in a situation where you're doing this a lot, a better way is to create a subclass of HTML::Stream which installs your custom function when constructed. For example, see the HTML::Stream::Latin1 example in this module, used as follows:

use HTML::Stream;

$HTML = new HTML::Stream::Latin1 \*STDOUT;
output $HTML "\253A right angle is 90\260, \277No?\273\n";

By the way, the following are equivalent:

$HTML->autoescape('ALL')
$HTML->autoescape(\&HTML::Stream::escape_all);

No arguments to autoescape() returns the current autoescape function.

Outputting HTML to things besides filehandles

As of Revision 1.21, you no longer need to supply new() with a filehandle: any object that responds to a print() method will do. Of course, this includes blessed FileHandles.

If you supply a GLOB reference (like \*STDOUT) or a string (like "Module::FH"), HTML::Stream will automatically create an invisible object for talking to that filehandle (I don't dare bless it into a FileHandle, since it'd get closed when the HTML::Stream is destroyed, and you might not like that).

You say you want to print to a string? For kicks and giggles, try this:

    package StringHandle;
    sub new {
	my $self = '';
	bless \$self, shift;
    }
    sub print {
        my $self = shift;
        $$self .= join('', @_);
    }
    
  
    package main;
    use HTML::Stream;
    
    my $SH = new StringHandle;
    my $HTML = new HTML::Stream $SH;
    $HTML -> H1 -> "<Hello & welcome!>" -> _H1;
    print "PRINTED STRING: ", $$SH, "\n";

Subclassing

This is where you can make your application-specific HTML-generating code much easier to look at. Consider this:

    package MY::HTML;
    @ISA = qw(HTML::Stream);
     
    sub Aside {
	$_[0] -> FONT(SIZE=>-1) -> I;
    }
    sub _Aside {
	$_[0] -> _I -> _FONT;
    }

Now, you can do this:

my $HTML = new MY::HTML \*STDOUT;

$HTML -> Aside
      -> t("Don't drink the milk, it's spoiled... pass it on...")
      -> _Aside;

If you're defining these markup-like, chocolate-interface-style functions, I recommend using mixed case with a leading capital. You probably shouldn't use all-uppercase, since that's what this module uses for real HTML tags.

PERFORMANCE

Slower than I'd like. Both the output() method and the various "tag" methods seem to run about 5 times slower than the old just-hardcode-the-darn stuff approach. That is, in general, this:

### Approach #1...
tag  $HTML 'A', HREF=>"$href";
tag  $HTML 'IMG', SRC=>"logo.gif", ALT=>"LOGO";
text $HTML "My caption!";
tag  $HTML '_A';
text $HTML $a_lot_of_text;

And this:

    ### Approach #2...
    output $HTML [A, HREF=>"$href"], 
	         [IMG, SRC=>"logo.gif", ALT=>"LOGO"],
		 "My caption!",
		 [_A];
    output $HTML $a_lot_of_text;

And this:

    ### Approach #3...
    $HTML -> A(HREF=>"$href")
	  -> IMG(SRC=>"logo.gif", ALT=>"LOGO")
	  -> t("My caption!")
	  -> _A
          -> t($a_lot_of_text);

Each run about 5x slower than this:

  ### Approach #4...
  print '<A HREF="', html_escape($href), '>',
        '<IMG SRC="logo.gif" ALT="LOGO">',
	  "My caption!",
        '</A>';
  print html_escape($a_lot_of_text);

Of course, I'd much rather use any of first three (especially #3) if I had to get something done right in a hurry. Or did you not notice the typo in approach #4? ;-)

(BTW, thanks to Benchmark:: for allowing me to... er... benchmark stuff.)

WHY IN THE WORLD DID I WRITE THIS?

I was just mucking about with different ways of generating large HTML documents, seeing which ways I liked the most/least.

CHANGE LOG

Version 1.27

Added built-in HTML::Stream::Latin1, which does a very simple encoding of all characters above ASCII 127.

Fixed bug in accept_tag(), where 'my' variable was shadowing argument. Thanks to John D Groenveld for the bug report and the patch.

Version 1.26

Start of history.

VERSION

$Revision: 1.29 $

AUTHOR

Eryq, eryq@rhine.gsfc.nasa.gov .

Enjoy.