NAME

Components.pod - Mason Developer's Manual

DESCRIPTION

This manual is written for content developers who know HTML and at least a little Perl. The goal is to write, run, and debug Mason components.

If you are the webmaster (or otherwise responsible for the Mason installation), you should also read HTML::Mason::Admin. There you will find FAQs about virtual site configuration, performance tuning, component caching, and so on.

I strongly suggest that you have a working Mason to play with as you work through these examples. Other component examples can be found in the samples/ directory.

WHAT ARE COMPONENTS?

The component - a mix of Perl and HTML - is Mason's basic building block and computational unit. Under Mason, web pages are formed by combining the output from multiple components. An article page for a news publication, for example, might call separate components for the company masthead, ad banner, left table of contents, and article body. Consider this layout sketch:

+---------+------------------+
|Masthead | Banner Ad        |
+---------+------------------+
|         |                  |
|+-------+|Text of Article ..|
||       ||                  |
||Related||Text of Article ..|
||Stories||                  |
||       ||Text of Article ..|
|+-------+|                  |
|         +------------------+
|         | Footer           |
+---------+------------------+

The top level component decides the overall page layout, perhaps with HTML tables. Individual cells are then filled by the output of subordinate components, one for the Masthead, one for the Footer, etc. In practice pages are built up from as few as one, to as many as twenty or more components.

This component approach reaps many benefits in a web environment. The first benefit is consistency: by embedding standard design elements in components, you ensure a consistent look and make it possible to update the entire site with just a few edits. The second benefit is concurrency: in a multi-person environment, one person can edit the masthead while another edits the table of contents. A last benefit is reuseability: a component produced for one site might be useful on another. You can develop a library of generally useful components to employ on your sites and to share with others.

Most components emit chunks of HTML. "Top level" components, invoked from a URL, represent an entire web page. Other, subordinate components emit smaller bits of HTML destined for inclusion in top level components.

Components receive form and query data from HTTP requests. When called from another component, they can accept arbitrary parameter lists just like a subroutine, and optionally return values. This enables a type of component that does not print any HTML, but simply serves as a function, computing and returning a result.

Mason actually compiles components down to Perl subroutines, so you can debug and profile component-based web pages with standard Perl tools that understand the subroutine concept, e.g. you can use the Perl debugger to step through components, and Devel::DProf to profile their performance.

IN-LINE PERL SECTIONS

Here is a simple component example:

% my ($noun, $timeofday) = ('World');
<%perl>
    my @time = split /[\s:]/, localtime;
    $timeofday = "evening";     # default if next tests fail
    if ( $time[3] < 12 ) {
        $timeofday = "morning";
    } elsif ( $time[3] > 12 and $time[3] < 18 ) {
        $timeofday = "afternoon";
    }
</%perl>
Good <% $timeofday %>, <% $noun %>!<BR>
How are ya?

After 6 pm, the output of this component is:

Good evening, World!
How are ya?

This short example demonstrates the three primary in-line Perl sections you can embed in your components. By "in-line" I mean these sections are generally embedded within HTML and execute in the order they appear. Other, specialized Perl sections are tied to component events like initialization and cleanup, argument definition, etc. Those are covered later in "Other Perl Sections".

The parsing rules for these Perl sections are as follows:

  1. Blocks of the form <% xxx %> are replaced with the result of evaluating xxx as a single Perl expression. These are often used for variable replacement. such as 'Hello, <% $name %>!'.

  2. Lines beginning with a '%' character are treated as Perl.

  3. Multiline blocks of Perl code can be inserted with the <%perl> .. </%perl> tag. The enclosed text is executed as Perl and the return value, if any, is discarded.

    The <%perl> tag is case-insensitive. It may appear anywhere in the text, and may span any number of lines. <%perl> blocks cannot be nested inside one another.

I've used bad form here for the sake of example; the leading '%' line could simply have been placed into the multi-line <%perl> section.

In addition to Perl code, Perl sections may also contain Mason commands. These keywords, identified by their mc_ prefix, collectively provide an interface to Mason services such as data caching, file includes, and so on. The majority of Mason commands are for advanced users, but a few (like mc_comp(), for calling other components) see widespread use. HTML::Mason::Commands is the reference for all Mason commands.

% lines

Most useful for conditional and loop structures - if, while, foreach, , etc. - as well as side-effect commands like assignments. Examples:

o Conditional code

% my $ua = $r->header_in('User-Agent');
% if ($ua =~ /msie/i) {
Welcome, Internet Explorer users
...
% } elsif ($ua =~ /mozilla/i) {
Welcome, Netscape users
...
% }

o HTML list formed from array

<ul>
% foreach $item (@list) {
<li><% $item %>
% }
</ul>

o HTML list formed from hash

<ul>
% while (my ($key,$value) = each(%ENV)) {
<li>
<b><% $key %></b>: <% $value %>
% }
</ul>

o HTML table formed from list of hashes

<table>
<tr>
% foreach my $h (@loh) {
<td><% $h->{foo} %></td>
<td bgcolor=#ee0000><% $h->{bar} %></td>
<td><% $h->{baz} %></td>
% }
</tr>
</table>

For more than three lines of Perl, consider using a <%perl> block.

&lt;% xxx %&gt;

Most useful for printing out variables, as well as more complex expressions. Examples:

Dear <% $name %>: We will come to your house at <% $address %> in the
fair city of <% $city %> to deliver your $<% $amount %> dollar prize!

The answer is <% ($y+8) % 2 %>.

You are <% $age<18 ? 'not' : '' %> permitted to enter this site.

For side-effect commands like assignments, consider using a % line or <%perl> block instead.

&lt;%perl&gt; xxx &lt;%/perl&gt;

Useful for Perl blocks of more than a few lines. For a very small block, consider using % lines.

HOW COMPONENTS ARE INVOKED

Components are invoked in two ways: top-level components respond directly to HTTP requests, while other, subordinate components are called more or less like subroutines, using Mason's mc_comp command. Top-level components must reside within the server's DocumentRoot, while other components can live anywhere within Mason's component root (these may or may not be the same).

Top-level Components

A top-level component is simply one that's called from an URL, such as:

http://www.foo.com/mktg/prods.html

Apache resolves this URL to a filename, e.g. /usr/local/www/htdocs/mktg/prods.html. Mason loads and executes that file as a component. It might in turn call other components and execute some Perl code, or it might be nothing more than static HTML.

dhandlers

What happens when a user requests a component that doesn't exist? In this case Mason scans backward through the URI, checking each directory for a component named dhandler ("default handler"). If found, the dhandler is invoked and is expected to use $r->path_info (the virtual location) as the parameter to some access function, perhaps a database lookup or location in another filesystem. In a sense, dhandlers are similar in spirit to Perl's AUTOLOAD feature; they are the "component of last resort" when a URL points to a non-existent component.

Consider the following URL, in which newsfeeds/ exists but not the subdirectory LocalNews nor the component locStory1.html:

http://myserver/newsfeeds/LocalNews/locStory1.html,

In this case Mason constructs the following search path:

/newsfeeds/LocalNews/locStory1.htm  => no such thing
/newsfeeds/LocalNews/dhandler       => no such thing
/newsfeeds/dhandler                 => found! (search ends)
/dhandler

The found dhandler would read "/LocalNews/locStory1.html" from $r->path_info and use it as a retrieval key. Optionally, the Mason command mc_dhandler_arg() returns the same path_info stripped of the leading slash ("LocalNews/locStory1.html"). This is sometimes more useful that the absolute path returned by $r->path_info.

Components as Subroutines

Mason pages often are built not from a single component, but from multiple components that call each other in a hierarchical fashion.

To call one component from another, use Mason's mc_comp command:

mc_comp (compPath, name=>value, ...[, STORE=>ref ])

compPath:

The component path. With a leading '/', the path is relative to the component root (comp_root). Otherwise, it is relative to the location of the calling component.

name=>value pairs:

Parameters are passed as one or more name=>value pairs, e.g. player=>'M. Jordan'.

The optional STORE parameter takes a scalar reference as an argument, and tells the component to direct its output into the named variable instead of standard output. This is analogous to the difference between sprintf and printf. For example:

<% mc_comp('/shared/mastHead', color=>'salmon', STORE=>\$mh_text) %>

Return Values and Context

Components generally fall into one of two categories: HTML generators, and functions that compute a value. The latter behave like normal Perl functions, while HTML generators (far more common) rarely return values--in the name of performance they stream their output immediately to the browser.

Mason in fact adds a return undef to the bottom of each component to provide an empty default return value. For HTML-generating components, this allows the convenient idiom:

<% mc_comp('foo') %>

which, if you think about it, actually prints two things: foo internally prints some HTML, while <%mc_comp('foo')%> prints the return value of foo (undef).

To return your own value from a component, you must use an explicit return statement. In this case the component behaves like a normal Perl subroutine with regard to return values and scalar/list context:

You are <% mc_comp('isNetscape') ? '' : 'NOT' %> using Netscape!

PASSING PARAMETERS

This section describes Mason's facilities for passing parameters to components (either from HTTP requests or mc_comp calls) and for accessing parameter values inside components.

In Component Calls

Any Perl data type can be passed in an mc_comp() call:

mc_comp('/sales/header', s=>'dog', l=>[2,3,4], h=>{a=>7,b=>8});

This command passes a scalar ($s), a list (@l), and a hash (%h). The list and hash must be passed as references, but they will be automatically dereferenced in the called component.

In HTTP requests

Consider a CGI-style URL with a query string:

http://www.foo.com/mktg/prods.html?str=dog&lst=2&lst=3&lst=4

or an HTTP request with some POST content. Mason automatically parses the GET/POST values and makes them available to the component as parameters.

In fact, internally Mason just treats an HTTP request as a first mc_comp call! In this case:

mc_comp('/mktg/prods.html',str=>'dog',lst=>[2,3,4])

assuming the component and document roots are the same.

Accessing Parameters

Component parameters, whether they come from GET/POST or an mc_comp call, can be accessed in two ways.

1. Declared named arguments: Components can define a <%perl_args%> section listing argument names, types, and default values. For example:

<%perl_args>
$a
@b
%c
$d=>5
@e=>('foo','baz')
%f=>(joe=>1,bob=>2)
</%perl_args>

Here, $a, @b, and %c are required arguments; the component generates an error if the caller leaves them unspecified. $d, @e, and %f are optional arguments; they are assigned the specified default values if unspecified. All the arguments are available as lexically scoped ("my") variables in the rest of the component.

2. %ARGS hash: This variable, always available, contains all of the parameters passed to the component. It is especially handy when there are many parameters or when parameter names are determined at run-time. %ARGS can be used whether or not you have a <%perl_args> section.

Here's how to pass all of a component's parameters to another component:

mc_comp ("template", %ARGS);

Parameter Passing Examples

The following examples illustrate the different ways to pass and receive parameters.

1. Passing a scalar id with value 5.

In a URL: /my/URL?id=5
In an mc_comp call: mc_comp ('/my/comp', id => 5)
In the called component, if there is a declared argument named...
  $id, then $id will equal 5
  @id, then @id will equal (5)
  %id, then an error occurs
In addition, $ARGS{id} will equal 5.

2. Passing a list colors with values red, blue, and green.

In a URL: /my/URL?colors=red&colors=blue&colors=green
In an mc_comp call: mc_comp ('/my/comp', colors => ['red', 'blue', 'green'])
In the called component, if there is a declared argument named...
  $colors, then $colors will equal ['red', 'blue', 'green']
  @colors, then @colors will equal ('red', 'blue', 'green')
  %colors, then an error occurs
In addition, $ARGS{colors} will equal ['red', 'blue', 'green'].

3. Passing a hash grades with pairs Alice => 92 and Bob => 87.

In a URL: /my/URL?grades=Alice&grades=92&grades=Bob&grades=87
In an mc_comp call: mc_comp ('/my/comp', grades => {Alice => 92, Bob => 87})
In the called component, if there is a declared argument named...
  $grades, then $grades will equal {Alice => 92, Bob => 87}
  @grades, then @grades will equal ('Alice', 92, 'Bob', 87)
  %grades, then %grades will equal (Alice => 92, Bob => 87)
In addition, $ARGS{grades} will equal {Alice => 92, Bob => 87}.

OTHER PERL SECTIONS

In this section we describe other specialized sections you can place in your component. Several are tied to phases of the component execution sequence, which goes something like this:

1. Initialize arguments
2. <%perl_init> section
3. Output HTTP headers (if not output already)
4. Primary section (HTML + embedded Perl sections)
5. <%perl_cleanup> section
<%perl_init> xxx </%perl_init>

Used for initialization code. For example: connecting to a database and selecting out rows; opening a file and reading its contents into a list.

Technically a <%perl_init> block is equivalent to a <%perl> block at the beginning of the component. However, there is an aesthetic advantage of placing this block at the end of the component rather than the beginning. In the following example, a database query is used to preload the @persons list-of-hashes; it lets us hide the technical details at the bottom.

<H2>Birthdays Next Week</H2>
<TABLE BORDER=1>
<TR><TH>Name</TH><TH>Birthday</TH></TR>
% foreach (@persons) {
    <TR><TD><%$_->{name}%></TD><TD><%$_->{birthday}%></TD></TR>
% }
</TABLE>

<%PERL_INIT>
# Assuming DBI/DBD and Date::Manip are already loaded ...
# Query MySQL for employees with birthdays next week.
# Results are stored in the @persons list-of-hashes.

my (@persons, $name, $birthday);    # local vars

# Calculate "MM-DD" dates for this and next Sunday
my $Sun = UnixDate(&ParseDate("Sunday"), "%m-%d");
my $nextSun = UnixDate(&DateCalc("Sunday", "+7 days"), "%m-%d");

my $dbh = DBI->connect('DBI:mysql:myDB', 'nobody' );
my $sth = $dbh->prepare(
   qq{ SELECT name, DATE_FORMAT(birthday, 'm-d')
       FROM emp
       WHERE DATE_FORMAT(birthday,'m-d') BETWEEN '$Sun' AND '$nextSun'
     } );
$sth->execute;		# other DBDs want this after the bind
$sth->bind_columns(undef, \($name, $birthday) );

while ($sth->fetch) {
    push (@persons, {name=>$name, birthday=>$birthday} );
}
</%PERL_INIT>

Since <%perl_init> sections fire before any HTTP headers are sent, they should do their work quickly to prevent dead time on the browser side.

<%perl_cleanup> xxx </%perl_cleanup>

Useful for cleanup code. For example: closing a database connection or closing a file handle.

Technically a <%perl_cleanup> block is equivalent to a <%perl> block at the end of the component, but has aesthetic value as marking a cleanup section.

Recall that the end of a component corresponds to the end of a subroutine block. Since Perl is so darned good at cleaning up stuff at the end of blocks, <%perl_cleanup> sections are rarely needed.

<%perl_args> xxx </%perl_args>

xxx contains a list of argument declarations, one per line. Each declaration contains a type character ($, @, or %), a name, and optionally '=>' followed by a default value. The default value must be a valid Perl expression of matching type (scalar, list, hash). See "Accessing Parameters" above for usage and examples.

\ at end of line

Useful for suppressing unwanted newlines before Perl lines and block tags. In HTML components, this is mostly useful for fixed width areas like <PRE> tags, since browsers ignore white space for the most part. An example:

<PRE>
foo
%if ($b == 2) {
bar
%}
baz
</PRE>

outputs

foo
bar
baz

because of the newlines on lines 1 and 3. (Lines 2 and 4 do not generate a newline because the entire line is taken by Perl.) To suppress the newlines:

<PRE>
foo\
%if ($b == 2) {
bar\
%}
baz
</PRE>

which prints

foobarbaz

The backslash has no special meaning outside this context. In particular, you cannot use it to escape a newline before a plain text line.

<%perl_doc> xxx </%perl_doc>

Most useful for a component's main documentation. One can easily write a program to sift through a set of components and pull out their <%perl_doc> blocks to form a reference page.

Can also be used for in-line comments, though I admit it is a somewhat cumbersome comment marker. Another option is '%#':

%# this is a comment

These comments differ from HTML comments in that they do not appear in the HTML.

<%perl_off>

Turns off processing of Perl sections; useful when documenting Mason itself from a component:

<%perl_off>
% This is an example of a Perl line.
<% This is an example of an expression block. %>
</%perl_off>

This works for almost everything, but doesn't let you output </%perl_off> itself! When all else fails, use mc_out():

%mc_out('The tags are <%perl_off> and </%perl_off>.');

DATA CACHING

Mason's mc_cache() and mc_cache_self() commands let components save and retrieve the results of computation for improved performance. Anything may be cached, from a block of HTML to a complex data structure.

Each component gets a private data cache. Except under special circumstances, one component does not access another component's cache. Each cached value may be set to expire under certain conditions or at a certain time.

To use data caching, your Mason installation must be configured with a good DBM package like Berkeley DB (DB_File) or GDBM. See HTML::Mason::Admin for more information.

Basic Usage

Here's the typical usage of mc_cache:

my $result = mc_cache(action=>'retrieve');
if (!defined($result)) {
    ... compute $result> ...
    mc_cache(action=>'store', value=>$result);
}

The first mc_cache call attempts to retrieve this component's cache value. If the value is available it is placed in $result. If the value is not available, $result is computed and stored in the cache by the second mc_cache call.

The default action for mc_cache is 'retrieve', so the first line can be written as

my $result = mc_cache();

Multiple Keys/Values

A cache file can store multiple keys and values. A value can be a scalar, list reference, or hash reference:

mc_cache(action=>'store',key=>'name',value=>$name);
mc_cache(action=>'store',key=>'friends',value=>\@lst);
mc_cache(action=>'store',key=>'map',value=>\%hsh);

The key defaults to 'main' when unspecified, as in the first example above.

Mason uses the MLDBM package to store and retrieve from its cache files, meaning that Mason can cache arbitrarily deep data structures composed of lists, hashes, and simple scalars.

Expiration

Typical cache items have a useful lifetime after which they must expire. Mason supports three types of expiration:

By Time

(e.g. the item expires in an hour, or at midnight). To expire an item by time, pass one of these options to the 'store' action.

expire_at: takes an absolute expiration time, in Perl time() format (number of seconds since the epoch)

expire_in: takes a relative expiration time of the form "<num><unit>", where <num> is a positive number and <unit> is one of seconds, minutes, hours, days, or weeks, or any abbreviation thereof. E.g. "10min", "1hour".

expire_next: takes a string, either 'hour' or 'day'. It indicates an expiration time at the top of the next hour or day.

Examples:

mc_cache(action=>'store', expire_in=>'2 hours');
mc_cache(action=>'store', expire_next=>'hour');
By Condition

(e.g. the item expires if a certain file or database table changes). To expire an item based on events rather than current time, pass the 'expire_if' option to the 'retrieve' action.

expire_if: calls a given anonymous subroutine and expires if the subroutine returns a non-zero value. The subroutine is called with one parameter, the time when the cache value was last written.

Example:

# expire the cache if 'myfile' is newer
mc_cache(action => 'retrieve',
      expire_if => sub { [stat 'myfile']->[9] > $_[0] });
By Explicit Action

(e.g. a shell command or web interface is responsible for explicitly expiring the item) To expire an item from a Perl script, for any component, use access_data_cache. It takes the same arguments as mc_cache plus one additional argument, cache_file. See the administration manual for details on where cache files are stored and how they are named.

use HTML::Mason::Utils 'access_data_cache';
access_data_cache (cache_file=>'/usr/local/mason/cache/foo::bar',
                   action=>'expire' [, key=>'fookey']);

The 'expire' action can also take multiple keys (as a list reference); this can be used in conjunction with the 'keys' action to expire all keys matching a particular pattern.

use HTML::Mason::Utils 'access_data_cache';
my @keys = access_data_cache (cache_file=>'/usr/local/mason/cache/foo::bar',
                              action=>'keys');
access_data_cache (cache_file=>'/usr/local/mason/cache/foo::bar',
                   action=>'expire', key=>[grep(/^sales/,@keys)]);

Busy Locks

The code shown in "Basic Usage" above,

my $result = mc_cache(action=>'retrieve');
if (!defined($result)) {
    ... compute $result> ...
    mc_cache(action=>'store', value=>$result);
}

can suffer from a kind of race condition for caches that are accessed frequently and take a long time to recompute.

Suppose that a particular cache value is accessed five times a second and takes three seconds to recompute. When the cache expires, the first process comes in, sees that it is expired, and starts to recompute the value. The second process comes in and does the same thing. This sequence continues until the first process finishes and stores the new value. On average, the value will be recomputed and written to the cache 15 times!

The solution here is to have the first process notify the others that it has started recomputing. This can be accomplished with the busy_lock flag:

mc_cache(action=>'retrieve',busy_lock=>'10sec',...);

With this flag, the first process sets a lock in the cache that effectively says "I'm busy recomputing his value, don't bother." Subsequent processes see the lock and return the old value. The lock is good for 10 seconds (in this case) and is ignored after that. Thus the time value you pass to busy_lock indicates how long you're willing to allow this component to use an expired cache value.

Would some of your caches would benefit from busy locks? One way to find out is to turn on cache logging in the Mason system logs. If you see large clusters of writes to the same cache in a short time span, then you might want to use busy locks when writing to that cache.

Keeping In Memory

The keep_in_memory flag indicates that the cache value should be kept in memory after it is stored or retrieved. Since every child process will store its own copy, this flag should be used only for small, frequently retrieved cache values. If used, this flag should be passed to both the store and retrieve commands.

Caching All Output

Occasionally you will need to cache the complete output of a component. One way to accomplish this is to replace the component with a placeholder that simply calls the component, then caches and prints the result. For example, if the component were named "foo", we might rename it to "foo_main" and put this component in its place:

<% $foo_out %>
<%perl_init>
    my $foo_out;
    if (!defined ($foo_out = mc_cache())) {
        mc_comp('foo_main', STORE=>\$foo_out);
        mc_cache(action=>'store',
              expire_in=>'3 hours', value=>$foo_out);
    }
</%perl_init>

This works, but is cumbersome. Mason offers a better shortcut: the mc_cache_self() command that lets a component cache it's own output and eliminates the need for a dummy component. It is typically used right at the top of a <%perl_init%> section:

<%perl_init>
    return if mc_cache_self(expire_in=>'3 hours'[, key=>'fookey']);
    ... <rest of perl_init> ...
</%perl_init%>

mc_cache_self is built on top of mc_cache, so it inherits all the expiration options described earlier.

Guarantees (or lack thereof)

Mason will make a best effort to cache data until it expires, but will not guarantee it. The data cache is not a permanent reliable store in itself; you should not place in the cache critical data (e.g. user session information) that cannot be regenerated from another source such as a database. You should write your code as if the cache might disappear at any time. In particular,

o If the 'store' action cannot get a write lock on the cache, it simply fails quietly. o Your Mason administrator will be required to remove cache files periodically when they get too large; this can happen any time.

On the other hand, expiration in its various forms is guaranteed, because Mason does not want you to rely on bad data to generate your content. If you use the 'expire' action and it cannot get a write lock, it will repeat the attempt several times and finally die with an error.

ACCESSING SERVER INTERNALS

Mason is built on top of mod_perl, an Apache extension that embeds a persistent Perl interpreter into the web server. Mason makes the powerful $r "request object" available as a global in all components, granting access to a variety of server internals, HTTP request data, and server API methods.

$r is fully described in the Apache documentation -- here is a sampling of methods useful to component developers:

$r->uri             # the HTTP request URI
$r->headers_in(..)  # the named HTTP header line
$r->server->port    # (note two arrows!) port # (usu. 80)
$r->content_type    # set or retrieve content-type

$r->content()       # don't use this one! (see Tips and Traps)

SENDING HTTP HEADERS

Mason sends a standard HTTP header with content type text/html when it reaches the primary HTML section of a component (after any <%perl_init> section).

That means if you want to send your own HTTP header, you have to do it in the <%perl_init%> section. You send headers with Apache commands headers_out and send_http_header.

To prevent Mason from sending out the default header, call mc_suppress_http_header(1). Here's an example:

<%perl_init>
...
mc_suppress_http_header(1);   # necessary because of next line
my $registered = mc_comp('isUserRegistered');
if (!$registered) {
     mc_comp('/shared/http/redirect',url=>'/registerScreen');
}
...
</%perl_init>

The component isUserRegistered returns 0 or 1 indicating whether the user has registered (e.g. by looking for a cookie). If the result is 0, we use an HTTP redirect to go to the registration screen. Mason would normally send the default header upon reaching the primary section of isUserRegistered - that is why we must call mc_suppress_http_header.

To cancel header suppression, call mc_suppress_http_header(0).

USING THE PERL DEBUGGER

The Perl debugger is an indispensable tool for identifying and fixing bugs in Perl programs. Unfortunately, in a mod_perl environment one is normally unable to use the debugger since programs are run from a browser. Mason removes this limitation by optionally creating a debug file for each page request, allowing the request to be replayed from the command line or Perl debugger.

Note: in early 1999 a new module, Apache::DB, was released that makes it substantially easier to use the Perl debugger directly in conjunction with a real Apache server. Since this mechanism is still new, we continue to support Mason debug files, and there may be reasons to prefer Mason's method (e.g. no need to start another Apache server). However we acknowledge that Apache::DB may eventually eliminate the need for debug files. For now we encourage you to try both methods and see which one works best.

Using debug files

Here is a typical sequence for debugging a Mason page:

1. Find the debug file:

When Mason is running in debug mode, requests generate "debug files", cycling through filenames "1" through "20". To find a request's debug file, simply do a "View Source" in your browser after the request and look for a comment like this at the very top:

<!--
Debug file is '3'.
Full debug path is '/usr/local/mason/debug/anon/3'.
-->
2. Run the debug file:

Debug files basically contain two things: a copy of the entire HTTP request (serialized with Data::Dumper), and all the plumbing needed to route that request through Mason. In other words, if you simply run the debug file like this:

perl /usr/local/mason/debug/anon/3

you should see the HTTP headers and content that the component would normally send to the browser.

3. Debug the debug file:

Now you merely add a -d option to run the debug file in Perl's debugger -- at which point you have to deal the problem of anonymous subroutines.

Mason compiles components down to anonymous subroutines which are not easily breakpoint'able (Perl prefers line numbers or named subroutines). Therefore, immediately before each component call, Mason calls a nonce subroutine called debug_hook just so you can breakpoint it like this:

b HTML::Mason::Interp::debug_hook

Since debug_hook is called with the component name as the second parameter, you can also breakpoint specific components using a conditional on $_[1]:

b HTML::Mason::Interp::debug_hook $_[1] =~ /component name/

You can avoid all that typing by adding the following to your ~/.perldb file:

# Perl debugger aliases for Mason
$DB::alias{mb} = 's/^mb\b/b HTML::Mason::Interp::debug_hook/';

which reduces the previous examples to just:

mb
mb $_[1] =~ /component name/

The use of debug files opens lots of other debugging options. For instance, you can read a debug file into the Emacs editor, with its nifty interface to Perl's debugger. This allows you to set break points visually or (in trace mode) watch a cursor bounce through your code in single-step or continue mode.

Specifying when to create debug files

Details about configuring debug mode can be found in HTML::Mason::Admin. In particular, the administrator must decide which of three debugging modes to activate:

never (no debug files)

always (create debug files for each request)

error (only generate a debug file when an error occurs)

How debug files work

At the beginning of a request, Mason calls almost every one of the mod_perl API methods ($r->xxx), trapping its result in a hash. That hash is then serialized by Data::Dumper and output into a new debug file along with some surrounding code.

When the debug file is executed, a new object is created of the class "HTML::Mason::FakeApache", passing the saved hash as initialization. The FakeApache object acts as a fake $r, responding to each method by getting or setting data in its hash. For most purposes it is indistinguishable from the original $r except that print methods go to standard output. The debug file then executes your handler() function with the simulated $r.

When debug files don't work

The vast majority of mod_perl API methods are simple get/set functions (e.g. $r->uri, $r->content_type) which are easy to simulate. Many pages only make use of these methods and can be successfully simulated in debug mode.

However, a few methods perform tasks requiring the presence of a true Apache server. These cannot be properly simulated. Some, such as log_error and send_cgi_header, are generally tangential to the debugging effort; for these Mason simply returns without doing anything and hopes for the best. Others, such as internal_redirect and lookup_uri, perform such integral functions that they cannot be ignored, and for these FakeApache aborts with an error. This category includes any method call expected to return an Apache::Table object.

In addition, FakeApache is playing something of a catch-up game: every time a new mod_perl release comes out with new API methods, those methods will not be recognized by FakeApache until it is updated in the next Mason release.

The combination of these problems and the existence of the new Apache::DB package may eventually lead us to stop further work on FakeApache/debug files. For now, though, we'll continue to support them as best we can.

USING THE PERL PROFILER (new in 0.4)

Debug files, mentioned in the previous section, can be used in conjunction with Devel::DProf to profile a web request.

To use profiling, pass the -p flag to the debug file:

% ./3 -p

This executes the debug file under Devel::DProf and, for convenience, runs dprofpp. If you wish you can rerun dprofpp with your choice of options.

Because components are implemented as anonymous subroutines, any time spent in components would normally be reported under an unreadable label like CODE(0xb6cbc). To remedy this, the -p flag automatically adjusts the tmon.out file so that components are reported by their component paths.

Much of the time spent in a typical debug file is initialization, such as loading Mason and other Perl modules. The effects of initialization can swamp profile results and obscure the time actually spent in components. One remedy is to run multiple iterations of the request inside the debug file, thus reducing the influence of initialization time. Pass the number of desired iterations via the -r flag:

% ./3 -p -r20

Currently there are no special provisions for other profiling modules such as Devel::SmallProf. You can try simply:

% perl -d:SmallProf ./3 -r20

However, this crashes on our Unix system -- apparently some bad interaction between Mason and SmallProf -- so it is unsupported for now.

THE PREVIEWER

Mason comes with a web-based debugging utility that lets you test your components by throwing fake requests at them. Adjustable parameters include: UserAgent, Time, HTTP Referer, O/S and so on. For example, imagine a component whose color scheme is supposed to change each morning, noon, and night. Using the Previewer, it would be simple to set the perceived time forward 1,5 or 8 hours to test the component at various times of day.

The Previewer also provides a debug trace of a page, showing all components being called and indicating the portion of HTML each component is responsible for. For pages constructed from more than a few components, these traces are quite useful for finding the component that is outputting a particular piece of HTML.

Your administrator will give you the main Previewer URL, and a set of preview ports that you will use to view your site under various conditions. For the purpose of this discussion we'll assume the Previewer is up and working, that the Previewer URL is http://www.yoursite.com/preview, and the preview ports are 3001 to 3005.

Take a look at the main Previewer page. The top part contains the most frequently used options, such as time and display mode. The middle part contains a table of your saved configurations; if this is your first time using the Previewer, it will be empty. The bottom part contains less frequently used options, such as setting the user agent and referer.

Try clicking "Save". This will save the displayed settings under the chosen preview port, say 3001, and redraw the page. Under "Saved Port Settings", you should see a single row showing this configuration. Your configurations are saved permanently in a file. If a username/password is required to access the Previewer, then each user has his/her own configuration file.

The "View" button should display your site's home page. If not, then the Previewer may not be set up correctly; contact your administrator or see the Administrator's Manual.

Go back to the main Previewer page, change the display mode from "HTML" to "debug", change the preview port to 3002, and click "Save" again. You should now see a second saved configuration.

Click "View". This time instead of seeing the home page as HTML, you'll get a debug trace with several sections. The first section shows a numbered hierarchy of components used to generate this page. The second section is the HTML source, with each line annotated on the left with the number of the component that generated it. Try clicking on the numbers in the first section; this brings you to the place in the second section where that component first appears. If there's a particular piece of HTML you want to change on a page, searching in the annotated source will let you quickly determine which component is responsible.

The final section of the debug page shows input and output HTTP headers. Note that some of these are simulated due to your Previewer settings. For example, if you specified a particular user agent in your Previewer configuration, then the User-Agent header is simulated; otherwise it reflects your actual browser.

TIPS AND TRAPS

No Subroutines

Do not declare named subroutines within your components. Mason wraps your subroutine within its own sub {..}, which creates a subroutine within a subroutine, which makes Perl very unhappy. Instead, use local anonymous subroutines or create another component and call with mc_comp.

Do Not Call $r->content or "new CGI"

Mason calls $r->content itself to read request input, emptying the input buffer and leaving a trap for the unwary: subsequent calls to $r->content hang the server. This is a mod_perl "feature" that may be fixed in an upcoming release.

For the same reason you should not create a CGI object like

my $query = new CGI;

when handling a POST; the CGI module will try to reread request input and hang. Instead, create an empty object:

my $query = new CGI ("");

such an object can still be used for all of CGI's useful HTML output functions. Or, if you really want to use CGI's input functions, initialize the object from %ARGS:

my $query = new CGI (\%ARGS);
Separating Perl From HTML

In our experience, the most readable components, especially for non-programmer designers and editors, contain full HTML in one continuous block at the top with simple substitutions for dynamic elements (<%$name%>, <%$salary%>) but no distracting blocks of Perl code. At the bottom a <%perl_init> block sets up the substitution variables -- getting $name from the database, calculating $salary, etc. This organization allows non-programmers to work with the HTML without getting distracted or discouraged by Perl code.

This technique does sacrifice some performance for readability.

AUTHOR

Jonathan Swartz, swartz@transbay.net

SEE ALSO

HTML::Mason, HTML::Mason::Commands