NAME

Daily Update - downloads and integrates dynamic information into your webpage

SYNOPSIS

DailyUpdate.pl [-anr] [-i inputfile] [-o outputfile] [-c configfile]

DESCRIPTION

Daily Update grabs dynamic information from the internet and integrates it into your webpage. Features include modular extensibility, timeouts to handle dead servers without hanging the script, user-defined update times, automatic installation of modules, and compatibility with cgi-wrap.

Daily Update takes an input HTML file, which includes special tags of the form <!--dailyupdate name=X-->. X represents a data source, such as "apnews", "weather", etc. When such a tag is encountered, Daily Update attempts to load and execute the handler to acquire the data, replacing the tag with the data. If the handler can not be found, the script asks for permission to attempt to download it from the central repository at http://www.cs.virginia.edu/~dwc3q/code/DailyUpdate/handlers.html.

The output contains comment tags with timestamps, which are used by the script to determine when the data needs to be refreshed. Update times are specified in the configuration file, which also allows one to specify the input and output files, and proxy settings.

HANDLERS

Daily Update has a modular architecture, in which handlers implement the acquisition and output of data gathered from the internet. To use new data sources, first locate an interesting one at http://www.cs.virginia.edu/~dwc3q/code/DailyUpdate/handlers.html, then place the tag <!--dailyupdate name=NAME--> in your input file. Then run Daily Update once manually, and it will prompt you for permission to download and install the handler.

To help handler developers, a utility called MakeHandler.pl is included with the Daily Update distribution. It is a generator that asks several questions, and then creates a basic handler. Handler development is supported by two APIs, AcquisitionFunctions and OutputFunctions.

AcquisitionFunctions consists of:

  • GetUrl: Grabs all the content from a URL

  • GetText: Grabs text data from a block of HTML, without formatting

  • GetHtml: Grabs a block of HTML from a URL's content

  • GetImages: Grabs images from a block of HTML

  • GetLinks: Grabs hyperlinks from a block of HTML

OutputFunctions consists of:

  • OutputList: takes a reference to an array, a style (ul, ol, free text), and an integer representing the number of columns.

  • OutputUnorderedList: takes a reference to an array.

  • OutputOrderedList: takes a reference to an array.

  • OutputTwoColumns: takes a reference to an array.

  • OutputListOrColumns: Outputs either an unordered list or a two column table, depending on the value of the "style" attribute to the tag. Takes a reference to an array.

OPTIONS AND ARGUMENTS

-i

Override the input file specified in the configuration file.

-o

Override the output file specified in the configuration file.

-c

Use the specified file as the configuration file, instead of DailyUpdate.cfg.

-a

Automatically download all handlers that are not installed locally.

-n

Check for new versions of handlers while processing input file.

-r

Reload the content from the proxy server even on a cache hit. This prevents Daily Update from using stale data when constructing the output file.

Configuration

The file DailyUpdate.cfg contains the configuration. Daily Update will first look for this file in ~/.DailyUpdate, and then in the system-wide location specified during installation. If both of these fail, it will search the standard Perl library path. In this file you can specify the following:

  • Multiple input and output files.

  • The timeout value for the script. This puts a limit on the total time the script can execute, which prevents it from hanging.

  • The timeout value for socket connections. This allows the script to recover from unresponsive servers.

  • Your proxy host. For example, "http://proxy.host.com:8080/"

  • The locations of handlers. For example, ['dir1','dir2'] would look for handlers in dir1/DailyUpdate/Handler/ and dir2/DailyUpdate/Handler/. Note that while installing handlers, the first directory is used.

  • Custom times at which to update the data for each handler. Handlers typically update their data at set times, but this can be customized in the configuration.

See the file DailyUpdate.cfg for examples.

RUNNING

You can run DailyUpdate.pl from the command line, but a better way is to run the script as a cron job. To do this, create a .crontab file with something similar to the following:

    0 7,10,13,16,19,22 * * * /users/dwc3q/public_html/cgi-bin/DailyUpdate.pl

You can also have cgiwrap call your startup page, but this would mean having to wait for the script to execute (2 to 30 seconds, depending on the staleness of the information). To do this, place DailyUpdate.pl and DailyUpdate.cfg in your public_html/cgi-bin directory, and use a URL similar to the following:

    http://www.cs.virginia.edu/cgi-bin/cgiwrap?user=dwc3q&script=DailyUpdate.pl

PREREQUISITES

This script requires the LWP::UserAgent (part of libwww), URI, HTML-Tree, and HTML::Parser modules, in addition to others that are included in the standard Perl distribution. Download them all from CPAN at http://www.perl.com/CPAN/modules/by-module/.

Handlers that you download may require additional modules.

AUTHOR

David Coppit, <coppit@cs.virginia.edu>, http://www.cs.virginia.edu/~dwc3q/index.html