NAME
Daily Update - downloads and integrates dynamic information into your webpage
SYNOPSIS
DailyUpdate.pl [-i inputfile] [-o outputfile] [-c configfile]
DESCRIPTION
Daily Update grabs dynamic information from the internet and integrates it into your webpage. Features include modular extensibility, timeouts to handle dead servers without hanging the script, user-defined update times, automatic installation of modules, and compatibility with cgi-wrap.
Daily Update takes an input HTML file, which includes special tags of the form \<dailyupdate name=X\>. X represents a data source, such as "apnews", "weather", etc. When such a tag is encountered, Daily Update attempts to load and execute the handler to acquire the data, replacing the tag with the data. If the handler can not be found, the script asks for permission to attempt to download it from the central repository at http://www.cs.virginia.edu/~dwc3q/code/DailyUpdate/handlers.html.
The output contains comment tags with timestamps, which are used by the script to determine when the data needs to be refreshed. Update times are specified in the configuration file, which also allows one to specify the input and output files, and proxy settings.
HANDLERS
Daily Update has a modular architecture, in which handlers implement the acquisition and output of data gathered from the internet. To use new data sources, first locate an interesting one at http://www.cs.virginia.edu/~dwc3q/code/DailyUpdate/handlers.html, then place the tag \<dailyupdate name=NAME\> in your input file. Then run Daily Update once manually, and it will prompt you for permission to download and install the handler.
To help handler developers, a utility called MakeHandler.pl is included with the Daily Update distribution. It is a generator that asks several questions, and then creates a basic handler. Handler development is supported by two APIs, AcquisitionFunctions and OutputFunctions.
AcquisitionFunctions consists of:
GetUrl: Grabs all the content from a URL
GetText: Grabs text data from a block of HTML, without formatting
GetHtml: Grabs a block of HTML from a URL's content
GetImages: Grabs images from a block of HTML
GetLinks: Grabs hyperlinks from a block of HTML
OutputFunctions consists of:
OutputUnorderedList: takes a reference to an array.
OutputOrderedList: takes a reference to an array.
OutputTwoColumns: takes a reference to an array.
OutputListOrColumns: Outputs either an unordered list or a two column table, depending on the value of the "style" attribute to the tag. Takes a reference to an array.
OPTIONS AND ARGUMENTS
- -i
-
Override the input file specified in the configuration file.
- -o
-
Override the output file specified in the configuration file.
- -c
-
Use the specified file as the configuration file, instead of DailyUpdate.cfg.
RUNNING
You can run DailyUpdate.pl from the command line, but a better way is to run the script as a cron job. To do this, create a .crontab file with something similar to the following:
0 7,10,13,16,19,22 * * * /users/dwc3q/public_html/cgi-bin/DailyUpdate.pl
You can also have cgiwrap call your startup page, but this would mean having to wait for the script to execute (2 to 30 seconds, depending on the staleness of the information). To do this, place DailyUpdate.pl and DailyUpdate.cfg in your public_html/cgi-bin directory, and use a URL similar to the following:
http://www.cs.virginia.edu/cgi-bin/cgiwrap?user=dwc3q&script=DailyUpdate.pl
PREREQUISITES
This script requires the LWP::UserAgent
, URI
, and HTML::Parser
modules, in addition to others that are included in the standard Perl distribution. Download them from %CPAN%/modules/by-module/LWP/. (Go to http://www.perl.com/ if you don't know how to get to CPAN.)
Handlers that you download may require additional modules.
AUTHOR
David Coppit, <coppit@cs.virginia.edu>, http://www.cs.virginia.edu/~dwc3q/index.thml