NAME
Daily Update - downloads and integrates dynamic information into your webpage
SYNOPSIS
DailyUpdate.pl [-anrv] [-i inputfile] [-o outputfile] [-c configfile]
DESCRIPTION
Daily Update grabs dynamic information from the internet and integrates it into your webpage. Features include modular extensibility, timeouts to handle dead servers without hanging the script, user-defined update times, automatic installation of modules, and compatibility with cgi-wrap.
Daily Update takes an input HTML file, which includes special tags of the form:
<!--dailyupdate
<input name=X>
<filter name=Y>
<output name=Z>
-->
where X represents a data source, such as "apnews", "slashdot", etc. When such a tag is encountered, Daily Update attempts to load and execute the handler to acquire the data. Then the data is sent to the filter named by Y, and then on to the output handler named by Z. If the handler can not be found, the script asks for permission to attempt to download it from the central repository at http://www.cs.virginia.edu/~dwc3q/code/DailyUpdate/handlers.html.
HANDLERS
Daily Update has a modular architecture, in which handlers implement the acquisition and output of data gathered from the internet. To use new data sources, first locate an interesting one at http://www.cs.virginia.edu/~dwc3q/code/DailyUpdate/handlers.html, then place the Daily Update tag in your input file. Then run Daily Update once manually, and it will prompt you for permission to download and install the handler.
You can control, at a high level, the format of the output data by using the built-in filters and handlers described on the handlers web page. For more control over the style of output data, you can write your own handlers in Perl. For more information, see the on-line user's manual at http://www.cs.virginia.edu/~dwc3q/code/DailyUpdate/manual.html.
To help handler developers, a utility called MakeHandler.pl is included with the Daily Update distribution. It is a generator that asks several questions, and then creates a basic handler. Handler development is supported by two APIs, AcquisitionFunctions and HTMLTools. For a complete description of these APIs, as well as suggestions on how to write handlers, visit http://www.cs.virginia.edu/~dwc3q/code/DailyUpdate/handlers.html.
OPTIONS AND ARGUMENTS
- -i
-
Override the input file specified in the configuration file.
- -o
-
Override the output file specified in the configuration file.
- -c
-
Use the specified file as the configuration file, instead of DailyUpdate.cfg.
- -a
-
Automatically download all handlers that are not installed locally.
- -n
-
Check for new versions of handlers while processing input file.
- -r
-
Reload the content from the proxy server even on a cache hit. This prevents Daily Update from using stale data when constructing the output file.
- -v
-
Verbose output. Output a copy of the information sent to the output file to standard output.
Configuration
The file DailyUpdate.cfg contains the configuration. Daily Update will first look for this file in ~/.DailyUpdate, and then in the system-wide location specified during installation. If both of these fail, it will search the standard Perl library path. In this file you can specify the following:
Multiple input and output files.
The timeout value for the script. This puts a limit on the total time the script can execute, which prevents it from hanging.
The timeout value for socket connections. This allows the script to recover from unresponsive servers.
Your proxy host. For example, "http://proxy.host.com:8080/"
The locations of handlers. For example, ['dir1','dir2'] would look for handlers in dir1/DailyUpdate/Handler/ and dir2/DailyUpdate/Handler/. Note that while installing handlers, the first directory is used.
The size and location of the HTML cache Daily Update uses to store data in between the update times specified by the handlers.
The location of Daily Update's modules, in case the aren't in the standard Perl module path. (Set during installation.)
The maximum age and location of images stored locally by the cacheimages filter.
DOS/Windows users can specify their time zone. (Set during installation.)
See the file DailyUpdate.cfg for examples.
RUNNING
You can run DailyUpdate.pl from the command line, but a better way is to run the script as a cron job. To do this, create a .crontab file with something similar to the following:
0 7,10,13,16,19,22 * * * /users/dwc3q/public_html/cgi-bin/DailyUpdate.pl
You can also have cgiwrap call your startup page, but this would mean having to wait for the script to execute (2 to 30 seconds, depending on the staleness of the information). To do this, place DailyUpdate.pl and DailyUpdate.cfg in your public_html/cgi-bin directory, and use a URL similar to the following:
http://www.cs.virginia.edu/cgi-bin/cgiwrap?user=dwc3q&script=DailyUpdate.pl
PREREQUISITES
This script requires the Date::Manip
, LWP::UserAgent
(part of libwww), URI
, HTML-Tree
, and HTML::Parser
modules, in addition to others that are included in the standard Perl distribution. Download them all from CPAN at http://www.perl.com/CPAN/modules/by-module/.
Handlers that you download may require additional modules.
AUTHOR
David Coppit, <coppit@cs.virginia.edu>, http://www.cs.virginia.edu/~dwc3q/index.html