NAME

News Clipper - downloads and integrates dynamic information into your webpage

SYNOPSIS

NewsClipper.pl [-anrv] [-i inputfile] [-o outputfile] [-c configfile]

DESCRIPTION

News Clipper grabs dynamic information from the internet and integrates it into your webpage. Features include modular extensibility, timeouts to handle dead servers without hanging the script, user-defined update times, automatic installation of modules, and compatibility with cgi-wrap.

News Clipper takes an input HTML file, which includes special tags of the form:

<!--newsclipper
  <input name=X>
  <filter name=Y>
  <output name=Z>
-->

where X represents a data source, such as "apnews", "slashdot", etc. When such a tag is encountered, News Clipper attempts to load and execute the handler to acquire the data. Then the data is sent to the filter named by Y, and then on to the output handler named by Z. If the handler can not be found, the script asks for permission to attempt to download it from the central repository.

HANDLERS

News Clipper has a modular architecture, in which handlers implement the acquisition and output of data gathered from the internet. To use new data sources, first locate an interesting one at http://www.newsclipper.com/handlers.html, then place the News Clipper tag in your input file. Then run News Clipper once manually, and it will prompt you for permission to download and install the handler.

You can control, at a high level, the format of the output data by using the built-in filters and handlers described on the handlers web page. For more control over the style of output data, you can write your own handlers in Perl.

To help handler developers, a utility called MakeHandler.pl is included with the News Clipper distribution. It is a generator that asks several questions, and then creates a basic handler. Handler development is supported by two APIs, AcquisitionFunctions and HTMLTools. For a complete description of these APIs, as well as suggestions on how to write handlers, visit http://www.newsclipper.com/handlers.html.

OPTIONS AND ARGUMENTS

-i

Override the input file specified in the configuration file.

-o

Override the output file specified in the configuration file.

-c

Use the specified file as the configuration file, instead of NewsClipper.cfg.

-a

Automatically download all handlers that are not installed locally.

-n

Check for new versions of handlers while processing input file.

-r

Reload the content from the proxy server even on a cache hit. This prevents News Clipper from using stale data when constructing the output file.

-d

Enable debug mode, which prints extra information about the execution of News Clipper. Output is sent to the screen instead of the output file.

-v

Verbose output. Output a copy of the information sent to the output file to standard output. Does not work on Windows or DOS.

Configuration

The file NewsClipper.cfg contains the configuration. News Clipper will first look for this file in the system-wide location specified by the NEWSCLIPPER environment variable. News Clipper will then load the user's NewsClipper.cfg from $home/.NewsClipper. Options that appear in the personal configuration file override those in the system-wide configuration file. In this file you can specify the following:

  • Multiple input and output files.

  • The timeout value for the script. This puts a limit on the total time the script can execute, which prevents it from hanging.

  • The timeout value for socket connections. This allows the script to recover from unresponsive servers.

  • Your proxy host. For example, "http://proxy.host.com:8080/"

  • The locations of handlers. For example, ['dir1','dir2'] would look for handlers in dir1/NewsClipper/Handler/ and dir2/NewsClipper/Handler/. Note that while installing handlers, the first directory is used.

  • The size and location of the HTML cache News Clipper uses to store data in between the update times specified by the handlers.

  • The location of News Clipper's modules, in case the aren't in the standard Perl module path. (Set during installation.)

  • The maximum age and location of images stored locally by the cacheimages filter.

  • DOS/Windows users can specify their time zone. (Set during installation.)

See the file NewsClipper.cfg for examples.

RUNNING

You can run NewsClipper.pl from the command line, but a better way is to run the script as a cron job. To do this, create a .crontab file with something similar to the following:

    0 7,10,13,16,19,22 * * * /path/NewsClipper.pl

You can also have cgiwrap call your startup page, but this would mean having to wait for the script to execute (2 to 30 seconds, depending on the staleness of the information). To do this, place NewsClipper.pl and NewsClipper.cfg in your public_html/cgi-bin directory, and use a URL similar to the following:

    http://www.server.com/cgi-bin/cgiwrap?user=USER&script=NewsClipper.pl

PREREQUISITES

This script requires the Time::CTime, Time::ParseDate, LWP::UserAgent (part of libwww), URI, HTML-Tree, and HTML::Parser modules, in addition to others that are included in the standard Perl distribution. See the News Clipper distribution's README file for more information.

Handlers that you download may require additional modules.

AUTHOR

Spinnaker Software, Inc. David Coppit, <david@coppit.org>, http://coppit.org/