NAME

WWW::Scraper::Typo3 - Clean up files managed by the CMS called Typo3

Synopsis

Note: The code assumes you are running a web server locally, so the scripts can both read and write files, and use LWP::Simple::getstore to process files.

cd ~/misc
wget -o wget.log --limit-rate=100k -w 4 -r -k -P tewoaf -E -p http://tewoaf.org.au
cd tewoaf
rm *eID* # This removes pop-up files generated by clicking on images.
cd $DR   # This is doc root for your web server.
rm -rf tewoaf
cp -r ~/misc/tewoaf
cd ~/perl.modules/WWW-Scraper-Typo3
perl scripts/rename.files.pl -d $DR/tewoaf -v 1
perl scripts/patch.files.pl -d $DR/tewoaf -v 1
perl scripts/report.files.pl -b /tewoaf -v 1

patch.files.pl is the only program which overwrites files.

Description

WWW::Scraper::Typo3 is a pure Perl module.

It processes the set of files downloaded from a web site whose files are managed by the CMS called Typo3.

Distributions

This module is available as a Unix-style distro (*.tgz).

See http://savage.net.au/Perl-modules.html for details.

See http://savage.net.au/Perl-modules/html/installing-a-module.html for help on unpacking and installing.

Constructor and initialization

new(...) returns an object of type WWW::Scraper::Typo3.

This is the class's contructor.

Usage: WWW::Scraper::Typo3 -> new().

This method takes a hash of options.

Call new() as new(option_1 => value_1, option_2 => value_2, ...).

Available options:

base_url aURL

The script report.files.pl uses "http://$host:$port$base_url$home_page" as the URL where processing starts.

If necessary, both a leading '/' and a trailing '/' are added to the value you supply.

The default value is '/'.

This parameter is mandatory for the script report.files.pl.

dir aDirName

This option is used by the 2 scripts rename.files.pl and patch.files.pl.

It is the directory where these scripts read and write files.

From the synopsis, you can see I suggest you download the site's files to a directory outside your local web server's doc root, and work on a copy of the files within that doc root.

The default value is ''.

This parameter is optional.

home_page aHTMLFileName

The name of the home page of the site.

The default value is index.html.

This parameter is mandatory for the script report.files.pl.

host aHostName

The domain name or IP address of the host.

The default value is 127.0.0.1.

This parameter is mandatory for the script report.files.pl.

post aPortNumber

The number of the port to use.

The default value is 80.

This parameter is mandatory for the script report.files.pl.

verbose #

Display more (1) or less (0) output.

The default is 0.

This parameter is optional.

Method: patch_files()

Run the code which patches various aspects of Typo3-managed files.

See scripts/patch.files.pl.

Method: rename_files()

Run the code which renames Typo3-managed files.

See scripts/rename.files.pl.

Method: report_files()

Run the code which reports on various aspects of Typo3-managed files.

See scripts/report.files.pl.

Author

WWW::Scraper::Typo3 was written by Ron Savage <ron@savage.net.au> in 2010.

Home page: http://savage.net.au/index.html

Copyright

Australian copyright (c) 20010 Ron Savage.

All Programs of mine are 'OSI Certified Open Source Software';
you can redistribute them and/or modify them under the terms of
The Artistic License, a copy of which is available at:
http://www.opensource.org/licenses/index.html