NAME
WWW::Scraper::Typo3
- Clean up files managed by the CMS called Typo3
Synopsis
Note: The code assumes you are running a web server locally, so the scripts can both read and write files, and use LWP::Simple::getstore to process files.
cd ~/misc
wget -o wget.log --limit-rate=100k -w 4 -r -k -P tewoaf -E -p http://tewoaf.org.au
cd tewoaf
rm *eID* # This removes pop-up files generated by clicking on images.
cd $DR # This is doc root for your web server.
rm -rf tewoaf
cp -r ~/misc/tewoaf
cd ~/perl.modules/WWW-Scraper-Typo3
perl scripts/rename.files.pl -d $DR/tewoaf -v 1
perl scripts/patch.files.pl -d $DR/tewoaf -v 1
perl scripts/report.files.pl -b /tewoaf -v 1
patch.files.pl is the only program which overwrites files.
Description
WWW::Scraper::Typo3
is a pure Perl module.
It processes the set of files downloaded from a web site whose files are managed by the CMS called Typo3.
Distributions
This module is available as a Unix-style distro (*.tgz).
See http://savage.net.au/Perl-modules.html for details.
See http://savage.net.au/Perl-modules/html/installing-a-module.html for help on unpacking and installing.
Constructor and initialization
new(...) returns an object of type WWW::Scraper::Typo3
.
This is the class's contructor.
Usage: WWW::Scraper::Typo3 -> new()
.
This method takes a hash of options.
Call new()
as new(option_1 => value_1, option_2 => value_2, ...)
.
Available options:
- base_url aURL
-
The script report.files.pl uses "http://$host:$port$base_url$home_page" as the URL where processing starts.
If necessary, both a leading '/' and a trailing '/' are added to the value you supply.
The default value is '/'.
This parameter is mandatory for the script report.files.pl.
- dir aDirName
-
This option is used by the 2 scripts rename.files.pl and patch.files.pl.
It is the directory where these scripts read and write files.
From the synopsis, you can see I suggest you download the site's files to a directory outside your local web server's doc root, and work on a copy of the files within that doc root.
The default value is ''.
This parameter is optional.
- home_page aHTMLFileName
-
The name of the home page of the site.
The default value is index.html.
This parameter is mandatory for the script report.files.pl.
- host aHostName
-
The domain name or IP address of the host.
The default value is 127.0.0.1.
This parameter is mandatory for the script report.files.pl.
- post aPortNumber
-
The number of the port to use.
The default value is 80.
This parameter is mandatory for the script report.files.pl.
- verbose #
-
Display more (1) or less (0) output.
The default is 0.
This parameter is optional.
Method: patch_files()
Run the code which patches various aspects of Typo3-managed files.
See scripts/patch.files.pl.
Method: rename_files()
Run the code which renames Typo3-managed files.
See scripts/rename.files.pl.
Method: report_files()
Run the code which reports on various aspects of Typo3-managed files.
See scripts/report.files.pl.
Author
WWW::Scraper::Typo3
was written by Ron Savage <ron@savage.net.au> in 2010.
Home page: http://savage.net.au/index.html
Copyright
Australian copyright (c) 20010 Ron Savage.
All Programs of mine are 'OSI Certified Open Source Software';
you can redistribute them and/or modify them under the terms of
The Artistic License, a copy of which is available at:
http://www.opensource.org/licenses/index.html