NAME
URI::Sequin - Extract information from the URLs of Search-Engines
SYNOPSIS
use URI::Sequin qw/se_extract key_extract log_extract %log_types/;
$url = &log_extract($line_from_log_file, 'NCSA');
$log_types{'MyLogType'} = '^(.+?) -> .+$';
$url = &log_extract($line_from_log_file, 'MyLogType');
$keyword_string = &key_extract($url);
($search_engine_name, $search_engine_url) = @{&se_extract($url)};
DESCRIPTION
This module provides three tools to aid people trying to analyse Search-Engine URLs. It’s meant mainly for those who want to analyse referrer logs and pick out key information about site visitors, such as which Search-Engine and keywords they used to find the site.
The functions and globals provided (and exported by default) from this module are:
- log_extract($log_line, ‘Type’)
-
This will pick out the referring URL from a line of a logfile. The ‘type’ can be one of the built in types or can be a user-created one. For more information, see %log_types below. This subroutine accepts a scalar, and returns a scalar.
- key_extract($url)
-
This will try and determine the keywords used in $url. It accepts a scalar and returns a scalar. Should nothing be found, it returns an undefined value.
- se_extract($url)
-
This will try and determine the name of the Search-Engine used and its URL. It accepts a scalar, and returns an array containing firstly the Search- Engine’s name and secondly the Search-Engine’s URL. Should the URL appear not to be from a Search Query, it returns a reference to an empty array.
- %log_types
-
There are five built-in logfile types already in this hash. They are:
IIS1 - Microsoft IIS 3.0 and 2.0
IIS2 - Microsoft IIS4.0 (W3SVC format)
NCSA - For APACHE, NETSCAPE and any other NCSA format logs
ORW - O'Reilly WebSite format
General - A generalised one that will work with most logfiles
It’s easy to add another one. Simply add a key to the hash, with a value that is a regex. Parenthesise the part that is the referring URL, as the script uses $1 to obtain the URL. (see the example in the Synopsis section).
AUTHOR
Peter Sergeant <pete_sergeant@hotmail.com>
COPYRIGHT
Copyright 2000 Peter Sergeant.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 354:
Non-ASCII character seen before =encoding in 'It’s'. Assuming CP1252