NAME

URI::Sequin - Extract information from the URLs of Search-Engines

SYNOPSIS

use URI::Sequin qw/se_extract key_extract log_extract %log_types/;

$url = &log_extract($line_from_log_file, 'NCSA');

$log_types{'MyLogType'} = '^(.+?) -> .+$';
$url = &log_extract($line_from_log_file, 'MyLogType');

$keyword_string = &key_extract($url);

($search_engine_name, $search_engine_url) = @{&se_extract($url)};

DESCRIPTION

This module provides three tools to aid people trying to analyse Search-Engine URLs. It’s meant mainly for those who want to analyse referrer logs and pick out key information about site visitors, such as which Search-Engine and keywords they used to find the site.

The functions and globals provided (and exported by default) from this module are:

log_extract($log_line, ‘Type’)

This will pick out the referring URL from a line of a logfile. The ‘type’ can be one of the built in types or can be a user-created one. For more information, see %log_types below. This subroutine accepts a scalar, and returns a scalar.

key_extract($url)

This will try and determine the keywords used in $url. It accepts a scalar and returns a scalar. Should nothing be found, it returns an undefined value.

se_extract($url)

This will try and determine the name of the Search-Engine used and its URL. It accepts a scalar, and returns an array containing firstly the Search- Engine’s name and secondly the Search-Engine’s URL. Should the URL appear not to be from a Search Query, it returns a reference to an empty array.

%log_types

There are five built-in logfile types already in this hash. They are:

  • IIS1 - Microsoft IIS 3.0 and 2.0

  • IIS2 - Microsoft IIS4.0 (W3SVC format)

  • NCSA - For APACHE, NETSCAPE and any other NCSA format logs

  • ORW - O'Reilly WebSite format

  • General - A generalised one that will work with most logfiles

It’s easy to add another one. Simply add a key to the hash, with a value that is a regex. Parenthesise the part that is the referring URL, as the script uses $1 to obtain the URL. (see the example in the Synopsis section).

AUTHOR

Peter Sergeant <pete_sergeant@hotmail.com>

COPYRIGHT

Copyright 2000 Peter Sergeant.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 354:

Non-ASCII character seen before =encoding in 'It’s'. Assuming CP1252