NAME
SWISH::Prog::DBI - index DB records with Swish-e
SYNOPSIS
use SWISH::Prog::DBI;
use Carp;
my $prog_dbi = SWISH::Prog::DBI->new(
db => [
"DBI:mysql:database=movies;host=localhost;port=3306",
'some_user', 'some_secret_pass',
{
RaiseError => 1,
HandleError => sub { confess(shift) },
}
],
alias_columns => 1
);
$prog_dbi->create(
tables => {
'moviesIlike' => {
title => 1,
synopsis => 1,
year => 1,
director => 1,
producer => 1,
awards => 1
}
}
);
DESCRIPTION
SWISH::Prog::DBI is a SWISH::Prog subclass designed for providing full-text search for your databases with Swish-e.
Since SWISH::Prog::DBI inherits from SWISH::Prog, read the SWISH::Prog docs first. Any overridden methods are documented here.
METHODS
new( db => DBI_connect_info, alias_columns => 0|1 )
Create new indexer object. DBI_connect_info is passed directly to DBI's connect() method, so see the DBI docs for syntax. If DBI_connect_info is a DBI handle object, it is accepted as is. If DBI_connect_info is an array ref, it will be dereferenced and passed to connect(). Otherwise it will be passed to connect as is.
The alias_columns
flag indicates whether all columns should be searchable under the default MetaName of swishdefault. The default is 1 (true). This is not the default behaviour of swish-e; this is a feature of SWISH::Prog.
NOTE: The new() method simply inherits from SWISH::Prog, so any params valid for that method() are allowed here.
init
Initialize object. This overrides SWISH::Prog init() base method.
init_indexer
Adds the special table
MetaName to the Config object before opening indexer.
DESTROY
Calls the DBI disconnect() method on the cached dbh before calling the SWISH::Prog::DESTROY method.
NOTE: Internal method only.
info
Internal method for retrieving db meta data.
cols
Internal method for retrieving db column data.
table_meta
Get/set all the table/column info for the current db.
create( opts )
Create index. The default is for all tables to be indexed, with each table name saved in the tablename
MetaName.
opts supports the following options:
- tables
-
Only index the following tables (and optionally, columns within tables).
Example:
If you only want to index the table
foo
and only the columnsbar
andgab
, pass this:$dbi->index( tables => { foo => { columns => bar=>1, gab=>1 } } } );
To index all columns:
$dbi->index( tables => { foo => 1 } );
- TODO
#TODO - make the column hash value the MetaRankBias for that column
NOTE: create() just loops over all the relevant tables and calls index_sql() to actually create each index. If you want to tailor your SQL (using JOINs etc.) then you probably want to call index_sql() directly.
Returns number of rows indexed.
index_sql( %opts )
Fetch rows from the DB, convert to XML and pass to inherited index() method. %opts should include at least the following:
- sql
-
The SQL statement to execute.
%opts may also contain:
- table
-
The name of the table. Used for creating virtual XML documents passed to indexer.
- title
-
Which column to use as the title of the virtual document. If not defined, the title will be the empty string.
- desc
-
Which columns to include in
swishdescription
property. Default is none. Should be a hashref with column names as keys.
%opts may contain any other param that SWISH::Prog::Index->new() accepts.
Example:
$prog_dbi->index_sql( sql => 'SELECT * FROM `movies`',
title => 'Movie_Title'
);
row2xml( table_name, row_hash_ref, title )
Converts row_hash_ref to a XML string. Returns the XML.
The table_name is included in <table
> tagset within each row. You can use the table
MetaName to limit searches to a specific table.
title_filter( row_hash_ref )
Override this method if you do not provide a title
column in index_sql(). The return value of title_filter() will be used as the swishtitle
for the row's virtual XML document.
row_filter( row_hash_ref )
Override this method if you need to alter the data returned from the db prior to it being converted to XML for indexing.
This method is called prior to title_filter() so all row data is affected.
NOTE: This is different from the row() method in the ::Doc subclass. This row_filter() gets called before the Doc object is created.
See FILTERS section.
FILTERS
There are several filtering methods in this module. Here's a summary of what they do and when they are called, so you have a better idea of how to best use them. Pay special attention to those called before converting the row to XML as opposed to after conversion.
row_filter
Called by index_sql() for each row fetched from the database. This is the first filter called in the chain. Called before the row is converted to XML.
title_filter
Called by index_sql() after row_filter() but only if an explicit title
opt param was not passed to index_sql(). Called before the row is converted to XML.
SWISH::Prog::DBI::Doc *_filter() methods
Each of the normal SWISH::Prog::Doc attributes has a *_filter() method. These are called after the row is converted to XML. See SWISH::Prog::Doc.
NOTE: There is not a SWISH::Prog::DBI::Doc row_filter() method.
filter
The normal SWISH::Prog filter() method is called as usual just before passing to ok() inside index(). Called after the row is converted to XML.
ENCODINGS
Since Swish-e version 2 does not support UTF-8 encodings, you may need to convert or transliterate your text prior to indexing. Swish-e offers the TranslateCharacters config option, but that does not work well with multi-byte characters.
Here's one way to handle the issue. Use Search::Tools::Transliterate and the row_filter() method to convert your UTF-8 text to single-byte characters. You can do this by subclassing SWISH::Prog::DBI and overriding the row_filter() method.
Example:
package My::DBI;
use base qw( SWISH::Prog::DBI );
use POSIX qw(locale_h);
use locale;
use Encode;
use Search::Tools::Transliterate;
my $trans = Search::Tools::Transliterate->new;
my ($charset) = (setlocale(LC_CTYPE) =~ m/^.+?\.(.+)/ || 'iso-8859-1');
sub row_filter
{
my $self = shift;
my $row = shift;
# We transliterate everything in each row and append as a charset column.
# This means we can search for it but it'll not show in any property.
# Instead we'll get the UTF-8 text in the property value.
# The downside is that you can't do 'meta=asciitext' because the charset string
# is not stored under any but the swishdefault metaname.
# You could get around that by using MetaNameAlias in config() to alias
# each column to column_charset.
for (keys %$row)
{
# if it's not already UTF-8, make it so.
unless ($trans->is_valid_utf8($row->{$_}))
{
$row->{$_} = Encode::encode_utf8(Encode::decode($charset, $row->{$_}, 1));
}
# then transliterate to single-byte chars
$row->{$_ . '_' . $charset} = $trans->convert($row->{$_});
}
}
1;
use My::DBI;
my $dbi_prog = My::DBI->new(
config => SWISH::Config->new(
# also use Swish-e's feature so that all text is searchable as ASCII
TranslateCharacters => ':ascii:'
),
);
$dbi_prog->create;
SEE ALSO
SWISH::Prog, SWISH::Prog::DBI::Doc, Search::Tools
AUTHOR
Peter Karman, <perl@peknet.com>
Thanks to Atomic Learning for supporting the development of this module.
COPYRIGHT AND LICENSE
Copyright 2006 by Peter Karman
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.