NAME

SWISH::Filters::Base - base class for SWISH::Filters

DESCRIPTION

Each filter is a subclass of SWISH::Filters::Base. A number of methods are available by default (and some can be overridden). Others are useful when writing your new() constructor.

METHODS

filter

You must override this method in your filter subclass.

parent_filter

This method is no longer supported.

type

This method fetches the type of the filter. The value returned sets the primary sort key for sorting the filters. You can override this in your filter, or just set it as an attribute in your object. The default is 2.

The idea of the "type" is to create groups of filters, if needed. For example, you might have a set of filters that are used for uncompressing some documents before passing on to another group for filtering.

priority

This method fetches the priority of the filter. The value returned sets the secondary sort key for sorting the filters. You can override this in your filter, or just set it as an attribute in your object. The default method returns 50.

The priority is useful if you have multiple filters for the same content type that use different methods for filtering (say one uses wvWare and another uses catdoc for filtering MS Word files). You might give the wvWare filter a lower priority number so it runs before the catdoc filter if both wvWare AND catdoc happen to be installed at the same time.

A lower priority value is given preference over a higher priority value.

mimetypes

Returns the list of mimetypes (as regular expressions) set for the filter.

can_filter_mimetype( content_type )

Returns true if passed in content type matches one of the filter's mimetypes Returns the pattern that matched.

mywarn( message )

Prints message on STDERR if debugging is set with FILTER_DEBUG environment variable.

set_programs( @program_list );

Creates a method for each program with the "run_" prefix. Returns undef if any program cannot be found.

If all the programs listed in @program_list are found and can be executed as the current user, set_programs() returns $self, so you can chain methods together.

For example, in your constructor you might do:

return $self->set_programs( qw/ pdftotext pdfinfo / );

Then in your filter() method:

my $content = $self->run_pdfinfo( $doc->fetch_filename, [options] );

find_binary( prog );

Use in a filter's new() method to test for a necesary program located in $ENV{PATH}. Returns the path to the program if prog exists and passes the built-in -x test. Returns undefined otherwise.

use_modules( @module_list );

Attempts to load each of the modules listed and call its import() method.

Use to test and load required modules within a filter without aborting.

return unless $self->use_modules( qw/ Spreadsheet::ParseExcel  HTML::Entities / );

If the module name is an array reference, the first item is considered the module name and the second the minimum version required.

return unless $self->use_modules( [ 'Foo::Bar' => '0.123' ] );

Returns undef if any module is unavailable. A warning message is displayed if the FILTER_DEBUG environment variable is true.

Returns $self on success.

run_program( program, @args );

Runs program with @args. Must pass in @args.

Under Windows calls IPC::Open2, which may pass data through the shell. Double-quotes are escaped (backslashed) and each parameter is wrapped in double-quotes.

On other platforms a fork() and exec() is used to avoid passing any data through the shell.

Returns a reference to a scalar containing the output from your program, or croaks.

This method is intended to read output from a program that converts one format into text. The output is read back in text mode -- on systems like Windows this means \r\n (CRLF) will be convertet to \n.

escapeXML( string )

Escapes the 5 primary XML characters & < > ' and ", plus all ASCII control characters. Returns the escaped string.

format_meta_headers( meta_hash_ref )

Returns XHTML-compliant meta tags as a scalar, suitable for inserting into the head tagset of HTML or anywhere in an XML doc.

meta_hash_ref should be a hash ref of name/content pairs. Both name and content will be run through escapeXML for you, so do not escape them yourself or you run the risk of double-escaped text.

TESTING

Filters can be tested with the swish-filter-test program in the example/ directory. Run:

swish-filter-test -man

for documentation.

SUPPORT

Please contact the Swish-e discussion list. http://swish-e.org

AUTHOR

Bill Moseley

Currently maintained by Peter Karman perl@peknet.com.

COPYRIGHT

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.