NAME
SWISH::3 - Perl interface to libswish3
SYNOPSIS
use SWISH::3;
my $swish3 = SWISH::3->new(
config => 'path/to/config.xml',
handler => \&my_handler,
regex => qr/\w+(?:'\w+)*/,
);
$swish3->parse( 'path/to/file.xml' )
or die "failed to parse file: " . $swish3->error;
printf "libxml2 version %s\n", $swish3->xml2_version;
printf "libswish3 version %s\n", $swish3->version;
DESCRIPTION
SWISH::3 is a Perl interface to the libswish3 C library.
CONSTANTS
All the SWISH_*
constants defined in libswish3.h are available and can be optionally imported with the :constants keyword.
use SWISH::3 qw(:constants);
See the SWISH::3::Constants section below.
In addition, the SWISH::3 Perl class defines some Perl-only constants:
- SWISH_DOC_FIELDS
-
An array of method names that can be called on a SWISH::3::Doc object in your handler method.
- SWISH_TOKEN_FIELDS
-
An array of method names that can be called on a SWISH::3::Token object.
- SWISH_DOC_FIELDS_MAP
-
A hashref of method names to id integer values. The integer values are assigned in libswish3.h.
- SWISH_DOC_PROP_MAP
-
A hashref of built-in property names to docinfo attribute names. The values of SWISH_DOC_PROP_MAP are the keys of SWISH_DOC_FIELDS_MAP.
FUNCTIONS
default_handler
The handler used if you do not specify one. By default is simply prints the contents of SWISH::3::Data to stderr.
CLASS METHODS
xml2_version
Returns the libxml2 version used by libswish3.
version
Returns the libswish3 version.
refcount( object )
Returns the Perl reference count for object.
OBJECT METHODS
new( args )
args should be an array of key/value pairs. See SYNOPSIS.
Returns a new SWISH::3 instance.
slurp( filename )
Returns the contents of filename as a scalar string.
parse( filename_or_filehandle_or_string )
Wrapper around parse_file(), parse_buffer() and parse_fh() that tries to Do the Right Thing.
parse_file( filename )
Calls the C function of the same name on filename.
parse_buffer( str )
Calls the C function of the same name on str. Note that str should contain the API headers.
parse_fh( filehandle )
Not yet implemented.
set_config( swish_3_config )
Set the Config object.
get_config
Returns SWISH::3::Config object.
config
Alias for get_config().
set_analyzer( swish_3_analyzer )
Set the Analyzer object.
get_analyzer
Returns SWISH::3::Analyzer object.
analyzer
Alias for get_analyzer()
set_parser( swish_3_parser )
Set the Parser object.
get_parser
Returns SWISH::3::Parser object.
parser
Alias for get_parser().
set_handler( \&handler )
Set the parser handler CODE ref.
get_handler
Returns a CODE ref for the handler.
set_data_class( class_name )
Default class_name is SWISH::3::Data
.
get_data_class
Returns class name.
set_parser_class( class_name )
Default class_name is SWISH::3::Parser
.
get_parser_class
Returns class name.
set_config_class( class_name )
Default class_name is SWISH::3::Config
.
get_config_class
Returns class name.
set_analyzer_class( class_name )
Default class_name is SWISH::3::Analyzer
.
get_analyzer_class
Returns class name.
set_regex( qr/\w+(?:'\w+)*/ )
Set the regex used in tokenize().
get_regex
Returns the regex used in tokenize().
regex
Alias for get_regex().
get_stash
Returns the SWISH::3::Stash object used internally by the SWISH::3 object. You typically do not need to access this object as a user of SWISH::3, but if you are developing code that needs to access objects within a handler function, you can put it in the Stash object and then retrieve it later.
Example:
my $s3 = SWISH::3->new( handler => \&handler );
my $stash = $s3->get_stash();
$stash->set('my_indexer' => $indexer);
# later..
sub handler {
my $data = shift;
my $indexer = $data->s3->get_stash->get('my_indexer');
$indexer->add_doc( $data );
}
tokenize( string [, metaname, context ] )
Returns a SWISH::3::TokenIterator object representing string. The tokenizer uses the regex defined in set_regex().
tokenize_native( string [, metaname, context ] )
Returns a SWISH::3::TokenIterator object representing string. The tokenizer uses the built-in libswish3 tokenizer, not a regex.
DEVELOPER METHODS
ref_cnt
Returns the internal reference count for the underlying C struct pointer.
debug([n])
Get/set the internal debugging level.
describe( object )
Like calling Devel::Peek::Dump on object.
mem_debug
Calls the C function swish_memcount_debug().
get_memcount
Returns the global C malloc counter value.
dump
A wrapper around Devel::Peek::Dump() and Data::Dump::dump().
SWISH::3::Analyzer
new( swish_3_config )
Returns a new SWISH::3::Analyzer instance.
set_regex( qr/\w+/ )
Set the regex used in SWISH::3->tokenize().
get_regex
Returns a qr// regex object.
get_tokenize
Get the tokenize flag. Default is true.
set_tokenize( 0|1 )
Toggle the tokenize flag. Default is true (tokenize contents when file is parsed).
SWISH::3::Config
set_default
set_properties
get_properties
set_metanames
get_metanames
set_mimes
get_mimes
set_parsers
get_parsers
set_aliases
get_aliases
set_index
get_index
set_misc
get_misc
debug
add(file_or_xml)
An alias for add() is merge().
delete
delete() is NOT YET IMPLEMENTED.
read( filename )
write( filename )
SWISH::3::Data
s3
Get the parent SWISH::3 object.
config
Get the parent SWISH::3::Config object.
property( name )
Returns the string value of PropertyName name.
metaname( name )
Returns the string value of MetaName name.
properties
Returns a hashref of name/value pairs.
metanames
Returns a hashref of name/value pairs.
doc
Returns a SWISH::3::Doc object.
tokens
Returns a SWISH::3::TokenIterator object.
SWISH::3::Doc
mtime
Returns the last modified time as epoch int.
size
Returns the size in bytes.
nwords
Returns the number of tokenized words in the Doc.
encoding
Returns the string encoding of Doc.
uri
Returns the URI value.
ext
Returns the file extension.
mime
Returns the mime type.
parser
Returns the name of the parser used (TXT, HTML, or XML).
action
Returns the intended action (e.g., add, delete, update).
SWISH::3::MetaName
new( name )
Returns a new SWISH::3::MetaName instance.
TODO: there are no set methods so this isn't of much use.
id
Returrns the id integer.
name
Returns the name string.
bias
Returns the bias integer.
alias_for
Returns the alias_for string.
SWISH::3::MetaNameHash
get( name )
Get the SWISH::3::MetaName object for name
set( name, swish_3_metaname )
Set the SWISH::3::MetaName for name.
keys
Returns array of names.
SWISH::3::Property
id
Returns the id integer.
name
Returns the name string.
ignore_case
Returns the ignore_case boolean.
type
Returns the type integer.
verbatim
Returns the verbatim boolean.
max
Returns the max integer.
sort
Returns the sort boolean.
alias_for
Returns the alias_for string.
SWISH::3::PropertyHash
get( name )
Get the SWISH::3::Property object for name
set( name, swish_3_property )
Set the SWISH::3::Property for name.
keys
Returns array of names.
SWISH::3::Stash
get( key )
set( key, value )
keys
values
SWISH::3::Token
value
Returns the value string.
meta
Returns the SWISH::3::MetaName object for the Token.
meta_id
Returns the id integer for the related MetaName.
context
Returns the context string.
pos
Returns the position integer.
len
Returns the length in bytes of the Token.
SWISH::3::TokenIterator
next
Returns the next SWISH::3::Token.
SWISH::3::xml2Hash
get( key )
set( key, value )
keys
SWISH::3::Constants
The following constants are imported directly from libswish3 and are defined there.
- SWISH_ALIAS
- SWISH_BODY_TAG
- SWISH_BUFFER_CHUNK_SIZE
- SWISH_CASCADE_META_CONTEXT
- SWISH_CLASS_ATTRIBUTES
- SWISH_CONTRACTIONS
- SWISH_DATE_FORMAT_STRING
- SWISH_DEFAULT_ENCODING
- SWISH_DEFAULT_METANAME
- SWISH_DEFAULT_MIME
- SWISH_DEFAULT_PARSER
- SWISH_DEFAULT_PARSER_TYPE
- SWISH_DEFAULT_VALUE
- SWISH_ENCODING_ERROR
- SWISH_ESTRAIER_FORMAT
- SWISH_EXT_SEP
- SWISH_FALSE
- SWISH_HEADER_FILE
- SWISH_HEADER_ROOT
- SWISH_INCLUDE_FILE
- SWISH_INDEX
- SWISH_INDEX_FILEFORMAT
- SWISH_INDEX_FILENAME
- SWISH_INDEX_FORMAT
- SWISH_INDEX_LOCALE
- SWISH_INDEX_STEMMER_LANG
- SWISH_INDEX_NAME
- SWISH_KINOSEARCH_FORMAT
- SWISH_LOCALE
- SWISH_MAXSTRLEN
- SWISH_MAX_FILE_LEN
- SWISH_MAX_HEADERS
- SWISH_MAX_SORT_STRING_LEN
- SWISH_MAX_WORD_LEN
- SWISH_META
- SWISH_MIME
- SWISH_MIN_WORD_LEN
- SWISH_PARSERS
- SWISH_PARSER_HTML
- SWISH_PARSER_TXT
- SWISH_PARSER_XML
- SWISH_PREFIX_MTIME
- SWISH_PREFIX_URL
- SWISH_PROP
- SWISH_PROP_DATE
- SWISH_PROP_DBFILE
- SWISH_PROP_DESCRIPTION
- SWISH_PROP_DOCID
- SWISH_PROP_DOCPATH
- SWISH_PROP_ENCODING
- SWISH_PROP_INT
- SWISH_PROP_MIME
- SWISH_PROP_MTIME
- SWISH_PROP_NWORDS
- SWISH_PROP_PARSER
- SWISH_PROP_RANK
- SWISH_PROP_RECCNT
- SWISH_PROP_SIZE
- SWISH_PROP_STRING
- SWISH_PROP_TITLE
- SWISH_RD_BUFFER_SIZE
- SWISH_SPECIAL_ARG
- SWISH_STACK_SIZE
- SWISH_SWISH_FORMAT
- SWISH_TITLE_METANAME
- SWISH_TITLE_TAG
- SWISH_TOKENIZE
- SWISH_TOKENPOS_BUMPER
- SWISH_TOKEN_LIST_SIZE
- SWISH_TRUE
- SWISH_URL_LENGTH
- SWISH_VERSION
- SWISH_WORDS
- SWISH_XAPIAN_FORMAT
AUTHOR
Peter Karman perl@peknet.com
COPYRIGHT
Copyright 2008 Peter Karman. This program is free software; you can redistribute it and/or modify under the same terms as Perl itself.
SEE ALSO
SWISH::Prog