NAME
Search::Circa - a Search Engine / Indexer running with Mysql
DESCRIPTION
This is Search::Circa, a module who provide functions to perform search on Circa, a www search engine running with Mysql. Circa is for your Web site, or for a list of sites. It indexes like Altavista does. It can read, add and parse all url's found in a page. It add url and word to MySQL for use it at search.
Circa can be used for index 100 to 100 000 url
Notes:
Accents are removed on search and when indexed
Search are case unsensitive (mmmh what my english ? ;-)
Search::Circa::Search work with Search::Circa::Indexer result. Search::Circa::Search is a Perl interface, but it's exist on this package a PHP client too.
Search::Circa is root class for Search::Circa::Indexer and Search::Circa::Search.
SYNOPSIS
See Search::Circa::Search, Search::Circa::Indexer
FEATURES
Search Features
Boolean query language support : or (default) and ("+") not ("-"). Ex perl + faq -cgi : Documents with faq, eventually perl and not cgi.
Client Perl or PHP
Can browse site by directory / rubrique.
Search for different criteria: news, last modified date, language, URL / site.
Full text indexing
Different weights for title, keywords, description and rest of page HTML read can be given in configuration
Herite from features of LWP suite:
Support protocol HTTP://,FTP://, FILE:// (Can do indexation of filesystem without talk to Web Server)
Full support of standard robots exclusion (robots.txt). Identification with CircaIndexer/0.1, mail alian@alianwebserver.com. Delay requests to the same server for 8 secondes. "It's not a bug, it's a feature!" Basic rule for HTTP serveur load.
Support proxy HTTP.
Make index in MySQL
Read HTML and full text plain
Several kinds of indexing : full, incremental, only on a particular server.
Documents not updated are not reindexed.
All requests for a file are made first with a head http request, for information such as validate, last update, size, etc.Size of documents read can be restricted (Ex: don't get all documents > 5 MB). For use with low-bandwidth connections, or computers which do not have much memory.
HTML template can be easily customized for your needs.
Admin functions available by browser interface or command-line.
Index the different links found in a CGI (all after name_of_file?)
FREQUENTLY ASKED QUESTIONS
Q: Where are clients for example ?
A: See in demo directory. For command line, see circa_admin and circa_search,, for CGI, take a look in cgi-bin/circa, they are installed with make cgi.
Q: Where are global parameters to connect to Circa ?
A: Use lib/CircaConf.pm file
Q : What is an account for Circa ?
A: It's like a project, or a databse. A namespace for what you want.
Q : How I begin with indexer ?
A: See man page of circa_admin
Q : Did you succed to use Circa with mod_perl ?
A: Yes
Public interface
You use this method behind Search::Circa::Indexer and Search::Circa::Search object
- connect user, password, database, host
-
Connect Circa to MySQL. Return 1 on succes, 0 else
user : Utilisateur MySQL
password : Mot de passe MySQL
db : Database MySQL
bost : Adr IP du serveur MySQL
Connect Circa to MySQL. Return 1 on succes, 0 else
- close
-
Close connection to MySQL. This method is called with DESTROY method of this class.
- pre_tbl
-
Get or set the prefix for table name for use Circa with more than one time on a same database
- fill_template masque, ref_hash
-
masque : Path of template
vars : hash ref with keys/val to substitue
Give template with remplaced variables Ex:
$circa->fill_template('A <? $age ?> ans', ('age' => '12 ans'));
Will return:
J'ai 12 ans,
- fetch_first request
-
Execute request SQL on db and return first row. In list context, retun full row, else return just first column.
- trace level, msg
-
Print message msg on standart output error if debug level for script is upper than level.
- prompt message, default_value
-
Ask in STDIN for a parameter with message and default_value and return value
SEE ALSO
Search::Circa::Indexer, Indexer module
Search::Circa::Search, Searcher module
Search::Circa::Annuaire, Manage directory of Circa
Search::Circa::Url, Manage url of Circa
Search::Circa::Categorie, Manage categorie of Circa
VERSION
$Revision: 1.18 $
AUTHOR
Alain BARBET alian@alianwebserver.com