Dezi::Tutorial - getting started with the Dezi search platform
Install the Dezi server from CPAN:
% cpan -i Dezi
Install the Dezi client from CPAN:
% cpan -i Dezi::Client
Beginner - Hello World
Start the Dezi server:
% dezi
In a separate terminal, add a small test document to the index:
% echo '<doc><title>bar</title>hello world</doc>' > test.xml
% dezi-client test.xml
Search the index to confirm your test document worked:
% dezi-client -q bar
Intermediate - The Dezi Demo
The Intermediate tutorial details the specifics behind the Dezi demo available at
Download the Reuters corpus
The Reuters News Corpus for Text Classification (Reuters-21578) is a common document corpus used for information retrieval projects. Other document collections have become more popular since the Reuters corpus first appeared (e.g. Wikipedia database) but the Reuters corpus is a nice, medium sized collection for demonstrating Dezi.
You can find the corpus many places on the internet. The version used for the demo came from The
script at that URL will convert the original SGML documents to valid XML and split them into about 21k individual documents.
Unpack the tar.gz file somewhere and run the
script as described in the script's comments.
Create a Swish3 configuration file
As described in Dezi::Architecture, Dezi is based on Swish3 You can index the Reuters corpus with the swish3 command that comes with SWISH::Prog (one of the Dezi dependencies).
First, you'll need a configuration file. Here's the one used for the Dezi demo:
DefaultContents XML*
StoreDescription XML* <text> 10000
PropertyNameAlias swishtitle title
MetaNames dates topics people places orgs author swishdocpath
PropertyNames dates topics people places orgs author dateline
FuzzyIndexingMode Stemming_en1
Save the file as swish.conf
More details on Swish3 configuration can be found at
Index the XML
If your Reuters docs are in a directory called reuters
, you can create an index with a command like:
% swish3 -c swish.conf -F lucy -f dezi.index -i reuters
You can index all kinds of document types, not just XML, but for the purposes of this tutorial, we'll keep it simple.
Create a Dezi configuration file
Here's the contents of the demo config file, named
engine_config => {
facets => {
names => [qw( topics people places orgs author )]
ui_class => 'Dezi::UI',
base_uri => '',
username => 'deziuser',
password => 'a-secret',
NOTE that the username/password is there to prevent unwanted modification of the index. Since Dezi supports POST, PUT and DELETE HTTP actions on an index, it's a good idea to protect an index, particularly if it is on the open internet.
NOTE too the Dezi::UI
class is enabled. That requires a separate installation from CPAN.
% cpan -i Dezi::UI
Start the Dezi server
% dezi --dezi-config
From a separate terminal, you can search the index containing text from the Reuters corpus:
% dezi-client -q 'some words'
Thanks to the Dezi::UI module, you can also search via a web browser. Assuming you are running the demo on a local machine, you can point your browser at http://localhost:5000/ui and explore the index contents graphically.
Advanced - Roll Your Own
Write your own client application
% cat
#!/usr/bin/env perl
use strict;
use warnings;
use Dezi::Client;
use File::Find;
my $client = Dezi::Client->new(
server => 'http://localhost:5000'
wanted => \&add_to_index,
follow => 1,
no_chdir => 1,
}, @ARGV);
my $resp = $client->commit();
print $resp->content;
sub add_to_index {
my $file = $File::Find::name;
# we only want .xml files
return unless $file =~ m/\.xml$/;
my $resp = $client->index($file);
if (!$resp->is_success) {
die "Failed to index $file: " . $resp->status_line;
Start your Dezi server
% dezi
Run your indexer
In a separate terminal:
% perl path/to/xml/docs
Search with dezi-client
After you're done indexing, look for something:
% dezi-client -q foo
Peter Karman, <karman at>
Please report any bugs or feature requests to bug-dezi at
, or through the web interface at I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
You can find this documentation with the perldoc command.
perldoc Dezi::Tutorial
You can also look for information at:
#dezisearch at freenode
Mailing list
RT: CPAN's request tracker
AnnoCPAN: Annotated CPAN documentation
CPAN Ratings
Search CPAN
Copyright 2011 Peter Karman.
This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.
See for more information.
Dezi::Client, Search::OpenSearch, SWISH::3, SWISH::Prog::Lucy, Plack, Lucy