NAME
ElasticSearch - An API for communicating with ElasticSearch
VERSION
Version 0.01 - this is an alpha release
DESCRIPTION
ElasticSearch is an Open Source (Apache 2 license), distributed, RESTful Search Engine based on Lucene, and built for the cloud, with a JSON API.
Check out its features: http://www.elasticsearch.com/products/elasticsearch/
This module is a thin API which makes it easy to communicate with an ElasticSearch cluster.
It maintains a list of all servers/nodes in the ElasticSearch cluster, and spreads the load randomly across these nodes. If the current active node disappears, then it attempts to connect to another node in the list.
Forking a process triggers a server list refresh, and a new connection to a randomly chosen node in the list.
SYNOPSIS
use ElasticSearch;
my $e = ElasticSearch->new( servers => 'search.foo.com', debug => 1 );
$e->index(
index => 'twitter',
type => 'tweet',
id => 1,
data => {
user => 'kimchy',
postDate => '2009-11-15T14:12:12',
message => 'trying out Elastic Search'
}
);
$data = $e->get(
index => 'twitter',
type => 'tweet',
id => 1
);
$results = $e->search(
index => 'twitter',
type => 'tweet',
query => {
term => { user => 'kimchy' },
}
);
GETTING ElasticSearch
You can download the latest release from http://www.elasticsearch.com/download/, or to build from source on Unix:
cd ~
git clone git://github.com/elasticsearch/elasticsearch.git
cd elasticsearch
./gradlew clean devRelease
cd /path/where/you/want/elasticsearch
unzip ~/elasticsearch/distributions/elasticsearch*
To start a test server in the foreground:
./bin/elasticsearch -f
You can start multiple servers by repeating this command - they will autodiscover each other.
More instructions are available here: http://www.elasticsearch.com/docs/elasticsearch/setup/installation
CALLING CONVENTIONS
I've tried to follow the same terminology as used in the ElasticSearch docs when naming methods, so it should be easy to tie the two together.
Some methods require a specific index
and a specific type
, while others allow a list of indices or types, or allow you to specify all indices or types. I distinguish between them as follows:
$e->method( index => multi, type => single, ...)
multi
values can be:
index => 'twitter' # specific index
index => ['twitter','user'] # list of indices
index => undef # (or not specified) = all indices
single
values must be a scalar, and are required parameters
type => 'tweet'
RETURN VALUES AND EXCEPTIONS
Methods that query the ElasticSearch cluster return the raw data structure that the cluster returns. This may change in the future, but as these data structures are still in flux, I thought it safer not to try to interpret.
Anything that is know to be an error throws an exception, eg trying to delete a non-existent index.
METHODS
Creating a new ElasticSearch instance
new()
-
$e = ElasticSearch->new( servers => '127.0.0.1:9200' # single server | ['es1.foo.com:9200', 'es2.foo.com:9200'], # multiple servers debug => 1 | 0, ua_options => { LWP::UserAgent options}, );
servers
is a required parameter and can be either a single server or an ARRAY ref with a list of servers. These servers are used to retrieve a list of all servers in the cluster, after which one is chosen at random to be the current_server.See also: debug, ua_options, refresh_servers, servers, current_server
Document-indexing methods
index()
-
$result = $e->index( index => single, type => single, id => $document_id, # optional, otherwise auto-generated data => { key => value, ... }, timeout => eg '1m' or '10s' # optional create => 1 |0 # optional );
eg:
$result = $e->index( index => 'twitter', type => 'tweet', id => 1, data => { user => 'kimchy', postDate => '2009-11-15T14:12:12', message => 'trying out Elastic Search' }, );
Used to add a document to a specific
index
as a specifictype
with a specificid
. If theindex/type/id
combination already exists, then that document is updated, otherwise it is created.Note:
If the
id
is not specified, then ElasticSearch autogenerates a unique ID and a new document is always created.If
create
istrue
, then a new document is created, even if the sameindex/type/id
combination already exists!create
can be used to slightly increase performance when creating documents that are known not to exists in the index.
See also: http://www.elasticsearch.com/docs/elasticsearch/json_api/index and create_mapping
set()
-
set()
is a synonym for index create()
-
create
is a synonym for index but setscreate
totrue
get()
-
$result = $e->get( index => single, type => single, id => single, );
Returns the document stored at
index/type/id
or throws an exception if the document doesn't exist.Example:
$e->get( index => 'twitter', type => 'tweet', id => 1) Returns: { _id => 1, _index => "twitter", _source => { message => "trying out Elastic Search", postDate => "2009-11-15T14:12:12", user => "kimchy", }, _type => "tweet", }
See also: "KNOWN ISSUES", http://www.elasticsearch.com/docs/elasticsearch/json_api/get
delete()
-
$result = $e->delete( index => single, type => single, id => single, );
Deletes the document stored at
index/type/id
or throws an exception if the document doesn't exist.Example:
$e->delete( index => 'twitter', type => 'tweet', id => 1);
See also: http://www.elasticsearch.com/docs/elasticsearch/json_api/delete
delete_by_query()
-
$result = $e->delete_by_query( index => multi, type => multi, query => {...} );
Deletes any documents matching the query. Documents can be matched against multiple indices and multiple types, eg
$result = $e->delete_by_query( index => undef, # all type => ['user','tweet'], query => { term => {user => 'kimchy' }} );
See also search, http://www.elasticsearch.com/docs/elasticsearch/json_api/delete_by_query
count()
-
$result = $e->count( index => multi, type => multi, query => {...} );
Counts the number of documents matching the query. Documents can be matched against multiple indices and multiple types, eg
$result = $e->count( index => undef, # all type => ['user','tweet'], query => { term => {user => 'kimchy' }} );
See also search, http://www.elasticsearch.com/docs/elasticsearch/json_api/count
search()
-
$result = $e->search( index => multi, type => multi, query => {...} );
Searches for all documents matching the query. Documents can be matched against multiple indices and multiple types, eg:
$result = $e->search( index => undef, # all type => ['user','tweet'], query => { term => {user => 'kimchy' }} );
For all of the options that can be included in the
query
parameter, see http://www.elasticsearch.com/docs/elasticsearch/json_api/search
Index Admin methods
index_status()
-
$result = $e->index_status( index => multi, );
Returns the status of $result = $e->index_status(); #all $result = $e->index_status( index => ['twitter','buzz'] ); $result = $e->index_status( index => 'twitter' );
See http://www.elasticsearch.com/docs/elasticsearch/json_api/admin/indices/status
create_index()
-
$result = $e->create_index( index => single, defn => {...} # optional );
Creates a new index, optionally setting certain paramters, eg:
$result = $e->create_index( index => 'twitter', defn => { numberOfShards => 3, numberOfReplicas => 2, } );
Throws an exception if the index already exists.
See http://www.elasticsearch.com/docs/elasticsearch/json_api/admin/indices/create_index
delete_index()
-
$result = $e->delete_index( index => single );
Deletes an existing index, or throws an exception if the index doesn't exist, eg:
$result = $e->delete_index( index => 'twitter' );
See http://www.elasticsearch.com/docs/elasticsearch/json_api/admin/indices/delete_index
flush_index()
-
$result = $e->flush_index( index => multi );
Flushes one or more indices. The flush process of an index basically frees memory from the index by flushing data to the index storage and clearing the internal transaction log. By default, ElasticSearch uses memory heuristics in order to automatically trigger flush operations as required in order to clear memory.
Example:
$result = $e->flush_index( index => 'twitter' );
See http://www.elasticsearch.com/docs/elasticsearch/json_api/admin/indices/flush
refresh_index()
-
$result = $e->refresh_index( index => multi );
Explicitly refreshes one or more indices, making all operations performed since the last refresh available for search. The (near) real-time capabilities depends on the index engine used. For example, the robin one requires refresh to be called, but by default a refresh is scheduled periodically.
Example:
$result = $e->refresh_index( index => 'twitter' );
See http://www.elasticsearch.com/docs/elasticsearch/json_api/admin/indices/refresh
gateway_snapshot()
-
$result = $e->gateway_snapshot( index => multi );
Explicitly performs a snapshot through the gateway of one or more indices (backs them up ). By default, each index gateway periodically snapshot changes, though it can be disabled and be controlled completely through this API.
Example:
$result = $e->gateway_snapshot( index => 'twitter' );
See http://www.elasticsearch.com/docs/elasticsearch/json_api/admin/indices/gateway_snapshot and http://www.elasticsearch.com/docs/elasticsearch/modules/gateway
snapshot_index()
-
snapshot_index()
is a synonym for gateway_snapshot create_mapping()
-
$result = $e->create_mapping( index => multi, type => single, properties => { ... } # required );
A
mapping
is the data definition of atype
. If no mapping has been specified, then ElasticSearch tries to infer the types of each field in document, by looking at its contents, eg'foo' => string 123 => integer 1.23 => float
However, these heuristics can be confused, so it safer (and much more powerful) to specify an official
mapping
instead, eg:$result = $e->create_mapping( index => ['twitter','buzz'], type => 'tweet', properties => { user => {type => "string", index => "not_analyzed"}, message => {type => "string", nullValue => "na"}, postDate => {type => "date"}, priority => {type => "integer"}, rank => {type => "float"} } );
See also: http://www.elasticsearch.com/docs/elasticsearch/json_api/admin/indices/create_mapping and http://www.elasticsearch.com/docs/elasticsearch/mapping
Cluster admin methods
cluster_state()
-
$result = $e->cluster_state();
Returns cluster state information.
See http://www.elasticsearch.com/docs/elasticsearch/json_api/admin/cluster/state/
nodes()
-
$result = $e->nodes( nodes => multi, settings => 1 | 0 # optional );
Returns information about one or more nodes or servers in the cluster. If
settings
istrue
, then it includes the node settings information.See: http://www.elasticsearch.com/docs/elasticsearch/json_api/admin/cluster/nodes_info
Module-specific methods
servers()
-
$servers = $e->servers
Returns a list of the servers/nodes known to be in the cluster the last time that refresh_servers was called.
refresh_servers()
-
$e->refresh_servers( $server | [$server_1, ...$server_n]) $e->refresh_servers()
Tries to contact each server in the list to retrieve a list of servers/nodes currently in the cluster. If it succeeds, then it updates servers and randomly selects one server to be the current_server
If no servers are passed in, then it uses the list from servers (ie the last known good list) instead.
Throws an exception if no servers can be found.
refresh_server
is called from : current_server()
-
$current_server = $e->current_server()
Returns the current server for the current PID, or if none is set, then it tries to get a new current server by calling refresh_servers.
ua()
-
$ua = $e->ua
Returns the current LWP::UserAgent instance for the current PID. If there is none, then it creates a new instance, with any options specified in ua_options
Keep-alive
is used by default (via LWP::ConnCache). ua_options()
-
$ua_options = $e->ua({....})
Get/sets the current list of options to be used when creating a new
LWP::UserAgent
instance. You may, for instance, want to settimeout
This is best set when creating a new instance of ElasticSearch with new.
JSON()
-
$json_xs = $e->JSON
Returns the current JSON::XS object which is used to encode and decode all JSON when communicating with ElasticSearch.
If you need to change the JSON settings you can do (eg):
$e->JSON->utf8
It is probably better not to fiddle with this! ElasticSearch expects all data to be provided as Perl strings (not as UTF8 encoded byte strings) and returns all data from ElasticSearch as Perl strings.
request()
-
$result = $e->request({ method => 'GET|PUT|POST|DELETE', cmd => url, # eg '/twitter/tweet/123' data => $hash_ref # converted to JSON document })
The
request()
method is used to communicate with the ElasticSearch current_server. If any request fails with aCan't connect
error, thenrequest()
tries to refresh the server list, and repeats the request.Any other error will throw an exception.
throw()
-
$e->throw('ErrorType','ErrorMsg', {vars})
Throws an exception of
ref $e . '::Error::' . $error_type
, eg:$e->throw('Param', 'Missing required param', { params => $params})
... will thrown an error of class
ElasticSearch::Error::Param
.Any vars passed in will be available as
$error->{-vars}
.If debug is
true
, then$error->{-stacktrace}
will contain a stacktrace. debug()
-
$e->debug(1|0);
If
debug()
istrue
, then exceptions include a stack trace.
AUTHOR
Clinton Gormley, <drtech at cpan.org>
KNOWN ISSUES
- get
-
The
_source
key that is returned from a get contains the original JSON string that was used to index the document initially. ElasticSearch parses JSON more leniently than JSON::XS, so if invalid JSON is used to index the document (eg unquoted keys) then$e->get(....)
will fail with a JSON exception.Any documents indexed via this module will be not susceptible to this problem.
TODO
Currently there are no tests in this module, as testing would require a live ElasticSearch server - I plan to add these shortly.
BUGS
This is an alpha module, so there will be bugs, and the API is likely to change in the future, as the API of ElasticSearch itself changes.
If you have any suggestions for improvements, or find any bugs, please report them to bug-elasticsearch at rt.cpan.org
, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=ElasticSearch. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
SUPPORT
You can find documentation for this module with the perldoc command.
perldoc ElasticSearch
You can also look for information at:
RT: CPAN's request tracker
AnnoCPAN: Annotated CPAN documentation
CPAN Ratings
Search CPAN
ACKNOWLEDGEMENTS
Thanks to Shay Bannon, the ElasticSearch author, for producing an amazingly easy to use search engine.
LICENSE AND COPYRIGHT
Copyright 2010 Clinton Gormley.
This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.
See http://dev.perl.org/licenses/ for more information.