NAME
Hailo - A pluggable Markov engine analogous to MegaHAL
SYNOPSIS
This is the synopsis for using Hailo as a module. See hailo for command-line invocation.
# Hailo requires Perl 5.10
use 5.010;
use strict;
use warnings;
use Hailo;
# Construct a new in-memory Hailo using the SQLite backend. See
# backend documentation for other options.
my $hailo = Hailo->new;
# Various ways to learn
my @train_this = qw< I like big butts and I can not lie >;
$hailo->learn(\@train_this);
$hailo->learn($_) for @train_this;
# Heavy-duty training interface. Backends may drop some safety
# features like journals or synchronous IO to train faster using
# this mode.
$hailo->learn("megahal.trn");
$hailo->learn($filehandle);
# Make the brain babble
say $hailo->reply("hello good sir.");
DESCRIPTION
Hailo is a fast and lightweight markov engine intended to replace AI::MegaHAL. It has a Mouse (or Moose) based core with pluggable storage and tokenizer backends.
It is similar to MegaHAL in functionality, the main differences (with the default backends) being better scalability, drastically less memory usage, an improved tokenizer, and tidier output.
With this distribution, you can create, modify, and query Hailo brains. To use Hailo in event-driven POE applications, you can use the POE::Component::Hailo wrapper. One example is POE::Component::IRC::Plugin::Hailo, which implements an IRC chat bot.
Etymology
Hailo is a portmanteau of HAL (as in MegaHAL) and failo.
Backends
Hailo supports pluggable storage and tokenizer backends, it also supports a pluggable UI backend which is used by the hailo command-line utility.
Storage
Hailo can currently store its data in either a SQLite, PostgreSQL or MySQL database, more backends were supported in earlier versions but they were removed as they had no redeeming quality.
SQLite is the primary target for Hailo. It's much faster and uses less resources than the other two. It's highly recommended that you use it.
This benchmark shows how the backends compare when training on the small testsuite dataset as reported by the utils/hailo-benchmark utility (found in the distribution):
Rate DBD::Pg DBD::mysql DBD::SQLite/file DBD::SQLite/memory
DBD::Pg 2.22/s -- -33% -49% -56%
DBD::mysql 3.33/s 50% -- -23% -33%
DBD::SQLite/file 4.35/s 96% 30% -- -13%
DBD::SQLite/memory 5.00/s 125% 50% 15% --
Under real-world workloads SQLite is much faster than these results indicate since the time it takes to train/reply is relative to the existing database size. Here's how long it took to train on a 214,710 line IRC log on a Linode 1080 with Hailo 0.18:
SQLite
real 8m38.285s user 8m30.831s sys 0m1.175s
MySQL
real 48m30.334s user 8m25.414s sys 4m38.175s
PostgreSQL
real 216m38.906s user 11m13.474s sys 4m35.509s
In the case of PostgreSQL it's actually much faster to first train with SQLite, dump that database and then import it with psql(1), see failo's README for how to do that.
However when replying with an existing database (using utils/hailo-benchmark-replies) yields different results. SQLite can reply really quickly without being warmed up (which is the typical usecase for chatbots) but once PostgreSQL and MySQL are warmed up they start replying faster:
Here's a comparison of doing 10 replies:
Rate PostgreSQL MySQL SQLite-file SQLite-file-28MB SQLite-memory
PostgreSQL 71.4/s -- -14% -14% -29% -50%
MySQL 83.3/s 17% -- 0% -17% -42%
SQLite-file 83.3/s 17% 0% -- -17% -42%
SQLite-file-28MB 100.0/s 40% 20% 20% -- -30%
SQLite-memory 143/s 100% 71% 71% 43% --
In this test MySQL uses around 28MB of memory (using Debian's my-small.cnf) and PostgreSQL around 34MB. Plain SQLite uses 2MB of cache but it's also tested with 28MB of cache as well as with the entire database in memory.
But doing 10,000 replies is very different:
Rate SQLite-file PostgreSQL SQLite-file-28MB MySQL SQLite-memory
SQLite-file 85.1/s -- -7% -18% -27% -38%
PostgreSQL 91.4/s 7% -- -12% -21% -33%
SQLite-file-28MB 103/s 21% 13% -- -11% -25%
MySQL 116/s 37% 27% 13% -- -15%
SQLite-memory 137/s 61% 50% 33% 18% --
Once MySQL gets more memory (using Debian's my-large.cnf) and a chance to warm it starts yielding better results (I couldn't find out how to make PostgreSQL take as much memory as it wanted):
Rate MySQL SQLite-memory
MySQL 121/s -- -12%
SQLite-memory 138/s 14% --
Tokenizer
By default Hailo will use the word tokenizer to split up input by whitespace, taking into account things like quotes, sentence terminators and more.
There's also a the character tokenizer. It's not generally useful for a conversation bot but can be used to e.g. generate new words given a list of existing words.
UPGRADING
Hailo makes no promises about brains generated with earlier versions being compatable with future version and due to the way Hailo works there's no practical way to make that promise.
If you're maintaining a Hailo brain that you want to keep using you should save the input you trained it on and re-train when you upgrade.
The reason for not offering a database schema upgrade for Hailo is twofold:
We're too lazy to maintain database upgrade scripts for every version.
Even if we weren't there's no way to do it right.
The reason it can't be done right is that Hailo is always going to destroy information present in the input you give it. How input tokens get split up and saved to the storage backend depends on the version of the tokenizer being used and how that input gets saved to the database.
For instance if an earlier version of Hailo tokenized "foo+bar"
simply as as "foo+bar"
but a later version split that up into "foo", "+", "bar"
an input of "foo+bar are my favorite metasyntactic variables"
wouldn't take into account the existing "foo+bar"
string in the database.
Just because of tokenizer changes carrying over brains like this would accumulate dead parts of the database & leave other parts in a state they wouldn't otherwise have gotten into. There have been similar changes to the database format itself.
ATTRIBUTES
brain_resource
The name of the resource (file name, database name) to use as storage. There is no default. Whether this gets used at all depends on the storage backend, currently only SQLite uses it.
save_on_exit
A boolean value indicating whether Hailo should save its state before its object gets destroyed. This defaults to true and will simply call save at DEMOLISH
time.
order
The Markov order (chain length) you want to use for an empty brain. The default is 2.
storage_class
The storage backend to use. Default: 'SQLite'.
tokenizer_class
The tokenizer to use. Default: 'Words';
ui_class
The UI to use. Default: 'ReadLine';
storage_args
tokenizer_args
ui_args
A HashRef
of arguments for storage/tokenizer/ui backends. See the documentation for the backends for what sort of arguments they accept.
METHODS
new
This is the constructor. It accepts the attributes specified in "ATTRIBUTES".
learn
Takes a string or an array reference of strings and learns from them.
train
Takes a filename, filehandle or array reference and learns from all its lines. If a filename is passed, the file is assumed to be UTF-8 encoded. Unlike learn
, this method sacrifices some safety (disables the database journal, fsyncs, etc) for speed while learning.
reply
Takes an optional line of text and generates a reply that might be relevant.
learn_reply
Takes a string argument, learns from it, and generates a reply that might be relevant. This is equivalent to calling learn followed by reply.
save
Tells the underlying storage backend to save its state, any arguments to this method will be passed as-is to the backend.
stats
Takes no arguments. Returns the number of tokens, expressions, previous token links and next token links.
PRIVATE METHODS
run
Run Hailo in accordance with the the attributes that were passed to it, this method is called by the hailo command-line utility and the Hailo test suite, it's behavior is subject to change.
SUPPORT
You can join the IRC channel #hailo on FreeNode if you have questions.
BUGS
Bugs, feature requests and other issues are tracked in Hailo's issue tracker on Github.
SEE ALSO
Hailo::UI::Web - A Catalyst and jQuery powered web interface to Hailo
POE::Component::Hailo - A non-blocking POE wrapper around Hailo
POE::Component::IRC::Plugin::Hailo - A Hailo IRC bot plugin
http://github.com/hinrik/failo - Failo, an IRC bot that uses Hailo
LINKS
http://bit.ly/hailo_rewrite_of_megahal - Hailo: A Perl rewrite of MegaHAL, A blog posting about the motivation behind Hailo
AUTHORS
Hinrik Örn Sigurðsson, hinrik.sig@gmail.com
Ævar Arnfjörð Bjarmason <avar@cpan.org>
LICENSE AND COPYRIGHT
Copyright 2010 Hinrik Örn Sigurðsson and Ævar Arnfjörð Bjarmason <avar@cpan.org>
This program is free software, you can redistribute it and/or modify it under the same terms as Perl itself.