NAME
Festival::Client::Async - Non-blocking interface to a Festival server
SYNOPSIS
use Festival::Client::Async qw(parse_lisp);
my $fest = Festival::Client::Async->new($host, $port);
$fest->server_eval_sync($lisp, \%actions); # blocking
$fest->server_eval($lisp); # just queues $lisp for writing
if ($fest->write_pending) {
while (defined(my $b = $fest->write_more)) {
last if $b == 0;
}
}
while (defined(my $b = $fest->read_more)) {
last if $b == 0;
}
if ($fest->error_pending) {
# Oops
}
while ($fest->wave_pending) {
my $waveform_data = $fest->dequeue_wave;
# Do something with it
}
while ($fest->lisp_pending) {
my $lisp = $fest->dequeue_lisp
my $arr = parse_lisp($lisp);
# Do something with it
}
DESCRIPTION
This module provides Yet Another interface to a Festival speech synthesis server.
Why should you use this module instead of the already existing ones?
Non-blocking operation. This means that this module can interoperate with a Tk, Gtk, Event, or POE event loop without the need to fork a separate process.
More flexible interface for blocking operation. You can register separate callbacks to handle the results of Scheme evaluation, waveform data, and OK/error messages.
This module is simply a very thin wrapper around Festival's Scheme read-evaluate-print loop. If you don't know what that is, you may not want this module. If you do know what that is, this will allow you to use this module to do basically anything you could do at the actual Festival interpreter prompt. You are not limited to doing simple text-to-speech to the local audio device.
USAGE
new
-
my $fest = Festival::Client::Async->new($host, $port) or die "couldn't connect: $!";
The
$host
and$port
parameters are optional, and default to 'localhost' and 1314, respectively.If the connection to the server fails, this returns undef.
fh
-
$kernel->select_read($fest->fh, 'foo_state');
This method returns the actual filehandle for the socket connection to the server. You'll probably need it for non-blocking operation.
Now that you've connected to the server, what do you do with it? As mentioned above, this module is just a client for the Festival repl. Therefore, the answer to the question is 'evaluate S-expressions'. This is accomplished using the server_eval_sync
and server_eval
methods, described below.
A few S-expressions you might want to evaluate are:
(Parameter.set 'Wavefiletype 'raw) ;; send back raw wave data
(tts_textall "$text" nil) ;; text-to-speech and send the waveform
(SayText "$text") ;; text-to-speech to the local audio device
(voice_$foo) ;; switch to the $foo voice
The Festival server can send you back a few things:
- Results of evaluation
-
These are accessed via the
LP
callback for blocking operation, or thelisp_pending
/dequeue_lisp
methods in non-blocking operation. The server sends back a single result on a line by itself, followed by an empty line. (Note: I'm not sure if this is actually guaranteed anywhere in the Festival code, and you can probably get it rather confused if you embed newlines in strings and such. Just Don't Do That.)Festival::Client::Async
exports (or rather, can export - you will have to specify it explicitly in theuse
statement) one subroutine,parse_lisp
, which is a convenience function for de-lisp-ifying results sent back from Festival. It tries its best to create Perl data structures approximating the Lisp ones spat out by Festival. Symbols, numbers, and strings are all converted to scalars, while lists are turned into arrays. For example,((foo 123) ("bar" "baz") ())
will be parsed as:
[["foo" 123] ["bar" "baz"] []]
- Waveform data
-
These are accessed via the
WV
callback for blocking operation, or thewaveform_pending
/dequeue_wave
methods in non-blocking operation. They will be in whatever format is set in theWavefiletype
parameter in the Festival server. By default this is NIST audio files, which contain some metadata that will sound funny if you try to play it to an audio device.Unfortunately there is no way to find out what sample rate Festival is going to send at you except by examining the headers in the audio data it sends. The individual voices each have their own sampling rate, which is typically 16000Hz, but can vary. So, if this bothers you, you must either use a headered file format (NIST is textual and thus easy to parse), or if you use raw data, you must know ahead of time what rates your voices use, or build an utterance structure manually and resample it to the desired rate. Here's an example of a Scheme expression that will synthesize some text, resample it, and send the waveform to the client (substitute your text and sampling rate for $text and $sampling_rate, obviously):
(let ((utt (Utterance Text "$text"))) (begin (utt.synth utt) (utt.wave.resample utt $sampling_rate) (utt.send.wave.client utt)))
You might need to resample anyway if you're stuck with, say, an Intel on-board audio device that only does 48kHz. Resampling is expensive, don't do it if you can help it.
- Acknowledgements or errors
-
After sending the results of evaluation, Festival will send an OK. You can't capture this in the blocking mode, since it just gets translated into the return value of the
server_eval_sync
method. In non-blocking mode, this gets queued as a timestamp.If an error occurs, you may not get results of evaluatoin, and you certainly won't get an OK, but rather an error. Again, this gets translated to the return value in blocking mode, but is queued as a timestamp (since Festival doesn't actually send you any meaningful data with the error message).
There is no method to call to disconnect from the server. This will just happen automatically if the client object goes out of scope; you can also force it to occur by calling undef
on that variable.
Blocking Operation
$fest->server_eval_sync($sexp,
{
LP => sub {
my $lisp = shift;
# .. etc ...
},
WV => sub {
my $wave = shift;
# .. etc ...
}
}) or die "error from server";
The blocking mode of operation will evaluate a single S-expression on the server, and wait for it to complete. Access to the results of that expression is done by giving callbacks to the server_eval_sync
method. These callbacks will be called with the individual chunks of Lisp or waveform data as they come in.
There are no callbacks for acknowledgements or errors, since only one S-expression can be evaluated at a time. Instead, the method will return a true value for a successful evaluation, or undef
for failure (unfortunately there's no good way to find out exactly what failed, since the server won't tell you).
Non-blocking Operation
The following example is kind of long-winded ... bear with me.
use IO::Select;
my $s = IO::Select->new($fest->fh);
$fest->server_eval($sexp);
# In a real event loop, you'd want to make sure $fest->fh is being
# watched for ability to accept output at this point.
EVENT:
while (1) {
if ($s->can_write) {
my $b = $fest->write_more;
if ($b == 0) {
# In a real event loop, you'd want to stop or suspend
# watching $fest->fh for output at this point.
}
}
if ($s->can_read) {
my $b = $fest->read_more;
last EVENT if $b == 0;
if ($fest->waveform_pending) {
while (defined(my $wav = $fest->dequeue_wave)) {
# ... do something ...
}
}
if ($fest->lisp_pending) {
while (defined(my $lisp = $fest->dequeue_wave)) {
if ($lisp) {
# ... do something ...
} else {
# That was the end of one evaluation, if you're
# keeping track
}
}
}
if ($fest->ok_pending) {
while (defined(my $ok_time = $fest->dequeue_ok)) {
# ... do something ...
}
}
if ($fest->error_pending) {
while (defined(my $error_time = $fest->dequeue_error)) {
# ... do something ...
}
}
}
}
This mode of operation is meant to be used from a select
loop, or some more sophisticated event loop, such as the POE or Gtk ones, which has the ability to watch filehandles for activity.
To use it, you will want to add the filehandle for the Festival client object (obtained using the fh
method) to your set of filehandles being watched for input and output (urgent data is not used, so there is no need to watch it for exceptions). Then, you want to call the read_more
and write_more
methods in response to read and write ready events, respectively.
read_more
will return zero if the server closes the connection, or if it fails to read any data. It is guaranteed (I hope) that it will successfully read data if called in the manner described above.
write_more
will return zero if there is no more data to be written. At this point, you'll want to stop watching the filehandle for output until you have more data to write; otherwise you'll spin endlessly since the filehandle will continue to be ready for output, thus causing your select
call to be woken up.
To start evaluation, call the server_eval
method. This doesn't actually send anything to the server, but places an S-expression on the output queue.
After calling read_more
, you can check to see if there are any results available using the various foo_pending
methods, and get the available data using the various dequeue_foo
methods, as shown above (the example shows them all, I'm not going to repeat them here).
BUGS
The non-blocking mode does no tracking of which Lisp result corresponds to which expression; it's trivial to do this in a higher level (and in fact, POE::Component::Festival
does this), but arguably it should be in this module.
People might want convenience functions to shield them from all that nasty Lisp.
It's probably possible to confuse the protocol handling in here with things like Lisp results containing embedded newlines; the Festival client-to-server protocol is kind of wonky.
SEE ALSO
The Festival web site (http://www.cstr.ed.ac.uk/projects/festival/), the Festvox website (http://www.festvox.org/), the documentation included in the Festival distribution, IO::Select, and perl(1p).
AUTHOR
David Huggins-Daines <dhd@cepstral.com>