NAME
Lingua::Awkwords - randomly generates outputs from a given pattern
SYNOPSIS
use feature qw(say);
use Lingua::Awkwords;
use Lingua::Awkwords::Subpattern;
# V is a pre-defined subpattern, ^ filters out aa from the list
# of two vowels that the two VV generate
my $la = Lingua::Awkwords->new( pattern => q{ [VV]^aa } );
say $la->render for 1..10;
# define our own C, V
Lingua::Awkwords::Subpattern->set_patterns(
C => [qw/j k l m n p s t w/],
V => [qw/a e i o u/],
);
# and a pattern somewhat suitable for Toki Pona...
$la->pattern(q{
[a/*2]
(CV*5)^ji^ti^wo^wu
(CV*2)^ji^ti^wo^wu
[CV/*2]^ji^ti^wo^wu
[n/*5]
});
say $la->render for 1..10;
DESCRIPTION
This is a Perl implementation of
http://akana.conlang.org/tools/awkwords/
though is not an exact replica of that parser;
http://akana.conlang.org/tools/awkwords/help.html
details the format that this code is based on. Briefly,
SYNTAX
- [] or ()
-
Denote a unit or group; they are identical except that
(a)
is equivalent to[a/]
--that is, it represents the possibility of generating the empty string in addition to any other terms supplied.Units can be nested recursively. There is an implicit unit at the top level of the pattern.
- /
-
Introduces a choice within a unit; without this
[Vx]
would generate whateverV
represents (a list of vowels by default) followed by the letterx
while[V/x]
by contrast generates only a vowel or the letterx
. - *
-
The asterisk followed by an integer in the range
1..INT_MAX
weights the current term of the alternation, if any. That is, while[a/]
generates each term with equal probability,[a/*2]
would generate the empty string at twice the probability of the lettera
. - ^
-
The caret introduces a filter that must follow a unit (there is an implicit unit at the top level of a pattern). An example would be
[VV]^aa
or the equivalentVV^aa
that (by default) generates two vowels, but replacesaa
with the empty string. More than one filter may be specified. - A-Z
-
Capital ASCII letters denote subpatterns; several of these are set by default. See Lingua::Awkwords::Subpattern for how to customize them.
V
for example is by default equivalent to the more verbose[a/i/u]
. - "
-
Use double quotes to denote a quoted string; this prevents other characters (besides
"
itself) from being interpreted as some non- string value. - anything-else
-
Anything else not otherwise accounted for above is treated as part of a string, so
["abc"/abc]
generates either the stringabc
or the stringabc
, as this is two ways of saying the same thing.
ATTRIBUTES
- pattern
-
Awkword pattern. Without this supplied any call to render will throw an exception.
- tree
-
Where the parse tree is stored.
FUNCTIONS
These can be called as Lingua::Awkwords::set_filter
or can be imported via
use Lingua::Awkwords qw(weights2str weights_from);
- percentize hashref
-
Modifies the values of the given hashref to be percentages of the sum of the values. Will croak if sum is 0. Use this to help compare weights_from different corpus.
- set_filter filter-value
-
Utility routine for use with walk. Returns a subroutine that sets the filter_with attribute to the given value.
$la->walk( Lingua::Awkwords::set_filter('X') );
- weights2str hash-reference
-
Constructs an awkwords choice string from a given hash-reference of values and weights, e.g.
use Lingua::Awkwords qw(weights2str weights_from); weights2str( ( weights_from("toki sin li toki pona") )[-1] )
will return a weight string of
a*1/i*4/k*2/l*1/n*2/o*3/p*1/s*1/t*2
that can then be used as a pattern for this module.
- weights_from string-or-filehandle
-
Parses the frequency of characters appearing in the input string or filehandle, and returns four hash references, first, mid, last and all which contain the character counts of the first letters of the "words" in the input, characters that appear in the middle, end, and a tally of all three of these positions together.
"words" is used in scare quotes because there is "no generally accepted and completely satisfactory definition of what constitutes a word" (Philip Durkin. "The Oxford Guide to Etymology". p.37) and because instead syllables could be fed in and then patterns generated using those syllable-specific weights.
METHODS
- new
-
Constructor. Typically this should be passed a pattern argument.
- parse_string pattern
-
Returns the parse tree of the given pattern without setting the tree attribute. "COMPLICATIONS" shows one use for this.
- render
-
Returns a string render of the awkword pattern. This may be the empty string if filters have removed all the text.
- walk callback
-
Provides a means to recurse through the parse tree, where every object in the tree will call the callback with
$self
as the sole argument, and then if necessary iterate through all of the possibilities contained by itself calling walk on each of those.
COMPLICATIONS
More complicated structures can be built by attaching parse trees to subpatterns. For example, Toki Pona could be extended to allow optional diphthongs (mostly in the second syllable) via
use feature qw(say);
use Lingua::Awkwords::Subpattern;
use Lingua::Awkwords;
my $cv = Lingua::Awkwords->parse_string(q{
CV^ji^ti^wo^wu
});
my $cvv = Lingua::Awkwords->parse_string(q{
CVV^ji^ti^wo^wu^aa^ee^ii^oo^uu
});
Lingua::Awkwords::Subpattern->set_patterns(
A => $cv,
B => $cvv,
C => [qw/j k l m n p s t w/],
V => [qw/a e i o u/],
);
my $tree = Lingua::Awkwords->new( pattern => q{
[ a[B/BA/BAA/A/AA/AAA] / [AB/ABA/ABAA/A/AA/AAA] ] [n/*5]
});
say join ' ', map { $tree->render } 1 .. 10;
The default filter of the empty string can be problematical, as one may not know whether a filter has been applied to the result, or the word may be filtered into an incorrect form. Consult the eg/
directory of this module's distribution for example code that customizes the filter value.
Code that makes use of non-ASCII encodings may need appropriate settings made, e.g. to use the locale for input and output and to allow UTF-8 in the program text.
use open IO => ':locale';
use utf8;
Lingua::Awkwords::Subpattern->set_patterns(
S => [qw/... UTF-8 data here .../],
);
BUGS
Reporting Bugs
Please report any bugs or feature requests to bug-lingua-awkwords at rt.cpan.org
, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Lingua-Awkwords.
Patches might best be applied towards:
https://github.com/thrig/Lingua-Awkwords
Known Issues
There are various incompatibilities with the original version of the code; these are detailed in the parser module as they concern how e.g. weights are parsed.
See also the "Known Issues" section in all the other modules in this distribution.
SEE ALSO
Lingua::Awkwords::ListOf, Lingua::Awkwords::OneOf, Lingua::Awkwords::Parser, Lingua::Awkwords::String, Lingua::Awkwords::Subpattern
AUTHOR
thrig - Jeremy Mates (cpan:JMATES) <jmates at cpan.org>
COPYRIGHT AND LICENSE
Copyright (C) 2017 by Jeremy Mates
This program is distributed under the (Revised) BSD License: http://www.opensource.org/licenses/BSD-3-Clause