NAME

Lingua::Deva - Convert between Latin and Devanagari Sanskrit text

SYNOPSIS

use v5.12.1;
use strict;
use utf8;
use charnames ':full';
use Lingua::Deva;

# Basic usage
my $d = Lingua::Deva->new();
say $d->to_latin('आसीद्राजा'); # prints 'āsīdrājā'
say $d->to_deva('Nalo nāma'); # prints 'नलो नाम'

# With configuration: strict, allow Danda, 'w' for 'v'
my %c = %Lingua::Deva::Maps::Consonants;
$d = Lingua::Deva->new(
    strict => 1,
    allow  => [ "\N{DEVANAGARI DANDA}" ],
    C      => do { $c{'w'} = delete $c{'v'}; \%c },
);
say $d->to_deva('ziwāya'); # 'zइवाय', warning for 'z'
say $d->to_latin('सर्वम्।'); # 'sarvam।', no warnings

DESCRIPTION

Facilities for converting Sanskrit in Latin transliteration to Devanagari and vice-versa. The principal interface is exposed through instances of the Lingua::Deva class. "Deva" is the name for the Devanagari (devanāgarī) script according to ISO 15924.

Using the module is as simple as creating a Lingua::Deva instance and calling to_deva() or to_latin() with appropriate string arguments.

my $d = Lingua::Deva->new();
say $d->to_latin('कामसूत्र');
say $d->to_deva('Kāmasūtra');

The default translation maps adhere to the IAST transliteration scheme, but it is easy to customize these mappings. This is done by copying and modifying a map from Lingua::Deva::Maps and passing it to the Lingua::Deva constructor.

# Copy and modify the consonants map
my %c = %Lingua::Deva::Maps::Consonants;
$c{"c\x{0327}"} = delete $c{"s\x{0301}"};

# Pass a reference to the modified map to the constructor
my $d = Lingua::Deva->new( C => \%c );

Behind the scenes, all translation is done via an intermediate object representation called "aksara" (Sanskrit akṣara). These objects are instances of Lingua::Deva::Aksara, which provides an interface to inspect and manipulate individual aksaras.

# Create an array of aksaras
my $a = $d->l_to_aksaras('Kāmasūtra');

# Print vowel in the fourth Aksara
say $a->[3]->vowel();

Having the intermediate Lingua::Deva::Aksara representation comes with a slight penalty in efficiency, but gives you the advantage of having aksara structure available for precise analysis and validation.

Methods

new()

Constructor. Takes optional arguments which are described below.

strict => 0 or 1

In strict mode warnings for invalid input are output. Invalid means either not a Devanagari token (eg. "q") or structurally ill-formed (eg. a Devanagari diacritic vowel following an independent vowel).

Off by default.
allow => [ ... ]

In strict mode, the allow array can be used to exempt certain characters from being flagged as invalid even though they normally would be.
C => { consonants map }
V => { independent vowels map }
D => { diacritic vowels map }
F => { finals map }

Translation maps in the direction Latin to Devanagari.
DC => { consonants map }
DV => { independent vowels map }
DD => { diacritic vowels map }
DF => { finals map }

Translation maps in the direction Devanagari to Latin.

The default maps are in Lingua::Deva::Maps. To customize, make a copy of an existing mapping hash and pass it to one of these parameters. Note that the map keys need to be in Unicode NFD form (see Unicode::Normalize).

l_to_tokens()

Converts a string of Latin characters into "tokens" and returns a reference to an array of tokens. A "token" is either a character sequence which may constitute a single Devanagari grapheme or a single non-Devanagari character. In the first sense, a token is simply any key in the translation maps.

my $t = $d->l_to_tokens("Bhārata\n");
# $t now refers to the array ['Bh','ā','r','a','t','a',"\n"]

The input string will be normalized (NFD). No chomping takes place. Upper case and lower case distinctions are preserved.

l_to_aksaras()

Converts its argument into "aksaras" and returns a reference to an array of aksaras (see Lingua::Deva::Aksara). The argument can be a Latin string, or a reference to an array of tokens.

my $a = $d->l_to_aksaras('hyaḥ');
is( ref($a->[0]), 'Lingua::Deva::Aksara', 'one aksara object' );
done_testing();

Input tokens which can not be part of an aksara are passed through untouched. This means that the resulting array can contain both aksara objects and separate tokens.

In strict mode warnings for invalid tokens are output.

d_to_aksaras()

Converts a Devanagari string into "aksaras" and returns a reference to an array of aksaras.

my $text = 'बुद्धः';
my $a = $d->d_to_aksaras($text);

my $o = $a->[1]->onset();
# $o now refers to the array ['d','dh']

Input tokens which can not be part of an aksara are passed through untouched. This means that the resulting array can contain both aksara objects and separate tokens.

In strict mode warnings for invalid tokens are output.

to_deva()

Converts a Latin string or an array of aksaras to a Devanagari string.

say $d->to_deva('Kāmasūtra');

# same as
my $a = $d->l_to_aksaras('Kāmasūtra');
say $d->to_deva($a);

Aksaras are assumed to be well-formed.

to_latin()

Converts a Devanagari string or an array of aksaras to an equivalent string in Latin transliteration.

Aksaras are assumed to be well-formed.

EXAMPLES

The synopsis gives the simplest usage patterns. Here are a few more.

To use "ring below" instead of "dot below" for syllabic r:

my %v = %Lingua::Deva::Maps::Vowels;
$v{"r\x{0325}"}         = delete $v{"r\x{0323}"};
$v{"r\x{0325}\x{0304}"} = delete $v{"r\x{0323}\x{0304}"};
my %d = %Lingua::Deva::Maps::Diacritics;
$d{"r\x{0325}"}         = delete $d{"r\x{0323}"};
$d{"r\x{0325}\x{0304}"} = delete $d{"r\x{0323}\x{0304}"};

my $d = Lingua::Deva->new( V => \%v, D => \%d );
say $d->to_deva('Kr̥ṣṇa');

Use the aksara objects to produce simple statistics.

# Count distinct rhymes in @aksaras
for my $a (grep { defined $_->get_rhyme() } @aksaras) {
    $rhymes{ join '', @{$a->get_rhyme()} }++;
}

# Print number of 'au' rhymes
say $rhymes{'au'};

The following script converts a Latin input file "in.txt" to Devanagari.

#!/usr/bin/env perl
use v5.12.1;
use strict;
use warnings;
use open ':encoding(UTF-8)';
use Lingua::Deva;

open my $in,  '<', 'in.txt'  or die;
open my $out, '>', 'out.txt' or die;

my $d = Lingua::Deva->new();
while (my $line = <$in>) {
    print $out $d->to_deva($line);
}

On a Unicode-capable terminal one-liners are also possible:

echo 'Himālaya' | perl -MLingua::Deva -e 'print Lingua::Deva->new()->to_deva(<>);'

DEPENDENCIES

There are no requirements apart from standard Perl modules.

Note that a modern, Unicode-capable version of Perl >= 5.12 is required.

AUTHOR

glts <676c7473@gmail.com>

BUGS

Report bugs to the author or at https://github.com/glts/Lingua-Deva

COPYRIGHT

This program is free software. You may copy or redistribute it under the same terms as Perl itself.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.12.1 or, at your option, any later version of Perl 5 you may have available.

To install Lingua::Deva, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Lingua::Deva

CPAN shell

perl -MCPAN -e shell
install Lingua::Deva

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)