NAME

Lingua::Deva - Convert between Latin and Devanagari Sanskrit text

SYNOPSIS

use v5.12.1;
use strict;
use utf8;
use charnames ':full';
use Lingua::Deva;

# Basic usage
my $d = Lingua::Deva->new();
say $d->to_latin('आसीद्राजा'); # prints 'āsīdrājā'
say $d->to_deva('Nalo nāma'); # prints 'नलो नाम'

# With configuration: strict, allow Danda, 'w' for 'v'
my %c = %Lingua::Deva::Maps::Consonants;
$d = Lingua::Deva->new(
    strict => 1,
    allow  => [ "\N{DEVANAGARI DANDA}" ],
    C      => do { $c{'w'} = delete $c{'v'}; \%c },
);
say $d->to_deva('ziwāya'); # 'zइवाय', warning for 'z'
say $d->to_latin('सर्वम्।'); # 'sarwam।', no warnings

DESCRIPTION

The Lingua::Deva module provides facilities for converting Sanskrit in various Latin transliterations to Devanagari and vice-versa. "Deva" is the name for the Devanagari (devanāgarī) script according to ISO 15924.

The facilities of this module are exposed through a simple interface in the form of instances of the Lingua::Deva class. A number of configuration options can be passed to it during initialization.

Using the module is as simple as creating a Lingua::Deva instance and calling its methods to_deva() or to_latin() with appropriate string arguments.

my $d = Lingua::Deva->new();
say $d->to_latin('कामसूत्र');
say $d->to_deva('Kāmasūtra');

By default, transliteration follows the widely used IAST conventions. Three other ready-made transliteration schemes are also included with this module, ISO 15919 (ISO15919), Harvard-Kyoto (HK), and ITRANS.

my $d = Lingua::Deva->new(map => 'HK');
say $d->to_latin('कामसूत्र'); # prints 'kAmasUtra'

For additional flexibility all mappings can be completely customized; users can also provide their own.

use Lingua::Deva::Maps::ISO15919;
my %f = %Lingua::Deva::Maps::ISO15919::Finals;
my $d = Lingua::Deva->new(
    map           => 'IAST', # use IAST transliteration
    casesensitive => 1,      # do not case fold
    F             => \%f,    # ISO 15919 mappings for finals
);
say $d->to_deva('Vṛtraṁ'); # prints 'Vऋत्रं'

For more information on customization see Lingua::Deva::Maps.

Behind the scenes, all translation is done via an intermediate object representation called "Aksara" (Sanskrit akṣara). These objects are instances of Lingua::Deva::Aksara, which provides an interface to inspect and manipulate individual Aksaras.

# Create an array of Aksaras
my $a = $d->l_to_aksaras('Kāmasūtra');

# Print vowel in the fourth Aksara
say $a->[3]->vowel();

The methods and options of Lingua::Deva are described below.

Methods

new()

Constructor. Takes the following optional arguments.

map => 'IAST'|'ISO15919'|'HK'|'ITRANS'

Selects one of the ready-made transliteration schemes.

casesensitive => (0|1)

Determines whether case is treated as distinctive or not. Some schemes (eg. Harvard-Kyoto) set this to 1 while others (eg. IAST) set it to 0.

Default is 0.

strict => (0|1)

In strict mode invalid input is flagged with warnings. Invalid means either not a Devanagari token (eg. q) or structurally ill-formed (eg. a Devanagari diacritic vowel following an independent vowel).

Default is 0.

allow => [ ... ]

In strict mode, the allow array can be used to exempt certain characters from being flagged as invalid even though they normally would be.

avagraha => "'"

Specifies the Latin character used for the transcription of avagraha (ऽ).

Default is "'" (apostrophe).

C => { consonants map }
V => { independent vowels map }
D => { diacritic vowels map }
F => { finals map }

Transliteration maps in the direction from Latin to Devanagari script.

DC => { consonants map }
DV => { independent vowels map }
DD => { diacritic vowels map }
DF => { finals map }

Transliteration maps in the direction from Devanagari to Latin script. When these are not given, reversed versions of the Latin to Devanagari maps are used.

The default maps are in Lingua::Deva::Maps. To customize, make a copy of an existing mapping hash (or create your own) and pass it to one of these parameters.

l_to_tokens()

Converts a string of Latin characters into tokens and returns a reference to an array of tokens. A "token" is either a character sequence which may constitute a single Devanagari grapheme or a single non-Devanagari character. In the first sense, a token is simply any key in the transliteration maps.

my $t = $d->l_to_tokens("Bhārata\n");
# $t now refers to the array ['Bh','ā','r','a','t','a',"\n"]

The input string is normalized with Unicode::Normalize::NFD. No chomping takes place. Upper case and lower case distinctions are preserved.

l_to_aksaras()

Converts a Latin string (or a reference to an array of tokens) into Aksaras and returns a reference to an array of Aksaras.

my $a = $d->l_to_aksaras('hyaḥ');
is( ref($a->[0]), 'Lingua::Deva::Aksara', 'one aksara object' );
done_testing();

Input tokens which can not be part of an Aksara pass through untouched. Thus, the resulting array can contain both Lingua::Deva::Aksara objects and separate tokens.

In strict mode warnings for invalid tokens are output.

d_to_aksaras()

Converts a Devanagari string into Aksaras and returns a reference to an array of Aksaras.

my $aksaras = $d->d_to_aksaras('बुद्धः');
my $onset = $aksaras->[1]->onset();
is_deeply( $onset, ['d', 'dh'], 'onset of second aksara' );
done_testing();

Input tokens which can not be part of an Aksara pass through untouched. Thus, the resulting array can contain both Lingua::Deva::Aksara objects and separate tokens.

In strict mode warnings for invalid tokens are output.

to_deva()

Converts a Latin string (or a reference to an array of Aksaras) into Devanagari and returns a Devanagari string.

say $d->to_deva('Kāmasūtra');

# same as
my $a = $d->l_to_aksaras('Kāmasūtra');
say $d->to_deva($a);

Aksaras are assumed to be well-formed.

to_latin()

Converts a Devanagari string (or a reference to an array of Aksaras) into Latin transliteration and returns a Latin string.

Aksaras are assumed to be well-formed.

EXAMPLES

The synopsis gives the simplest usage patterns. Here are a few more.

Use default transliteration, but use "ring below" instead of "dot below" for syllabic r:

my %v = %Lingua::Deva::Maps::Vowels;
$v{"r\x{0325}"}         = delete $v{"r\x{0323}"};
$v{"r\x{0325}\x{0304}"} = delete $v{"r\x{0323}\x{0304}"};
my %d = %Lingua::Deva::Maps::Diacritics;
$d{"r\x{0325}"}         = delete $d{"r\x{0323}"};
$d{"r\x{0325}\x{0304}"} = delete $d{"r\x{0323}\x{0304}"};

my $d = Lingua::Deva->new( V => \%v, D => \%d );
say $d->to_deva('Kr̥ṣṇa');

Use the Aksara objects to produce simple statistics.

# Count distinct rhymes in @aksaras
my %rhymes;
for my $a (grep { defined $_->get_rhyme() } @aksaras) {
    $rhymes{ join '', @{$a->get_rhyme()} }++;
}

# Print number of 'au' rhymes
say $rhymes{'au'};

The following script reads Latin input from a file and writes the converted output into another file.

#!/usr/bin/env perl
use v5.12.1;
use strict;
use warnings;
use open ':encoding(UTF-8)';
use Lingua::Deva;

open my $in,  '<', 'in.txt'  or die;
open my $out, '>', 'out.txt' or die;

my $d = Lingua::Deva->new();
while (my $line = <$in>) {
    print $out $d->to_deva($line);
}

On a Unicode-capable terminal one-liners are also possible:

echo 'Himālaya' | perl -MLingua::Deva -e 'print Lingua::Deva->new()->to_deva(<>);'

DEPENDENCIES

There are no requirements apart from standard Perl modules, but a modern, Unicode-capable version of Perl >= 5.12 is required.

AUTHOR

glts <676c7473@gmail.com>

BUGS

Report bugs to the author or at https://github.com/glts/Lingua-Deva/issues.

COPYRIGHT

This program is free software. You may copy or redistribute it under the same terms as Perl itself.

Copyright (c) 2012 by glts <676c7473@gmail.com>

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.12.1 or, at your option, any later version of Perl 5 you may have available.