NAME
Net::IDN::Encode - Internationalizing Domain Names in Applications (UTS #46)
SYNOPSIS
use Net::IDN::Encode ':all';
my $a = domain_to_ascii("müller.example.org");
my $e = email_to_ascii("POSTMASTER@例。テスト");
my $u = domain_to_unicode('EXAMPLE.XN--11B5BS3A9AJ6G');
NOTE
This developer version now implements UTS #46. The documentation has not been updated so far, beware!
DESCRIPTION
This module provides an easy-to-use interface for encoding and decoding Internationalized Domain Names (IDNs).
IDNs use characters drawn from a large repertoire (Unicode), but IDNA allows the non-ASCII characters to be represented using only the ASCII characters already allowed in so-called host names today (letter-digit-hypen, /[A-Z0-9-]/i
).
FUNCTIONS
By default, this module does not export any subroutines. You may use the :all
tag to import everything. You can also use regular expressions such as /^to_/
or /^email_/
to select some of the functions, see Exporter for details.
The following functions are available:
- to_ascii( $label [, AllowUnassigned => 0] [, UseSTD3ASCIIRules => 1 ] )
-
Converts a single label
$label
to ASCII. Will throw an exception on invalid input.This function takes the following optional parameters:
- AllowUnassigned
-
(boolean) If set to a false value, unassigned code points in the label are not allowed.
The default is determinated by
Net::IDN::Nameprep::nameprep
. - UseSTD3ASCIIRules
-
(boolean) If set to a true value, checks the label for compliance with STD 3 (RFC 1123) syntax for host name parts.
The default is false (unlike
domain_to_ascii
).
This function does not try to handle strings that consist of multiple labels (such as domain names).
This function implements the ToASCII operation from RFC 3490.
- to_unicode( $label [, AllowUnassigned => 0] [, UseSTD3ASCIIRules => 1 ] )
-
Converts a single label
$label
to Unicode. to_unicode never fails.This function takes the same optional parameters as
to_ascii
, with the same defaults.This function does not try to handle strings that consist of multiple labels (such as domain names).
This function implements the ToUnicode operation from RFC 3490.
- domain_to_ascii( $label [, AllowUnassigned => 0] [, UseSTD3ASCIIRules => 1 ] )
-
Converts all labels of the hostname
$domain
(with labels seperated by dots) to ASCII. Will throw an exception on invalid input.This function takes the following optional parameters:
- AllowUnassigned
-
(boolean) If set to a false value, unassigned code points in the label are not allowed.
The default determinated by
Net::IDN::Nameprep::nameprep
. - UseSTD3ASCIIRules
-
(boolean) If set to a true value, checks the label for compliance with STD 3 (RFC 1123) syntax for host name parts.
The default is true (unlike
to_ascii
).
The following characters are recognized as dots: U+002E (full stop), U+3002 (ideographic full stop), U+FF0E (fullwidth full stop), U+FF61 (halfwidth ideographic full stop).
- domain_to_unicode( $domain [, AllowUnassigned => 0] [, UseSTD3ASCIIRules => 1 ] )
-
Converts all labels of the hostname
$domain
(with labels seperated by dots) to Unicode. Any input is valid.This function takes the same optional parameters as
domain_to_ascii
, with the same defaults.The following characters are recognized as dots: U+002E (full stop), U+3002 (ideographic full stop), U+FF0E (fullwidth full stop), U+FF61 (halfwidth ideographic full stop).
- email_to_ascii( $email )
-
Converts the domain part (right hand side, separated by an at sign) of the RFC 2821/2822 email address to ASCII. May throw an exception on invalid input.
This function currently does not handle internationalization of the local-part (left hand side). This may change in future versions.
The follwing characters are recognized as at signs: U+0040 (commercial at), U+FF20 (fullwidth commercial at).
- email_to_unicode( $email )
-
Converts the domain part (right hand side, separated by an at sign) of the RFC 2821/2822 email address to Unicode. May throw an exception on invalid input.
This function currently does not handle internationalization of the local-part (left hand side). This may change in future versions.
The follwing characters are recognized as at signs: U+0040 (commercial at), U+FF20 (fullwidth commercial at).
AUTHOR
Claus Färber <CFAERBER@cpan.org>
LICENSE
Copyright 2007-2010 Claus Färber.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
SEE ALSO
Net::IDN::Nameprep, Net::IDN::Punycode, RFC 3490 (http://www.ietf.org/rfc/rfc3490.txt)