=head1 NAME
perldata - Perl data types
=head1 DESCRIPTION
=head2 Variable names
X<variable, name> X<variable name> X<data type> X<type>
Perl
has
three built-in data types: scalars, arrays of scalars, and
associative arrays of scalars, known as
"hashes"
. A
scalar
is a
single string (of any size, limited only by the available memory),
number, or a reference to something (which will be discussed
in L<perlref>). Normal arrays are ordered lists of scalars indexed
by number, starting
with
0. Hashes are unordered collections of
scalar
values
indexed by their associated string key.
Values are usually referred to by name, or through a named reference.
The first character of the name tells you to what
sort
of data
structure it refers. The rest of the name tells you the particular
value to which it refers. Usually this name is a single I<identifier>,
that is, a string beginning
with
a letter or underscore, and
containing letters, underscores, and digits. In some cases, it may
be a chain of identifiers, separated by C<::> (or by the slightly
archaic C<'>); all but the
last
are interpreted as names of packages,
to locate the namespace in which to look up the final identifier
(see L<perlmod/Packages>
for
details). For a more in-depth discussion
on identifiers, see L</Identifier parsing>. It's possible to
substitute
for
a simple identifier, an expression that produces a reference
to the value at runtime. This is described in more detail below
and in L<perlref>.
X<identifier>
Perl also
has
its own built-in variables whose names don't follow
these rules. They have strange names so they don't accidentally
collide
with
one of your normal variables. Strings that match
parenthesized parts of a regular expression are saved under names
containing only digits
after
the C<$> (see L<perlop> and L<perlre>).
In addition, several special variables that provide windows into
the inner working of Perl have names containing punctuation characters.
These are documented in L<perlvar>.
X<variable, built-in>
Scalar
values
are always named
with
'$'
, even
when
referring to a
scalar
that is part of an array or a hash. The
'$'
symbol works
semantically like the English word
"the"
in that it indicates a
single value is expected.
X<
scalar
>
$days
$days
[28]
$days
{
'Feb'
}
$#days
# the
last
index
of array
@days
Entire arrays (and slices of arrays and hashes) are denoted by
'@'
,
which works much as the word
"these"
or
"those"
does in English,
in that it indicates multiple
values
are expected.
X<array>
@days
@days
[3,4,5]
@days
{
'a'
,
'c'
}
Entire hashes are denoted by
'%'
:
X<hash>
%days
In addition, subroutines are named
with
an initial
'&'
, though this
is optional
when
unambiguous, just as the word
"do"
is often redundant
in English. Symbol table entries can be named
with
an initial
'*'
,
but you don't really care about that yet (
if
ever :-).
Every variable type
has
its own namespace, as
do
several
non-variable identifiers. This means that you can, without fear
of conflict,
use
the same name
for
a
scalar
variable, an array, or
a hash--or,
for
that matter,
for
a filehandle, a directory handle, a
subroutine name, a
format
name, or a label. This means that
$foo
and
@foo
are two different variables. It also means that C<
$foo
[1]>
is a part of
@foo
, not a part of
$foo
. This may seem a bit weird,
but that's okay, because it is weird.
X<namespace>
Because variable references always start
with
'$'
,
'@'
, or
'%'
, the
"reserved"
words aren't in fact reserved
with
respect to variable
names. They I<are> reserved
with
respect to labels and filehandles,
however, which don
't have an initial special character. You can'
t
have a filehandle named
"log"
,
for
instance. Hint: you could
say
C<
open
(LOG,
'logfile'
)> rather than C<
open
(
log
,
'logfile'
)>. Using
uppercase filehandles also improves readability and protects you
from conflict
with
future reserved words. Case I<is> significant--
"FOO"
,
"Foo"
, and
"foo"
are all different names. Names that start
with
a
letter or underscore may also contain digits and underscores.
X<identifier, case sensitivity>
X<case>
It is possible to replace such an alphanumeric name
with
an expression
that returns a reference to the appropriate type. For a description
of this, see L<perlref>.
Names that start
with
a digit may contain only more digits. Names
that
do
not start
with
a letter, underscore, digit or a caret are
limited to one character, e.g., C<$%> or
C<$$>. (Most of these one character names have a predefined
significance to Perl. For instance, C<$$> is the current process
id. And all such names are reserved
for
Perl's possible
use
.)
=head2 Identifier parsing
X<identifiers>
Up
until
Perl 5.18, the actual rules of what a valid identifier
was were a bit fuzzy. However, in general, anything
defined
here should
work on previous versions of Perl,
while
the opposite -- edge cases
that work in previous versions, but aren't
defined
here -- probably
won't work on newer versions.
As an important side note, please note that the following only applies
to bareword identifiers as found in Perl source code, not identifiers
introduced through symbolic references, which have much fewer
restrictions.
If working under the effect of the C<
use
utf8;> pragma, the following
rules apply:
/ (?[ ( \p{Word} & \p{XID_Start} ) + [_] ])
(?[ ( \p{Word} & \p{XID_Continue} ) ]) * /x
That is, a
"start"
character followed by any number of
"continue"
characters. Perl requires every character in an identifier to also
match C<\w> (this prevents some problematic cases); and Perl
additionally accepts identifier names beginning
with
an underscore.
If not under C<
use
utf8>, the source is treated as ASCII + 128 extra
generic characters, and identifiers should match
/ (?aa) (?!\d) \w+ /x
That is, any word character in the ASCII range, as long as the first
character is not a digit.
There are two
package
separators in Perl: A double colon (C<::>) and a single
quote (C<'>). Normal identifiers can start or end
with
a double colon, and
can contain several parts delimited by double colons.
Single quotes have similar rules, but
with
the exception that they are not
legal at the end of an identifier: That is, C<$
'foo> and C<$foo'
bar> are
legal, but C<
$foo
'bar'
> is not.
Additionally,
if
the identifier is preceded by a sigil --
that is,
if
the identifier is part of a variable name -- it
may optionally be enclosed in braces.
While you can mix double colons
with
singles quotes, the quotes must come
after
the colons: C<$::::
'foo> and C<$foo::'
bar> are legal, but C<$::'::foo>
and C<
$foo
'::bar> are not.
Put together, a grammar to match a basic identifier becomes
/
(?(DEFINE)
(?<variable>
(?
&sigil
)
(?:
(?
&normal_identifier
)
| \{ \s* (?
&normal_identifier
) \s* \}
)
)
(?<normal_identifier>
(?: :: )* '?
(?
&basic_identifier
)
(?: (?= (?: :: )+
'? | (?: :: )* '
) (?
&normal_identifier
) )?
(?: :: )*
)
(?<basic_identifier>
(?(?{ (
caller
(0))[8] &
$utf8::hint_bits
})
(?
&Perl_XIDS
) (?
&Perl_XIDC
)*
| (?aa) (?!\d) \w+
)
)
(?<sigil> [&*\$\@\%])
(?<Perl_XIDS> (?[ ( \p{Word} & \p{XID_Start} ) + [_] ]) )
(?<Perl_XIDC> (?[ \p{Word} & \p{XID_Continue} ]) )
)
/x
Meanwhile, special identifiers don't follow the above rules; For the most
part, all of the identifiers in this category have a special meaning
given
by Perl. Because they have special parsing rules, these generally can't be
fully-qualified. They come in six forms (but don't
use
forms 5 and 6):
=over
=item 1.
A sigil, followed solely by digits matching C<\p{POSIX_Digit}>, like
C<$0>, C<$1>, or C<$10000>.
=item 2.
A sigil followed by a single character matching the C<\p{POSIX_Punct}>
property, like C<$!> or C<%+>, except the character C<
"{"
> doesn't work.
=item 3.
A sigil, followed by a caret and any one of the characters
C<[][A-Z^_?\]>, like C<$^V> or C<$^]>.
=item 4.
Similar to the above, a sigil, followed by bareword text in braces,
where the first character is a caret. The
next
character is any one of
the characters C<[][A-Z^_?\]>, followed by ASCII word characters. An
example is C<${^GLOBAL_PHASE}>.
=item 5.
A sigil, followed by any single character in the range C<[\xA1-\xAC\xAE-\xFF]>
when
not under C<S<
"use utf8"
>>. (Under C<S<
"use utf8"
>>, the normal
identifier rules
given
earlier in this section apply.) Use of
non-graphic characters (the C1 controls, the NO-BREAK SPACE, and the
SOFT HYPHEN)
has
been disallowed since v5.26.0.
The
use
of the other characters is unwise, as these are all
reserved to have special meaning to Perl, and none of them currently
do
have special meaning, though this could change without notice.
Note that an implication of this form is that there are identifiers only
legal under C<S<
"use utf8"
>>, and vice-versa,
for
example the identifier
C<
$E
<233>tat> is legal under C<S<
"use utf8"
>>, but is otherwise
considered to be the single character variable C<
$E
<233>> followed by
the bareword C<
"tat"
>, the combination of which is a syntax error.
=item 6.
This is a combination of the previous two forms. It is valid only
when
not under S<C<
"use utf8"
>> (normal identifier rules apply
when
under
S<C<
"use utf8"
>>). The form is a sigil, followed by text in braces,
where the first character is any one of the characters in the range
C<[\x80-\xFF]> followed by ASCII word characters up to the trailing
brace.
The same caveats as the previous form apply: The non-graphic
characters are
no
longer allowed
with
S<
"use utf8"
>, it is unwise
to
use
this form at all, and utf8ness makes a big difference.
=back
Prior to Perl v5.24, non-graphical ASCII control characters were also
allowed in some situations; this had been deprecated since v5.20.
=head2 Context
X<context> X<
scalar
context> X<list context>
The interpretation of operations and
values
in Perl sometimes depends
on the requirements of the context
around
the operation or value.
There are two major contexts: list and
scalar
. Certain operations
return
list
values
in contexts wanting a list, and
scalar
values
otherwise. If this is true of an operation it will be mentioned in
the documentation
for
that operation. In other words, Perl overloads
certain operations based on whether the expected
return
value is
singular or plural. Some words in English work this way, like
"fish"
and
"sheep"
.
In a reciprocal fashion, an operation provides either a
scalar
or a
list context to
each
of its arguments. For example,
if
you
say
int
( <STDIN> )
the integer operation provides
scalar
context
for
the <>
operator, which responds by reading one line from STDIN and passing it
back to the integer operation, which will then find the integer value
of that line and
return
that. If, on the other hand, you
say
sort
( <STDIN> )
then the
sort
operation provides list context
for
<>, which
will proceed to
read
every line available up to the end of file, and
pass that list of lines back to the
sort
routine, which will then
sort
those lines and
return
them as a list to whatever the context
of the
sort
was.
Assignment is a little bit special in that it uses its left argument
to determine the context
for
the right argument. Assignment to a
scalar
evaluates the right-hand side in
scalar
context,
while
assignment to an array or hash evaluates the righthand side in list
context. Assignment to a list (or slice, which is just a list
anyway) also evaluates the right-hand side in list context.
When you
use
the C<
use
warnings> pragma or Perl's B<-w> command-line
option, you may see warnings
about useless uses of constants or functions in
"void context"
.
Void context just means the value
has
been discarded, such as a
statement containing only C<
"fred"
;> or C<
getpwuid
(0);>. It still
counts as
scalar
context
for
functions that care whether or not
they're being called in list context.
User-
defined
subroutines may choose to care whether they are being
called in a void,
scalar
, or list context. Most subroutines
do
not
need to bother, though. That's because both scalars and lists are
automatically interpolated into lists. See L<perlfunc/
wantarray
>
for
how you would dynamically discern your function's calling
context.
=head2 Scalar
values
X<
scalar
> X<number> X<string> X<reference>
All data in Perl is a
scalar
, an array of scalars, or a hash of
scalars. A
scalar
may contain one single value in any of three
different flavors: a number, a string, or a reference. In general,
conversion from one form to another is transparent. Although a
scalar
may not directly hold multiple
values
, it may contain a
reference to an array or hash which in turn contains multiple
values
.
Scalars aren
't necessarily one thing or another. There'
s
no
place
to declare a
scalar
variable to be of type
"string"
, type
"number"
,
type
"reference"
, or anything
else
. Because of the automatic
conversion of scalars, operations that
return
scalars don't need
to care (and in fact, cannot care) whether their
caller
is looking
for
a string, a number, or a reference. Perl is a contextually
polymorphic language whose scalars can be strings, numbers, or
references (which includes objects). Although strings and numbers
are considered pretty much the same thing
for
nearly all purposes,
references are strongly-typed, uncastable pointers
with
builtin
reference-counting and destructor invocation.
X<truth> X<falsehood> X<true> X<false> X<!> X<not> X<negation> X<0>
X<boolean> X<bool>
A
scalar
value is interpreted as FALSE in the Boolean sense
if
it is undefined, the null string or the number 0 (or its
string equivalent,
"0"
), and TRUE
if
it is anything
else
. The
Boolean context is just a special kind of
scalar
context where
no
conversion to a string or a number is ever performed.
Negation of a true value by C<!> or C<not> returns a special false value.
When evaluated as a string it is treated as C<
""
>, but as a number, it
is treated as 0. Most Perl operators
that
return
true or false behave this way.
There are actually two varieties of null strings (sometimes referred
to as
"empty"
strings), a
defined
one and an undefined one. The
defined
version is just a string of
length
zero, such as C<
""
>.
The undefined version is the value that indicates that there is
no
real value
for
something, such as
when
there was an error, or
at end of file, or
when
you refer to an uninitialized variable or
element of an array or hash. Although in early versions of Perl,
an undefined
scalar
could become
defined
when
first used in a
place expecting a
defined
value, this
no
longer happens except
for
rare cases of autovivification as explained in L<perlref>. You can
use
the
defined
() operator to determine whether a
scalar
value is
defined
(this
has
no
meaning on arrays or hashes), and the
undef
()
operator to produce an undefined value.
X<
defined
> X<undefined> X<
undef
> X<null> X<string, null>
To find out whether a
given
string is a valid non-zero number, it's
sometimes enough to test it against both numeric 0 and also lexical
"0"
(although this will cause noises
if
warnings are on). That's
because strings that aren't numbers count as 0, just as they
do
in B<awk>:
if
(
$str
== 0 &&
$str
ne
"0"
) {
warn
"That doesn't look like a number"
;
}
That method may be best because otherwise you won't treat IEEE
notations like C<NaN> or C<Infinity> properly. At other
times
, you
might prefer to determine whether string data can be used numerically
by calling the POSIX::strtod() function or by inspecting your string
with
a regular expression (as documented in L<perlre>).
warn
"has nondigits"
if
/\D/;
warn
"not a natural number"
unless
/^\d+$/;
warn
"not an integer"
unless
/^-?\d+$/;
warn
"not an integer"
unless
/^[+-]?\d+$/;
warn
"not a decimal number"
unless
/^-?\d+\.?\d*$/;
warn
"not a decimal number"
unless
/^-?(?:\d+(?:\.\d*)?|\.\d+)$/;
warn
"not a C float"
unless
/^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/;
The
length
of an array is a
scalar
value. You may find the
length
of array
@days
by evaluating C<
$#days
>, as in B<csh>. However, this
isn
't the length of the array; it'
s the subscript of the
last
element,
which is a different value since there is ordinarily a 0th element.
Assigning to C<
$#days
> actually changes the
length
of the array.
Shortening an array this way destroys intervening
values
. Lengthening
an array that was previously shortened does not recover
values
that were in those elements.
X<$
You can also gain some minuscule measure of efficiency by pre-extending
an array that is going to get big. You can also extend an array
by assigning to an element that is off the end of the array. You
can
truncate
an array down to nothing by assigning the null list
() to it. The following are equivalent:
@whatever
= ();
$#whatever
= -1;
If you evaluate an array in
scalar
context, it returns the
length
of the array. (Note that this is not true of lists, which
return
the
last
value, like the C comma operator, nor of built-in functions,
which
return
whatever they feel like returning.) The following is
always true:
X<array,
length
>
scalar
(
@whatever
) ==
$#whatever
+ 1;
Some programmers choose to
use
an explicit conversion so as to
leave nothing to doubt:
$element_count
=
scalar
(
@whatever
);
If you evaluate a hash in
scalar
context, it returns a false value
if
the hash is empty. If there are any key/value pairs, it returns a
true value. A more precise definition is version dependent.
Prior to Perl 5.25 the value returned was a string consisting of the
number of used buckets and the number of allocated buckets, separated
by a slash. This is pretty much useful only to find out whether
Perl's internal hashing algorithm is performing poorly on your data
set. For example, you stick 10,000 things in a hash, but evaluating
%HASH
in
scalar
context reveals C<
"1/16"
>, which means only one out
of sixteen buckets
has
been touched, and presumably contains all
10,000 of your items. This isn't supposed to happen.
As of Perl 5.25 the
return
was changed to be the count of
keys
in the
hash. If you need access to the old behavior you can
use
C<Hash::Util::bucket_ratio()> instead.
If a
tied
hash is evaluated in
scalar
context, the C<SCALAR> method is
called (
with
a fallback to C<FIRSTKEY>).
X<hash,
scalar
context> X<hash, bucket> X<bucket>
You can preallocate space
for
a hash by assigning to the
keys
() function.
This rounds up the allocated buckets to the
next
power of two:
keys
(
%users
) = 1000;
=head2 Scalar value constructors
X<
scalar
, literal> X<
scalar
, constant>
Numeric literals are specified in any of the following floating point or
integer formats:
12345
12345.67
.23E-10
3.14_15_92
4_294_967_296
0xff
0xdead_beef
0377
0o12_345
0b011011
0x1.999ap-4
You are allowed to
use
underscores (underbars) in numeric literals
between digits
for
legibility (but not multiple underscores in a row:
C<23__500> is not legal; C<23_500> is).
You could,
for
example, group binary
digits by threes (as
for
a Unix-style mode argument such as 0b110_100_100)
or by fours (to represent nibbles, as in 0b1010_0110) or in other groups.
X<number, literal>
String literals are usually delimited by either single or double
quotes. They work much like quotes in the standard Unix shells:
double-quoted string literals are subject to backslash and variable
substitution; single-quoted strings are not (except
for
C<\'> and
C<\\>). The usual C-style backslash rules apply
for
making
characters such as newline, tab, etc., as well as some more exotic
forms. See L<perlop/
"Quote and Quote-like Operators"
>
for
a list.
X<string, literal>
Hexadecimal, octal, or binary, representations in string literals
(e.g.
'0xff'
) are not automatically converted to their integer
representation. The
hex
() and
oct
() functions make these conversions
for
you. See L<perlfunc/
hex
> and L<perlfunc/
oct
>
for
more details.
Hexadecimal floating point can start just like a hexadecimal literal,
and it can be followed by an optional fractional hexadecimal part,
but it must be followed by C<p>, an optional sign, and a power of two.
The
format
is useful
for
accurately presenting floating point
values
,
avoiding conversions to or from decimal floating point, and therefore
avoiding possible loss in precision. Notice that
while
most current
platforms
use
the 64-bit IEEE 754 floating point, not all
do
. Another
potential source of (low-order) differences are the floating point
rounding modes, which can differ between CPUs, operating systems,
and compilers, and which Perl doesn't control.
You can also embed newlines directly in your strings, i.e., they can end
on a different line than they begin. This is nice, but
if
you forget
your trailing quote, the error will not be reported
until
Perl finds
another line containing the quote character, which may be much further
on in the script. Variable substitution inside strings is limited to
scalar
variables, arrays, and array or hash slices. (In other words,
names beginning
with
$ or @, followed by an optional bracketed
expression as a subscript.) The following code segment prints out "The
price is
$Z
<>100."
X<interpolation>
$Price
=
'$100'
;
print
"The price is $Price.\n"
;
There is
no
double interpolation in Perl, so the C<$100> is left as is.
By
default
floating point numbers substituted inside strings
use
the
dot (
"."
) as the decimal separator. If C<
use
locale> is in effect,
and POSIX::setlocale()
has
been called, the character used
for
the
decimal separator is affected by the LC_NUMERIC locale.
See L<perllocale> and L<POSIX>.
=head3 Demarcated variable names using braces
As in some shells, you can enclose the variable name in braces as a
demarcator to disambiguate it from following alphanumerics and
underscores or other text. You must also
do
this
when
interpolating a
variable into a string to separate the variable name from a following
double-colon or an apostrophe since these would be otherwise treated as
X<interpolation>
$who
=
"Larry"
;
print
PASSWD
"${who}::0:0:Superuser:/:/bin/perl\n"
;
print
"We use ${who}speak when ${who}'s here.\n"
;
Without the braces, Perl would have looked
for
a
$whospeak
, a
C<
$who::0
>, and a C<
$who
's> variable. The
last
two would be the
$0 and the
$s
variables in the (presumably) non-existent
package
C<who>.
In fact, a simple identifier within such curly braces is forced to be a
string, and likewise within a hash subscript. Neither need quoting. Our
earlier example, C<
$days
{
'Feb'
}> can be written as C<
$days
{Feb}> and the
quotes will be assumed automatically. But anything more complicated in
the subscript will be interpreted as an expression. This means
for
example that C<
$version
{2.0}++> is equivalent to C<
$version
{2}++>, not
to C<
$version
{
'2.0'
}++>.
There is a similar problem
with
interpolation
with
text that looks like
array or hash access notation. Placing a simple variable like C<
$who
>
immediately in front of text like C<
"[1]"
> or C<
"{foo}"
> would cause the
variable to be interpolated as accessing an element of C<
@who
> or a
value stored in C<
%who
>:
$who
=
"Larry Wall"
;
print
"$who[1] is the father of Perl.\n"
;
would attempt to access
index
1 of an array named C<
@who
>. Again, using
braces will prevent this from happening:
$who
=
"Larry Wall"
;
print
"${who}[1] is the father of Perl.\n"
;
will be treated the same as
$who
=
"Larry Wall"
;
print
$who
.
"[1] is the father of Perl.\n"
;
This notation also applies to more complex variable descriptions,
such as array or hash access
with
subscripts. For instance
@name
=
qw(Larry Curly Moe)
;
print
"Also ${name[0]}[1] was a member\n"
;
Without the braces the above example would be parsed as a two level
array subscript in the C<
@name
> array, and under C<
use
strict> would
likely produce a fatal exception, as it would be parsed like this:
print
"Also "
.
$name
[0][1] .
" was a member\n"
;
and not as the intended:
print
"Also "
.
$name
[0] .
"[1] was a member\n"
;
A similar result may be derived by using a backslash on the first
character of the subscript or
package
notation that is not part of
the variable you want to access. Thus the above example could also
be written:
@name
=
qw(Larry Curly Moe)
;
print
"Also $name[0]\[1] was a member\n"
;
however
for
some special variables (multi character caret variables) the
demarcated form using curly braces is the B<only> way you can reference
the variable at all, and the only way you can access a subscript of the
variable via interpolation.
Consider the magic array C<@{^CAPTURE}> which is populated by the
regex engine
with
the contents of all of the capture buffers in a
pattern (see L<perlvar> and L<perlre>). The B<only> way you can
access one of these members inside of a string is via the braced
(demarcated) form:
"abc"
=~/(.)(.)(.)/
and
print
"Second buffer is ${^CAPTURE[1]}"
;
is equivalent to
"abc"
=~/(.)(.)(.)/
and
print
"Second buffer is "
. ${^CAPTURE}[1];
Saying C<@^CAPTURE> is a syntax error, so it B<must> be referenced as
C<@{^CAPTURE}>, and to access one of its elements in normal code you
would
write
C< ${^CAPTURE}[1] >. However
when
interpolating in a string
C<
"${^CAPTURE}[1]"
> would be equivalent to C<${^CAPTURE} .
"[1]"
>,
which does not even refer to the same variable! Thus the subscripts must
B<also> be placed B<inside> of the braces: C<
"${^CAPTURE[1]}"
>.
The demarcated form using curly braces can be used
with
all the
different types of variable access, including array and hash slices. For
instance code like the following:
@name
=
qw(Larry Curly Moe)
;
local
$
" = "
and ";
print
"My favorites were @{name[1,2]}.\n"
;
would output
My favorites were Curly and Moe.
=head3 Special floating point: infinity (Inf) and not-a-number (NaN)
Floating point
values
include the special
values
C<Inf> and C<NaN>,
for
infinity and not-a-number. The infinity can be also negative.
The infinity is the result of certain math operations that overflow
the floating point range, like 9**9**9. The not-a-number is the
result
when
the result is undefined or unrepresentable. Though note
that you cannot get C<NaN> from some common
"undefined"
or
"out-of-range"
operations like dividing by zero, or square root of
a negative number, since Perl generates fatal errors
for
those.
The infinity and not-a-number have their own special arithmetic rules.
The general rule is that they are
"contagious"
: C<Inf> plus one is
C<Inf>, and C<NaN> plus one is C<NaN>. Where things get interesting
is
when
you combine infinities and not-a-numbers: C<Inf> minus C<Inf>
and C<Inf> divided by C<Inf> are C<NaN> (
while
C<Inf> plus C<Inf> is
C<Inf> and C<Inf>
times
C<Inf> is C<Inf>). C<NaN> is also curious
in that it does not equal any number, I<including> itself:
C<NaN> != C<NaN>.
Perl doesn't understand C<Inf> and C<NaN> as numeric literals, but
you can have them as strings, and Perl will convert them as needed:
"Inf"
+ 1. (You can, however,
import
them from the POSIX extension;
C<
use
POSIX
qw(Inf NaN)
;> and then
use
them as literals.)
Note that on input (string to number) Perl accepts C<Inf> and C<NaN>
in many forms. Case is ignored, and the Win32-specific forms like
C<1.
C<Inf> and C<NaN>.
=head3 Version Strings
X<version string> X<vstring> X<v-string>
A literal of the form C<v1.20.300.4000> is parsed as a string composed
of characters
with
the specified ordinals. This form, known as
v-strings, provides an alternative, more readable way to construct
strings, rather than
use
the somewhat less readable interpolation form
C<
"\x{1}\x{14}\x{12c}\x{fa0}"
>. This is useful
for
representing
Unicode strings, and
for
comparing version
"numbers"
using the string
comparison operators, C<cmp>, C<gt>, C<lt> etc. If there are two or
more dots in the literal, the leading C<v> may be omitted.
print
v9786;
print
v102.111.111;
print
102.111.111;
Such literals are accepted by both C<
require
> and C<
use
>
for
doing a version check. Note that using the v-strings
for
IPv4
addresses is not portable
unless
you also
use
the
inet_aton()/inet_ntoa() routines of the Socket
package
.
Note that since Perl 5.8.1 the single-number v-strings (like C<v65>)
are not v-strings
before
the C<< => >> operator (which is usually used
to separate a hash key from a hash value); instead they are interpreted
as literal strings (
'v65'
). They were v-strings from Perl 5.6.0 to
Perl 5.8.0, but that caused more confusion and breakage than good.
Multi-number v-strings like C<v65.66> and C<65.66.67>
continue
to
be v-strings always.
=head3 Special Literals
X<special literal> X<__END__> X<__DATA__> X<END> X<DATA>
X<end> X<data> X<^D> X<^Z>
The special literals __FILE__, __LINE__, and __PACKAGE__
represent the current filename, line number, and
package
name at that
point in your program. __SUB__ gives a reference to the current
subroutine. They may be used only as separate tokens; they
will not be interpolated into strings. If there is
no
current
package
(due to an empty C<
package
;> directive), __PACKAGE__ is the undefined
value. (But the empty C<
package
;> is
no
longer supported, as of version
5.10.) Outside of a subroutine, __SUB__ is the undefined value. __SUB__
is only available in 5.16 or higher, and only
with
a C<
use
v5.16> or
C<
use
feature
"current_sub"
> declaration.
X<__FILE__> X<__LINE__> X<__PACKAGE__> X<__SUB__>
X<line> X<file> X<
package
>
The two control characters ^D and ^Z, and the tokens __END__ and __DATA__
may be used to indicate the logical end of the script
before
the actual
end of file. Any following text is ignored by the interpreter
unless
read
by the program as described below.
Text
after
__DATA__ may be
read
via the filehandle C<PACKNAME::DATA>,
where C<PACKNAME> is the
package
that was current
when
the __DATA__
token was encountered. The filehandle is left
open
pointing to the
line
after
__DATA__. The program should C<
close
DATA>
when
it is done
reading from it. (Leaving it
open
leaks filehandles
if
the module is
reloaded
for
any reason, so it's a safer practice to
close
it.) For
compatibility
with
older scripts written
before
__DATA__ was
introduced, __END__ behaves like __DATA__ in the top level script (but
not in files loaded
with
C<
require
> or C<
do
>) and leaves the remaining
contents of the file accessible via C<main::DATA>.
while
(
my
$line
= <DATA>) {
print
$line
; }
close
DATA;