=encoding utf8
=head1 NAME
Synopsis_06 - Subroutines
=head1 AUTHOR
Damian Conway <damian
@conway
.org> and
Allison Randal <al
@shadowed
.net>
=head1 VERSION
Maintainer: Larry Wall <larry
@wall
.org>
Date: 21 Mar 2003
Last Modified: 26 Oct 2007
Number: 6
Version: 90
This document summarizes Apocalypse 6, which covers subroutines and the
new type
system
.
=head1 Subroutines and other code objects
B<Subroutines> (keyword: C<
sub
>) are non-inheritable routines
with
parameter lists.
B<Methods> (keyword: C<method>) are inheritable routines which always
have an associated object (known as their invocant) and belong to a
particular kind or class.
B<Submethods> (keyword: C<submethod>) are non-inheritable methods, or
subroutines masquerading as methods. They have an invocant and belong to
a particular kind or class.
B<Regexes> (keyword: C<regex>) are methods (of a grammar) that perform
pattern matching. Their associated block
has
a special syntax (see
Synopsis 5). (We also
use
the term
"regex"
for
anonymous patterns
of the traditional form.)
B<Tokens> (keyword: C<token>) are regexes that perform low-level
non-backtracking (by
default
) pattern matching.
B<Rules> (keyword: C<rule>) are regexes that perform non-backtracking
(by
default
) pattern matching (and also enable rules to
do
whitespace
dwimmery).
B<Macros> (keyword: C<macro>) are routines whose calls execute as soon
as they are parsed (i.e. at compile-
time
). Macros may
return
another
source code string or a parse-tree.
=head1 Routine modifiers
B<Multis> (keyword: C<multi>) are routines that can have multiple
variants that share the same name, selected by arity, types, or some
other constraints.
B<Prototypes> (keyword: C<proto>) specify the commonalities (such
as parameter names, fixity, and associativity) shared by all multis
of that name in the scope of the C<proto> declaration. A C<proto>
also adds an implicit C<multi> to all routines of the same short
name within its scope,
unless
they have an explicit modifier.
(This is particularly useful
when
adding to rule sets or
when
attempting
to compose conflicting methods from roles.)
B<Only> (keyword: C<only>) routines
do
not share their short names
with
other routines. This is the
default
modifier
for
all routines,
unless
a C<proto> of the same name was already in scope.
A modifier keyword may occur
before
the routine keyword in a named routine:
only
sub
foo {...}
proto
sub
foo {...}
multi
sub
foo {...}
only method bar {...}
proto method bar {...}
multi method bar {...}
If the routine keyword is omitted, it defaults to C<
sub
>.
Modifier keywords cannot apply to anonymous routines.
=head2 Named subroutines
The general syntax
for
named subroutines is any of:
my
RETTYPE
sub
NAME ( PARAMS ) TRAITS {...}
our
RETTYPE
sub
NAME ( PARAMS ) TRAITS {...}
sub
NAME ( PARAMS ) TRAITS {...}
The
return
type may also be put inside the parentheses:
sub
NAME (PARAMS --> RETTYPE) {...}
Unlike in Perl 5, named subroutines are considered expressions,
so this is valid Perl 6:
my
@subs
= (
sub
foo { ... },
sub
bar { ... });
=head2 Anonymous subroutines
The general syntax
for
anonymous subroutines is:
sub
( PARAMS ) TRAITS {...}
But one can also
use
a scope modifier to introduce the
return
type first:
my
RETTYPE
sub
( PARAMS ) TRAITS {...}
our
RETTYPE
sub
( PARAMS ) TRAITS {...}
In this case there is
no
effective difference, since the distinction between
C<
my
> and C<
our
> is only in the handling of the name, and in the case of
an anonymous
sub
, there
's isn'
t one.
B<Trait> is the name
for
a compile-
time
(C<is>) property.
See L<
"Properties and traits"
>.
=head2 Perl5ish subroutine declarations
You can declare a
sub
without parameter list, as in Perl 5:
sub
foo {...}
Arguments implicitly come in via the C<
@_
> array, but they are C<readonly>
aliases to actual arguments:
sub
say
{
print
qq{"@_[]"\n}
; }
sub
cap {
$_
=
uc
$_
for
@_
}
If you need to modify the elements of C<
@_
>, declare the array explicitly
with
the C<is rw> trait:
sub
swap (*
@_
is rw) {
@_
[0,1] =
@_
[1,0] }
=head2 Blocks
Raw blocks are also executable code structures in Perl 6.
Every block defines an object of type C<Code>, which may either be
executed immediately or passed on as a C<Code> object. How a block is
parsed is context dependent.
A bare block where an operator is expected terminates the current
expression and will presumably be parsed as a block by the current
statement-level construct, such as an C<
if
> or C<
while
>. (If
no
statement construct is looking
for
a block there, it's a syntax error.)
This form of bare block requires leading whitespace because a bare
block where a postfix is expected is treated as a hash subscript.
A bare block where a term is expected merely produces a C<Code> object.
If the term bare block occurs in a list, it is considered the final
element of that list
unless
followed immediately by a comma or colon
(intervening C<\h*> or
"unspace"
is allowed).
=head2
"Pointy blocks"
Semantically the arrow operator C<< -> >> is almost a synonym
for
the
C<
sub
> keyword as used to declare an anonymous subroutine, insofar as
it allows you to declare a signature
for
a block of code. However,
the parameter list of a pointy block does not
require
parentheses,
and a pointy block may not be
given
traits. In most respects,
though, a pointy block is treated more like a bare block than like
an official subroutine. Syntactically, a pointy block may be used
anywhere a bare block could be used:
my
$sq
= ->
$val
{
$val
**2 };
say
$sq
(10);
my
@list
= 1..3;
for
@list
->
$elem
{
say
$elem
;
}
It also behaves like a block
with
respect to control exceptions.
If you C<
return
> from within a pointy block, the block is transparent
to the
return
; it will
return
from the innermost enclosing C<
sub
> or
C<method>, not from the block itself. It is referenced by C<&?BLOCK>,
not C<&?ROUTINE>.
A normal pointy block's parameters
default
to C<readonly>, just like
parameters to a normal
sub
declaration. However, the double-pointy variant
defaults parameters to C<rw>:
for
@list
<->
$elem
{
$elem
++;
}
This form applies C<rw> to all the arguments:
for
@kv
<->
$key
,
$value
{
$key
~=
".jpg"
;
$value
*= 2
if
$key
~~ :e;
}
=head2 Stub declarations
To predeclare a subroutine without actually defining it,
use
a
"stub block"
:
sub
foo {...}
The old Perl 5 form:
sub
foo;
is a compile-
time
error in Perl 6 (because it would imply that the body of the
subroutine
extends
from that statement to the end of the file, as C<class> and
C<module> declarations
do
). The only allowed
use
of the semicolon form is to
declare a C<MAIN>
sub
--see L</Declaring a MAIN subroutine> below.
Redefining a stub subroutine does not produce an error, but redefining
an already-
defined
subroutine does. If you wish to redefine a
defined
sub
,
you must explicitly
use
the
"C<is instead>"
trait.
The C<...> is the
"yadayadayada"
operator, which is executable but returns
a failure. You can also
use
C<???> to produce a warning, or C<!!!> to
always
die
. These also officially define stub blocks
if
used as the
only expression in the block.
It
has
been argued that C<...> as literal syntax is confusing
when
you might also want to
use
it
for
metasyntax within a document.
Generally this is not an issue in context; it's never an issue in the
program itself, and the few places where it could be an issue in the
documentation, a comment will serve to clarify the intent, as above.
The rest of the
time
, it doesn't really matter whether the reader
takes C<...> as literal or not, since the purpose of C<...> is to
indicate that something is missing whichever way you take it.
=head2 Globally scoped subroutines
Subroutines and variables can be declared in the global namespace, and are
thereafter visible everywhere in a program.
Global subroutines and variables are normally referred to by prefixing
their identifiers
with
C<*> (short
for
"C<GLOBAL::>"
). The C<*>
is required on the declaration
unless
the C<GLOBAL> namespace can be
inferred some other way, but the C<*> may be omitted on
use
if
the
reference is unambiguous:
$
*next_id
= 0;
sub
*saith
(
$text
) {
print
"Yea verily, $text"
}
module A {
my
$next_id
= 2;
saith(
$next_id
);
saith($
*next_id
);
}
module B {
saith(
$next_id
);
}
However, under stricture (the
default
for
most code), the C<*> is required
on variable references. It's never required on
sub
calls, and in fact,
the syntax
$x
=
*saith
(
$y
);
is illegal, because a C<*> where a term is expected is always parsed
as the
"whatever"
token. If you really want to
use
a C<*>, you must
also
use
the sigil along
with
the twigil:
$x
= &
*saith
(
$y
);
Only the name is installed into the C<GLOBAL>
package
by C<*>. To define
subs completely within the scope of the C<GLOBAL> namespace you should
use
"C<package GLOBAL {...}>"
around
the declaration.
=head2 Lvalue subroutines
Lvalue subroutines
return
a
"proxy"
object that can be assigned to.
It's known as a proxy because the object usually represents the
purpose or outcome of the subroutine call.
Subroutines are specified as being lvalue using the C<is rw> trait.
An lvalue subroutine may
return
a variable:
my
$lastval
;
sub
lastval () is rw {
return
$lastval
}
or the result of some nested call to an lvalue subroutine:
sub
prevval () is rw {
return
lastval() }
or a specially
tied
proxy object,
with
suitably programmed
C<FETCH> and C<STORE> methods:
sub
checklastval (
$passwd
) is rw {
return
new Proxy:
FETCH
=> method {
return
lastval();
},
STORE
=> method (
$val
) {
die
unless
check(
$passwd
);
lastval() =
$val
;
};
}
Other methods may be
defined
for
specialized purposes such as temporizing
the value of the proxy.
=head2 Operator overloading
Operators are just subroutines
with
special names and scoping.
An operator name consists of a grammatical category name followed by
a single colon followed by an operator name specified as
if
it were
a hash subscript (but evaluated at compile
time
). So any of these
indicates the same binary addition operator:
infix:<+>
infix:«+»
infix:<<+>>
infix:{
'+'
}
infix:{
"+"
}
Use the C<&> sigil just as you would on ordinary subs.
Unary operators are
defined
as C<prefix> or C<postfix>:
sub
prefix:<OPNAME> (
$operand
) {...}
sub
postfix:<OPNAME> (
$operand
) {...}
Binary operators are
defined
as C<infix>:
sub
infix:<OPNAME> (
$leftop
,
$rightop
) {...}
Bracketing operators are
defined
as C<circumfix> where a term is expected
or C<postcircumfix> where a postfix is expected. A two-element slice
containing the leading and trailing delimiters is the name of the
operator.
sub
circumfix:<LEFTDELIM RIGHTDELIM> (
$contents
) {...}
sub
circumfix:{
'LEFTDELIM'
,
'RIGHTDELIM'
} (
$contents
) {...}
Contrary to Apocalypse 6, there is
no
longer any rule about splitting an even
number of characters. You must
use
a two-element slice. Such names
are canonicalized to a single form within the symbol table, so you
must
use
the canonical name
if
you wish to subscript the symbol table
directly (as in C<< PKG::{
'infix:<+>'
} >>). Otherwise any form will
do
. (Symbolic references
do
not count as direct subscripts since they
go through a parsing process.) The canonical form always uses angle
brackets and a single space between slice elements. The elements
are not escaped, so C<< PKG::circumfix:{
'<'
,
'>'
} >> is canonicalized
to C<<< PKG::{
'circumfix:<< >>'
} >>>, and decanonicalizing always
involves stripping the outer angles and splitting on space,
if
any.
This works because a hash key knows how long it is, so there's
no
ambiguity about where the final angle is. And space works because
operators are not allowed to contain spaces.
Operator names can be any sequence of non-whitespace characters
including Unicode characters. For example:
sub
infix:<(c)> (
$text
,
$owner
) {
return
$text
but Copyright(
$owner
) }
method prefix:<±> (Num
$x
--> Num) {
return
+
$x
| -
$x
}
multi
sub
postfix:<!> (Int
$n
) {
$n
< 2 ?? 1 !!
$n
*(
$n
-1)! }
macro circumfix:«<!-- -->» (
$text
) is parsed / .*? / {
""
}
my
$document
=
$text
(c)
$me
;
my
$tolerance
= ±7!;
<!-- This is now a comment -->
Whitespace may never be part of the name (except as separator
within a C<< <...> >> or C<«...»> slice subscript, as in the example above).
A null operator name does not define a null or whitespace operator, but
a
default
matching subrule
for
that syntactic category, which is useful
when
there is
no
fixed string that can be recognized, such as tokens beginning
with
digits. Such an operator I<must> supply an C<is parsed> trait.
The Perl grammar uses a
default
subrule
for
the C<:1st>, C<:2nd>, C<:3rd>,
etc. regex modifiers, something like this:
sub
regex_mod_external:<> (
$x
) is parsed(token { \d+[st|nd|rd|th] }) {...}
Such
default
rules are attempted in the order declared. (They always follow
any rules
with
a known prefix, by the longest-token-first rule.)
Although the name of an operator can be installed into any
package
or
lexical namespace, the syntactic effects of an operator declaration are
always lexically scoped. Operators other than the standard ones should
not be installed into the C<*> namespace. Always
use
exportation to make
non-standard syntax available to other scopes.
=head1 Parameters and arguments
Perl 6 subroutines may be declared
with
parameter lists.
By
default
, all parameters are readonly aliases to their corresponding
arguments--the parameter is just another name
for
the original
argument, but the argument can't be modified through it. This is
vacuously true
for
value arguments, since they may not be modified in
any case. However, the
default
forces any container argument to also
be treated as an immutable value. This
extends
down only one level;
an immutable container may always
return
an element that is mutable
if
it so chooses. (For this purpose a
scalar
variable is not considered
a container of its singular object, though, so the top-level object
within a
scalar
variable is considered immutable by
default
. Perl 6
does not have references in the same sense that Perl 5 does.)
To allow modification,
use
the C<is rw> trait. This requires a mutable
object or container as an argument (or some kind of protoobject that
can be converted to a mutable object, such as might be returned
by an array or hash that knows how to autovivify new elements).
Otherwise the signature fails to
bind
, and this candidate routine
cannot be considered
for
servicing this particular call. (Other multi
candidates,
if
any, may succeed
if
the don't
require
C<rw>
for
this
parameter.) In any case, failure to
bind
does not by itself cause
an exception to be thrown; that is completely up to the dispatcher.
To pass-by-copy,
use
the C<is copy> trait. An object container will
be cloned whether or not the original is mutable,
while
an (immutable)
value will be copied into a suitably mutable container. The parameter
may
bind
to any argument that meets the other typological constraints
of the parameter.
If you have a readonly parameter C<
$ro
>, it may never be passed on to
a C<rw> parameter of a subcall, whether or not C<
$ro
> is currently
bound to a mutable object. It may only be rebound to readonly or
copy parameters. It may also be rebound to a C<
ref
> parameter (see
"C<is ref>"
below), but modification will fail as in the case where
an immutable value is bound to a C<
ref
> parameter.
Aliases of C<
$ro
> are also readonly, whether generated explicitly
with
C<:=>
or implicitly within a C<Capture> object (which are themselves immutable).
Also, C<
$ro
> may not be returned from an lvalue subroutine or method.
Parameters may be required or optional. They may be passed by position,
or by name. Individual parameters may confer a
scalar
or list context
on their corresponding arguments, but unlike in Perl 5, this is decided
lazily at parameter binding
time
.
Arguments destined
for
required positional parameters must come
before
those bound to optional positional parameters. Arguments destined
for
named parameters may come
before
and/or
after
the positional
parameters. (To avoid confusion it is highly recommended that all
positional parameters be kept contiguous in the call syntax, but
this is not enforced, and custom arg list processors are certainly
possible on those arguments that are bound to a final slurpy or
arglist variable.)
=head2 Named arguments
Named arguments are recognized syntactically at the
"comma"
level.
Since parameters are identified using identifiers, the recognized
syntaxes are those where the identifier in question is obvious.
You may
use
either the adverbial form, C<:name(
$value
)>, or the
autoquoted arrow form, C<<
name
=>
$value
>>. These must occur at
the top
"comma"
level, and
no
other forms are taken as named pairs
by
default
. Pairs intended as positional arguments rather than named
arguments may be indicated by extra parens or by explicitly quoting
the key to suppress autoquoting:
doit :
when
<now>,1,2,3;
doit (:
when
<now>),1,2,3;
doit
when
=>
'now'
,1,2,3;
doit (
when
=>
'now'
),1,2,3;
doit
'when'
=>
'now'
,1,2,3;
Only bare
keys
with
valid identifier names are recognized as named arguments:
doit
when
=>
'now'
;
doit
'when'
=>
'now'
;
doit
123
=>
'now'
;
doit :123<now>;
Going the other way, pairs intended as named arguments that don't look
like pairs must be introduced
with
the C<|> prefix operator:
$pair
= :
when
<now>;
doit
$pair
,1,2,3;
doit |
$pair
,1,2,3;
doit |get_pair(),1,2,3;
doit |(
'when'
=>
'now'
),1,2,3;
Note the parens are necessary on the
last
one due to precedence.
Likewise,
if
you wish to pass a hash and have its entries treated as
named arguments, you must dereference it
with
a C<|>:
%pairs
= (:
when
<now>, :what<any>);
doit
%pairs
,1,2,3;
doit |
%pairs
,1,2,3;
doit |%(get_pair()),1,2,3;
doit |%(
'when'
=>
'now'
),1,2,3;
Variables
with
a C<:> prefix in rvalue context autogenerate pairs, so you
can also
say
this:
$when
=
'now'
;
doit
$when
,1,2,3;
doit :
$when
,1,2,3;
In other words C<:
$when
> is shorthand
for
C<:
when
(
$when
)>. This works
for
any sigil:
:
$what
:what(
$what
)
:
@what
:what(
@what
)
:
%what
:what(
%what
)
:
&what
:what(
&what
)
Ordinary hash notation will just pass the value of the hash entry as a
positional argument regardless of whether it is a pair or not.
To pass both key and value out of hash as a positional pair,
use
C<:p>
instead:
doit
%hash
<a>:p,1,2,3;
doit
%hash
{
'b'
}:p,1,2,3;
The C<:p> stands
for
"pairs"
, not
"positional"
--the C<:p> adverb may be
placed on any Hash access to make it mean
"pairs"
instead of
"values"
.
If you want the pair (or pairs) to be interpreted a named argument,
you may
do
so by prefixing
with
the C<< prefix:<|> >> operator:
doit |
%hash
<a>:p,1,2,3;
doit |
%hash
{
'b'
}:p,1,2,3;
Pair constructors are recognized syntactically at the call level and
put into the named slot of the C<Capture> structure. Hence they may be
bound to positionals only by name, not as ordinary positional C<Pair>
objects. Leftover named arguments can be slurped into a slurpy hash.
Because named and positional arguments can be freely mixed, the
programmer always needs to disambiguate pairs literals from named
arguments
with
parentheses or quotes:
push
@array
, 1, 2, :a<b>;
push
@array
, 1, 2, (:a<b>);
push
@array
, 1, 2,
'a'
=>
'b'
;
Perl 6 allows multiple same-named arguments, and records the relative
order of arguments
with
the same name. When there are more than one
argument, the C<@> sigil in the parameter list causes the arguments
to be concatenated:
sub
fun (Int
@x
) { ... }
fun(
x
=> 1,
x
=> 2 );
fun(
x
=> (1, 2),
x
=> (3, 4) );
Other sigils
bind
only to the I<
last
> argument
with
that name:
sub
fun (Int
$x
) { ... }
f(
x
=> 1,
x
=> 2 );
fun(
x
=> (1, 2),
x
=> (3, 4) );
This means a hash holding
default
values
must come I<
before
> known named
parameters, similar to how hash constructors work:
f( |
%defaults
,
x
=> 1,
y
=> 2 );
=head2 Invocant parameters
A method invocant may be specified as the first parameter in the parameter
list,
with
a colon (rather than a comma) immediately
after
it:
method get_name (
$self
:) {...}
method set_name (
$_
:
$newname
) {...}
The corresponding argument (the invocant) is evaluated in
scalar
context
and is passed as the left operand of the method call operator:
print
$obj
.get_name();
$obj
.set_name(
"Sam"
);
For the purpose of matching positional arguments against invocant parameters,
the invocant argument passed via the method call syntax is considered the
first positional argument
when
failover happens from single dispatch to
multiple dispatch:
handle_event(
$w
,
$e
,
$m
);
$w
.handle_event(
$e
,
$m
);
Invocants may also be passed using the indirect object syntax,
with
a colon
after
them. The colon is just a special form of the comma, and
has
the
same precedence:
set_name
$obj
:
"Sam"
;
$obj
.set_name(
"Sam"
);
An invocant is the topic of the corresponding method
if
that formal
parameter is declared
with
the name C<
$_
>. A method's invocant
always
has
the alias C<self>. Other styles of self can be declared
with
the C<self> pragma.
=head2 Longname parameters
A routine marked
with
C<multi> can mark part of its parameters to
be considered in the multi dispatch. These are called I<longnames>;
see S12
for
more about the semantics of multiple dispatch.
You can choose part of a C<multi>'s parameters to be its longname,
by putting a double semicolon
after
the
last
one:
multi
sub
handle_event (
$window
,
$event
;;
$mode
) {...}
multi method set_name (
$self
:
$name
;;
$nick
) {...}
A parameter list may have at most one double semicolon; parameters
after
it are never considered
for
multiple dispatch (except of course
that they can still
"veto"
if
their number or types mismatch).
[Conjecture: It might be possible
for
a routine to advertise multiple
long names, delimited by single semicolons. See S12
for
details.]
If the parameter list
for
a C<multi> contains
no
semicolons to delimit
the list of important parameters, then all positional parameters are
considered important. If it's a C<multi method> or C<multi submethod>,
an additional implicit unnamed C<self> invocant is added to the
signature list
unless
the first parameter is explicitly marked
with
a colon.
=head2 Required parameters
Required parameters are specified at the start of a subroutine's parameter
list:
sub
numcmp (
$x
,
$y
) {
return
$x
<=>
$y
}
Required parameters may optionally be declared
with
a trailing C<!>,
though that's already the
default
for
positional parameters:
sub
numcmp (
$x
!,
$y
!) {
return
$x
<=>
$y
}
The corresponding arguments are evaluated in
scalar
context and may be
passed positionally or by name. To pass an argument by name,
specify it as a pair: C<< I<parameter_name> => I<argument_value> >>.
$comparison
= numcmp(2,7);
$comparison
= numcmp(
x
=>2,
y
=>7);
$comparison
= numcmp(
y
=>7,
x
=>2);
Pairs may also be passed in adverbial pair notation:
$comparison
= numcmp(:x(2), :y(7));
$comparison
= numcmp(:y(7), :x(2));
Passing the wrong number of required arguments to a normal subroutine
is a fatal error. Passing a named argument that cannot be bound to a normal
subroutine is also a fatal error. (Methods are different.)
The number of required parameters a subroutine
has
can be determined by
calling its C<.arity> method:
$args_required
=
&foo
.arity;
=head2 Optional parameters
Optional positional parameters are specified
after
all the required
parameters and
each
is marked
with
a C<?>
after
the parameter:
sub
my_substr (
$str
,
$from
?,
$len
?) {...}
Alternately, optional fields may be marked by supplying a
default
value.
The C<=> sign introduces a
default
value:
sub
my_substr (
$str
,
$from
= 0,
$len
= Inf) {...}
Default
values
can be calculated at run-
time
. They may even
use
the
values
of
preceding parameters:
sub
xml_tag (
$tag
,
$endtag
= matching_tag(
$tag
) ) {...}
Arguments that correspond to optional parameters are evaluated in
scalar
context. They can be omitted, passed positionally, or passed by
name:
my_substr(
"foobar"
);
my_substr(
"foobar"
,1);
my_substr(
"foobar"
,1,3);
my_substr(
"foobar"
,
len
=>3);
Missing optional arguments
default
to their
default
values
, or to
an undefined value
if
they have
no
default
. (A supplied argument that is
undefined is not considered to be missing, and hence does not trigger
the
default
. Use C<//=> within the body
for
that.)
(Conjectural: Within the body you may also
use
C<
exists
> on the
parameter name to determine whether it was passed. Maybe this will have to
be restricted to the C<?> form,
unless
we're willing to admit that a parameter
could be simultaneously
defined
and non-existent.)
=head2 Named parameters
Named-only parameters follow any required or optional parameters in the
signature. They are marked by a prefix C<:>:
sub
formalize(
$text
, :
$case
, :
$justify
) {...}
This is actually shorthand
for
:
sub
formalize(
$text
, :case(
$case
), :justify(
$justify
)) {...}
If the longhand form is used, the label name and variable name can be
different:
sub
formalize(
$text
, :case(
$required_case
), :justify(
$justification
)) {...}
so that you can
use
more descriptive internal parameter names without
imposing inconveniently long external labels on named arguments.
Multiple name wrappings may be
given
; this allows you to give both a
short and a long external name:
sub
globalize (:g(:global(
$gl
))) {...}
Or equivalently:
sub
globalize (:g(:
$global
)) {...}
Arguments that correspond to named parameters are evaluated in
scalar
context. They can only be passed by name, so it doesn't matter what
order you pass them in:
$formal
= formalize(
$title
,
case
=>
'upper'
);
$formal
= formalize(
$title
,
justify
=>
'left'
);
$formal
= formalize(
$title
, :justify<right>, :case<title>);
See S02
for
the correspondence between adverbial form and arrow notation.
While named and position arguments may be intermixed, it is suggested
that you keep all the positionals in one place
for
clarity
unless
you
have a good reason not to. This is likely bad style:
$formal
= formalize(:justify<right>,
$title
, :case<title>,
$date
);
Named parameters are optional
unless
marked
with
a following C<!>.
Default
values
for
optional named parameters are
defined
in the same
way as
for
positional parameters, but may depend only on existing
values
, including the
values
of parameters that have already been
bound. Named optional parameters
default
to C<
undef
>
if
they have
no
default
. Named required parameters fail
unless
an argument pair
of that name is supplied.
Bindings happen in declaration order, not call order, so any
default
may reliably depend on formal parameters to its left in the signature.
In other words,
if
the first parameter is C<
$a
>, it will
bind
to
a C<:a()> argument in preference to the first positional argument.
It might seem that performance of binding would suffer by requiring
a named lookup
before
a positional lookup, but the compiler is able
to guarantee that subs
with
known fixed signatures (both onlys and
multis
with
protos) translate named arguments to positional in the
first N positions. Also, purely positional calls may obviously omit any
named lookups, as may bindings that have already used up all the named
arguments. The compiler is also free to intuit proto signatures
for
a
given
sub
or method name as long as the candidate list is stable..
=head2 List parameters
List parameters capture a variable
length
list of data. They're used
in subroutines like C<
print
>, where the number of arguments needs to be
flexible. They're also called
"variadic parameters"
, because they take a
I<variable> number of arguments. But generally we call them
"slurpy"
parameters because they slurp up arguments.
Slurpy parameters follow any required or optional parameters. They are
marked by a C<*>
before
the parameter:
sub
duplicate(
$n
, *
%flag
, *
@data
) {...}
Named arguments are bound to the slurpy hash (C<*
%flag
>
in the above example). Such arguments are evaluated in
scalar
context.
Any remaining variadic arguments at the end of the argument list
are bound to the slurpy array (C<*
@data
> above) and are evaluated
in list context.
For example:
duplicate(3,
reverse
=> 1,
collate
=> 0, 2, 3, 5, 7, 11, 14);
duplicate(3, :
reverse
, :!collate, 2, 3, 5, 7, 11, 14);
Slurpy
scalar
parameters capture what would otherwise be the first
elements of the variadic array:
sub
head(
*$head
, *
@tail
) {
return
$head
}
sub
neck(
*$head
,
*$neck
, *
@tail
) {
return
$neck
}
sub
tail(
*$head
, *
@tail
) {
return
@tail
}
head(1, 2, 3, 4, 5);
neck(1, 2, 3, 4, 5);
Slurpy scalars still impose list context on their arguments.
Slurpy parameters are treated lazily -- the list is only flattened
into an array
when
individual elements are actually accessed:
@fromtwo
= tail(1..Inf);
You can't
bind
to the name of a slurpy parameter: the name is just there
so you can refer to it within the body.
sub
foo(*
%flag
, *
@data
) {...}
foo(:flag{
a
=> 1 }, :data[ 1, 2, 3 ]);
=head2 Slurpy block
It's also possible to declare a slurpy block: C<*
&block
>. It slurps
up any nameless block, specified by C<{...}>, at either the current positional
location or the end of the syntactic list. Put it first
if
you want the
option of putting a block either first or
last
in the arguments. Put it
last
if
you want to force it to come in as the
last
argument.
=head2 Argument list binding
The underlying C<Capture> object may be bound to a single
scalar
parameter marked
with
a C<|>.
sub
bar (
$a
,
$b
,
$c
,:
$mice
) {
say
$mice
}
sub
foo (|
$args
) {
say
$args
.perl;
&bar
.callwith(|
$args
); }
This prints:
foo 1,2,3,:mice<blind>;
As demonstrated above, the capture may be interpolated into another
call's arguments. (The C<|> prefix is described in the
next
section.)
Use of C<callwith> allows the routine to be called without introducing
an official C<CALLER> frame. For more see
"Wrapping"
below.
It is allowed to rebind the parameters within the signature, but
only as a subsignature of the capture argument:
sub
compare (|
$args
(Num
$x
, Num
$y
--> Bool)) { ... }
For all normal declarative purposes (invocants and multiple dispatch
types,
for
instance), the inner signature is treated as the entire
signature:
method addto (|
$args
(
$self
:
@x
)) { trace(
$args
);
$self
+= [+]
@x
}
The inner signature is not required
for
non-multies since there can
only be one candidate, but
for
multiple dispatch the inner signature
is required at least
for
its types, or the declaration would not know
what signature to match against.
multi foo (|
$args
(Int, Bool?, *@, *%)) { reallyintfoo(
$args
) }
multi foo (|
$args
(Str, Bool?, *@, *%)) { reallystrfoo(
$args
) }
=head2 Flattening argument lists
The unary C<|> operator casts its argument to a C<Capture>
object, then splices that capture into the argument list
it occurs in. To get the same effect on multiple arguments you
can
use
the C<< |« >> hyperoperator.
C<Pair> and C<Hash> become named arguments:
|(
x
=>1);
|{
x
=>1,
y
=>2};
C<List> (also C<Seq>, C<Range>, etc.) are simply turned into
positional arguments:
|(1,2,3);
|(1..3);
|(1..2, 3);
|([
x
=>1,
x
=>2]);
For example:
sub
foo(
$x
,
$y
,
$z
) {...}
@onetothree
= 1..3;
foo(1,2,3);
foo(
@onetothree
);
foo(|
@onetothree
);
The C<|> operator flattens lazily -- the array is flattened only
if
flattening is actually required within the subroutine. To flatten
before
the list is even passed into the subroutine,
use
the C<eager> list
operator:
foo(|eager
@onetothree
);
=head2 Multidimensional argument list binding
Some functions take more than one list of positional and/or named arguments,
that they wish not to be flattened into one list. For instance, C<zip()> wants
to iterate several lists in parallel,
while
array and hash subscripts want to
process a multidimensional slice. The set of underlying argument lists may be
bound to a single array parameter declared
with
a double C<@@> sigil:
sub
foo (*@
@slice
) { ... }
Note that this is different from
sub
foo (\
$slice
) { ... }
insofar as C<\
$slice
> is bound to a single argument-list object that
makes
no
commitment to processing its structure (and maybe doesn't
even know its own structure yet),
while
C<*@
@slice
>
has
to create
an array that binds the incoming dimensional lists to the array's
dimensions, and make that commitment visible to the rest of the scope
via the sigil so that constructs expecting multidimensional lists
know that multidimensionality is the intention.
It is allowed to specify a
return
type:
sub
foo (*@
@slice
--> Num) { ... }
The invocant does not participate in multi-dimensional argument lists,
so C<self> is not present in the C<@
@slice
> below:
method foo (*@
@slice
) { ... }
The C<@@> sigil is just a variant of the C<@> sigil, so C<@
@slice
>
and C<
@slice
> are really the same array. In particular, C<@
@_
> is
really the good old C<
@_
> array viewed as multidimensional.
=head2 Zero-dimensional argument list
If you call a function without parens and supply
no
arguments, the
argument list becomes a zero-dimensional slice. It differs from
C<\()> in several ways:
sub
foo (*@
@slice
) {...}
foo;
foo();
sub
bar (\
$args
= \(1,2,3)) {...}
bar;
bar();
=head2 Feed operators
The variadic list of a subroutine call can be passed in separately from
the normal argument list, by using either of the I<feed> operators:
C<< <== >> or C<< ==> >>. Syntactically, feed operators expect to find a
statement on either end. Any statement can occur on the source end;
however not all statements are suitable
for
use
on the sink end of a feed.
Each operator expects to find a call to a variadic receiver on its
"sharp"
end, and a list of
values
on its
"blunt"
end:
grep
{
$_
% 2 } <==
@data
;
@data
==>
grep
{
$_
% 2 };
It binds the (potentially lazy) list from the blunt end to the slurpy
parameter(s) of the receiver on the sharp end. In the case of a receiver
that is a variadic function, the feed is received as part of its slurpy list.
So both of the calls above are equivalent to:
grep
{
$_
% 2 },
@data
;
Note that all such feeds (and indeed all lazy argument lists) supply
an implicit promise that the code producing the lists may execute
in parallel
with
the code receiving the lists. (Feeds, hyperops,
and junctions all have this promise of parallelizability in common,
but differ in interface. Code which violates these promises is
erroneous, and will produce undefined results
when
parallelized.)
However, feeds go a bit further than ordinary lazy lists in enforcing
the parallel discipline: they explicitly treat the blunt end as a
cloned closure that starts a subthread (presumably cooperative). The only variables shared
by the inner scope
with
the outer scope are those lexical variables
declared in the outer scope that are visible at the
time
the closure is
cloned and the subthread spawned. Use of such shared variables will
automatically be subject to transactional protection (and associated
overhead). Package variables are not cloned
unless
predeclared
as lexical names
with
C<
our
>. Variables declared within the blunt
end are not visible outside, and in fact it is illegal to declare a
lexical on the blunt end that is not enclosed in curlies somehow.
Because feeds are
defined
as lazy pipes, a chain of feeds may not begin
and end
with
the same array without some kind of eager sequence point.
That is, this isn't guaranteed to work:
@data
<==
grep
{
$_
% 2 } <==
@data
;
either of these
do
:
@data
<==
grep
{
$_
% 2 } <== eager
@data
;
@data
<== eager
grep
{
$_
% 2 } <==
@data
;
Conjecture:
if
the cloning process eagerly duplicates C<
@data
>, it could
be forced to work. Not clear
if
this is desirable, since ordinary clones
just clone the container, not the value.
Leftward feeds are a convenient way of explicitly indicating the typical
right-to-left flow of data through a chain of operations:
@oddsquares
=
map
{
$_
**2 },
sort
grep
{
$_
% 2 },
@nums
;
@oddsquares
=
do
{
map
{
$_
**2 } <==
sort
<==
grep
{
$_
% 2 } <==
@nums
;
}
Rightward feeds are a convenient way of reversing the normal data flow in a
chain of operations, to make it
read
left-to-right:
@oddsquares
=
do
{
@nums
==>
grep
{
$_
% 2 } ==>
sort
==>
map
{
$_
**2 };
}
Note that something like the C<
do
> is necessary because feeds operate
at the statement level. Parens would also work, since a statement is
expected inside:
@oddsquares
= (
@nums
==>
grep
{
$_
% 2 } ==>
sort
==>
map
{
$_
**2 };
);
But as described below, you can also just
write
:
@nums
==>
grep
{
$_
% 2 } ==>
sort
==>
map
{
$_
**2 } ==>
@oddsquares
;
If the operand on the sharp end of a feed is not a call to a variadic
operation, it must be something
else
that can be interpreted as a list
receiver, or a
scalar
expression that can be evaluated to produce an
object that does the C<KitchenSink> role, such as an C<IO> object.
Such an object provides C<.clear> and C<.
push
> methods that will
be called as appropriate to
send
data. (Note that an C<IO> object
used as a sink will force eager evaluation on its pipeline, so the
next
statement is guaranteed not to run till the file is closed.
In contrast, an C<Array> object used as a sink turns into a lazy
array.)
Any non-variadic object (such as an C<Array> or C<IO> object) used as a filter
between two feeds is treated specially as a I<tap> that merely captures
data I<en passant>. You can safely install such a tap in an extended pipeline
without changing the semantics. An C<IO> object used as a tap does not
force eager evaluation since the eagerness is controlled instead by the
downstream feed.
Any prefix list operator is considered a variadic operation, so ordinarily
a list operator adds any feed input to the end of its list.
But sometimes you want to interpolate elsewhere, so any contextualizer
with
C<*> as an argument may be used to indicate the target of a
feed without the
use
of a temporary array:
foo() ==>
say
@(*),
" is what I meant"
;
bar() ==> @@(*).baz();
Likewise, an C<Array> used as a tap may be distinguished from an C<Array> used
as a translation function:
numbers() ==>
@array
==> bar()
numbers() ==>
@array
[@(*)] ==> bar()
Feeding into the C<*>
"whatever"
term sets the source
for
the
next
sink.
To append multiple sources to the
next
sink, double the angle:
0..* ==> *;
'a'
..* ==>> *;
pidigits() ==>> *;
for
zip(@@(*)) { .perl.
say
}
You may
use
a variable (or variable declaration) as a receiver, in
which case the list value is bound as the
"todo"
of the variable.
(The append form binds addition todos to the receiver's todo list.)
Do not think of it as an assignment, nor as an ordinary binding.
Think of it as iterator creation. In the case of a
scalar
variable,
that variable contains the newly created iterator itself. In the case
of an array, the new iterator is installed as the method
for
extending
the array. As
with
assignment, the old todo list is clobbered;
use
the
append form to avoid that and get
push
semantics.
In general you can simply think of a receiver array as representing
the results of the chain, so you can equivalently
write
any of:
my
@oddsquares
<==
map
{
$_
**2 } <==
sort
<==
grep
{
$_
% 2 } <==
@nums
;
my
@oddsquares
<==
map
{
$_
**2 }
<==
sort
<==
grep
{
$_
% 2 }
<==
@nums
;
@nums
==>
grep
{
$_
% 2 } ==>
sort
==>
map
{
$_
**2 } ==>
my
@oddsquares
;
@nums
==>
grep
{
$_
% 2 }
==>
sort
==>
map
{
$_
**2 }
==>
my
@oddsquares
;
Since the feed iterator is bound into the final variable, the variable
can be just as lazy as the feed that is producing the
values
.
When feeds are bound to arrays
with
"push"
semantics, you can have
a receiver
for
multiple feeds:
my
@foo
;
0..2 ==>
@foo
;
'a'
..
'c'
==>>
@foo
;
say
@foo
;
Note how the feeds are concatenated in C<
@foo
> so that C<
@foo
>
is a list of 6 elements. This is the
default
behavior. However,
sometimes you want to capture the outputs as a list of two iterators,
namely the two iterators that represent the two input feeds. You can
get at those two iterators by using the name C<@
@foo
> instead, where
the
"slice"
sigil marks a multidimensional array, that is, an
array of lists,
each
of which may be treated independently.
0..* ==> @
@foo
;
'a'
..* ==>> @
@foo
;
pidigits() ==>> @
@foo
;
for
zip(@
@foo
) { .
say
}
[0,
'a'
,3]
[1,
'b'
,1]
[2,
'c'
,4]
[3,
'd'
,1]
[4,
'e'
,5]
[5,
'f'
,9]
...
Here C<@
@foo
> is an array of three iterators, so
zip(@
@foo
)
is equivalent to
zip(@
@foo
[0]; @
@foo
[1]; @
@foo
[2])
A semicolon inside brackets is equivalent to stacked feeds. The code above
could be rewritten as:
(0..*;
'a'
..*; pidigits()) ==>
my
@
@foo
;
for
@
@foo
.zip { .
say
}
which is in turn equivalent to
for
zip(0..*;
'a'
..*; pidigits()) { .
say
}
A named receiver array is useful
when
you wish to feed into an
expression that is not an ordinary list operator, and you wish to be
clear where the feed's destination is supposed to be:
picklist() ==>
my
@baz
;
my
@foo
=
@bar
[
@baz
];
Various contexts may or may not be expecting multi-dimensional slices
or feeds. By
default
, ordinary arrays are flattened, that is, they
have
"list"
semantics. If you
say
(0..2;
'a'
..
'c'
) ==>
my
@tmp
;
for
@tmp
{ .
say
}
then you get 0,1,2,
'a'
,
'b'
,
'c'
. If you have a multidim array, you
can ask
for
list semantics explicitly
with
list():
(0..2;
'a'
..
'c'
) ==>
my
@
@tmp
;
for
@
@tmp
.list { .
say
}
As we saw earlier,
"zip"
produces an interleaved result by taking one element
from
each
list in turn, so
(0..2;
'a'
..
'c'
) ==>
my
@
@tmp
;
for
@
@tmp
.zip { .
say
}
produces 0,
'a'
,1,
'b'
,2,
'c'
.
If you want the result as a list of subarrays, then you need to put
the zip into a
"chunky"
context instead:
(0..2;
'a'
..
'c'
) ==>
my
@
@tmp
;
for
@
@tmp
.zip.@@() { .
say
}
This produces [0,
'a'
],[1,
'b'
],[2,
'c'
]. But usually you want the flat
form so you can just
bind
it directly to a signature:
for
@
@tmp
.zip ->
$i
,
$a
{
say
"$i: $a"
}
Otherwise you'd have to
say
this:
for
@
@tmp
.zip.@@() -> [
$i
,
$a
] {
say
"$i: $a"
}
In list context the C<@
@foo
> notation is really a shorthand
for
C<[;](@
@foo
)>.
In particular, you can
use
C<@
@foo
> to interpolate a multidimensional slice
in an array or hash subscript.
If C<@
@foo
> is currently empty, then C<
for
zip(@
@foo
) {...}> acts on a
zero-dimensional slice (i.e. C<
for
(zip) {...}>), and outputs nothing
at all.
Note that
with
the current definition, the order of feeds is preserved
left to right in general regardless of the position of the receiver.
So
(
'a'
..*; 0..*) ==> *;
for
zip(@@() <==
@foo
) ->
$a
,
$i
,
$x
{ ... }
is the same as
'a'
..* ==> *;
0..* ==> *;
for
zip(@@ <==
@foo
) ->
$a
,
$i
,
$x
{ ... }
which is the same as
for
zip(
'a'
..*; 0..*;
@foo
) ->
$a
,
$i
,
$x
{ ... }
Also note that these come out to be identical
for
ordinary arrays:
@foo
.zip
@foo
.cat
The C<@@(
$foo
)> coercer can be used to pull a multidim out of some
object that contains one, such as a C<Capture> or C<Match> object. Like
C<@()>, C<@@()> defaults to C<@@($/)>, and returns a multidimensional
view of any match that repeatedly applies itself
with
C<:g> and
the like. In contrast, C<@()> would flatten those into one list.
=head2 Closure parameters
Parameters declared
with
the C<&> sigil take blocks, closures, or
subroutines as their arguments. Closure parameters can be required,
optional, named, or slurpy.
sub
limited_grep (Int
$count
,
&block
, *
@list
) {...}
@first_three
= limited_grep 3, {
$_
<10},
@data
;
(The comma is required
after
the closure.)
Within the subroutine, the closure parameter can be used like any other
lexically scoped subroutine:
sub
limited_grep (Int
$count
,
&block
, *
@list
) {
...
if
block(
$nextelem
) {...}
...
}
The closure parameter can have its own signature in a type specification written
with
C<:(...)>:
sub
limited_Dog_grep (
$count
,
&block
:(Dog), Dog *
@list
) {...}
and even a
return
type:
sub
limited_Dog_grep (
$count
,
&block
:(Dog --> Bool), Dog *
@list
) {...}
When an argument is passed to a closure parameter that
has
this kind of
signature, the argument must be a C<Code> object
with
a compatible
parameter list and
return
type.
=head2 Type parameters
Unlike normal parameters, type parameters often come in piggybacked
on the actual value as
"kind"
, and you'd like a way to capture both
the value and its kind at once. (A
"kind"
is a class or type that
an object is allowed to be. An object is not officially allowed
to take on a constrained or contravariant type.) A type variable
can be used anywhere a type name can, but instead of asserting that
the value must conform to a particular type, it captures the
actual
"kind"
of the object and also declares a
package
/type name
by which you can refer to that kind later in the signature or body.
For instance,
if
you wanted to match any two Dogs as long as they
were of the same kind, you can
say
:
sub
matchedset (Dog ::T
$fido
, T
$spot
) {...}
(Note that C<::T> is not required to contain C<Dog>, only
a type that is compatible
with
C<Dog>.)
The C<::> sigil is short
for
"subset"
in much the same way that C<&> is
short
for
"sub"
. Just as C<&> can be used to name any kind of code,
so too C<::> can be used to name any kind of type. Both of them insert
a bare identifier into the symbol table, though they fill different syntactic
spots.
Note that it is not required to capture the object associated
with
the
class
unless
you want it. The
sub
above could be written as
sub
matchedset (Dog ::T, T) {...}
if
we're not interested in C<
$fido
> or C<
$spot
>. Or just
sub
matchedset (::T, T) {...}
if
we don't care about anything but the matching.
=head2 Unpacking array parameters
Instead of specifying an array parameter as an array:
sub
quicksort (
@data
,
$reverse
?,
$inplace
?) {
my
$pivot
:=
shift
@data
;
...
}
it may be broken up into components in the signature, by
specifying the parameter as
if
it were an anonymous array of
parameters:
sub
quicksort ([
$pivot
, *
@data
],
$reverse
?,
$inplace
?) {
...
}
This subroutine still expects an array as its first argument, just like
the first version.
=head2 Unpacking a single list argument
To match the first element of the slurpy list,
use
a
"slurpy"
scalar
:
sub
quicksort (:
$reverse
, :
$inplace
,
*$pivot
, *
@data
)
=head2 Unpacking hash parameters
Likewise, a hash argument can be mapped to a hash of parameters, specified
as named parameters within curlies. Instead of saying:
sub
register (
%guest_data
,
$room_num
) {
my
$name
:=
delete
%guest_data
<name>;
my
$addr
:=
delete
%guest_data
<addr>;
...
}
you can get the same effect
with
:
sub
register ({:
$name
, :
$addr
, *
%guest_data
},
$room_num
) {
...
}
=head2 Unpacking tree node parameters
You can
unpack
tree nodes in various dwimmy ways by enclosing the bindings
of child nodes and attributes in parentheses following the declaration of
the node itself:
sub
traverse ( BinTree
$top
(
$left
,
$right
) ) {
traverse(
$left
);
traverse(
$right
);
}
In this, C<
$left
> and C<
$right
> are automatically bound to the left
and right nodes of the tree. If
$top
is an ordinary object, it binds
the C<
$top
.left> and C<
$top
.right> attributes. If it's a hash,
it binds C<<
$top
<left> >> and C<<
$top
<right> >>. If C<BinTree> is a
signature type and
$top
is a List (argument list) object, the child types
of the signature are applied to the actual arguments in the argument
list object. (Signature types have the benefit that you can view
them inside-out as constructors
with
positional arguments, such that
the transformations can be reversible.)
However, the full power of signatures can be applied to pattern match
just about any argument or set of arguments, even though in some cases
the
reverse
transformation is not derivable. For instance, to
bind
to
an array of children named C<.kids> or C<< .<kids> >>,
use
something
like:
multi traverse ( NAry
$top
( :kids [
$eldest
, *
@siblings
] ) ) {
traverse(
$eldest
);
traverse(:kids(
@siblings
));
}
multi traverse (
$leaf
) {...}
The second candidate is called only
if
the parameter cannot be bound to
both
$top
and to the
"kids"
parsing subparameter.
Likewise, to
bind
to a hash element of the node and then
bind
to
keys
in that hash by name:
sub
traverse ( AttrNode
$top
( :
%attr
{ :
$vocalic
, :
$tense
} ) ) {
say
"Has {+%attr} attributes, of which"
;
say
"vocalic = $vocalic"
;
say
"tense = $tense"
;
}
You may omit the top variable
if
you prefix the parentheses
with
a colon
to indicate a signature. Otherwise you must at least put the sigil of
the variable, or we can't correctly differentiate:
my
Dog (
$fido
,
$spot
) := twodogs();
my
Dog $ (
$fido
,
$spot
) := twodogs();
my
Dog :(
$fido
,
$spot
) := twodogs();
Sub signatures can be matched directly within regexes by using C<:(...)>
notation.
push
@a
,
"foo"
;
push
@a
, \(1,2,3);
push
@a
,
"bar"
;
...
my
(
$i
,
$j
,
$k
);
@a
~~ rx/
<,>
:(Int
$i
,Int
$j
,Int?
$k
)
<,>
/;
say
"i = $<i>"
;
say
"j = $<j>"
;
say
"k = $<k>"
if
defined
$<k>;
If you want a parameter bound into C<$/>, you have to
say
C<< $<i> >>
within the signature. Otherwise it will
try
to
bind
an external C<
$i
>
instead, and fail
if
no
such variable is declared.
Note that unlike a
sub
declaration, a regex-embedded signature
has
no
associated
"returns"
syntactic slot, so you have to
use
C<< --> >>
within the signature to specify the C<of> type of the signature, or match as
an arglist:
:(Num, Num --> Coord)
:(\Coord(Num, Num))
A consequence of the latter form is that you can match the type of
an object
with
C<:(\Dog)> without actually breaking it into its components.
Note, however, that it's not equivalent to
say
:(--> Dog)
which would be equivalent to
:(\Dog())
that is, match a nullary function of type C<Dog>. Nor is it equivalent to
:(Dog)
which would be equivalent to
:(\Any(Dog))
and match a function taking a single parameter of type Dog.
Note also that bare C<\(1,2,3)> is never legal in a regex since the
first (escaped) paren would
try
to match literally.
=head2 Attributive parameters
If a submethod's parameter is declared
with
a C<.> or C<!>
after
the
sigil (like an attribute):
submethod initialize($.name, $!age) {}
then the argument is assigned directly to the object's attribute of the
same name. This avoids the frequent need to
write
code like:
submethod initialize(
$name
,
$age
) {
$.name =
$name
;
$!age =
$age
;
}
To
rename
an attribute parameter you can
use
the explicit pair form:
submethod initialize(:moniker($.name), :youth($!age)) {}
The C<:
$name
> shortcut may be combined
with
the C<$.name> shortcut,
but the twigil is ignored
for
the parameter name, so
submethod initialize(:$.name, :$!age) {}
is the same as:
submethod initialize(:name($.name), :age($!age)) {}
Note that C<$!age> actually refers to the private
"C<has>"
variable that
can be referred to as either C<
$age
> or C<$!age>.
=head2 Placeholder variables
Even though every bare block is a closure, bare blocks can't have
explicit parameter lists. Instead, they
use
"placeholder"
variables,
marked by a caret (C<^>)
after
their sigils.
Using placeholders in a block defines an implicit parameter list. The
signature is the list of distinct placeholder names, sorted in Unicode order.
So:
{ $^y < $^z && $^x != 2 }
is a shorthand
for
:
->
$x
,
$y
,
$z
{
$y
<
$z
&&
$x
!= 2 }
Note that placeholder variables syntactically cannot have type constraints.
Also, it is illegal to
use
placeholder variables in a block that already
has
a signature, because the autogenerated signature would conflict
with
that.
Placeholder names consisting of a single uppercase letter are disallowed,
not because we're mean, but because it helps us
catch
references to
obsolete Perl 5 variables such as $^O.
=head1 Properties and traits
Compile-
time
properties are called
"traits"
. The
C<is I<NAME> (I<DATA>)> syntax defines traits on containers and
subroutines, as part of their declaration:
constant
$pi
is Approximated = 3;
my
$key
is Persistent(:file<.key>);
sub
fib is cached {...}
The C<will I<NAME> I<BLOCK>> syntax is a synonym
for
C<is I<NAME> (I<BLOCK>)>:
my
$fh
will undo {
close
$fh
};
The C<but I<NAME> (I<DATA>)> syntax specifies run-
time
properties on
values
:
constant
$pi
= 3 but Inexact;
sub
system
{
...
return
$error
but False
if
$error
;
return
0 but True;
}
Properties are predeclared as roles and implemented as mixins--see S12.
=head2 Subroutine traits
These traits may be declared on the subroutine as a whole (individual
parameters take other traits). Trait syntax depends on the particular
auxiliary you
use
, but
for
C<is>, the subsequent syntax is identical to
adverbial syntax, except that that colon may be omitted or doubled depending
on the degree of ambiguity desired:
is ::Foo[...]
is :Foo[...]
is Foo[...]
=over
=item C<is signature>
The signature of a subroutine. Normally declared implicitly, by providing a
parameter list and/or
return
type.
=item C<returns>/C<is returns>
The C<inner> type constraint that a routine imposes on its
return
value.
=item C<of>/C<is of>
The C<of> type that is the official
return
type of the routine. Or you
can think of
"of"
as outer/formal. If there is
no
inner type, the outer
type also serves as the inner type to constrain the
return
value.
=item C<will
do
>
The block of code executed
when
the subroutine is called. Normally declared
implicitly, by providing a block
after
the subroutine's signature definition.
=item C<is rw>
Marks a subroutine as returning an lvalue.
=item C<is parsed>
Specifies the subrule by which a macro call is parsed. The parse
always starts
after
the macro's initial token. If the operator
has
two parts (circumfix or postcircumfix), the final token is also automatically
matched, and should not be matched by the supplied regex.
=item C<is reparsed>
Also specifies the subrule by which a macro call is parsed, but restarts
the parse
before
the macro's initial token, usually because you want
to parse using an existing rule that expects to traverse the initial
token. If the operator
has
two parts (circumfix or postcircumfix), the
final token must also be explicitly matched by the supplied regex.
=item C<is cached>
Marks a subroutine as being memoized, or at least memoizable.
In the abstract, this cache is just a hash where incoming argument
C<Capture>s are mapped to
return
values
. If the C<Capture> is found in
the hash, the
return
value need not be recalculated. If you
use
this trait, the compiler will assume two things:
=over
=item *
A
given
C<Capture> would always calculate the same
return
value. That is,
there is
no
state hidden within the dynamic scope of the call.
=item *
The cache lookup is likely to be more efficient than recalculating
the value in at least some cases, because either most uncached calls
would be slower (and reduce throughput), or you're trying to avoid a
significant number of pathological cases that are unacceptably slow
(and increase latency).
=back
This trait is a suggestion to the compiler that caching is okay. The
compiler is free to choose any kind of caching algorithm (including
non-expiring, random, lru, pseudo-lru, or adaptive algoritms, or
even
no
caching algorithm at all). The run-
time
system
is free to
choose any kind of maximum cache size depending on the availability
of memory and trends in usage patterns. You may suggest a particular
cache size by passing a numeric argument (representing the maximum number
of unique C<Capture>
values
allowed), and some of the possible
algorithms may pay attention to it. You may also pass C<*>
for
the
size to request a non-expiring cache (complete memoization). The
compiler is free to ignore this too.
The intent of this trait is to specify performance hints without
mandating any exact behavior. Proper
use
of this trait should not
change semantics of the program; it functions as a kind of
"pragma"
.
This trait will not be extended to reinvent other existing ways of
achieving the same effect. To gain more control,
write
your own
trait handler to allow the
use
of a more specific trait, such as
"C<is lru(42)>"
. Alternately, just
use
a state hash keyed on the
sub
's argument capture to
write
your own memoization
with
complete
control from within the subroutine itself, or from within a wrapper
around
your subroutine.
=item C<is inline>
I<Suggests> to the compiler that the subroutine is a candidate
for
optimization via inlining. Basically promises that nobody is going
to
try
to wrap this subroutine (or that
if
they
do
, you don't care).
=item C<is tighter>/C<is looser>/C<is equiv>
Specifies the precedence of an operator relative to an existing
operator. C<tighter> and C<looser> operators
default
to being left
associative.
C<equiv> on the other hand also clones other traits, so it specifies
the
default
associativity to be the same as the operator to which
the new operator is equivalent. The following are the
default
equivalents
for
various syntactic categories
if
neither C<equiv> nor
C<assoc> is specified. (Many of these have
no
need of precedence
or associativity because they are parsed specially. Nevertheless,
C<equiv> may be useful
for
cloning other traits of these operators.)
category:<prefix>
circumfix:<( )>
dotty:<.>
infix:<+>
infix_circumfix_meta_operator:{
'»'
,
'«'
}
infix_postfix_meta_operator:<=>
infix_prefix_meta_operator:<!>
package_declarator:<class>
postcircumfix:<( )>
postfix:<++>
postfix_prefix_meta_operator:{
'»'
}
prefix:<++>
prefix_circumfix_meta_operator:{
'['
,
']'
}
prefix_postfix_meta_operator:{
'«'
}
q_backslash:<\\>
qq_backslash:<n>
quote_mod:<c>
quote:<
q>
regex_assertion:&
lt;?>
regex_backslash:<w>
regex_metachar:<.>
regex_mod_internal:<i>
routine_declarator:<
sub
>
scope_declarator:<
my
>
sigil:<$>
special_variable:<$!>
statement_control:<
if
>
statement_mod_cond:<
if
>
statement_mod_loop:<
while
>
statement_prefix:<
do
>
term:<*>
trait_auxiliary:<is>
trait_verb:<of>
twigil:<?>
type_declarator:<subset>
version:<v>
The existing operator may be specified either as a function object
or as a string argument equivalent to the one that would be used in
the complete function name. In string form the syntactic
category will be assumed to be the same as the new declaration.
Therefore these all have the same effect:
sub
postfix:<!> (
$x
) is equiv(
&postfix
:<++>) {...}
sub
postfix:<!> (
$x
) is equiv<++> {...}
sub
postfix:<!> (
$x
) {...}
Prefix operators that are identifiers are handled specially. Both of
sub
foo ($) {...}
sub
prefix:<foo> ($) {...}
default
to named unary precedence despite declaring a prefix operator.
Likewise postfix operators that look like method calls are forced to
default
to the precedence of method calls. Any prefix operator that
requires multiple arguments defaults to listop precedence, even
if
it
is not an identifier.
=item C<is assoc>
Specifies the associativity of an operator explicitly. Valid
values
are:
Tag Examples Meaning of
$a
op
$b
op
$c
Default equiv
=== ======== ========================= =============
left + - * / x (
$a
op
$b
) op
$c
+
right ** =
$a
op (
$b
op
$c
) **
non cmp <=> .. ILLEGAL cmp
chain == eq ~~ (
$a
op
$b
) and (
$b
op
$c
) eqv
list | & ^ Z op(
$a
;
$b
;
$c
) |
Note that operators
"C<equiv>"
to relationals are automatically considered
chaining operators. When creating a new precedence level, the chaining
is determined by the presence or absence of
"C<< is assoc<chain> >>"
,
and other operators
defined
at that level are required to be the same.
Specifying an C<assoc> without an explicit C<equiv> substitutes a
default
C<equiv> consistent
with
the associativity, as shown in the final column above.
=item C<PRE>/C<POST>
Mark blocks that are to be unconditionally executed
before
/
after
the subroutine's C<
do
> block. These blocks must
return
a true value,
otherwise an exception is thrown.
When applied to a method, the semantics provide support
for
the
"Design by Contract"
style of OO programming: a precondition of
a particular method is met
if
all the C<PRE> blocks associated
with
that method
return
true. Otherwise, the precondition is met
if
C<all> of the parent classes' preconditions are met (which may
include the preconditions of I<their> parent classes
if
they fail,
and so on recursively.)
In contrast, a method
's postcondition is met if all the method'
s C<POST> blocks
return
true I<and> all its parents' postconditions are also met recursively.
C<POST> blocks (and
"C<will post>"
block traits) declared within a C<PRE>
or C<ENTER> block are automatically hoisted outward to be called at the
same
time
as other C<POST> blocks. This conveniently gives
"circum"
semantics by virtue of wrapping the post lexical scope within the pre
lexical scope.
method
push
(
$new_item
) {
ENTER {
my
$old_height
= self.height;
POST { self.height ==
$old_height
+ 1 }
}
$new_item
==>
push
@.items;
}
method
pop
() {
ENTER {
my
$old_height
= self.height;
POST { self.height ==
$old_height
- 1 }
}
return
pop
@.items;
}
[Conjecture: class and module invariants can similarly be supplied
by embedding C<POST>/C<post> declarations in a C<FOREIGN> block that
only runs
when
any routine of this module is called from
"outside"
the current module or type, however that's
defined
. The C<FOREIGN> block
itself could perhaps refine the concept of what is foreign, much like
an exception handler.]
=item C<ENTER>/C<LEAVE>/C<KEEP>/C<UNDO>/etc.
These supply closures that are to be conditionally executed
before
or
after
the subroutine's C<
do
> block (only
if
used at the outermost level
within the subroutine; technically, these are block traits on the C<
do
>
block, not subroutine traits). These blocks are generally used only
for
their side effects, since most
return
values
will be ignored.
=back
=head2 Parameter traits
The following traits can be applied to many types of parameters.
=over
=item C<is readonly>
Specifies that the parameter cannot be modified (e.g. assigned to,
incremented). It is the
default
for
parameters. On arguments which
are already immutable
values
it is a
no
-op at run
time
; on mutable
containers it may need to create an immutable alias to the mutable object
if
the constraint cannot be enforced entirely at compile
time
. Binding
to a readonly parameter never triggers autovivification.
=item C<is rw>
Specifies that the parameter can be modified (assigned to, incremented,
etc). Requires that the corresponding argument is an lvalue or can be
converted to one.
When applied to a variadic parameter, the C<rw> trait applies to
each
element of the list:
sub
incr (*
@vars
is rw) {
$_
++
for
@vars
}
(The variadic array as a whole is always modifiable, but such
modifications have
no
effect on the original argument list.)
=item C<is
ref
>
Specifies that the parameter is passed by reference. Unlike C<is rw>, the
corresponding argument must already be a suitable lvalue. No attempt at
coercion or autovivification is made, so unsuitable
values
throw an
exception
if
you
try
to modify them within the body of the routine.
=item C<is copy>
Specifies that the parameter receives a distinct,
read
-writable copy of the
original argument. This is commonly known as
"pass-by-value"
.
sub
reprint (
$text
,
$count
is copy) {
print
$text
while
$count
-- > 0;
}
Binding to a copy parameter never triggers autovivification.
=item C<is context(I<ACCESS>)>
Specifies that the parameter is to be treated as an
"environmental"
variable, that is, a lexical that is accessible from the dynamic
scope (see S02). If I<ACCESS> is omitted, defaults to readonly in
any portions of the dynamic scope outside the current lexical scope.
=back
=head1 Advanced subroutine features
=head2 The C<
return
> function
The C<
return
> function notionally throws a control exception that is
caught by the current lexically enclosing C<Routine> to force a
return
through the control logic code of any intermediate block constructs.
(That is, it must unwind the stack of dynamic scopes to the proper
lexical scope belonging to this routine.) With normal blocks
(those that are autoexecuted in place because they're known to the
compiler) this unwinding can likely be optimized away to a
"goto"
.
All C<Routine> declarations have an explicit declarator such as C<
sub
>
or C<method>; bare blocks and
"pointy"
blocks are never considered
to be routines in that sense. To
return
from a block,
use
C<leave>
instead--see below.
The C<
return
> function preserves its argument list as a C<Capture> object, and
responds to the left-hand C<Signature> in a binding. This allows named
return
values
if
the
caller
expects one:
sub
f () {
return
:x<1> }
sub
g (
$x
) {
print
$x
}
my
$x
:= |(f);
g(|(f));
To
return
a literal C<Pair> object, always put it in an additional set of
parentheses:
return
( (:x<1>), (:y<2>) );
Note that the postfix parentheses on the function call don't count as
being
"additional"
. However, as
with
any function, whitespace
after
the
C<
return
> keyword prevents that interpretation and turns it instead
into a list operator:
return
:x<1>, :y<2>;
return
( :x<1>, :y<2> );
If the function ends
with
an expression without an explicit C<
return
>,
that expression is also taken to be a C<Capture>, just as
if
the expression
were the argument to a C<
return
> list operator (
with
whitespace):
sub
f { :x<1> }
sub
f { (:x<1>) }
On the
caller
's end, the C<Capture> is interpolated into any new argument list
much like an array would be, that is, as a
scalar
in
scalar
context, and as a
list in list context. This is the
default
behavior, but the
caller
may
use
C<< prefix:<|> >> to inline the returned
values
as part of the
new argument list. The
caller
may also
bind
the returned C<Capture> directly.
If any function called as part of a
return
list asks what its context
is, it will be told it was called in list context regardless of the
eventual binding of the returned C<Capture>. (This is quite
different from Perl 5, where a C<
return
> statement always propagates its
caller
's context to its own argument(s).) If that is not the
desired behavior you must coerce the call to an appropriate context,
(or declare the
return
type of the function to perform such a coercion).
In any event, such a function is called only once at the
time
the
C<Capture> object is generated, not
when
it is later bound (which
could happen more than once).
=head2 The C<context> and C<
caller
> functions
The C<context> function takes a list of matchers and interprets them
as a navigation path from the current context to a location in the
dynamic scope, either the current context itself or some context
from which the current context was called. It returns an object
that describes that particular dynamic scope, or a false value
if
there is
no
such scope. Numeric arguments are interpreted as number
of contexts to skip,
while
non-numeric arguments scan outward
for
a
context matching the argument as a smartmatch.
The current context is accessed
with
a null argument list.
say
" file "
, context().file,
" line "
, context().line;
which is equivalent to:
say
" file "
, CONTEXT::<$?FILE>,
" line "
, CONTEXT::<$?LINE>;
The immediate
caller
of this context is accessed by skipping one level:
say
" file "
, context(1).file,
" line "
, context(1).line;
You might think that that must be the current function's
caller
,
but that's not necessarily so. This might
return
an outer block in
our
own routine, or even some function elsewhere that implements a
control operator on behalf of
our
block. To get outside your current
routine, see C<
caller
> below.
The C<context> function may be
given
arguments
telling it which higher scope to look
for
. Each argument is processed
in order, left to right. Note that C<Any> and C<0> are
no
-ops:
$ctx
= context();
$ctx
= context(Any);
$ctx
= context(Any,Any);
$ctx
= context(1);
$ctx
= context(2);
$ctx
= context(3);
$ctx
= context(1,0,1,1);
$ctx
= context(
$i
);
Note also that negative numbers are allowed as long as you stay within
the existing context stack:
$ctx
= context(4,-1);
Repeating any smartmatch just matches the same context again
unless
you
intersperse a 1 to skip the current level:
$ctx
= context(Method);
$ctx
= context(Method,Method);
$ctx
= context(Method,1,Method);
$ctx
= context(Method,1,Method,1)
$ctx
= context(1,Block);
$ctx
= context(Sub,1,Sub,1,Sub);
$ctx
= context({ .labels.any eq
'Foo'
});
Note that this
last
potentially differs from the answer returned by
Foo.context
which returns the context of the innermost C<Foo> block in the lexical scope
rather than the dynamic scope. A context also responds to the C<.context>
method, so a
given
context may be used as the basis
for
further navigation:
$ctx
= context(Method,1,Method);
$ctx
= context(Method).context(1).context(Method);
You must supply args to get anywhere
else
, since C<.context> is
the identity operator
when
called on something that is already
a C<Context>:
$ctx
= context;
$ctx
= context.context.context.context;
The C<
caller
> function is special-cased to go outward just far enough
to escape from the current routine scope,
after
first ignoring any
inner blocks that are embedded, or are otherwise pretending to be
"inline"
:
&caller
::=
&context
.assuming({ !.inline }, 1);
Note that this is usually the same as C<context(&?ROUTINE,1)>,
but not always. A call to a returned closure might not even have
C<&?ROUTINE> in its dynamic scope anymore, but it still
has
a
caller
.
So to find where the current routine was called you can
say
:
say
" file "
,
caller
.file,
" line "
,
caller
.line;
which is equivalent to:
say
" file "
, CALLER::<$?FILE>,
" line "
, CALLER::<$?LINE>;
Additional arguments to C<
caller
> are treated as navigational from the
calling context. One context out from your current routine is I<not>
guaranteed to be a C<Routine> context. You must
say
C<
caller
(Routine)>
to get to the
next
-most-inner routine.
Note that C<
caller
(Routine).line> is not necessarily going to give you the
line number that your current routine was called from; you're rather
likely to get the line number of the topmost block that is executing
within that outer routine, where that block contains the call to
your routine.
For either C<context> or C<
caller
>,
the returned context object supports at least the following methods:
.context
.
caller
.leave
.want
.inline
.
package
.file
.line
.
my
.hints
The C<.context> and C<.
caller
> methods work the same as the functions
except that they are relative to the context supplied as invocant.
The C<.leave> method can force an immediate
return
from the
specified context. The C<.want> method returns known smart-matchable
characteristics of the specified context.
The C<.inline> method says whether this block was entered implicitly
by some surrounding control structure. Any
time
you invoke a block or
routine explicitly
with
C<.()> this is false. However, it is
defined
to be true
for
any block entered using dispatcher-level primitives
such as C<.callwith>, C<.callsame>, C<.nextwith>, or C<.nextsame>.
The C<.
my
> method provides access to the lexical namespace in effect at
the
given
dynamic context's current position. It may be used to look
up ordinary lexical variables in that lexical scope. It must not be
used to change any lexical variable that is not marked as C<< context<rw> >>.
The C<.hints> method gives access to a snapshot of compiler symbols in
effect at the point of the call
when
the call was originally compiled.
(For instance, C<
caller
.hints(
'&?ROUTINE'
)> will give you the
caller
's
routine object.) Such
values
are always
read
-only, though in the
case of some (like the
caller
's routine above) may
return
a fixed
object that is nevertheless mutable.
=head2 The C<want> function
The C<want> function returns a C<Signature> object that contains
information about the context in which the current block, closure,
or subroutine was called. The C<want> function is really just
short
for
C<
caller
.want>. (Note that this is what your current
routine's
caller
wants from your routine, not necessarily the same as
C<context.want>
when
you are embedded in a block within a subroutine.
Use C<context.want>
if
that's what you want.)
As
with
normal function signatures, you can test the result of C<want>
with
a
smart match (C<~~>) or a C<
when
>:
given
want {
when
:($) {...}
when
:(*@) {...}
when
:($ is rw) {...}
when
:($,$) {...}
...
}
Or
use
its shorthand methods to reduce line noise:
if
want.item {...}
elsif
want.list {...}
elsif
want.void {...}
elsif
want.rw {...}
The C<.arity> and C<.count> methods also work here:
if
want.arity > 2 {...}
if
want.count > 2 {...}
Their difference is that C<.arity> considers only mandatory parts,
while
C<.count> considers also optional ones, including C<*$>:
(
$x
,
$y
) = f();
=head2 The C<leave> function
As mentioned above, a C<
return
> call causes the innermost surrounding
subroutine, method, rule, token, regex (as a keyword) or macro
to
return
. Only declarations
with
an explicit declarator keyword
(C<
sub
>, C<submethod>, C<method>, C<macro>, C<regex>, C<token>, and
C<rule>) may be returned from. Statement prefixes such a C<
do
> and
C<
try
>
do
not fall into that category.
You cannot
use
C<
return
> to escape directly into the surrounding
context from loops, bare blocks, pointy blocks, or quotelike operators
such as C<rx//>; a C<
return
> within one of those constructs will
continue
searching outward
for
a
"proper"
routine to
return
from.
Nor may you
return
from property blocks such as C<BEGIN> or C<CATCH>
(though blocks executing within the lexical and dynamic scope of a
routine can of course
return
from that outer routine, which means
you can always
return
from a C<CATCH> or a C<FIRST>, but never from
a C<BEGIN> or C<INIT>.)
To
return
from blocks that aren't routines, the C<leave> method is used
instead. (It can be taken to mean either
"go away from"
or "bequeath
to your successor" as appropriate.) The object specifies the scope to
exit
,
and the method's arguments specify the
return
value. If the object
is omitted (by
use
of the function or listop forms), the innermost
block is exited. Otherwise you must
use
something like C<context>
or C<&?BLOCK> or a contextual variable to specify the scope you
want to
exit
. A label (such as a loop label) previously seen in
the lexical scope also works as a kind of singleton context object:
it names a statement that is serving both as an outer lexical scope
and as a context in the current dynamic scope.
As
with
C<
return
>, the arguments are taken to be a C<Capture> holding the
return
values
.
leave;
context(Method).leave;
&?ROUTINE.leave(1,2,3);
&?ROUTINE.leave <== 1,2,3;
OUTER.leave;
&foo
.leave: 1,2,3;
Note that these are equivalent in terms of control flow:
COUNT.leave;
last
COUNT;
However, the first form explicitly sets the
return
value
for
the
entire loop,
while
the second implicitly returns all the previous
successful loop iteration
values
as a list comprehension. (It may,
in fact, be too late to set a
return
value
for
the loop
if
it is
being evaluated lazily!) A C<leave>
from the inner loop block, however, merely specifies the
return
value
for
that iteration:
for
1..10 { leave
$_
* 2 }
Note that this:
leave COUNT;
will always be taken as the function, not the method, so it returns
the C<COUNT> object from the innermost block. The indirect object form
of the method always requires a colon:
leave COUNT: ;
=head2 Temporization
The C<temp> macro temporarily replaces the value of an existing
variable, subroutine, context of a function call, or other object in a
given
scope:
{
temp $
*foo
=
'foo'
;
temp
&bar
:=
sub
{...};
...
}
C<temp> invokes its argument's C<.TEMP> method. The method is expected
to
return
a C<Code> object that can later restore the current
value of the object. At the end of the lexical scope in which the
C<temp> was applied, the subroutine returned by the C<.TEMP> method is
executed.
The
default
C<.TEMP> method
for
variables simply creates
a closure that assigns the variable's pre-C<temp> value
back to the variable.
New kinds of temporization can be created by writing storage classes
with
their own C<.TEMP> methods:
class LoudArray is Array {
method TEMP {
print
"Replacing $.WHICH() at {caller.location}\n"
;
my
$restorer
= $.SUPER::TEMP();
return
{
print
"Restoring $.WHICH() at {caller.location}\n"
;
$restorer
();
};
}
}
You can also modify the behaviour of temporized code structures, by
giving them a C<TEMP> block. As
with
C<.TEMP> methods, this block is
expected to
return
a closure, which will be executed at the end of
the temporizing scope to restore the subroutine to its pre-C<temp> state:
my
$next
= 0;
sub
next
{
my
$curr
=
$next
++;
TEMP {{
$next
=
$curr
}}
return
$curr
;
}
say
next
();
say
next
();
say
next
();
if
(
$hiccough
) {
say
temp
next
();
say
next
();
say
next
();
}
say
next
();
say
next
();
Note that C<temp> must be a macro rather than a function because the
temporization must be arranged
before
the function causes any state
changes, and
if
it were a normal argument to a normal function, the state
change would be happen
before
C<temp> got control.
Hypothetical variables
use
the same mechanism, except that the restoring
closure is called only on failure.
Note that contextual variables may be a better solution than temporized
globals in the face of multithreading.
=head2 Wrapping
Every C<Routine> object
has
a C<.wrap> method. This method expects a
single C<Code> argument. Within the code, the special C<callsame>,
C<callwith>, C<nextsame> and C<nextwith> functions will invoke the
original routine, but
do
not introduce an official C<CALLER> frame:
sub
thermo (
$t
) {...}
$handle
=
&thermo
.wrap( { callwith( ($^t-32)/1.8 ) } );
The C<callwith> function lets you pass your own arguments to the wrapped
function. The C<callsame> function takes
no
argument; it
implicitly passes the original argument list through unchanged.
The call to C<.wrap> replaces the original C<Routine>
with
the C<Code>
argument, and arranges that any call to C<callsame>, C<callwith>,
C<nextsame> or C<nextwith> invokes the previous version of the
routine. In other words, the call to C<.wrap>
has
more or less the
same effect as:
&old_thermo
:=
&thermo
;
&thermo
:=
sub
(
$t
) { old_thermo( (
$t
-32)/1.8 ) }
except that C<
&thermo
> is mutated in-place, so C<
&thermo
.WHICH> stays the same
after
the C<.wrap>.
The call to C<.wrap> returns a unique handle that can later be passed to
the C<.unwrap> method, to undo the wrapping:
&thermo
.unwrap(
$handle
);
This does not affect any other wrappings placed to the routine.
A wrapping can also be restricted to a particular dynamic scope
with
temporization:
temp
&thermo
.wrap( { callwith($^t + 273.16) } );
The entire argument list may be captured by binding to a C<Capture> parameter.
It can then be passed to C<callwith> using that name:
&thermo
.wrap( -> |
$args
{ callwith(|
$args
) * 2 } );
In this case only the
return
value is changed.
The wrapper is not required to call the original routine; it can call another
C<Code> object by passing the C<Capture> to its C<callwith> method:
&thermo
.wrap(
sub
(|
$args
) {
&other_thermo
.callwith(|
$args
) } );
or more briefly:
&thermo
.wrap( {
&other_thermo
.callsame } );
Since the method versions of C<callsame>, C<callwith>, C<nextsame>,
and C<nextwith> specify an explicit destination, their semantics
do
not change outside of wrappers. However, the corresponding functions
have
no
explicit destination, so instead they implicitly call the
next
-most-likely method or multi-
sub
; see S12
for
details.
As
with
any
return
value, you may capture the returned C<Capture> of C<call>
by binding:
my
|
$retval
:= callwith(|
$args
);
...
return
|
$retval
;
Alternately, you may prevent any
return
at all by using the variants
C<nextsame> and C<nextwith>. Arguments are passed just as
with
C<callsame> and C<callwith>, but a tail call is explicitly enforced;
any code following the call will be unreached, as
if
a
return
had
been executed there
before
calling into the destination routine.
Within an ordinary method dispatch these functions treat the rest
of the dispatcher's candidate list as the wrapped function, which
generally works out to calling the same method in one of
our
parent
(or older sibling) classes. Likewise within a multiple dispatch the
current routine may defer to candidates further down the candidate
list. Although not necessarily related by a class hierarchy, such
later candidates are considered more generic and hence likelier
to be able to handle various unforeseen conditions (perhaps).
=head2 The C<&?ROUTINE> object
C<&?ROUTINE> is always an alias
for
the lexically innermost C<Routine>
(which may be a C<Sub>, C<Method>, or C<Submethod>), so you can specify
tail-recursion on an anonymous
sub
:
my
$anonfactorial
=
sub
(Int
$n
) {
return
1
if
$n
<2;
return
$n
* &?ROUTINE(
$n
-1);
};
You can get the current routine name by calling C<&?ROUTINE.name>.
(The outermost routine at a file-scoped compilation unit is always
named C<
&MAIN
> in the file's
package
.)
Note that C<&?ROUTINE> refers to the current single
sub
, even
if
it is
declared
"multi"
. To redispatch to the entire suite under a
given
short
name, just
use
the named form, since there are
no
anonymous multis.
=head2 The C<&?BLOCK> object
C<&?BLOCK> is always an alias
for
the current block, so you can
specify tail-recursion on an anonymous block:
my
$anonfactorial
= -> Int
$n
{
$n
< 2
?? 1
!!
$n
* &?BLOCK(
$n
-1)
};
C<&?BLOCK.labels> contains a list of all labels of the current block.
This is typically matched by saying
if
&?BLOCK.labels.any eq
'Foo'
{...}
If the innermost lexical block happens to be the main block of a C<Routine>,
then C<&?BLOCK> just returns the C<Block> object, not the C<Routine> object
that contains it.
[Note: to refer to any C<$?> or C<&?> variable at the
time
the
sub
or
block is being compiled,
use
the C<< COMPILING:: >> pseudopackage.]
=head2 Currying
Every C<Code> object
has
a C<.assuming> method. This method does a partial
binding of a set of arguments to a signature and returns a new function
that takes only the remaining arguments.
&textfrom
:=
&substr
.assuming(
str
=>
$text
,
len
=>Inf);
or equivalently:
&textfrom
:=
&substr
.assuming(:str(
$text
) :len(Inf));
or even:
&textfrom
:=
&substr
.assuming:str(
$text
):len(Inf);
It returns a C<Code> object that implements the same behaviour
as the original subroutine, but
has
the
values
passed to C<.assuming>
already bound to the corresponding parameters:
$all
= textfrom(0);
$some
= textfrom(50);
$last
= textfrom(-1);
The result of a C<
use
> statement is a (compile-
time
) object that also
has
a C<.assuming> method, allowing the user to
bind
parameters in all the
module's subroutines/methods/etc. simultaneously:
This special form should generally be restricted to named parameters.
To curry a particular multi variant, it may be necessary to specify the type
for
one or more of its parameters:
&woof
::=
&bark
:(Dog).assuming :pitch<low>;
&pine
::=
&bark
:(Tree).assuming :pitch<yes>;
=head2 Macros
Macros are functions or operators that are called by the compiler as
soon as their arguments are parsed (
if
not sooner). The syntactic
effect of a macro declaration or importation is always lexically
scoped, even
if
the name of the macro is visible elsewhere. As
with
ordinary operators, macros may be classified by their grammatical
category. For a
given
grammatical category, a
default
parsing rule or
set of rules is used, but those rules that have not yet been
"used"
by the
time
the macro keyword or token is seen can be replaced by
use
of
"is parsed"
trait. (This means,
for
instance, that an infix
operator can change the parse rules
for
its right operand but not
its left operand.)
In the absence of a signature to the contrary, a macro is called as
if
it were a method on the current match object returned from the
grammar rule being reduced; that is, all the current parse information
is available by treating C<self> as
if
it were a C<$/> object.
[Conjecture: alternate representations may be available
if
arguments
are declared
with
particular AST types.]
Macros may
return
either a string to be reparsed, or a syntax tree
that needs
no
further parsing. The textual form is handy, but the
syntax tree form is generally preferred because it allows the parser
and debugger to give better error messages. Textual substitution
on the other hand tends to yield error messages that are opaque to
the user. Syntax trees are also better in general because they are
reversible, so things like syntax highlighters can get back to the
original language and know which parts of the derived program come
from which parts of the user's view of the program. Nevertheless,
it's difficult to
return
a syntax tree
for
an unbalanced construct,
and in such cases a textual macro may be a clearer expression of the
evil thing you're trying to
do
.
If you call a macro at runtime, the result of the macro is automatically
evaluated again, so the two calls below
print
the same thing:
macro f {
'1 + 1'
}
say
f();
say
&f
();
=head2 Quasiquoting
In aid of returning syntax tree, Perl provides a
"quasiquoting"
mechanism using the quote C<
q:code>, followed by a block intended to
represent an AST:
return
q:code { say "foo" };
Modifiers to the C<:
code> adverb can modify the operation:
:ast(MyAst)
:lang(Ruby)
:unquote<[: :]>
Within a quasiquote, variable and function names resolve according
to the lexical scope of the macro definition. Unrecognized symbols raise
errors
when
the macro is being compiled, I<not>
when
it's being used.
To make a symbol resolve to the (partially compiled) scope of the macro
call,
use
the C<COMPILING::> pseudo-
package
:
macro moose () {
q:code { $COMPILING:
:x } }
moose();
my
$x
;
moose();
If you want to mention symbols from the scope of the macro call,
use
the
import
syntax as modifiers to C<:code>:
:COMPILING<
$x
>
:COMPILING
If those symbols
do
not exist in the scope of the compiling scope, a
compile-
time
exception is thrown at macro call
time
.
Similarly, in the macro body you may either refer to the C<
$x
> declared in the
scope of the macro call as C<
$COMPILING::x
>, or
bind
to them explicitly:
my
$x
:=
$COMPILING::x
;
You may also
use
an
import
list to
bind
multiple symbols into the
macro's lexical scope:
Note that you need to
use
the run-
time
C<:=> and C<
require
> forms, not C<::=>
and C<
use
>, because the macro
caller
's compile-time is the macro'
s runtime.
=head2 Splicing
Bare AST variables (such as the arguments to the macro) may not be
spliced directly into a quasiquote because they would be taken as
normal bindings. Likewise, program text strings to be inserted need
to be specially marked or they will be bound normally. To insert a
"unquoted"
expression of either type within a quasiquote,
use
the
quasiquote delimiter tripled, typically a bracketing quote of some
sort
:
return
q:code { say $a + {{{ $ast }}} }
return q:
code [
say
$a
+ [[[
$ast
]]] ]
return
q:code < say $a + <<< $ast >>> >
return q:
code (
say
$a
+ (((
$ast
))) )
The delimiters don't have to be bracketing quotes, but the following
is probably to be construed as Bad Style:
return
q:code / say $a + /// $ast /// /
(Note to implementors:
this must not be implemented by finding
the final closing delimiter and preprocessing, or we'll violate
our
one-pass parsing rule. Perl 6 parsing rules are parameterized to know
their closing delimiter, so adding the opening delimiter should not
be a hardship. Alternately the opening delimiter can be deduced from
the closing delimiter. Writing a rule that looks
for
three opening
delimiters in a row should not be a problem. It
has
to be a special
grammar rule, though, not a fixed token, since we need to be able to
nest code blocks
with
different delimiters. Likewise
when
parsing the
inner expression, the inner parser subrule is parameterized to know that
C<}}}> or whatever is its closing delimiter.)
Unquoted expressions are inserted appropriately depending on the
type of the variable, which may be either a syntax tree or a string.
(Again, syntax tree is preferred.) The case is similar to that of a
macro called from within the quasiquote, insofar as reparsing only
happens
with
the string version of interpolation, except that such
a reparse happens at macro call
time
rather than macro definition
time
, so its result cannot change the parser's expectations about
what follows the interpolated variable.
Hence,
while
the quasiquote itself is being parsed, the syntactic
interpolation of a unquoted expression into the quasiquote always
results in the expectation of an operator following the variable.
(You must
use
a call to a submacro
if
you want to expect something
else
.) Of course, the macro definition as a whole can expect
whatever it likes afterwards, according to its syntactic category.
(Generally, a term expects a following postfix or infix operator,
and an operator expects a following term or prefix operator. This
does not matter
for
textual macros, however, since the reparse of
the text determines subsequent expectations.)
Quasiquotes
default
to hygienic lexical scoping, just like closures.
The visibility of lexical variables is limited to the
q:code expression
by default. A variable declaration can be made externally visible using
the C<COMPILING:
:> pseudo-
package
. Individual variables can be made visible,
or all top-level variable declarations can be exposed using the
C<
q:code(:
COMPILING)> form.
Both examples below will add C<
$new_variable
> to the lexical scope of
the macro call:
q:code { my $COMPILING:
:new_variable;
my
$private_var
; ... }
q:code(:
COMPILING) {
my
$new_variable
; {
my
$private_var
; ... } }
(Note that C<:COMPILING>
has
additional effects described in L<Macros>.)
=head1 Other matters
=head2 Anonymous hashes vs blocks
C<{...}> is always a block. However,
if
it is completely empty or
consists of a single list, the first element of which is either a hash
or a pair, it is executed immediately to compose a Hash object.
The standard C<pair> list operator is equivalent to:
sub
pair (*
@LIST
) {
my
@pairs
;
for
@LIST
->
$key
,
$val
{
push
@pairs
,
$key
=>
$val
;
}
return
@pairs
;
}
or more succinctly (and lazily):
sub
pair (*
@LIST
) {
gather
for
@LIST
->
$key
,
$val
{
take
$key
=>
$val
;
}
}
The standard C<hash> list operator is equivalent to:
sub
hash (*
@LIST
) {
return
{ pair
@LIST
};
}
So you may
use
C<
sub
> or C<hash> or C<pair> to disambiguate:
$obj
=
sub
{ 1, 2, 3, 4, 5, 6 };
$obj
= { 1, 2, 3, 4, 5, 6 };
$obj
= {
1
=>2,
3
=>4,
5
=>6 };
$obj
= {
1
=>2, 3, 4, 5, 6 };
$obj
= hash( 1, 2, 3, 4, 5, 6 );
$obj
= hash 1, 2, 3, 4, 5, 6 ;
$obj
= { pair 1, 2, 3, 4, 5, 6 };
=head2 Pairs as lvalues
Since they are immutable, Pair objects may not be directly assigned:
(
key
=>
$var
) =
"value"
;
However,
when
binding pairs, names can be used to
"match up"
lvalues
and rvalues, provided you
write
the left side as a signature using
C<:(...)> notation:
:(:who(
$name
), :why(
$reason
)) := (
why
=>
$because
,
who
=>
"me"
);
(Otherwise the parser doesn't know it should parse the insides as a
signature and not as an ordinary expression
until
it gets to the C<:=>,
and that would be bad. Alternately, the C<
my
> declarator can also
force treatment of its argument as a signature.)
=head2 Out-of-scope names
C<< GLOBAL::<
$varname
> >> specifies the C<
$varname
> declared in the C<*>
namespace. Or maybe it's the other way
around
...
C<< CALLER::<
$varname
> >> specifies the C<
$varname
> visible in
the dynamic scope from which the current block/closure/subroutine
was called, provided that variable is declared
with
the
"C<is context>"
trait. (Implicit lexicals such as C<
$_
> are automatically
assumed to be contextual.)
C<< CONTEXT::<
$varname
> >> specifies the C<
$varname
> visible in the
innermost dynamic scope that declares the variable
with
the
"C<is context>"
trait.
C<< MY::<
$varname
> >> specifies the lexical C<
$varname
> declared in the current
lexical scope.
C<< OUR::<
$varname
> >> specifies the C<
$varname
> declared in the current
package
's namespace.
C<< COMPILING::<
$varname
> >> specifies the C<
$varname
> declared (or about
to be declared) in the lexical scope currently being compiled.
C<< OUTER::<
$varname
> >> specifies the C<
$varname
> declared in the lexical
scope surrounding the current lexical scope (i.e. the scope in which
the current block was
defined
).
=head2 Declaring a C<MAIN> subroutine
Ordinarily a top-level Perl
"script"
just evaluates its anonymous
mainline code and exits. During the mainline code, the program's
arguments are available in raw form from the C<
@ARGS
> array. At the end of
the mainline code, however, a C<MAIN> subroutine will be called
with
whatever command-line arguments remain in C<
@ARGS
>. This call is
performed
if
and only
if
:
=over
=item a)
the compilation unit was directly invoked rather than
by being required by another compilation unit, and
=item b)
the compilation unit declares a C<Routine> named
"C<MAIN>"
, and
=item c)
the mainline code is not terminated prematurely, such as
with
an explicit call
to C<
exit
>, or an uncaught exception.
=back
The command line arguments (or what's left of them
after
mainline
processing) is magically converted into a C<Capture> and passed to
C<MAIN> as its arguments, so switches may be bound as named args and
other arguments to the program may be bound to positional parameters
or the slurpy array:
sub
MAIN (
$directory
, :
$verbose
, *
%other
, *
@filenames
) {
for
@filenames
{ ... }
}
If C<MAIN> is declared as a set of multi subs, MMD dispatch is performed.
As
with
module and class declarations, a
sub
declaration
ending in semicolon is allowed at the outermost file scope
if
it is the
first such declaration, in which case the rest of the file is the body:
sub
MAIN (
$directory
, :
$verbose
, *
%other
, *
@filenames
);
for
@filenames
{ ... }
This form is allowed only
for
simple subs named C<MAIN> that are intended
to be run from the command line.
Proto or multi definitions may not be written in semicolon form,
nor may C<MAIN> subs within a module or class. (A C<MAIN> routine
is allowed in a module or class, but is not usually invoked
unless
the file is run directly (see a above). This corresponds to the
"unless caller"
idiom of Perl 5.) In general, you may have only one
semicolon-style declaration that controls the whole file.
If an attempted dispatch to C<MAIN> fails, the C<USAGE> routine is called.
If there is
no
C<USAGE> routine, a
default
message is printed. This
usage message is automatically generated from the signature (or
signatures) of C<MAIN>. This message is generated at compile
time
,
and hence is available at any later
time
as C<$?USAGE>.
Common Unix command-line conventions are mapped onto the capture
as follows:
On command line...
$ARGS
capture gets...
-name :name
-name=value :name<value>
-name=
"spacy value"
:name«
'spacy value'
»
-name=
'spacy value'
:name«
'spacy value'
»
-name=val1,
'val 2'
, etc :name«val1
'val 2'
etc»
--name :name
--name=value :name<value>
--name value :name<value>
--name=
"spacy value"
:name«
'spacy value'
»
--name
"spacy value"
:name«
'spacy value'
»
--name=
'spacy value'
:name«
'spacy value'
»
--name
'spacy value'
:name«
'spacy value'
»
--name=val1,
'val 2'
, etc :name«val1
'val 2'
etc»
--name val1
'val 2'
etc :name«val1
'val 2'
etc»
--
+name :!name
+name=value :name<value> but False
+name=
"spacy value"
:name«
'spacy value'
» but False
+name=
'spacy value'
:name«
'spacy value'
» but False
+name=val1,
'val 2'
, etc :name«val1
'val 2'
etc» but False
:name :name
:!name :!name
:/name :!name
:name=value :name<value>
:name=
"spacy value"
:name«
'spacy value'
»
:name=
'spacy value'
:name«
'spacy value'
»
:name=val1,
'val 2'
, etc :name«val1
'val 2'
etc»
Exact Perl 6 forms are okay
if
quoted from shell processing:
':name<value>'
:name<value>
':name(42)'
:name(42)
For security reasons, only constants are allowed as arguments, however.
The
default
C<Capture> mapper pays attention to declaration of C<MAIN>'s
parameters to resolve certain ambiguities. A C<--foo> switch needs to
know whether to treat the
next
word from the command line as an argument.
(Allowing the spacey form gives the shell room to
do
various things to
the argument.) The short C<-foo> form never assumes a separate argument,
and you must
use
C<=>. For the C<--foo> form,
if
there is a named parameter
corresponding to the switch name, and it is of type C<Bool>, then
no
argument
is expected. Otherwise an argument is expected. If the parameter is of
a non-slurpy array type, all subsequent words up to the
next
command-line
switch (or the end of the list) are bound to that parameter.
As usual, switches are assumed to be first, and everything
after
the first non-switch, or any switches
after
a C<-->, are treated
as positionals or go into the slurpy array (even
if
they look like
switches). Other policies may easily be introduced by calling C<MAIN>
explicitly. For instance, you can parse your arguments
with
a grammar
and pass the resulting C<Match> object as a C<Capture> to C<MAIN>:
@
*ARGS
~~ /<MyGrammar::top>/;
MAIN(|$/);
exit
;
sub
MAIN (
$frompart
,
$topart
, *
@rest
) {
if
$frompart
<foo> { ... }
if
$topart
<bar><baz> { ... }
}
This will conveniently
bind
top-level named matches to named
parameters, but still give you access to nested matches through those
parameters, just as any C<Match> object would. Of course, in this example,
there's
no
particular reason the
sub
has
to be named C<MAIN>.
To give both a long and a short switch name, you may
use
the pair
notation. The key will be considered the short switch name,
while
the variable name will be considered the long switch name. So
if
the previous declaration had been:
sub
MAIN (:f(
$frompart
), :t(
$topart
), *
@rest
)
then you could invoke the program
with
either C<-f> or C<--frompart>
to specify the first parameter. Likewise you could
use
either C<-t>
or C<--topart>
for
the second parameter.
If a switch of the form C<-abc> cannot be matched against any
particular parameter, an attempt will be made to match it as
if
it
had been written C<-a -b -c>.
=
for
vim:set expandtab sw=4: