=head3 TITLE
Apocalypse 12: Objects
=head1 AUTHOR
Larry Wall <larry
@wall
.org>
=head1 VERSION
Maintainer: Larry Wall <larry
@wall
.org>
Date: 13 Apr 2004
Last Modified: 4 Dec 2004
Number: 12
Version: 5
The official unofficial slogan of Perl 6 is "Second System Syndrome
Done Right!". After you
read
this Apocalypse you will at least be
certain that we got the
"Second System"
part down pat. But we've
also put in a little bit of work on the
"Done Right"
part, which
we hope you'll recognize. The management of complexity is complex,
but only
if
you think about it. The goal of Perl 6 is to discourage
you from thinking about it unnecessarily.
Speaking of thinking unnecessarily, please don't think that everything
we
write
here is absolutely true. We expect some things to change
as people point out various difficulties. That's the way all the other
Apocalypses have worked, so why should this one be different?
When I
say
"we"
, I don't just mean
"me"
. I mean everyone who
has
participated in the design, including the Perl 6 cabal, er, design team,
the readers (and writers) of the perl6-language mailing list, and all
the participants who wrote or commented on the original RFCs. For this
Apocalypse we've directly considered the following RFCs:
RFC PSA Title
=== === =====
032 abb A method of allowing foreign objects in perl
067 abb Deep Copying, aka, cloning
around
.
092 abb Extensible Meta-Object Protocol
095 acc Object Classes
101 bcc Apache-like Event and Dispatch Handlers
126 aaa Ensuring Perl's object-oriented future
137 bdd Overview: Perl OO should I<not> be fundamentally changed.
147 rr Split Scalars and Objects/References into Two Types
152 bdd Replace invocant in
@_
with
self() builtin
163 bdd Objects: Autoaccessors
for
object data structures
171 rr
my
Dog
$spot
should call a constructor implicitly
174 bdd Improved parsing and flexibility of indirect object syntax
187 abb Objects : Mandatory and enhanced second argument to C<
bless
>
188 acc Objects : Private
keys
and methods
189 abb Objects : Hierarchical calls to initializers and destructors
190 acc Objects : NEXT pseudoclass
for
method redispatch
193 acc Objects : Core support
for
method delegation
224 bdd Objects : Rationalizing C<
ref
>, C<attribute::reftype>, and
C<builtin:blessed>
244 cdr Method calls should not suffer from the action on a distance
254 abb Class Collections: Provide the ability to overload classes
256 abb Objects : Native support
for
multimethods
265 abc Interface polymorphism considered lovely
277 bbb Method calls SHOULD suffer from ambiguity by
default
307 rr PRAYER - what gets said
when
you C<
bless
> something
335 acc Class Methods Introspection: what methods does this object
support?
336 bbb
use
strict
'objects'
: a new pragma
for
using Java-like
objects in Perl
These RFCs contain many interesting ideas, and many more "cries
for
help". Usually in these Apocalypses, I discuss the design
with
respect to
each
of the RFCs. However, in this case I won't, because
most of these RFCs fail in exactly the same way--they assume the Perl
6 object model to be a set of extensions to the Perl 5 object model.
But as it turns out, that would have been a great way to end up
with
Second System Syndrome Done Wrong. Perl 5's OO
system
is a
great workbench, but it
has
some issues that have to be dealt
with
systematically rather than piecemeal.
=head2 Some of the Problems
with
Perl 5 OO
=head3 A little too orthogonal
It
has
often been claimed that Perl 5 OO was
"bolted on"
, but that's
inaccurate. It was
"bolted through"
, at right angles to all other
reference types, such that any reference could be blessed into being
an object. That
's way cool, but it'
s often a little I<too> cool.
=head3 Not quite orthogonal enough
It's too hard to treat built-in types as objects
when
you want to.
Perl 5's C<
tie
> interface helps, but is suboptimal in several ways,
not the least of which is that it only works on variables, not
values
.
=head3 Forced non-encapsulation
Because of the ability to turn (almost) anything into an object, a
derived class had to be aware of the internal data type of its base
class. Even
after
convention settled on hashes as the appropriate
default
data structure, one had to be careful not to stomp on the
attributes of one's base class.
=head3 A little too minimal
Some people will be surprised to hear it, but Perl is a minimalist
language at heart. It's just minimalistic about weird things
compared to your average language. Just as the binding of parameters
to C<
@_
> was a minimalistic approach, so too the entire Perl 5 object
system
was an attempt to see how far you could drive a few features.
But many of the following difficulties stem from that.
=head3 Too much keyword reuse
In Perl 5, a class is just a
package
, a method is just a subroutine,
and an object is just a blessed referent. That's all well and good,
and it is still fundamentally true in Perl 6. However, Perl 5 made
the mistake of reusing the same keywords to express similar ideas.
That's not how natural languages work--we often
use
different words
to express similar ideas, the better to make subtle distinctions.
=head3 Too difficult to capture metadata
Because Perl 5 reused keywords and treated parameter binding as
something you
do
via a list assignment at run-
time
, it was
next
to
impossible
for
the compiler to
tell
which subroutines were methods
and which ones were really just subroutines. Because hashes are
mutable, it was difficult to
tell
at compile
time
what the attribute
names were going to be.
=head3 Inside-out interfaces
The Perl 5 solution to the previous problem was to declare more
things at compile
time
. Unfortunately, since the main way to
do
things at compile
time
was to invoke C<
use
>, all the compile-
time
interfaces were shoehorned into C<
use
>'s syntax, which, powerful
though it may be, is often completely inside-out from a reasonable
interface. For instance, overloading is done by passing a list of
pairs to C<
use
>,
when
it would be much more natural to simply declare
appropriate methods
with
appropriate names and traits. The C<base>
and C<fields> pragmas are also kludges.
=head3 Not enough convention
Because of the flexibility of the Perl 5 approach, there was never any
"obvious"
way to
do
it. So best practices had to be developed by
each
group, and of course everyone came up
with
a slightly different solution.
Now, we're not going to be like some folks and confuse
"obvious"
with
"the
only way to
do
it". This is still Perl,
after
all, and the flexibility
will still be there
if
you need it. But by convention, there needs to
be a standard look to objects and classes so that they can interoperate.
There's more than one way to
do
it, but one of those is the standard way.
=head3 Wrong conventions
The
use
of arrow where most of the rest of the world uses dot was confusing.
=head3 Everything possible, but difficult
The upshot of the previous problems was that,
while
Perl 5 made it
easy to I<
use
> objects and classes, it was difficult to
try
to define
classes or derive from them.
=head2 Perl 5 Non-Problems
While there are plenty of problems
with
Perl 5's OO
system
, there are some
things it did right.
=head3 Generating class by running code at compile
time
One of the big advances in Perl 5 was that a program could be in charge
of its own compilation via C<
use
> statements and C<BEGIN> blocks.
A Perl program isn't a passive thing that a compiler
has
its way
with
,
willy nilly. It's an active thing that negotiates
with
the compiler
for
a set of semantics. In Perl 6 we're not shying away from that,
but taking it further, and at the same
time
hiding it in a more
declarative style. So you need to be aware that, although many of
the things we'll be talking about here I<look> like declarations, they
trigger Perl code that runs during compilation. Of such methods are
metaclasses made. (While these methods are often triggered by grammar
rule reductions, remember from Apocalypse 5 that all these grammar
rules are also running under the user's control. You can tweak
the language without the crude ax of source filtering.)
=head3 There are many roads to polymorphism
In looking
for
an
"obvious"
way to conventionalize Perl's object
system
, we shouldn
't overlook the fact that there'
s more than one
obvious way, and different approaches work better in different
circumstances. Inheritance is one way (and typically the most
overused), but we also need good support
for
composition,
delegation, and parametric types. Cutting across those techniques
are issues of interface, implementation, and mixtures of interface
and implementation. There are multiple strategies
for
ambiguity
resolution as well, and
no
single strategy is always right. (Unless
the boss says so.)
=head3 People using a class shouldn't have to think hard
In making it easier to define and derive classes, we must be careful not
to make it harder to I<
use
> classes.
=head2 Trust in Convention, but Keep Your Powder Dry
So to summarize this summary, what we're proposing to develop is a
set of conventions
for
how object orientation ought to work in Perl
6--by
default
. But there should also be enough hooks to customize
things to your heart's content, hopefully without undue impact on the
sensibilities of others.
And in particular, there's enough flexibility in the new approach
that,
if
you want to, you can still program in a way much like the
old Perl 5 approach. There's still a C<
bless
> method, and you can
still pretend that an object is a hash--though it isn't anymore.
However, as
with
all the rest of the design of Perl 6, the overriding
concern
has
been that the language scale well. That means Perl
has
to scale down as well as up. Perl
has
to work well both as a first
language and as a
last
language. We believe
our
design fulfills
this goal--though, of course, only
time
will
tell
.
One other note:
if
you haven't
read
the previous Apocalypses and
Exegeses, a lot of this is going to be complete gobbledygook to you.
(Of course, even
if
you I<have>
read
them, this might still be
gobbledygook. You take your chances in life...)
=head1 An Easy Example
Before we start talking about all the hard things that should be
possible, let's look at an example of some of the easy things that
should be easy. Suppose we define a Point object that (
for
some
strange reason) allows you to adjust the y-axis but not the x-axis.
class Point {
has
$.x;
has
$.y is rw;
method clear () { $.x = 0; $.y = 0; }
}
my
$point
= Point.new(
x
=> 2,
y
=> 3);
$a
=
$point
.y;
$point
.y = 42;
$b
=
$point
.x;
$point
.x = -1;
$point
.clear;
If you compare that to how it would have to be written in Perl 5,
you'll note a number of differences:
=over 4
=item *
It uses the keywords C<class> and C<method> rather than C<
package
>
and C<
sub
>.
=item *
The attributes are named in explicit declarations rather than implicit
hash
keys
.
=item *
It is impossible to confuse the attribute variables
with
ordinary
variables because of the extra dot (which also associates the
attributes visually
with
method calls).
=item *
Perhaps most importantly, we did not have to commit to using a hash
(or any other external data structure)
for
the object's
values
.
=item *
We didn't have to
write
a constructor.
=item *
The implicit constructor automatically knows how to
map
named arguments
to the attribute names.
=item *
We didn't have to
write
the accessor methods.
=item *
The accessors are by
default
read
-only outside the class, and you
can't get at the attributes from outside the class without an accessor.
(Inside the class you can
use
the attributes directly.)
=item *
The invocant of the C<clear> method is implicit.
=item *
And perhaps most obviously, Perl 6 uses C<.> instead of C<< -> >>
to dereference an object.
=back
Now suppose we want to derive from Point, and add a z-axis. That's just
class Point3d is Point {
has
$:z = 123;
method clear () { $:z = 0;
next
; }
}
my
$point3d
= Point3d.new(
x
=> 2,
y
=> 3,
z
=> 4);
$c
=
$point3d
.z;
The implicit constructor automatically sorts out the named arguments
to the correct initializers
for
you. If you omit the z value,
it will
default
to 123. And the new C<clear> method calls the old
clear method merely by invoking C<
next
>, without the dodgy
"super"
semantics that break down under MI. We also declared the C<$:z>
attribute to be completely private by using a colon instead of a dot.
No accessor
for
it is visible outside the class. (And yes, OO purists,
our
other attributes should probably have been private in the first
place...that
's why we'
re making it just as easy to
write
a private
attribute as a public one.)
If any of that makes your head spin, I'm sure the following will clear
it right up. C<:-)>
=head1 Classes
A class is what provides a name and a place
for
the abstract behavior
of a set of objects said to belong to the class.
As in Perl 5, a class is still
"just a funny package"
, structurally
speaking. Syntactically, however, a class is now distinct from a
package
or a module. And the body of a class definition now runs in
the context of a metaclass, which is just a way of saying that it
has
a metaclass instance as its (undeclared) invocant. (An
"invocant"
is what we call the object or class on behalf of which a method
is being called.) Hence class definitions, though apparently
declarative, are also executing code to build the class definition,
and the various declarations within the class are also running bits
of code. By convention classes will
use
a standard metaclass, but
that's just convention. (A very strong convention, we hope.)
The primary role of a class is to manage instances, that is, objects.
So a class must worry about object creation and destruction, and
everything that happens in between. Classes have a secondary
role as units of software reuse, in that they can be inherited
from or delegated to. However, because this is a secondary role,
and because of weaknesses in models of inheritance, composition,
and delegation, Perl 6 will
split
out the notion of software reuse
into a separate class-like entity called a
"role"
. Roles are an
abstraction mechanism
for
use
by classes that don't care about
the secondary aspects of software reuse, or that (looking at it
the other way) care so much about it that they want to encapsulate
any decisions about implementation, composition, delegation, and
maybe even inheritance. Sounds fancy, but just think of them as
includes of partial classes,
with
some safety checks. Roles don't
manage objects. They manage interfaces and other abstract behavior
(like
default
implementations), and they help classes manage objects.
As such, a role may only be composed into a class or into another role,
never inherited from or delegated to. That's what classes are
for
.
Classes are arranged in an inheritance hierarchy by their
"isa"
relationships. Perl 6 supports multiple inheritance, but makes it
easy to program in a single-inheritance style, insofar as roles make
it easy to mix in (or delegate, or parameterize) private implementation
details that don't belong in the public inheritance tree.
In those cases where MI is used, there can be ambiguities in the
pecking order of classes in different branches. Perl 6 will have
a canonical way to disambiguate these, but by design the dispatch
policy is separable from inheritance, so that you can change the
rules
for
a
given
set of classes. (Certainly the rules can change
when
we call into another language's class hierarchy,
for
instance.)
Where possible, class names are treated polymorphically, just
as method names are. This powerful feature makes it possible to
inherit systems of classes in parallel. (These classes might be
inner classes, or they might be inner aliases to outer classes.)
By making the class names
"virtual"
, the base classes can refer to
the appropriate derived classes without knowing their full name.
That sounds complicated, but it just means that
if
you
do
the normal
thing, Perl will call the right class instead of the one you thought
it was going to call. C<< :-) >>
(As in C++ culture, we
use
the term
"virtual"
to denote a method that
dispatched based on the actual run-
time
type of the object rather
than the declared type of the variable. C++ classes have to declare
their methods to be virtual explicitly. All of Perl's public methods
are virtual implicitly.)
You may derive from any built-in class. For high-level object classes
such as C<Int> or C<Num> there are
no
restrictions on how you derive.
For low-level representational classes like C<
int
> or C<num>, you may
not change the representation of the value; you may only add behaviors.
(If you want to change the representation, you should probably be
using composition instead of inheritance. Or define your own low-level
type.) Apart from this, you don't need to worry about the difference
between C<
int
> and C<Int>, or C<num> and C<Num>, since Perl 6 will
do
autoboxing.
=head2 Declaration of Classes
Class declarations may be either file scoped or block scoped.
A file-scoped declaration must be the first thing in the file, and
looks like this:
class Dog is Mammal;
has
Limb @.paws;
method walk () { .paws».move() }
That
has
the advantage of avoiding the
use
of one set of braces,
letting you put everything up against left margin. It is otherwise
identical to a block-scoped class, which looks like this:
class Dog is Mammal {
has
Limb @.paws;
method walk () { .paws».move() }
}
An incomplete class definition makes
use
of the C<...> ("yada, yada,
yada") operator:
class Dog is Mammal {...}
The declaration of a class name introduces the name as a valid
bare identifier or name. In the absence of such a declaration, the
name of a class in an expression must be introduced
with
the C<::>
class sigil, or it will be considered a bareword and rejected, since
Perl 6 doesn't allow barewords. Once the name is declared however,
it may be used as an ordinary term in an expression. Unlike in Perl
5, you should not view it as a bareword string. Rather, you should
view it as a parameterless subroutine that returns a class object,
which conveniently stringifies to the name of the class
for
Perl
5 compatibility. But
when
you
say
Dog.new()
the invocant of C<new> is an object of type C<Class>, not a string as
in Perl 5.
Unmodified, a class declaration always declares a global name.
But
if
you prefix it
with
C<
our
>, you're defining an inner class:
class Cell {
our
class Golgi {...}
...
}
The full name of the inner class is C<Cell::Golgi>, and that name can
be used outside of C<Cell>, since C<Golgi> is declared in the C<Cell>
package
.
(Classes may be declared private, however. More later.)
=head3 Class traits
A class declaration may apply various traits to the class. (A trait is
a property applied at compile
time
.) When you apply a trait, you're
accepting whatever it is that that trait does to your class, which could
be pretty much anything. Traits
do
things I<to> classes. Do not confuse
traits
with
roles, which are sworn to play a subservient role to the class.
Traits can
do
whatever they jolly well please to your class's metadata.
Now, the usual thing to
do
to a class's metadata is to insert another
class into its ISA metadata. So we
use
trait notation to install
a superclass:
class Dog is Mammal {...}
To specify multiple inheritance, just add another trait:
class Dog is Mammal is Pet {...}
But often you'll want a role instead, specified
with
C<does>:
class Dog is Mammal does Pet {...}
More on that later. But remember that traits are evil. You can have
traits like:
class Moose is Mammal is stuffed is really(Hatrack) is spy(Russian) {...}
So what
if
you actually want to derive from C<stuffed>? That's a good
question, which we will answer later. (The short answer is, you don't.)
Now as it happens, you can also
use
C<is> from within the class.
You can also put the C<does> inside to include various roles:
class Dog {
is Mammal;
does Pet;
does Servant;
does Best::Friend[Man];
does Drool;
...
}
In fact, there's
no
particular reason to put any of these outside the
braces except to make them more obvious to the casual reader. If we
take the view that inheritance is just one form of implementation,
then a simple
class Dog {...}
is sufficient to establish that there's a C<Dog> class
defined
out there
somewhere. We shouldn't really care about the implementation of C<Dog>,
only its interface--which is usually pretty slobbery.
That being said, you can know more about the interface at compile
time
once you know the inheritance, so it's good to have pulled in
a definition of the class as well as a declaration. Since this
is typically done
with
C<
use
>, the inheritance tree is generally
available even
if
you don't mark your class declaration externally
with
the inheritance. (But in any event, the actual inheritance tree
doesn
't have to be available till run time, since that'
s
when
methods
are dispatched. (Though as is often the case, certain optimizations
work better
when
you give them more data earlier...))
=head2 Use of Classes
A class is used directly by calling class methods, and indirectly by
calling methods of an object of that class (or of a derived class that
doesn't
override
the methods in question).
Classes may also be used as objects in their own right, as instances
of a metaclass, the class C<MetaClass> by
default
. When you declare
class C<Dog>, you're actually calling a metaclass class method that
constructs a metaclass instance (i.e. the C<Dog> class) and then
calls the associated closure (i.e the body of the class) as a method
on the instance. (With a little grammatical magic thrown in so that
C<Dog> isn't considered a bareword.)
The class C<Dog> is an instance of the class C<MetaClass>, but it's
also an instance of the type C<Class>
when
you're thinking of
it as a dispatcher. That is, a class object is really allomorphic.
If you treat one as an instance of Class, it behaves as
if
it were
the user's view of the class, and the user thinks the class is there only to
dispatch to the user's own class and instance methods. If, however,
you treat the object as an instance of C<MetaClass>, you get access
to all its metaclass methods rather than the user-
defined
methods.
Another way to look at it is that the metaclass object is a separate object
that manages the class object. In any event, you can get from the ordinary
class object to its corresponding metaclass object via the C<.meta>
method, which every object supports.
By the way, a C<Class> is a C<Module> which in turn is a C<Package> which in
turn is an C<Object>. Or something like that. So a class can always
be used as
if
it were a mere module or
package
. But modules and packages
don't have a C<.dispatch> method...
By
default
, classes in Perl are left
open
. That is, you can add
more methods later. (However, an application may
close
them.)
For discussion of this, see the section on
"Open vs Closed Classes"
.
=head2 Class Name Semantics
Class names (and module names) are just
package
names.
Unlike in Perl 5,
when
you mention a
package
name in Perl 6 it doesn't
always mean a global name, since Perl 6 knows about inner classes
and lexically scoped packages and such. As
with
other entities in Perl
such as variables and methods, a scan is made
for
who thinks they have
the best definition of the name, going out from lexical scopes to
package
scope to global scope in the case of static class names, and via method
inheritance rules in the case of virtual class names.
Note that C<::MyClass> and C<MyClass> mean the same thing. In Perl
6, an initial C<::> is merely an optional sigil
for
when
the name of
the
package
would be misconstrued as something
else
. It specifically
does not mean (as it does in Perl 5) that it is a top-level
package
.
To refer to the top-level
package
, you would need to
say
something
like C<::
*MyClass
> (or just C<
*MyClass
> in places where the C<*>
unary operator would not be expected.) But also note that the C<*>
package
in Perl is not the
"C<main>"
package
in the Perl 5 sense.
Likewise, the presence of C<::> within a
package
name like
C<Fish::Carp> does not make it a global
package
name necessarily.
Again, it scans out through various scopes, and only
if
no
local
scopes define
package
C<Fish::Carp>
do
you get the global definition.
And again, you can force it by saying C<::
*Fish::Carp
>. (Or just
C<
*Fish::Carp
> in places where the C<*> unary operator is not expected.)
You can interpolate a parenthesized expression within a
package
name
after
any C<::>. So these are all legal
package
names (or module
names, or class names):
::(
$alice
)
::(
$alice
)::(
$bob
)
::(
$alice::
(
$bob
))
::*::(
$alice
)::Bob
::(
'*'
)::(
$alice
~
'_misc'
)::Bob
::(get_my_dir())
::(
@multilevel
)
And any of those
package
names could be part of a variable or
sub
name:
$::(
$alice
)::name
@::(
$alice
)::(
$bob
)::elems[1,2,3]
%::*::(
$alice
)::Bob::
map
{
'xyz'
}
&::(
'*'
)::(
$alice
~
'_misc'
)::Bob::doit(1,2,3)
$::(get_my_dir())::x
$::(
@multilevel
)
Note in the
last
example that the final element of C<
@multilevel
>
is taken to be the variable name. This may be illegal under C<
use
strict refs>, since it amounts to a symbolic reference. (Not that the
others aren't symbolic, but the rules may be looser
for
package
names
than
for
variable names, depending on how strict
our
strictures get.)
[Update: There is
no
"strict refs"
anymore, since we have a separate
syntax
for
when
we explicitly want symbolic references.]
=head2 Private Classes
A class named
with
a single initial colon is a private class name:
class :MyPrivateClass {...}
It is completely ignored outside of the current class. Since the name
is useful only in the current
package
, it makes
no
sense to
try
to
qualify it
with
a
package
name. While it's an inner class of sorts,
it does not
override
any class name from any other class because it
lives in its own namespace (a subnamespace of the current
package
),
and there
's no way to tell if the class you'
re deriving from declares
its own private class of the same name (apart from digging through
the reflection interfaces).
The colon is orthogonal to the scoping. What's actually going on in
this example is that the name is stored in the
package
with
the leading
colon, because the colon is part of the name. But
if
you declared
"C<my class :Golgi>"
the private name would go into the lexical
namespace
with
the colon. The colon functions a bit like a
"private"
trait, but isn't really a trait. Wherever you might
use
a private
name, the colon in the name effectively creates a private subspace
of names, just as
if
you'd prefixed it
with
"_"
in the good old days.
But
if
were only that, it would just be encapsulation by convention.
We're trying to
do
a little better than that. So the language needs
to actively prevent people from accessing that private subspace from
outside the class. You might think that that's going to slow down
all the dispatchers, but probably not. The ordinary dispatch of
C<Class.method> and C<
$obj
.method> don't have to worry about it,
because they
use
bare identifiers. It's only
when
people start
doing C<::(
$class
)> or C<
$obj
.
$method
> that we have to trap illegal
references to colonic names.
Even though the initial colon isn't really a trait,
if
you interrogate the
"C<.private>"
property of the class, it will
return
true. You don't have
to parse the name to get that info.
We'll make more of this
when
we talk about private methods and
attributes. Speaking of methods...
=head1 Methods
Methods are the actions that a class knows how to invoke on behalf of
an object of that type (or on behalf of itself, as a class object).
But you knew that already.
As in Perl 5, a method is still
"just a funny subroutine"
, but in
Perl 6 we
use
a different keyword to declare it, both because it's
better documentation, and because it captures the metadata
for
the
class at compile
time
. Ordinary methods may be declared only within
the scope of a class definition. (Multimethods are exempt from this
restriction, however.)
=head2 Declaration of Methods
To declare a method,
use
the C<method> keyword just as you would
use
C<
sub
>
for
an ordinary subroutine. The declaration is otherwise
almost identical:
method doit (
$a
,
$b
,
$c
) { ... }
The one other difference is that a method
has
an I<invocant> on
behalf of which the method is called. In the declaration above, that
invocant is implicit. (It is implicitly typed to be the same as the
current surrounding class definition.) You may, however, explicitly
declare the invocant as the first argument. The declaration knows
you're doing that because you put a colon between the invocant and
the rest of the arguments:
method doit (
$self
:
$a
,
$b
,
$c
) { ... }
In this case, we didn
't specify the type of C<$self>, so it'
s an
untyped variable. To make the exact equivalent of the implicit
declaration, put the current class:
method doit (MyClass
$self
:
$a
,
$b
,
$c
) { ... }
or more generically using the C<::_>
"current class"
pronoun:
method doit (::_
$self
:
$a
,
$b
,
$c
) { ... }
[Update: The current class is now named via C<::?CLASS>.]
In any case, the method sets the current invocant as the topic, which is
also known as the C<
$_
> variable. However, the topic can change depending
on the code inside the method. So you might want to declare an explicit
invocant
when
the meaning of C<
$_
> might change. (For further discussion
of topics see Apocalypse 4. For a small writeup on
sub
signatures see
Apocalypse 6.)
A private method is declared
with
a colon on the front:
method :think (Brain
$self
:
$thought
)
Private methods are callable only by the class itself, and by trusted
"friends"
. More about that
when
we talk about attributes.
=head2 Use of Methods
As in Perl 5, there are two notations
for
calling ordinary methods.
They are called the
"dot"
notation and the
"indirect object"
notation.
=head3 The dot notation
Perl 6's
"dot"
notation is just the industry-standard way to call
a method these days. (This used to be C<< -> >> in Perl 5.)
$object
.doit(
"a"
,
"b"
,
"c"
);
If the object in question is the current topic, C<
$_
>, then you can
use
the unary form of the dot operator:
for
@objects
{
.doit(
"a"
,
"b"
,
"c"
);
}
A simple variable may be used
for
an indirectly named method:
my
$dosomething
=
"doit"
;
$object
.
$dosomething
(
"a"
,
"b"
,
"c"
);
As in Perl 5,
if
you want to
do
anything fancier,
use
a temporary variable.
The parentheses may also be omitted
when
the following code is unambiguously
a term or operator, so you can
write
things like this:
@thumbs
.
each
{ .twiddle }
$thumb
.twiddle + 1
.mode 1
[Update: This is retracted. Parens are always required
if
there are
arguments.]
(Parens are always required
around
the argument list
when
a method call
with
arguments is interpolated into a string.)
The parser I<will> make
use
of whitespace at this point to decide some things.
For instance
$obj
.method + 1
is obviously a method
with
no
arguments,
while
$obj
.method +1
is obviously a method
with
an argument. However, the dwimmery only goes
as far as the typical person's visual intuition. Any construct too ambiguous
is simply rejected. So
$obj
.method+1
produces a parse error.
[Update: The preceding is also retracted. None of those would be
interpreted as arguments now. However, the following is still true.]
In particular, curlies, brackets, or parens would be interpreted
as postfix subscripts or argument lists
if
you leave out the space.
In other words, Perl 6 distinguishes:
$obj
.method (
$x
+
$y
) +
$z
from
$obj
.method(
$x
+
$y
) +
$z
Yes, this is different from Perl 5. And yes, I know certain people
hate it. They can
write
their own grammar.
While it's always possible to disambiguate
with
parentheses, sometimes
that is just too unsightly. Many methods want to be parsed as
if
they were list operators. So as an alternative to parenthesizing
the entire argument list, you can disambiguate by putting a colon
between the method call and the argument list:
@thumbs
.
each
: { .twiddle }
$thumb
.twiddle: + 1
.mode: 1
$obj
.
for
: 1,2,3 ->
$i
{ ... }
[Update: There is
no
colon disambiguator any more. Use parens
if
there
are arguments. (However, you can pass an adverbial block using C<:{}>
notation
with
a null key. That does not count as an ordinary argument.)]
If a method is declared
with
the trait
"C<is rw>"
, it's an lvalue
method, and you can assign to it just as
if
it were an ordinary
variable:
method mystate is rw () {
return
$:secretstate }
$object
.mystate = 42;
print
$object
.mystate;
In fact, it's a general rule that you can
use
an argumentless
"C<rw>"
method
call anywhere you might
use
a variable:
temp
$state
.pi = 3;
$tailref
= \
$fido
.tail;
(Though occasionally you might need to supply parentheses to
disambiguate, since the compiler can't always know at compile
time
whether the method
has
any arguments.)
Method calls on container objects are obviously directed to the
container object itself, not to the contents of the container:
$elems
=
@array
.elems;
@keys
=
%hash
.
keys
;
$sig
=
&sub
.signature;
However,
with
scalar
variables, methods are always directed to the
object pointed to by the reference contained in the
scalar
:
$scalar
=
@array
;
$elems
=
$scalar
.elems;
or
for
value types, the appropriate class is called as
if
the value
were a reference to a
"real"
object.
$scalar
=
"foo"
;
$chars
=
$scalar
.chars;
In order to talk to the
scalar
container itself,
use
the C<
tied
()>
pseudo-function as you would in Perl 5:
if
tied
(
$scalar
).constant {...}
(You may recall, however, that in Perl 6 it's illegal to
tie
any
variable without first declaring it as tyable, or (preferably) tying
it directly in the variable's declaration. Otherwise the optimizer
would have to assume that every variable
has
semantics that are
unknowable in advance, and we would have to call it a pessimizer
rather than an optimizer.)
=head3 The
"indirect object"
notation
The other form of method call is known as the
"indirect object"
syntax,
although it differs from Perl 5's syntax in that a colon is required
between the indirect object (the invocant) and its arguments:
doit
$object
:
"a"
,
"b"
,
"c"
The colon may be omitted
if
there are
no
arguments (besides the invocant):
twiddle
$thumb
;
$x
= new X;
Note that indirect object calls may not be directly interpolated into
a string, since they don't start
with
a sigil. You can always
use
the
C<$()> expression interpolater though:
say
"$(greet $lang), world!"
;
[Update: The C<$()> notation is gone. Use
say
"{greet $lang}, world!"
;
instead.]
As in Perl 5, the indirect object syntax is valid only
if
you haven't
declared a subroutine locally that overrides the method lookup.
That was a bit of a problem in Perl 5 since,
if
there happened to be
a C<new> constructor in your class, it would call that instead
dispatching to the class you wanted it to. That's much less of a
problem in Perl 6, however, because Perl 6 cannot confuse a method
declaration
with
a subroutine declaration. (Which is yet another reason
for
giving methods their own keyword.)
Another factor that makes indirect objects work better in Perl 6
is that the class name in
"C<new X>"
is a predeclared object, not a
bare identifier. (Perl 5 just had to guess
when
it saw two bare
identifiers in a row that you were trying to call a class method.)
The indirect object syntax may not be used
with
a variable
for
the
methodname. You must
use
dot notation
for
that.
Because of precedence, the indirect object notation may not be used
as an lvalue
unless
you parenthesize it:
(mystate
$object
) = 42;
(findtail Dog:
"fido"
) = Wagging::on;
You may parenthesize an argumentless indirect object method to make
it look like a function:
mystate(
$object
) = 42;
twiddle(
$thumb
);
The dispatch rules
for
methods and global multi subs conspire to keep
these unambiguous, so the user really doesn't have to worry about whether
close
(
$handle
);
is implemented as a global multi
sub
or a method on a C<
$handle
> object.
In essence, the multimethod dispatching rules degenerate to ordinary
method dispatch
when
there are
no
extra arguments to consider (and
sometimes even
when
there are arguments). This is particularly important
because Perl uses these rules to
tell
the difference between
print
"Howdy, world!\n"
;
and
print
$
*OUT
;
However, you must still put the colon
after
the invocant
if
there are
other arguments. The colon tells the parser whether to look
for
the
arguments inside:
doit(
$object
:
"a"
,
"b"
,
"c"
)
or outside:
doit(
$object
):
"a"
,
"b"
,
"c"
If you
do
say
doit(
$object
,
"a"
,
"b"
,
"c"
)
the first comma forces it to be interpreted as a
sub
call rather than
a method call.
(We could have decided to
say
that whenever Perl can't find a C<doit()>
sub
definition at run
time
, it should assume you meant the entire
parenthesized list to be the indirect object, which, since it's in
scalar
context would automatically generate a list reference and call
C<< [
$object
,
"a"
,
"b"
,
"c"
].doit() >>, which is unlikely to be what
you mean, or even work. (Unless, of course, that's how you really
meant it to work.) But I think it's much more straightforward to
simply disallow comma lists at the top level of an indirect object.
The old
"if it looks like a function"
rule applies here. Oddly,
though, function syntax is how you call multisubs in Perl 6. And as
it happens, the way the multisub/multimethod dispatch rules are
defined
, it could still end up calling C<
$object
.doit(
"a"
,
"b"
,
"c"
)>
if
that is deemed to be the best choice among all the candidates.
But syntactically, it's not an indirect object. More on dispatch
rule interactions later.)
The comma still doesn't work
if
you go the other way and leave out
the parens entirely, since
doit
$object
,
"a"
,
"b"
,
"c"
;
would always (in the absence of a prior
sub
declaration) be parsed as
(doit
$object
:),
"a"
,
"b"
,
"c"
;
So a
print
with
both an indirect object and arguments
has
to look
like one of these:
print
$
*OUT
:
"Howdy, world!\n"
;
print
($
*OUT
:
"Howdy, world!\n"
);
print
($
*OUT
):
"Howdy, world!\n"
;
Note that the old Perl 5 form using curlies:
print
{some_hairy_expression()}
"Howdy, world!\n"
;
should instead now be written
with
parentheses:
print
(some_hairy_expression()):
"Howdy, world!\n"
;
though, in fact, in this case the parens are unnecessary:
print
some_hairy_expression():
"Howdy, world!\n"
;
You'd only need the parens
if
the invocant expression contained
operators lower in precedence than comma (comma itself not being
allowed). Basically,
if
it looks confusing to you, you can expect
it to look confusing to the compiler, and to make the compiler look
confused. But it's a feature
for
the compiler to look confused
when
it actually I<is> confused. (In Perl 5 this was not always so.)
Note that the disambiguating colon associates
with
the closest method
call, whether direct or indirect. So
print
$obj
.meth:
"Howdy, world!\n"
;
passes
"C<Howdy, world!\n>"
to C<
$obj
.meth> rather than to C<
print
>.
That's a case where you ought to have parenthesized the indirect object
for
clarity anyway:
print
(
$obj
.meth):
"Howdy, world!\n"
;
[Update: There is
no
disambiguating colon on dot calls anymore, so
a colon there can only indicate an indirect object to the
print
now.]
=head3 Calling private methods
A private method does not participate in normal method dispatch.
It is not listed in the class's public methods. The C<.can> method
does not see it. Calling it via normal dispatch raises a "
no
such
method" exception. It is, in essence, invisible to the
outside world. It does not hide a base class's method of the same
name--even in the current class! It's fair to ask
for
warnings about
name collisions, of course. But we're not following the C++ approach
of making private methods visible but uncallable, because that would
Instead, we separate the namespaces completely by distinguishing the
public dot operator from the private dot-colon operator. That is:
$mouth
.
say
(
"Yes!"
)
.
say
(
"Yes!"
)
$brain
.:think(
"No!"
)
.:think(
"No!"
)
The inclusion of the colon prevents any kind of
"virtual"
behavior.
Calling a private method is illegal except under two very specific
conditions. You can call a private method C<:think> on an object
C<
$brain
> only
if
:
=over 4
=item 1.
The class of C<
$brain
> is explicitly declared, and the declared class
is either the class definition that we are in or a class that
has
explicitly granted trust to
our
current class, and the declared
class contains a private C<:think> method. Or...
=item 2.
The class of the C<
$brain
> is not declared, and the current class contains
a private C<:think> method.
=back
The upshot of these rules is that a private method call is essentially
a subroutine call
with
a method-like syntax. But the private method
we're going to call can be determined at compile
time
, just like
a subroutine.
=head1 Class Methods
Class methods are called on the class as a whole rather than on any
particular instance object of the class. They are distinguished
from ordinary methods only by the declared type of the invocant.
Since an implicit invocant would be typed as an object of the class
and not as the class itself, the invocant declaration is I<not>
optional in a class method declaration
if
you wish to specify the
type of the invocant. (Untyped explicit invocants are allowed to
"squint"
, however.)
=head2 Class Invocant
To declare an ordinary class method, such as a constructor, you
say
something like:
method new (Class
$class
: *
@args
) { ... }
Such a method may only be called
with
an invocant that
"isa"
C<Class>,
that is, an object of type C<Class>, or derived from type C<Class>.
=head2 Class|object Invocant
It is possible to
write
a method that can be called
with
an invocant that
is either a C<Class> or an object of that current class. You can declare the
method
with
a type junction:
method new (Class|Dog
$classorobj
: *
@args
) { ... }
Or to be completely non-specific, you can leave out the type entirely:
method new (
$something
: *
@args
) { ... }
That's not as dangerous as it looks, since almost by definition
the dispatcher only calls methods that are consistent
with
the
inheritance tree. You just can't
say
:
method new (*
@args
) { ... }
which would be the equivalent of
method new (Dog
$_
: *
@args
) { ... }
Well, actually, you I<could>
say
that, but it would
require
that
you have an existing C<Dog>-compatible object in order to create a new one.
And that could present a little bootstrapping problem...
(Though it could certainly cure the boot chewing problem...)
But in fact, you'll rarely need to declare C<new> method at all,
because Perl supplies a
default
constructor to go
with
your class.
=head1 Submethods
Some methods are intended to be inherited by derived classes. Others
are intended to be reimplemented in every class, or in every class
that doesn't want the
default
method. We call these
"submethods"
,
because they work a little like subs, and a little like methods.
(You can also
read
the
"sub"
with
the meaning it
has
in words like
"subhuman"
.)
Typically these are (
sub
)methods related to the details of construction
and destruction of the object. So
when
you call a constructor,
for
instance, it ends up calling the C<BUILDALL> initialization routine
for
the class, which ends up calling the C<BUILD> submethod:
submethod BUILD (
$a
,
$b
,
$c
) {
$.a =
$a
;
$.b =
$b
;
$.c =
$c
;
}
Since the submethod is doing things that make sense only in the
context of the current class (such as initializing attributes), it
makes
no
sense
for
C<BUILD> to be inherited. Likewise C<DESTROY>
is also a submethod.
Why not just make them ordinary subs, then? Ordinary subs can't
be called by method invocation, and we want to call these routines
that way. Furthermore,
if
your base class I<does> define an ordinary
method named C<BUILD> or C<DESTROY>, it can serve as the
default
C<BUILD> or C<DESTROY>
for
all derived classes that don't declare
their own submethods. (All public methods are virtual in Perl,
but some are more virtual than others.)
You might be saying to yourself, "Wait, private methods aren't virtual.
Why not just
use
a private method
for
this?" It's true that private
methods aren
't virtual, because they aren'
t in fact methods at all.
They're just ordinary subroutines in disguise. They have nothing to
do
with
inheritance. By contrast, submethods are all about presenting
a unified inherited I<interface>
with
the option of either inheriting or
not inheriting the I<implementation> of that interface, at the discretion
of the class doing the implementing.
So the bottom line is that submethods allow you to
override
an
inherited implementation
for
the current class without overriding
the
default
implementation
for
other classes. But in any case, it's
still using a public interface, called as an ordinary method call,
from anywhere in your program that
has
an object of your type.
Or a class of your type. The
default
C<new> constructor is an ordinary
class method in class C<Object>, so it's inherited by all classes that
don't define their own C<new>. But
when
you
write
your own C<new>,
you need to decide whether your constructor should be inherited or not.
If so, that's good, and you should declare it as a method. But
if
not, you should declare it as a submethod so that derived classes
don't
try
to
use
it erroneously instead of the
default
C<Object.new()>.
=head1 Attributes
In Perl 6,
"attributes"
are what we call the instance variables of
an object. (We used that word to mean something
else
in Perl 5--we're
now calling those things
"traits"
or
"properties"
.)
As
with
classes and methods, attribute declarations are apparently
declarative. Underneath they actually call a method in the metaclass
to install the new definition. The Perl 6 implementation of attributes
is not based on a hash, but on something more like a symbol table.
Attributes are stored in an opaque datatype rather like a struct in
C, or an array in Perl 5--but you don't know that. The datatype is
opaque in the sense that you shouldn
't care how it'
s laid out in memory
(
unless
you have to interface
with
an outside data structure--like a
C struct). Do not confuse opacity
with
encapsulation. Encapsulation
only hides the object's implementation from the outside world. But
the object's structure is opaque even to the class that defines it.
One of the large benefits of this is that you can actually take a C or
C++ data structure and wrap accessor methods
around
it without having
to copy anything into a different data structure. This should speed
up things like XML parsing.
=head2 Declaration of Attributes
In order to provide this opaque abstraction layer, attributes are
not declared as a part of any other data structure. Instead, they
are modeled on real variables, whose storage details are implicitly
delegated to the scope in which they are declared. So attributes
are declared as
if
they were normal variables, but
with
a strange
scope and lifetime that is neither C<
my
> nor C<
our
>. (That scope is,
of course, the current object, and the variable lives as long as the
object lasts.) The class will implicitly store those attributes in a
location distinct from any other class's attributes of the same name,
including any base or derived class. To declare an attribute variable,
declare it within the class definition as you would a C<
my
> variable,
but
use
the C<
has
> declarator instead of C<
my
>:
class Dog is Mammal {
has
$.tail;
has
@.legs;
...
}
The C<
has
> declarator was chosen to remind people that attributes are
in a
"HASA"
relationship to the object rather than an
"ISA"
relationship.
The other difference from normal variables is that attributes have a
secondary sigil that indicates that they are associated
with
methods.
When you declare an attribute like C<$.tail>, you're also implicitly
declaring an accessor method of the same name, only without the C<$> on
the front. The dot is there to remind you that it's also a method call.
As
with
other declarations, you may add various traits to an attribute:
has
$.dogtag is rw;
If you want all your attributes to
default
to
"C<rw>"
, you can put the
attribute on the class itself:
class Coordinates is rw {
has
int
$.x;
has
int
$.y;
has
int
$.z;
}
Essentially, it's now a C-style struct, without having to introduce
an ugly word like
"struct"
into the language. Take that, C++. C<:-)>
You can also assign to a declaration:
has
$.master =
"TheDamian"
;
Well, actually, this looks like an assignment, but it isn't.
The effect of this is to establish a
default
; it is not executed at
run
time
. (Or more precisely, it runs
when
the class closure is
executed by the metaclass, so it gets evaluated only once and the
value is stored
for
later
use
by real instances. More below.)
=head2 Use of Attributes
The attribute behaves just like an ordinary variable within the
class's instance methods. You can
read
and
write
the attributes just
like ordinary variables. (It is, however, illegal to refer to an
instance attribute variable (that is, a
"C<has>"
variable) from within
a class method. Class methods may only access class attributes,
not instance attributes. See below.)
Bare attributes are automatically hidden from the outside world because
their sigiled names cannot be seen outside the class's
package
.
This is how Perl 6 enforces encapsulation. Outside the class the
I<only> way to talk about an attribute is through accessor methods.
Since public methods are always virtual in Perl, this makes attribute
access virtual outside the class. Always. (Unless you give the
optimizer enough hints to optimize the class to
"final"
. More on
that later.)
In other words, only the class itself is allowed to know whether this
attribute is, in fact, implemented by this class. The class may also
choose to ignore that fact, and call the abstract interface, that is,
the accessor method, in which case it might actually end up calling
some derived class's overriding method, which might in turn call
back to this class's accessor as a super method. (So in general,
an accessor method should always refer to its actual variable name
rather than the accessor method name to avoid infinite recursion.)
You may
write
your own accessor methods
around
the bare attributes,
but
if
you don't, Perl will generate them
for
you based on the
declaration of the attribute variable. The traits of the generated
method correspond directly to the traits on the variable.
By
default
, a generated accessor is
read
-only (because by
default
any
method is
read
-only). If you mark an attribute
with
the trait
"C<is rw>"
though, the corresponding generated accessor will also be
marked
"C<is rw>"
, meaning that it can be used as an lvalue.
In any event, even without
"C<is rw>"
the attribute variable is always
writable within the class itself (
unless
you apply the trait C<is
constant> to it).
As
with
private classes and methods, attributes are declared private
using a colon on the front of their names. As
with
any private
method, a private accessor is completely ignored outside its class
(or, by extension, the classes trusted by this class).
To carry the separate namespace idea through, we incorporate the
colon as the secondary sigil in declarations of private attributes:
has
$:x;
Then we can get rid of the verbose C<is private> altogether. Well,
it's still there as a trait, but the colon implies it, and is required
anyway.) And we basically force people to document the private/public
distinction every place they reference C<$:x> instead of C<$.x>,
or C<
$obj
.:meth> instead of C<
$obj
.meth>.
We've seen secondary sigils
before
in earlier Apocalypses. In
each
case
they're associated
with
a bizarre usage of some
sort
. So far we
have:
$
*foo
$?foo
$^foo
$.foo
$:foo
[Update: A regex-scoped variable now looks like C<< $<foo> >> instead.
A C<$?foo> variable is now a compiler variable, and a C<$=foo>
variable is a POD variable. By the way, lately I've been calling
the secondary sigils
"twigils"
.]
As a form of the dreaded
"Hungarian notation"
, secondary sigils are
not introduced lightly. We define secondary sigils only where we
deem instant recognizability to be crucial
for
readability. Just as
you should never have to look at a variable and guess whether it's
a true global, you should never have to look at a method and guess
which variables are attributes and which ones are variables you just
happen to be in the lexical scope of. Or which attributes are public
and which are private. In Perl 6 it's always obvious--at the cost
of a secondary sigil.
We
do
hereby solemnly swear to never, never, ever add tertiary sigils.
You have been warned.
=head2 Default Values
You can set
default
values
on attributes by pseudo-assignment to
the attribute declaration:
has
Answer $.ans = 42;
These
default
values
are associated as
"C<build>"
traits of the attribute
declaration object. When the C<BUILD> submethod is initializing a new
object, these
prototype
values
are used
for
uninitialized attributes.
The expression on the right is evaluated immediately at the point of
declaration, but you can defer evaluation by passing a closure, which
will automatically be evaluated at the actual initialization
time
.
(Therefore, to initialize to a closure value, you have to put a closure
in a closure.)
Here's the difference between those three approaches. Suppose you
say
:
class Hitchhiker {
my
$defaultanswer
= 0;
has
$.ans1 =
$defaultanswer
;
has
$.ans2 = {
$defaultanswer
};
has
$.ans3 = { {
$defaultanswer
} };
$defaultanswer
= 42;
...
}
When the object is eventually constructed, C<$.ans1> will be
initialized to C<0>,
while
C<$.ans2> will be initialized to 42.
(That's because the closure binds C<
$defaultanswer
> to the current
variable, which still presumably
has
the value 42 by the
time
the C<BUILD>
routine initializes the new object, even though the lexical variable
"C<$defaultanswer>"
has
supposedly gone out of scope by the
time
the
object is being constructed. That's just how closures work.)
And C<$.ans3> will be initialized not to 42, but to a closure that,
if
you ever call it, will also
return
42. So since the accessor
C<
$obj
.ans3()> returns that closure, C<
$obj
.ans3().()> will
return
42.
The
default
value is actually stored under the
"C<build>"
trait, so this:
has
$.x = calc(
$y
);
is equivalent to this:
has
$.x is build( calc(
$y
) );
and this:
has
$.x = { calc(
$y
) };
is equivalent to either of these:
has
$.x is build( { calc(
$y
) } );
has
$.x will build { calc(
$y
) };
As
with
all closure-valued container traits, the container being
declared (the C<$.x> variable in this case) is passed as the topic to
the closure (in addition to being the target that will be initialized
with
the result of the closure, because that's what C<build> does).
In addition to the magical topic, these build traits are also magically
passed the same named arguments that are passed to the C<BUILD> routine.
So you could
say
has
$.x = { calc($^y) };
to
do
a calculation based on the C<:y(582)> parameter originally passed to
the constructor. Or rather, that will be passed to the constructor
someday
when
the object is eventually constructed. Remember we're
really still at class construction
time
here.
As
with
other initializers, you can be more specific about the
time
at which the
default
value is constructed, as long as that
time
is
earlier than class construction
time
:
has
$.x = BEGIN { calc() }
has
$.x = CHECK { calc() }
has
$.x = INIT { calc() }
has
$.x = FIRST { calc() }
has
$.x = ENTER { calc() }
which are really just short
for
:
has
$.x is build( BEGIN { calc() } )
has
$.x is build( CHECK { calc() } )
has
$.x is build( INIT { calc() } )
has
$.x is build( FIRST { calc() } )
has
$.x is build( ENTER { calc() } )
=head1 Class Attributes
In general, class attributes are just
package
or lexical variables.
If you define a
package
variable
with
a dot or colon, it autogenerates
an accessor
for
you just as it does
for
an ordinary attribute:
our
$.count;
our
%:cache is rw;
The implicit invocant of these implicit accessors
has
a
"squinting"
type--it can either be the class or an object of the class. (Declare
your own accessors
if
you have a philosophical reason
for
forcing
the type one way or the other.)
The disadvantage of using
"C<our>"
above is that that both of these are
accessible from outside the class via their
package
name (though the
private one is Officially Ignored, and cannot be named simply by saying
C<
%MyClass::
:cache> because that syntax is specifically disallowed).
If on the other hand you declare your class variables lexically:
my
$.count;
my
%:cache is rw;
then the same pair of accessors are generated, but the variables
themselves are visible only within the class block. If you reopen
the class in another block, you can only see the accessors, not the
bare variables. This is probably a feature.
Generally speaking, though,
unless
you want to provide public
accessors
for
your class attributes, it's best to just declare
them as ordinary variables (either C<
my
> or C<state> variables)
to prevent confusion
with
instance attributes. It's a good policy
not to declare any public accessors
until
you know you need them.
They are,
after
all, part of your contract
with
the outside world,
and the outside world
has
a way of holding you to your contracts.
=head1 Object Construction
The basic idea here is to remove the drudgery of creating objects.
In addition we want object creation and cleanup to work right by
default
. In Perl 5 it's possible to make recursive construction and
destruction work, but it
's not the default, and it'
s not easy.
Perl 5 also confused the notions of constructor and initializer.
A constructor should create a new object once, then call all the
appropriate initializers in the inheritance tree without recreating
the object. The initializer
for
a base class should be called
before
the initializer
for
any class derived from it.
The initializer
for
a class is always named C<BUILD>. It's in uppercase
because it's usually called automatically
for
you at construction
time
.
As
with
Perl 5, a constructor is only named
"C<new>"
by convention,
and you can
write
a constructor
with
any name you like. However,
in Perl 6,
if
you
do
not supply a
"C<new>"
method, a generic one will
be provided (by inheritance from C<Object>, as it happens).
=head2 The Default Constructor
The
default
C<new> constructor looks like this:
multi method new (Class
$class
: *
%_
) {
return
$class
.
bless
(0, *
%_
);
}
The arguments
for
the
default
constructor are always named arguments,
hence the C<*
%_
> declaration to collect all those pairs and pass them
on to
bless
.
You'll note also that C<
bless
> is
no
longer a subroutine but a
method call, so it's now impossible to omit the class specification.
This makes it easier to inherit constructors. You can still
bless
any reference you could
bless
in Perl 5, but where you previously
used a function to
do
that:
return
bless
( {
attr
=>
"hi"
},
$class
);
in Perl 6 you
use
a method call:
return
$class
.
bless
( {
attr
=>
"hi"
} );
However,
if
what you pass as the first argument isn't a reference,
C<
bless
> is going to construct an opaque object and initialize it. In a
sense, C<
bless
> is the only real constructor in Perl 6. It first makes
sure the data structure is created. If you don't supply a reference
to
bless
, it calls C<CREATE> to create the object. Then it calls
C<BUILDALL> to call all the initializers.
The signature of C<
bless
> is something like:
method
bless
(
$class
:
$candidate
, *
%_
)
The C<0> candidate indicates the built-in opaque type. If you're really
strange in the head, you can think of the
"C<0>"
as standing
for
"C<0paque>"
. Or it's the
"zero"
object, about which we know zip.
Whatever tilts your windmill...
In any event, strings are reserved
for
other object layouts. We could
conceivably have things like:
return
$class
.
bless
(
"Cstruct"
, *
%_
);
So as it happens, C<0> is short
for
the layout
"P6opaque"
.
[Update: There is
no
C<0> argument. Just leave it out
if
you wish to
declare a C<P6opaque> object. If you wish to pass some other representation
name to C<CREATE>,
use
a named argument like C<< :CREATE[:repr<Cstruct>] >>.]
Any additional arguments to C<.
bless
> are automatically passed on to
C<CREATE> and C<BUILDALL>. But note that these I<must> be named arguments.
It could be argued that the I<only> real purpose
for
writing a C<.new>
constructor in Perl 6 is to translate different positional argument
signatures into a unified set of named arguments. Any other
initialization common to all constructors should be done within C<BUILD>.
Oh, the invocant of C<.
bless
> is either a class or an object of the
class, but
if
you
use
an object of the class, the contents of that
object are I<not> automatically used to
prototype
the new object.
If you wish to
do
that, you have to
do
it explicitly by copying the
attributes:
$obj
.
bless
(0, *
%$obj
)
(That is just a specific application of the general principle that
if
you treat any object like a hash, it will behave like one, to the
extent that it can. That is, C<
%$obj
> turns the attributes into key/value
pairs, and passes those as arguments to initialize the new object.
Note that C<
%$obj
> includes the private attributes
when
used inside
the class, but not outside.)
Just because C<.
bless
> allows an object to be used
for
a class doesn't
mean your C<new> constructor
has
to
do
the same. Some folks have
philosophical issues
with
mixing up classes and objects, and it's fine
to disallow that on the constructor level. In fact, you'll note that
the
default
C<.new> above requires a C<Class> as its invocant. Unless you
override
it, it doesn't allow an object
for
the constructor invocant.
Go thou and don't likewise.
=head3 The Default Cloner
Another good reason not to overload C<.new> to
do
cloning is that
Perl will also supply a
default
C<.clone> routine that works something
like this:
multi method clone (
$proto
: *
%_
) {
return
$proto
.
bless
(0, *
%_
, *
%$proto
);
}
Note the order of the two hash arguments to C<
bless
>. This gives the
supplied attribute
values
precedence over the copied attribute
values
, so that you can change some of the attributes I<en passant>
if
you like. That
's because we'
re passing the two flattened hashes
as arguments to C<.
bless
> and Perl 6's named argument binding mechanism
always picks the I<first> argument that matches, not the
last
. This
is opposite of what happens
when
you
use
the Perl 5 idiom:
%newvals
= (
%_
,
%$proto
);
In that case, the I<
last
> value (the one in
%$proto
) would
"win"
.
=head2 CREATE
submethod CREATE (
$self
: *
%args
) {...}
C<CREATE> is called
when
you don't want to
use
an existing data structure
as the candidate
for
your object. In general you won't define C<CREATE>
because the
default
C<CREATE> does all the heavy magic to bring an opaque
object into existence. But
if
you don't want an opaque object, and you
don't care to
write
all your constructors to create the data structure
before
calling C<.
bless
>, you can define your own C<CREATE> submethod, and
it will
override
the standard one
for
all constructors in the class.
=head2 BUILDALL
submethod BUILDALL (
$self
: *
%args
) {...}
After the data structure is created, it must be populated by
each
of the participating classes (and roles) in the proper order.
The C<BUILDALL> method is called upon to
do
this. The
default
C<BUILDALL> is usually correct, so you don't generally have to
override
it. In essence, it delegates the initialization of parent
classes to the C<BUILDALL> of the parent classes, and then it calls
C<BUILD> on the current class. In this way the pieces of the object
are assembled in the correct order, from least derived to most derived.
For
each
class C<BUILDALL> calls on,
if
the arguments contain a pair
whose key is that class name, it passes the value of the pair as
its argument to that class's C<BUILDALL>. Otherwise it passes
the entire list. (There's not much ambiguity there--most classes
and roles will start
with
upper case,
while
most attribute names
start
with
lower case.)
=head2 BUILD
submethod BUILD (
$self
: *
%args
) {...}
That is the generic signature of C<BUILD> from the viewpoint of the
caller
, but the typical C<BUILD> routine declares explicit parameters
named
after
the attributes:
submethod BUILD (+
$tail
, +
@legs
, *
%extraargs
) {
$.tail =
$tail
;
@:legs =
@legs
;
...
}
That occurs so frequently that there's a shorthand available in the
signature declaration. You can put the attributes (distinguished
by those secondary sigils, you'll recall) right into the signature.
The following means essentially the same thing, without repeating
the names:
submethod BUILD (+$.tail, +@:legs, *
%extraargs
) {...}
It's actually unnecessary to declare the C<*
%extraargs
> parameter.
If you leave it out, it will
default
to C<*
%_
> (but only on methods
and submethods--see the section on Interface Consistency later).
You may
use
this special syntax only
for
instance attributes, not class
attributes. Class attributes should generally not be reinitialized
every
time
you make a new object,
after
all.
If you
do
not declare a C<BUILD> routine, a
default
routine will be
supplied that initializes any attributes whose names correspond to
the
keys
of the argument pairs passed to it, and leaves the other
attributes to
default
to whatever the class supplied as the
default
,
or C<
undef
> otherwise.
In any event, the assignment of
default
attribute
values
happens
automatically. For any attribute that is not otherwise initialized,
the attribute declaration's
"C<build>"
property is evaluated and the
resulting value copied in to the newly created attribute slot.
This happens logically at the end of the C<BUILD> block, so we avoid
running initialization closures unnecessarily. This implicit
initialization is based not on whether the attribute is undefined,
but on whether it was initialized earlier in C<BUILD>. (Otherwise we
could never explicitly create an attribute
with
an undefined value.)
=head2 Eliminating Redundancy in Constructor Calls
If you
say
:
my
Dog
$spot
= Dog.new(...)
you have to repeat the type. That's not a big deal
for
a small
typename, but sometime typenames are a lot longer. Plus you'd like
to get rid of the redundancy, just because it's, like, redundant.
So there's a variant on the dot operator that looks a lot like a
dot assignment operator:
my
Dog
$spot
.= new(...)
It doesn't really quite fit the assignment operator rule though. If
it did, it'd have to mean
my
Dog
$spot
=
$spot
.new(...)
which doesn't quite work, because C<
$spot
> is undefined. What probably
happens is that the C<
my
> cheats and puts a version of C<
undef
> in
there that knows it should dispatch to the C<Dog> class
if
you call
C<.self:new()> on it. Anyway, we'll make it work one way or another,
so that it becomes the equivalent of:
my
Dog
$spot
= Dog.new(...)
The alternative is to go the C++ route and make C<new> a reserved word.
We're just not gonna
do
that.
Note that an attribute declaration of the form
has
Tail
$wagger
.= new(...)
might not
do
what you want done
when
you want it done,
if
what you want done is to create a new C<Dog> object
each
time
an object
is built. For that you'd have to
say
:
has
Tail
$wagger
= { .new(...) }
or equivalently,
has
Tail
$wagger
will build { .new(...) }
But leaving aside such timing issues, you should generally think of
the C<.=> operator more as a variant on C<.> than a variant on C<+=>.
It can,
for
instance, turn any non-mutating method call into a
mutating method:
@array
.=
sort
;
.=
lc
;
This presumes, of course, that the method's invocant and
return
value are of compatible types. Some classes will wish to define special
in-place mutators. The syntax
for
that is:
method self:
sort
(Array
@a
is rw) {...}
[Update: That's now C<< self:<
sort
> >> instead.]
It is illegal to
use
C<
return
> from such a routine, since the invocant
is automatically returned. If you
do
not declare the invocant, the
default
invocant is automatically considered
"C<rw>"
. If you
do
not
supply a mutating version, one is autogenerated
for
you based on
the corresponding copy operator.
=head1 Object Deconstruction
Object destruction is
no
longer guaranteed to be
"timely"
in Perl 6.
It happens
when
the garbage collector gets
around
to it. (Though there
will be ways to emulate Perl 5 end-of-scope cleanup.)
As
with
object creation, object destruction is recursive. Unlike creation,
it must proceed in the opposite order.
=head2 DESTROYALL
The C<DESTROYALL> routine is the counterpart to the C<BUILDALL>
routine. Similarly, the
default
definition is normally sufficient
for
the needs of most classes. C<DESTROYALL> first calls C<DESTROY>
on the current class, and then delegates to the C<DESTROYALL> of any
parent classes. In this way the pieces of the object are disassembled
in the correct order, from most derived to least derived.
=head2 DESTROY
As
with
Perl 5, all the memory deallocation is done
for
you, so you
really only need to define C<DESTROY>
if
you have to release external
resources such as files.
Since C<DESTROY> is the opposite of C<BUILD>,
if
any attribute
declaration
has
a
"C<destroy>"
property, that property (presumably
a closure) is evaluated
before
the main block of C<DESTROY>.
This happens even
if
you don't declare a C<DESTROY>.
(The
"C<build>"
and
"C<destroy>"
traits are the only way
for
roles to let
their preferences be made known at C<BUILD> and C<DESTROY>
time
.
It follows that any role that does not define an attribute cannot
participate in building and destroying except by defining a method
that C<BUILD> or C<DESTROY> might call. In other words, stateless
roles aren
't allowed to muck around with the object'
s state. This is
construed as a feature.)
=head1 Dispatch Mechanisms
Perl 6 supports both single dispatch (traditional OO) and multiple dispatch
(also known as
"multimethod dispatch"
, but we
try
to avoid that term).
=head2 Single Dispatch
Single dispatch looks up which method to run solely on the basis of
the type of the first argument, the invocant. A single-dispatch call
distinguishes the invocant syntactically (unlike a multiple-dispatch
call, which looks like a subroutine call, or even an operator.)
Basically, anything can be an invocant as long as it fills the C<Dispatch>
role, which provides a C<.dispatcher> method. This includes ordinary
objects, class objects, and (in some cases) even varieties of C<
undef
>
that happen to know what class of thing they aren't (yet).
Simple single dispatch is specified
with
the dot operator, or its
indirect object equivalent:
$object
.meth(
@args
)
.meth(
@args
)
meth
$object
:
@args
There are variants on the dot form indicated by the character
after
the dot. (None of these variants allows indirect object syntax.)
The private dispatcher only ever dispatches to the current class or
its proxies, so it's really more like a subroutine call in disguise:
$object
.:meth(
@args
)
.:meth(
@args
)
It is an error to
use
C<.:>
unless
there is a correspondingly named
"colon"
method in the appropriate class, just as it is an error to
use
C<.>
when
no
method can be found of that name. Unlike the C<.:>
operator, which can have only one candidate method, the C<.> operator
potentially generates a list of candidates, and allows methods in that
candidate list to defer to subsequent methods in other classes
until
a candidate
has
been found that is willing to handle the dispatch.
In addition to the C<.:> and C<.=> operators, there are three other
dot variants that can be used
if
it's not known how many methods are
willing to handle the dispatch:
$object
.?meth(
@args
)
.?meth(
@args
)
$object
.
*meth
(
@args
)
.
*meth
(
@args
)
$object
.+meth(
@args
)
.+meth(
@args
)
The C<.*> and C<.+> versions are generally only useful
for
calling
submethods, or methods that are otherwise expected to work like
submethods. They
return
a list of all the successful
return
values
.
The C<.?> operator either returns the one successful result, or
undef
if
no
appropriate method is found. Like the corresponding
regex modifiers, C<?> means
"0 or 1"
,
while
C<*> means
"0 or more"
,
and C<+> means
"1 or more"
. Ordinary C<.> means
"exactly one"
.
Here are some sample implementations, though of course these are
probably implemented in C
for
maximum efficiency:
sub
CALLONE (
$obj
,
$methname
, +
$maybe
, *
%opt
, *
@args
) {
my
$startclass
=
$obj
.dispatcher() // fail
"No dispatcher: $obj"
;
METHOD:
for
WALKMETH(
$startclass
, :method(
$methname
),
%opt
) ->
&meth
{
return
meth(
$obj
,
@args
);
}
fail
qq(Can't locate method "$methname" via class "$startclass")
unless
$maybe
;
return
;
}
With this dispatcher you can
continue
by saying
"C<next METHOD>"
.
This allows methods to
"failover"
to other methods
if
they choose
not to handle the request themselves.
sub
CALLALL (
$obj
,
$methname
, +
$maybe
, +
$force
, *
%opt
, *
@args
) {
my
$startclass
=
$obj
.dispatcher() // fail
"No dispatcher: $obj"
;
my
@results
= gather {
if
$force
{
METHOD:
for
WALKCLASS(
$startclass
,
%opt
) ->
$class
{
take
$obj
.::(
$class
)::
$methname
(*
@args
)
}
}
else
{
METHOD:
for
WALKMETH(
$startclass
, :method(
$methname
),
%opt
) ->
&meth
{
take meth(
$obj
,*
@args
);
}
}
}
return
@results
if
@results
or
$maybe
;
fail
qq(Can't locate method "$methname" via class "$startclass")
;
}
This one you can quit early by saying
"C<last METHOD>"
. Notice that
both of these dispatchers cheat by calling a method as
if
it were
a
sub
. You may only
do
that by taking a reference to the method, and
calling it as a subroutine, passing the object as the first argument.
This is the only way to call a virtual method non-virtually in Perl.
If you
try
to call a method directly as a subroutine, Perl will ignore
the method, look
for
a subroutine of that name elsewhere, probably
not find it, and complain bitterly. (Or find the wrong subroutine,
and execute it,
after
which you will complain bitterly.)
We snuck in an example the new C<gather>/C<take> construct. It is still
somewhat conjectural.
=head2 Calling Superclasses, and Not-So-Superclasses
Perl 5 supplies a pseudoclass, C<SUPER::>, that redirects dispatch to
a parent class
's method. That'
s often the wrong thing to
do
, though,
in part because under MI you may have more than one parent class, and
also because you might have sibling classes that also need to have
the
given
method triggered. Even
if
C<SUPER> is smart enough to visit
multiple parent classes, and even
if
all your classes cooperate and
call C<SUPER> at the right
time
, the depth first order of visitation
might be the wrong order, especially under diamond inheritance.
Still,
if
you know that your parent classes
use
C<SUPER>, or you're
calling into a language
with
C<SUPER> semantics (such as Perl 5)
then you should probably
use
C<SUPER> semantics too, or you'll end up
calling your parent's parents in duplicate. However, since
use
of
C<SUPER> is slightly discouraged, we Huffman code it a bit longer in
Perl 6. Remember the C<*
%opt
> parameters to the dispatchers above?
That comes in as a parameterized pseudoclass called C<WALK>.
$obj
.
*WALK
[:super]::method(
@args
)
That limits the call to only those immediate super classes that define
the method. Note the star in the example. If you really want the Perl
5 semantics, leave the star out, and you'll only get the first existing
parent method of that name. (Why you'd want that is beyond me.)
Actually, we'll probably still allow C<SUPER::> as a shorthand
for
C<WALK[:super]::>, since people will just hack it in anyway
if
we
don't provide it...
If you think about it, every ordinary dispatch
has
an implicit
C<WALK> modifier on the front that just happens to
default
to
C<WALK[:canonical]>. That is, the dispatcher looks
for
methods in
the canonical order. But you could
say
C<WALK[:depth]> to get Perl
5's order, or you could
say
C<WALK[:descendant]> to get an order
approximating the order of construction, or C<WALK[:ascendant]> to
get an order approximating the order of destruction. You could
say
C<WALK[:omit(SomeClass)]> to call all classes not equivalent to or
derived from C<SomeClass>. For instance, to call all super classes,
and not just your immediate parents, you could
say
C<WALK[:omit(::_)]>
to skip the current lexical class or anything derived from it.
[Update: The lexical class is now named C<::?CLASS>.]
But again, that's not usually the right thing to
do
. If your base classes
are all willing to cooperate, it's much better to simply call
$obj
.method(
@args
)
and then let
each
of the implementations of the method defer to the
next
one
when
they're done
with
their part of it. If any method says
"C<next METHOD>"
, it automatically iterates the loop of the dispatcher
and finds the
next
method to dispatch to, even
if
that method comes
from a sibling class rather than a parent class. The
next
method
is called
with
the same arguments as originally supplied.
That presupposes that the entire set of methods knows to call
"next"
appropriately. This is not always the case. In fact,
if
they don
't all call next, it'
s likely that none of them does.
And maybe just knowing whether or not they
do
is considered a violation
of encapsulation. In any case,
if
you still want to call all the
methods without their active cooperation, then
use
the star form:
$obj
.
*method
(
@args
)
Then the various methods don't have to
do
anything to call the
next
method--it happens automatically by
default
. In this case a method
has
to
do
something special
if
it wants to I<stop> the dispatch.
Naturally, that something is to call
"C<last METHOD>"
, which terminates
the dispatch loop early.
Now, sometimes you want to call the
next
method, but you want to
change the arguments so that the
next
method doesn't get the original
argument list. This is done
with
deep magic. If you
use
the C<call>
keyword in an ordinary (nonwrapper) method, it steals the rest of
the dispatch list from the outer loop and redispatches to the
next
method
with
the new arguments:
@retvals
= call(
@newargs
)
return
@retvals
;
And unlike
with
"C<next METHOD>"
, control returns to this method following
the call. It returns the results of the subsequent method calls, which
you should
return
so that your outer dispatcher can add them to the
return
values
it already gathered.
Note that
"C<next METHOD>"
and
"C<last METHOD>"
can typically be spelt
"C<next>"
and
"C<last>"
unless
they are in an inner loop.
=head2 Parallel Dispatch
By
default
the various dot operators call a method on a single
object, even
if
it ends up calling multiple methods
for
that object.
Since a method call is essentially a unary postfix operator, however,
you can
use
it as a hyper operator on a list of objects:
@object
».meth(
@args
)
@object
».?meth(
@args
)
@object
».
*meth
(
@args
)
@object
».+meth(
@args
)
Note that
with
the
last
two,
if
a method uses
"C<last METHOD>"
,
it doesn't bomb out of the
"hyper"
loop, but just goes on to the
next
entry. One can always bomb out of the hyperloop
with
a real
exception, of course. And maybe
with
"C<last HYPER>"
, depending on
how hyper's implicit iteration is implemented.
If you want to
use
an array
for
serial rather than parallel method
calling, see Delegation, which lets you set up cascading handlers.
=head2 WALKCLASS and WALKMETH Caching
C<WALKCLASS> generates a list of matching classes. C<WALKMETH>
generates a list of method references from matching classes.
The C<WALKCLASS> and C<WALKMETH> routines used in the sample dispatch
code need to cache their results so that every dispatch doesn't
have to traverse the inheritance tree again, but just consult the
preconstructed list in order. However,
if
there are changes to any
of the classes involved, then someone needs to call the appropriate
cache clear method to make sure that the inheritance is recalculated.
C<WALKCLASS>/C<WALKMETH> options include some that specify ordering:
:canonical
:ascendant
:descendant
:preorder
:breadth
and some that specify selection criteria:
:super
:method(Str)
:omit(Selector)
:include(Selector)
Note that C<:method(Str)> selects classes that merely have methods
declared, not necessarily
defined
. A declaration without a definition
probably implies that they intend to autoload a definition, so
we should call the stub anyway. In fact, Perl 6 differentiates an
C<AUTOMETHDEF> from C<AUTOLOAD>. C<AUTOLOAD> works as it does in
Perl 5. C<AUTOMETHDEF> is never called
unless
there is already a
declaration of the stub (or equivalently, C<AUTOMETH> faked a stub.)
It would be possible to just define everything in terms of C<WALKCLASS>,
but that would imply looking up
each
method name twice, once inside
C<WALKCLASS> to see
if
the method
exists
in the current class, and
once again outside in order to call it. Even
if
C<WALKCLASS> caches
the cache list, it wouldn
't cache the derived method list, so it'
s
better to have a separate cache
for
that, controlled by C<WALKMETH>,
since that's the common case and
has
to be fast.
(Again, this is all abstract, and is probably implemented in gloriously
grungy C code. Nevertheless, you can probably call C<WALKCLASS> and
C<WALKMETH> yourself
if
you feel like writing your own dispatcher.)
=head1 Multiple Dispatch
Multiple dispatch is based on the notion that methods often mediate the
relationships of multiple objects of diverse types, and therefore the
first object in the argument list should not be privileged over other
objects in the argument list
when
it comes to selecting which method
to run. In this view, methods aren't subservient to a particular
class, but are independent agents. A set of independent-minded,
identically named methods
use
the class hierarchy to
do
pattern
matching on the argument list and decide among themselves which method
can best handle the
given
set of arguments.
The Perl approach is, of course, that sometimes you want to distinguish
the first invocant, and sometimes you don't. The interaction of
these two approaches gets, um, interesting. But the basic notion
is to let the
caller
specify which approach is expected, and then,
where it makes sense, fall back on the other approach
when
the first
one fails. Underlying all this is the Principle of Least Surprise.
Do not confuse this
with
the Principle of Zero Surprise, which usually
means you
've just swept the real surprises under some else'
s carpet.
(There
's a certain amount of surprise you can'
t go below--the
Heisenberg Uncertainty Principle applies to software too.)
With traditional multimethods, all methods live in the same global
namespace. Perl 6 takes a different approach--we still keep all the
traditional Perl namespaces (lexical,
package
, global) and we still
search
for
names the same way (outward through the lexical scopes,
then the current
package
, then the global C<*> namespace; or upward in
the class hierarchy). Then we simply claim that, under multiple
dispatch, the
"long name"
of any multi routine includes its signature,
and that visibility is based on the long name. So an inner or derived
multi only hides an outer or base multi of the same name I<and> the
same signature. (Routines not declared
"C<multi>"
still hide everything
in the traditional fashion.)
To put it another way, the multiple dispatch always works
when
both
the
caller
and the callee agree that that's how it should work.
(And in some cases it also works
when
it ought to work, even
if
they
don't agree--
sort
of a
"common law"
multimethod, as it were...)
=head2 Declaration of Multiple Dispatch Routines
A callee agrees to the multiple dispatch
"contract"
by including
the word
"C<multi>"
in the declaration of the routine in question.
It essentially says, "Ordinarily this would be a unique name, but
it's okay to have duplicates of this name (the short name) that are
differentiated by signatures (the long name)."
Looking at it from the other end, leaving the
"C<multi>"
out says "I am
a perfect match
for
any signature--don't bother looking any further
outward or upward." In other words, the standard non-multi semantics.
You may not declare a multi in the same scope as a non-multi.
However, as long as they are in different scopes, you can have a
single non-multi inside a set of multis, or a set of multis inside
a single non-multi. You can even have a set of multis inside a non-multi
inside a set of multis. Indeed, this is how you hide all the outer
multis so that only the inner multi's long names are considered.
(And
if
no
long name matches, you get the intermediate non-multi as
a kind of backstop.) The same policy applies to both nested lexical
scopes and derived subclasses.
Actually, up till now we've been oversimplifying the concept of
"long name"
slightly. The long name includes only that part of
the signature up to the first colon. If there is
no
colon, then the
entire signature is part of the long name. (You can have more colons,
in which case the additional arguments function as
tie
breakers
if
the original set of long names is insufficient to prevent a
tie
.)
So sometimes we'll probably slip and
say
"signature"
when
we mean
"long name"
. We pray your indulgence.
=head3 multi
sub
A multi
sub
in any scope hides any multi
sub
with
the same
"long name"
in any outer scope. It does not hide subs
with
the same short name
but a different signature. Er, long name, I mean...
=head3 multi
sub
* (tradition multimethods)
If you want a multi that is visible in all namespaces (that don't
hide the long name), then declare the name in the global name space,
indicated in Perl 6
with
a C<*>. Most of the so-called
"built-ins"
are declared this way:
multi
sub
*push
(Array
$array
, *
@args
) {...}
multi
sub
*infix
:+ (Num
$x
, Num
$y
) returns Num {...}
multi
sub
*infix
:.. (Int
$x
, Int
$y
: Int ?
$by
) returns Ranger {...}
[Update: Those are now named C<< infix:<+> >> and C<< infix:<..> >>.]
Note the
use
of colon in the
last
example to exclude C<
$by
> as part of
the long name. The range operator is dispatched only on the types of
its two main arguments.
=head3 multi method
If you declare a method
with
C<multi>, then that method hides any base class
method
with
the same long name. It does not hide methods
with
the same
short name but a different signature
when
called as a multimethod.
(It does hide methods
when
called under single dispatch, in which
case the first invocant is treated as the I<only> invocant regardless
of where you put the colon. Just because a method is declared
with
C<multi> doesn't make it invisible to single dispatch.)
Unlike a regular method declaration, there is
no
implied invocant in
the syntax of a multi method. A method declared as multi I<must>
declare all its invocants so that there's
no
ambiguity as to the
meaning of the first colon. With a multi method, it always means the end of
the long name. (With a non-multi, it always means that the optional
invocant declaration is present.)
=head3 multi submethod
Submethods may be declared
with
C<multi>, in which case visibility
works the same as
for
ordinary methods. However, a submethod
has
the additional constraint that the first invocant must be an exact
class match. Which effectively means that a submethod is first single
dispatched to the class, and then the appropriate submethod within
that class is selected, ignoring any other class's submethods of the
same name.
=head3 multi rule
Since rules are just methods in disguise, you can have multi rules as
well. (Of course, that doesn't
do
you a lot of good
unless
you have
rules
with
different signatures, which is unusual.)
=head3 multi submethod BUILD
It is not likely that Perl 6.0.0 will support multiple dispatch on
named arguments, but only on positional arguments. Since all the extra
arguments to a C<BUILD> routine come in as named arguments, you probably
can't usefully multi a C<BUILD> (yet). However, we should not
do
anything
that precludes multiple C<BUILD> submethods in the future. Which means we
should probably enforce the presence of a colon
before
the first named
argument declaration in any multi signature, so that the semantics
don't suddenly change
if
and
when
we start supporting multiple dispatch
that includes named arguments as part of the long name.
=head3 multi method constructors
To the extent that you declare constructors (such as C<.new>)
with
positional arguments, you can
use
C<multi> on them in 6.0.0.
=head2 Calling via Multiple Dispatch
As we mentioned, multiple dispatch is enabled by agreement of both
caller
and callee. From the
caller
's point of view, you invoke
multiple dispatch simply by calling
with
subroutine call syntax instead
of method call syntax. It's then up to the dispatcher to figure out
which of the arguments are invocants and which ones are just options.
(In the case where the innermost visible subroutine is declared non-multi,
this degenerates to the Perl 5 semantics of subroutine calls.) This
approach lets you refactor a simple subroutine into a more nuanced
set of subroutines without changing how the subroutines are called
at all. That makes this
sort
of refactoring drop-dead simple. (Or
at least as simple as refactoring ever gets...)
It's a little harder to refactor between single dispatch and multiple
dispatch, but a good argument could be made that it I<should> be harder
to
do
that, because you're going to have to think through a lot more
things in that case anyway.
Anyway, here's the basic relationship between single dispatch and multiple
dispatch. Single dispatch is more familiar, so we'll discuss multiple
dispatch first.
=head3 Multiple dispatch semantics
Whenever you make a call using subroutine call syntax, it's a candidate
for
multiple dispatch. A search is made
for
an appropriate subroutine
declaration. As in Perl 5, this search goes outward through the
lexical scopes, then through the current
package
and on to the
global namespace (represented in Perl 6
with
an initial *
for
the
"wildcard"
package
name). If the name found is not a multi, then
it's a good old-fashioned
sub
call, and
no
multiple dispatch is done.
End of story.
However,
if
the first declaration we come to is a multi, then lots of
interesting stuff happens. (Fortunately
for
our
performance, most of
this interesting stuff can happen at compile
time
, or upon first
use
.)
The basic idea is that we will collect a complete list of candidates
before
we decide which one to call.
So the search continues outward, collecting all
sub
declarations
with
the same short name but different long names. (We can ignore
outer declarations that are hidden by an inner declaration
with
the
same long name.) If we run into a scope
with
a non-multi declaration,
then we're done generating
our
candidate list, and we can skip the
next
paragraph.
After going all the way out to the global scope, we then examine the
type of the first argument as
if
we were about to
do
single dispatch on
it. We then visit any classes that would have been single dispatched,
in most-derived to least-derived order, and
for
each
of those classes
we add into
our
candidate list any methods declared multi, plus all
the single invocant methods, whether or not they were declared multi!
In other words, we just add in all the methods declared in the class
as a subset of the candidates. (There are reasons
for
this that we'll
discuss below.) Anyway, just as
with
nested lexical scopes,
if
two
methods have the same long name, the more derived one hides the less
derived one. And
if
there's a class in which the method of the same
short name is not declared multi, it serves as a
"stopper"
, just as a
non-multi
sub
does in a lexical scope. (Though that
"stopper"
method
can of course redispatch further up the inheritance tree, just as a
"stopper"
lexical
sub
can always call further outward
if
it wants to.)
Now we have
our
list of candidates, which may or may not include every
sub
and method
with
the same short name, depending on whether we hit a
"stopper"
. Anyway, once we know the candidate list, it is sorted into
order of distance from the actual argument types. Any exact match on a
parameter type is distance 0. Any miss by a single level of derivation
counts as a distance of 1. Any violation of a hard constraint (such as
having too many arguments
for
the number of parameters, or violating
a subtype check on a type that does constraint checking, or missing
the exact type on a submethod) is effectively an infinite distance,
and disqualifies the candidate completely.
Once we have
our
list of candidates sorted, we simply call the first
one on the list,
unless
there's more than one
"first one"
on the
list, in which case we look to see
if
one of them is declared to be
the
default
. If so, we call it. If not, we
die
.
So
if
there's a
tie
, the
default
routine is in charge of subsequent
behavior:
multi
sub
foo (BaseA
$a
, BaseB
$b
) is
default
{
next
METHOD;
}
multi
sub
bar (BaseA
$a
, BaseB
$b
) is
default
{
last
METHOD;
}
multi
sub
baz (BaseA
$a
, BaseB
$b
) is
default
{
my
@ambiguities
= WALKMETH(
$startclass
, :method(
'baz'
))
or
last
METHOD;
pop
(
@ambiguities
).(
$a
,
$b
);
}
multi
sub
baz (BaseA
$a
, BaseB
$b
) is
default
{
my
@ambiguities
=
@CALLER::methods
or
last
METHOD;
pop
(
@ambiguities
).value.(
$a
,
$b
);
}
In many cases, of course, the
default
routine won't redispatch, but simply
do
something generically appropriate.
=head3 Single dispatch semantics
If you
use
the dot notation, you are explicitly calling single dispatch.
By
default
,
if
single dispatch doesn't find a suitable method, it does
a
"failsoft"
to multiple dispatch, pretending that you called a subroutine
with
the invocant passed as the first argument. (Multiple dispatch doesn't
need to failsoft to single dispatch since all single dispatch methods are
included as a subset of the multiple dispatch candidates anyway.)
This failsoft behavior can be modified by lexically scoped pragma.
If you
say
then single dispatch will be totally unforgiving as it is in Perl 5.
Or you can
tell
single dispatch to go away:
in which case all your dot notation is treated as a
sub
call. That is, any
$obj
.method(1,2,3)
in the lexical scope acts like you'd said:
method(
$obj
,1,2,3)
If single dispatch locates a class that defines the method, but the
method in question turns out to be a set of one or more multi methods,
then, the single dispatch fails immediately and a multiple dispatch
is done,
with
the additional constraint that only multis within that
class are considered.
(If you wanted the first argument to
do
loose matching as well,
you should have called it as a multimethod in the first place.)
=head3 Indirect objects
If you
use
indirect object syntax
with
an explicit colon, it is
exactly equivalent to dot notation in its semantics.
However, one-argument subs are inherently ambiguous, because Perl 6
does not
require
the colon on indirect objects without arguments.
That is,
if
you
say
:
print
$fh
it's not clear whether you mean
$fh
.
print
or
print
(
$fh
)
As it happens, we
've defined the semantics so that it doesn'
t matter.
Since all single invocant methods are included automatically in
multimethod dispatch, and since multiple dispatch degenerates to
single dispatch
when
there
's only one invocant, it doesn'
t matter
which way your
write
it. The effect is the same either way. (Unless
you've
defined
your own non-multi
print
routine in a surrounding
lexical scope. But then,
if
you've done that, you probably did it
on purpose I<precisely> because you wanted to disable the
default
dispatch semantics.)
=head2 Meaning of
"next METHOD"
Within the context of a multimethod dispatch,
"C<next METHOD>"
means to
try
the
next
best match,
if
unambiguous, or
else
the marked
default
method. From within the
default
method it means just pick the
next
in the list even
if
it's ambiguous. The dispatch list is actually
kept in C<
@CALLER::methods
>, which is a list of pairs, the key of
each
indicating the
"distance"
rating, and the value of
each
containing
a reference to the method to call (as a
sub
ref
).
=head2 Making Fiends, er, Friends.
If you want to directly access the attributes of a class, your multi
must be declared within the scope of that class. Attributes are
never directly visible outside a class. This makes it difficult to
write
an efficient multimethod that knows about the internals of two
different classes. However, it's possible
for
private accessors to
be visible outside your class under one condition. If your class
declares that another class is trusted, that other class can see the
private accessors of your class. If the other class declares that
you are trusted, then you can see its private accessor methods.
The trust relationship is not necessarily symmetrical.
This lets you have an architecture where classes by and large
don't trust
each
other, but they all trust a single well-guarded
"C<multi>-plexor"
class that keeps everyone
else
in line.
The syntax
for
trusting another class is simply:
class MyClass {
trusts Yourclass;
...
}
It's not clear whether roles should be allowed to grant trust.
In the absence of evidence to the contrary, I'm inclined to
say
not.
We can always relax that later
if
,
after
many large, longitudinal,
double-blind studies, it turns out to be both safe and effective.
=head1 Overloading
In Perl 5 overloading was this big special deal that had to have
special hooks inserted all over the C code to
catch
various operations
on overloaded types and
do
something special
with
them. In Perl 6,
that just all falls out naturally from multiple dispatch. The only
other part of the trick is to consider operators to be function calls
in disguise. So in Perl 6 the real name of an operator is composed of
a grammatical context identifier, a colon, and then the name of the
operator as you usually see it. The common context identifiers are
"prefix"
,
"infix"
,
"postfix"
,
"circumfix"
, and
"term"
, but there are
others.
So
when
you
say
something like
$x
= <
$a
++ * -
@b
.[...]>;
you're really saying something like this:
$x
= circumfix:<>(
infix:*(
postfix:++(
$a
),
prefix:-(
infix:.(
@b
,
circumfix:[](
term:...();
)
)
)
)
)
[Update: All the operator names need to be quoted now as a hash
subscript or slice. And instead of breaking C<.[]> into C<.> and
C<[]>, there is now a C<postcircumfix> grammatical category. So we
have:
$x
= circumfix:«< >»(
infix:<*>(
postfix:<++>(
$a
),
prefix:<->(
postcircumfix:<[ ]>(
@b
,
term:<...>();
)
)
)
)
Except that circumfix C<< <...> >> is a quoting operator these days.
many of the examples using French quotes in this Apocalypse are now
written
with
regular angles.]
Perl 5 had special key names representing stringification and numification.
In Perl 6 these naturally fall out
if
you define:
method prefix:+ () {...}
method prefix:~ () {...}
[Update: Here and below, it's C<< prefix:<+> >> etc.]
Likewise you can define what to
return
in boolean context:
method prefix:? () {...}
Integer context is, of course, just an ordinary method:
method
int
() {...}
These can be
defined
as normal methods since single-invocant multi
subs degenerate to standard methods anyway. C++ programmers will tend
to feel comfy defining these as methods. But others may prefer to
declare them as multi subs
for
consistency
with
binary operators.
In which case they'd look more like this:
multi
sub
*prefix
:+ (Us
$us
) {...}
multi
sub
*prefix
:~ (Us
$us
) {...}
multi
sub
*prefix
:? (Us
$us
) {...}
multi
sub
*prefix
:
int
(Us
$us
) {...}
Coercions to other classes can also be
defined
:
multi
sub
*coerce
:as (Us
$us
, Them ::to) { to.transmogrify(
$us
) }
Such coercions allow both explicit conversion:
$them
=
$us
as Them;
as well as implicit conversions:
my
Them
$them
=
$us
;
=head2 Binary Ops
Binary operators should generally be
defined
as multi subs:
multi
sub
infix:+ (Us
$us
, Us
$ustoo
) {...}
multi
sub
infix:+ (Us
$us
, Them
$them
) is commutative {...}
[Update: And these are C<< infix:<+> >> now.]
The
"C<is commutative>"
trait installs an additional autogenerated
sub
with
the invocant arguments reversed, but
with
the same semantics otherwise.
So the declaration above effectively autogenerates this:
multi
sub
infix:+ (Them
$them
, Us
$us
) {...}
Of course, there's
no
need
for
that
if
the two arguments have the
same type. And there might not actually be an autogenerated other
subroutine in any case,
if
the implementation can be smart enough
to simply swap the two arguments
when
it needs to. However it gets
implemented, note that there
's no need for Perl 5'
s "reversed arguments
flag" kludge, since we
reverse
the parameter name bindings along
with
the types. Perl 5 couldn't
do
that because it had
no
control
of the signature from the compiler's point of view.
See Apocalypse 6
for
much more on the definition of user-
defined
operators, their precedence, and their associativity. Some of it
might even still be accurate.
=head1 Class Composition
with
Roles
Objects have many kinds of relationships
with
other objects. One of
the pitfalls of the early OO movement was to encourage people to
model many relationships
with
inheritance that weren't really
"isa"
relationships. Various languages have sought to redress
this deficiency in various ways,
with
varying degrees of success.
With Perl 6 we'd like to back off a step and allow the user to
define abstract relationships between classes without committing to
a particular implementation.
More specifically, we buy the argument of the Traits paper (see
that classes should not be used both to manage objects and to manage
code reuse. It needs to be possible to separate those concerns.
Since a lot of the code that people want to reuse is that which
manages non-isa object relationships, that's what we should abstract
out from classes.
That abstraction we are calling a role. Roles can encompass both
interface and implementation of object relationships. A role without
implementation degenerates to an interface. A role without interface
degenerates to privately instantiated generics. But the typical role
will provide both interface and at least a
default
implementation.
Unlike the Traits paper, we will allow state as part of
our
implementation. This is necessary
if
we are to abstract out the
delegation decision. We feel that the decision to delegate rather
than compose a
sub
-object is a matter of implementation, and therefore
that decision should be encapsulated (or at least be allowed to be
encapsulated) in a role. This allows you to refactor a problem by
redefining one or more roles without having to doctor all the classes
that make
use
of those roles. This is a great way to turn your huge,
glorious
"god object"
into a cooperating set of objects that know
how to delegate to
each
other.
As in the Traits paper, roles are composed at class construction
time
, and the class composer does some work to make sure the composed
class is not unintentionally ambiguous. If two methods of the same
name are composed into the same class, the ambiguity will be caught.
The author of the class
has
various remedies
for
dealing
with
this
situation, which we'll go into below.
From the standpoint of the typical user, a role just looks like a
"smart"
include of a
"partial class"
. They're smart in that roles
have to be well behaved in certain respects, but most of the
time
the naive user can ignore the power of the abstraction.
=head2 Declaration of Roles
A role is declared much like a class, but
with
a C<role> keyword
instead:
role Pet {
method feed (
$food
) {
$food
.open_can();
$food
.put_in_bowl();
.call();
}
}
A role may not inherit from a class. It may be composed of other
roles, however. In essence, a role doesn't know its own type yet,
because it will be composed into another type. So
if
you happen
to make any mention of its main type (available as C<::_>),
that mention is in fact generic. Therefore the type of
C<
$self
> is generic. Likewise
if
you refer to C<SUPER>, the role doesn't
know what the parent classes are yet, so that's also generic.
The actual types are instantiated from the generic types
when
the
role is composed into the class. (You can
use
the role name (
"C<Pet>"
)
directly, but only in places where a role name is allowed as a type
constraint, not in places that declare the type of an actual object.)
Just as the body of a class declaration is actually a method call
on an instance of the C<MetaClass> class, so too the body of a role
declaration is actually a method call on an instance of the C<MetaRole>
class, which is like the C<MetaClass> class,
with
some tweaks to manage
C<Role> objects instead of C<Class> objects. For instance, a C<Role> object
doesn't actually support a dispatcher like a C<Class> object.
C<MetaRole> and C<MetaClass>
do
not inherit from
each
other. More likely
they both inherit from C<MetaModule> or some such.
=head3 Parametric types
A role's main type is generic by
default
, but you can also parameterize
other types explicitly:
role Pet[Type
$petfood
= TableScraps] {
method feed (::(
$petfood
)
$food
) {...}
}
Unlike certain other languages you may be altogether too familiar
with
,
Perl uses square brackets
for
parametric types rather than angles.
Within those square brackets it uses standard signature notation, so
you can also
use
the arguments to pass initial
values
,
for
instance.
Just bear in mind that by
default
any parameters to a role or class
are considered part of the name of the class
when
instantiated.
Inasmuch as instantiated type names are reminiscent of multimethod
"long names"
, you may
use
a colon to separate those arguments that
are to be considered part of the name from those that are just options.
Please note that these types can be as latent (or as non-latent) as
you like. Remember that what looks like compile
time
to you is actually
run
time
to the compiler, so it's free to
bind
types as early or late
as you
tell
it to, including not at all.
=head3 Interfaces
If a role merely declares methods without defining them, it degenerates
to an interface:
role Pet {
method feed (
$food
) {...}
method groom () {...}
method scratch (+
$where
) {...}
}
When such a role is included in a class, the methods then have to be
defined
by the class that uses the role. Actually,
each
method is
on its own--a role is free to define
default
implementations
for
any
subset of the methods it declares.
=head3 Private interfaces
If a role declares private accessors, those accessors are private
to the class, not the role. The class must define any private
implementations that are not supplied by the role, just as
with
public methods. But private method names are never visible outside
the class (except to its trusted proxy classes).
=head3 Encapsulated Attributes
Unlike in the Traits paper, we allow roles to have state. Which is fancy
way of saying that the role can define attributes, and methods that act on
those attributes, not just methods that act only on other methods.
role Pet {
has
$.collar = { Collar.new(Tag.new) };
method id () {
return
$.collar.tag }
method lose_collar () {
undef
$.collar }
}
By the way, I think that
when
C<$.collar> is undefined, calling
C<.tag> on it should merely
return
C<
undef
> rather than throwing an
exception (in the same way that C<
@foo
[
$x
][
$y
][
$z
]> returns C<
undef
>
when
C<
@foo
[
$x
]> is undefined, and
for
the same reason). The C<
undef
>
object returned should, of course, contain an unthrown exception
documenting the problem, so that
if
the C<
undef
> is ever asked to
provide a
defined
value, it can explain why it can't
do
so. Or
if
the returned value is tested by C<//>, it can participate in the
resulting error message.
If you want to parameterize the initial value of a role attribute,
be sure to put a colon
if
you don't want the parameter to be considered
part of the long name:
role Pet[IDholder
$id
:
$tag
] {
has
IDholder $.collar .= new(
$tag
);
}
class Dog does Pet[Collar, DogLicense(
"fido"
)] {...}
class Pigeon does Pet[LegBand, RacerId()] {...}
my
$dog
= new Dog;
my
$pigeon
= new Pigeon;
In which case the long names of the roles in question are C<Pet[Collar]>
and C<Pet[LegBand]>. In which case all of these are true:
$dog
.does(Dog)
$dog
.does(Pet)
$dog
.does(Pet[Collar])
but this is false:
$dog
.does(Pet[LegBand])
Anyway, where were we. Ah, yes, encapsulated attributes, which leads
us to...
=head3 Encapsulated private attributes
We can also have private attributes:
has
Nose $:sniffer .= new();
And encapsulated private attributes lead us to...
=head3 Encapsulated delegation
A role can abstract the decision to delegate:
role Pet {
has
$:groomer handles «bathe groom trim» = hire_groomer();
}
Now
when
the C<Dog> or C<Cat> class incorporates the C<Pet> role,
it doesn't even have to know that the C<.groom> method is delegated
to a professional groomer. (See section on Delegation below.)
=head3 Encapsulated Inheritance
It gets worse. Since you can specify inheritance
with
an
"is"
declaration
within a class, you can
do
the same
with
a role:
role Pet {
is Friend;
}
Note carefully that this is not claiming that a C<Pet> ISA C<Friend> (though
that might be true enough). Roles never inherit. So this is only
saying that whatever animal takes on the role of C<Pet> gets some methods
from C<Friend> that just happen to be implemented by inheritance rather
than by composition. Probably C<Friend> I<should> have been written as
a role, but it wasn't (perhaps because it was written in Some Other
Language that runs on Parrot), and now you want to pretend that it
I<was> written as a role to get your project out the door. You don't
want to
use
delegation because there's only one animal involved,
and inheritance will work good enough till you can rewrite C<Friend> in
a language that supports role playing.
Of course, the really funny thing is that
if
you go across a language
barrier like that, Perl might just decide to emulate the inheritance
with
delegation anyway. But that should be transparent to you. And
if
two
languages manage to unify their object models within the Parrot engine,
you don't want to suddenly have to rewrite your roles and classes.
And the really, really funny thing is that Parrot implements roles
internally
with
a funny form of multiple inheritance anyway...
Ain't abstraction wonderful.
=head2 Use of Roles at Compile Time
Roles are most useful at compile
time
, or more precisely, at class
composition
time
, the moment in which the C<MetaClass> class is figuring
out how to put together your C<Class> object. Essentially, that's
while
the closure associated
with
your class is being executed,
with
a little extra happening
before
and
after
.
A class incorporates a role
with
the verb
"does"
, like this:
class Dog is Mammal does Pet does Sentry {...}
or equivalently, within the body of the class closure:
class Dog {
is Mammal;
does Pet;
does Sentry;
...
}
There is
no
ordering dependency among the roles, so it doesn't matter above
if
C<Sentry> comes
before
C<Pet>. That is because the class just remembers
all the roles and then meshes them
after
the closure is done executing.
Each role's methods are incorporated into the class
unless
there
is already a method of that name
defined
in the class itself.
A class's method definition hides any role definition of the same
name, so role methods are second-class citizens. On the other hand,
role methods are still part of the class itself, so they hide any
methods inherited from other classes, which makes ordinary inherited
methods third-class citizens, as it were.
If there are
no
method name conflicts between roles (or
with
the
class), then
each
role's methods can be installed in the class,
and we're done. (Unless we wish to
do
further analysis of role
interrelationships to make sure that
each
role can find the methods
it depends on, in which case we can
do
that. But
for
6.0.0 I'll be
happy
if
non-existent methods just fail at run
time
as they
do
now
in Perl 5.)
If, however, two roles
try
to introduce a method of the same name (
for
some definition of name), then the composition of the class I<fails>,
and the compilation of the program blows sky high--we sincerely hope.
It's much better to
catch
this kind of error at compile
time
if
you can.
And in this case, you can.
=head3 Conflict Resolution
There are several ways to solve conflicts. The first is simply to
write
a class method that overrides the conflicting role methods, perhaps
figuring out which role method to call. It is allowed to
use
the role
name to
select
one of the hidden role methods:
method shake (
$self
:
$arg
) {
given
$arg
{
when
Culprit {
$self
.Sentry::shake(
$arg
) }
when
Paw {
$self
.Pet::shake(
$arg
) }
}
}
So even though the methods were not officially composed into the class,
they
're still there--they'
re not thrown away.
That
last
example looks an awful lot like multiple dispatch, and in fact,
if
you declare the roles' methods
with
C<multi>, they would be treated as
methods
with
different
"long names"
, provided their signatures were
sufficiently different.
An interesting question, though, is whether the class can force two
role methods that weren't declared
"multi"
to behave as
if
they were.
Perhaps this can be forced
if
the class declares a signatureless
multi stub without defining it later in the class:
multi shake {...}
The Traits paper recommends providing ways of renaming or excluding one
or the other of the conflicting methods. We don't recommend that, because
it's better
if
you can keep both contracts through multiple dispatch to
the role methods. However, you can force renaming or exclusion by
pretending the role is a delegation:
does Pet handles [ :myshake«shake», Any ];
does Pet handles { $^name !~
"shake"
};
Or something that. (See the section on Delegation below.) If we can't
get that to work right, you can always
say
something like:
method shake { .Sentry::shake(
@_
) }
method handshake { .Pet::shake(
@_
) }
In many ways that's clearer than trying to attach a selection syntax to
"does"
.
=head2 Use of Roles at Run Time (mixins)
While roles are at their most powerful at compile
time
, they can also
function as mixin classes at run
time
. The
"does"
binary operator
performs the feat of deriving a new class and binding the object to it:
$fido
does Sentry
Actually, it only does this
if
C<
$fido
> doesn't already
do
the C<Sentry>
role. If it does already, this is basically a
no
-op. The C<does> operator
works on the object in place. It would be illegal to
say
,
for
instance,
0 does true
The C<does> operator returns the object so you can nest mixins:
$fido
does Sentry does Tricks does TailChasing does Scratch;
Unlike the compile-
time
role composition,
each
of these layers on a new
mixin
with
a new level of inheritance, creating a new anonymous class
for
dear old Fido, so that a C<.chase> method from C<TailChasing> hides a
C<.chase> method from C<Sentry>.
(Do not confuse the binary C<does>
with
the unary C<does> that you
use
inside a class definition to pull in a role.)
In contrast to C<does>, the C<but> operator works on a copy. So you can
say
:
0 but true
and you get a mixin based on a copy of 0, not the original 0, which
everyone shares. One other wrinkle is that
"true"
isn't, in fact,
a class name. It's an enumerated value of a bit class. So what we
said was a shorthand
for
something like:
0 but bit::true
In earlier Apocalypses we talked about applying properties
with
C<but>.
This
has
now been unified
with
mixins, so any
time
you
say
:
$value
but prop(
$x
)
you're really doing something more like
$tmp
=
$value
;
$tmp
does SomeRole;
$tmp
.prop =
$x
;
And therefore a property is
defined
by a role like this:
role SomeRole {
has
SomeType $.prop is rw = 1;
}
This means that
when
you mention
"C<prop>"
in your program, something
has
to know how to
map
that to the C<SomeRole> role. That would often be
something like an enum declaration. It's illegal to
use
an undeclared
property. But sometimes you just want a random old property
for
which
the role
has
the same name as the property. You can declare one
with
my
property answer;
and that essentially declares a role that looks something like
my
role answer {
has
$.answer is rw = 1;
}
Then you can
say
$a
= 0 but answer(42)
and you have an object of an anonymous type that
"does"
C<answer>,
and that include a C<.answer> accessor of the same name, so that
if
you
call C<
$a
.answer>, you'll get back C<42>. But C<
$a
> itself
has
the
value C<0>. Since the accessor is
"C<rw>"
, you can also
say
$a
.answer = 43;
There's a corresponding assignment operator:
$a
but= tainted;
That avoids copying C<
$a
>
before
tainting it. It basically means the
same thing as
$a
does taint::tainted
For more on enumerated types, see Enums below.
=head1 Traits
Here we
're talking about Perl'
s traits (as in compile-
time
properties),
not Traits (as in the Traits paper).
Traits can be thought of as roles gone wrong. Like roles, they can
function as straightforward mixins on container objects at compile
time
, but they can also cheat, and frequently
do
. Unlike roles,
traits are not constrained to play fair
with
each
other. With traits,
it's both
"first come, first served"
, and "he who laughs
last
laughs
best". Traits are applied one at a
time
to their container victim,
er, object, and an earlier trait can throw away information required
by a later trait. Contrariwise, a later trait can overrule anything
done by an earlier trait--except of course that it can't undestroy
information that
has
been totally forgotten by the earlier trait.
You might
say
that
"role"
is short
for
"role model"
,
while
"trait"
is short
for
"traitor"
. In a nutshell, roles are symbiotes,
while
traits are parasites. Nevertheless, some parasites are symbiotic,
and some symbiotes are parasitic. Go figure...
All that being said, well-behaved traits are really just roles applied
to declared items like containers or classes. It's the declaration of
the item itself that makes traits seem more permanent than ordinary
properties. The only reason we call them
"traits"
rather than
"properties"
is to continually remind people that they are, in fact,
applied at compile
time
. (Well, and so that we can make bad puns on
"traitor"
.)
Even ill-behaved traits should add an appropriately named role to
the container, however, in case someone wants to look at the metadata
properties of the container.
Traits are generally inflicted upon the
"traitee"
with
the
"is"
keyword, though other modalities are possible. When the compiler
sees words like
"is"
or
"will"
or
"returns"
or
"handles"
, or
special constructs like signatures and body closures, it calls
into an associated trait handler which applies the role to the item
as a mixin, and also does any other traitorous magic that needs doing.
To define a trait handler
for
an
"is xxx"
trait, define one or
more multisubs into a property role like this:
role xxx {
has
Int $.xxx;
multi
sub
trait_auxiliary:is(xxx
$trait
, Class
$container
: ?
$arg
) {...}
multi
sub
trait_auxiliary:is(xxx
$trait
, Any
$container
: ?
$arg
) {...}
}
[Update: That's C<< trait_auxiliary:<is> >> now.]
Then it can function as a trait. A well-behaved trait handler will
say
$container
does xxx(
$arg
);
somewhere inside to set the metadata on the container correctly. Then
not only can you
say
class MyClass is xxx(123) {...}
but you'll also be able to
say
if
MyClass.meta.xxx == 123 {...}
Since a class can function as a role
when
it comes to parameter type
matching, you can also
say
:
class MyBase {
multi
sub
trait_auxiliary:is(MyBase
$base
, Class
$class
: ?
$arg
) {...}
multi
sub
trait_auxiliary:is(MyBase
$tied
, Any
$container
: ?
$arg
) {...}
}
These capture control
if
C<MyBase> wants to capture control of how it gets
used by any class or container. But usually you can just let it call
the generic defaults:
multi
sub
*trait_auxiliary
:is(Class
$base
, Class
$class
: ?
$arg
) {...}
which adds C<
$base
> to the
"isa"
list of C<
$class
>, or
multi
sub
*trait_auxiliary
:is(Class
$tied
, Any
$container
: ?
$arg
) {...}
which sets the
"tie"
type of the container to the implementation type
in C<
$tied
>.
In any event,
if
the trait supplies the optional argument, that
comes in as C<
$arg
>. (It's probably something unimportant, like the
function body...) Note that unlike
"pair options"
such as
"C<:wag>"
,
traits
do
not necessarily
default
to the value 1
if
you don't supply
the argument. This is consistent
with
the notion that traits don't
generally
do
something passive like setting a value somewhere,
but something active like totally screwing up the structure of your
container.
Most traits are introduced by
use
of a
"helping verb"
, which could
be something like
"C<is>"
, or
"C<will>"
, or
"C<can>"
, or
"C<might>"
, or
"C<should>"
,
or
"C<does>"
. We call these helping verbs
"trait auxiliaries"
. Here's
"C<will>"
, which (being syntactic sugar) merely delegates to back to
"is"
:
multi
sub
*trait_auxiliary
:will(
$trait
,
$container
:
&arg
) {
trait_auxiliary:is(
$trait
,
$container
,
&arg
);
}
Note the declaration of the argument as a non-optional reference to
a closure. This is what allows us to
say
:
my
$dog
will eat { anything() };
rather than having to
use
parens:
my
$dog
is eat({ anything() });
Other traits are applied
with
a single word, and we call one of those a
"trait verb"
. For instance, the
"C<returns>"
trait described in
Apocalypse 6 is
defined
something like this:
role returns {
has
ReturnType $.returns;
multi
sub
trait_verb:returns(
$container
: ReturnType
$arg
) {
$container
does returns(
$arg
);
}
...
}
[Update: Make that C<< trait_verb:<returns> >> now.]
Note that the argument is not optional on
"C<returns>"
.
Earlier we
defined
the C<xxx> trait using multi
sub
definitions:
role xxx {
has
Int $.xxx;
multi
sub
trait_auxiliary:is(xxx
$trait
, Class
$container
: ?
$arg
) {...}
multi
sub
trait_auxiliary:is(xxx
$trait
, Any
$container
: ?
$arg
) {...}
}
This is one of those situations in which you may really want
single-dispatch methods:
role xxx {
has
Int $.xxx;
method trait_auxiliary:is(xxx
$trait
: Class
$container
, ?
$arg
) {...}
method trait_auxiliary:is(xxx
$trait
: Any
$container
, ?
$arg
) {...}
}
Some traits are control freaks, so they want to make sure that anything
mentioning them comes through their control. They don't want something
dispatching to another trait's C<trait_auxiliary:is> method just because
someone introduced a cute new container type they don't know about.
That other trait would just mess things up.
Of course,
if
a trait is feeling magnanimous, it should just go ahead
and
use
multi subs. Since the multi-dispatcher takes into account
single-dispatch methods, and the distance of an exact match on the
first argument is 0, the dispatcher will generally respect the wishes
of both the paranoid and the carefree.
Note that we included
"does"
in
our
list of
"helping verbs"
.
Roles actually implement themselves using the trait interface, but the
generic version of C<trait_auxiliary:does> defaults to doing proper roley
things rather than proper classy things or improper traitorous things.
So yes, you could define your own C<trait_auxiliary:does> and turn your
nice role traitorous. That would be...naughty.
But apart from how you typically invoke them, traits and roles are
really the same thing. Just like the roles on which they're based,
you may neither instantiate nor inherit from a trait. You may,
however,
use
their names as type constraints on multimethod signatures
and such. As
with
well-behaved roles, they should define attributes or
methods that show up as metadata properties where that's appropriate.
Unlike compile-
time
roles, which all flatten out in the same class,
compile-
time
traits are applied one at a
time
, like mixin roles.
You can, in fact, apply a trait to a container at run
time
, but
if
you
do
, it's just an ordinary mixin role. You have to call the
appropriate C<trait_auxiliary:is()> routine yourself
if
you want it to
do
any extra shenanigans. The compiler won't call it
for
you at run
time
like it would at compile
time
.
When you define a helping verb such as
"is"
or
"does"
, it not only
makes it a postfix operator
for
declarations, but a unary operator
within class and role closures. Likewise, declarative closure blocks
like C<BEGIN> and C<INIT> are actually trait verbs, albeit ones that can
add multiple closures to a queue rather than adding a single property.
This implies that something like
sub
foo {
LEAVE {...}
...
}
could (except
for
scoping issues) equivalently be written:
sub
foo LEAVE {...} {
...
}
Though why you
'd want to that, I don'
t know. Hmm,
if
we really generalize
trait verbs like that, then you could also
write
things like:
sub
foo {
is signature (
'int $x'
);
is cached;
returns Int;
...
}
That
's gettin'
a little out there. Maybe we won't generalize it
quite that far...
=head1 Delegation
Delegation is the art of letting someone
else
do
your work
for
you.
The fact that you consider it
"your"
work implies that delegation is
actually a means of taking credit in advance
for
what someone
else
is
going to
do
. In terms of objects, it means pretending that some other
object's methods are your own. Now, as it happens, you can always
do
that by hand simply by writing your own methods that call out to
another object's methods of the same name. So any shorthand
for
doing
that is pure syntactic sugar. That
's what we'
re talking about here.
Delegation in this sugary sense always requires there to be an
attribute to keep a reference to the object we're delegating to.
So
our
syntactic relief will come in the form of annotations on a
"C<has>"
declaration. We could have decided to instead attach annotations
to
each
method declaration associated
with
the attribute, but by the
time
you
do
this, you've repeated so much information that you almost
might as well have written the non-sugary version yourself. I know
that
for
a fact, because that's how I originally proposed it. C<:-)>
Delegation is specified by a
"handles"
trait verb
with
an argument
specifying one or more method names that the current object and the
delegated object will have in common:
has
$:tail handles
'wag'
;
Since the method name (but nothing
else
) is known at class construction
time
, the following C<.wag> method is autogenerated
for
you:
method wag (*
@args
is context(Lazy)) { $:tail.wag(*
@args
) }
(It's necessary to specify a C<Lazy> context
for
the arguments to a such a
delegator method because the actual signature is supplied by the tail's
C<.wag> method, not your method.) So as you can see, the delegation
syntax already cuts
our
typing in half, not to mention the reading.
The win is even greater
when
you specify multiple methods to delegate:
has
$:legs handles «walk run lope shake pee»;
Or equivalently:
has
$:legs handles [
'walk'
,
'run'
,
'lope'
,
'shake'
,
'pee'
];
You can also
say
things like
my
@legmethods
:= «walk run lope shake pee»;
has
$:legs handles (
@legmethods
);
since the
"C<has>"
declaration is evaluated at class construction
time
.
Of course, it's illegal to call the outer method
unless
the attribute
has
been initialized to an object of a type supporting the method.
So a declaration that makes a new delegatee at object build
time
might be specified like this:
has
$:tail handles
'wag'
will build { Tail.new(*
%_
) };
or, equivalently,
has
$:tail handles
'wag'
= { Tail.new(*
%_
) };
This automatically performs
$:tail = Tail.new(*
%_
);
when
C<BUILD> is called on a new object of the current class (
unless
C<BUILD> initializes C<$:tail> to some other value). Or, since you
might want to declare the type of the attribute without duplicating
it in the
default
value, you can also
say
has
Tail $:tail handles
'wag'
= { .new(*
%_
) };
or
has
Tail $:tail handles
'wag'
will build { .new(*
%_
) };
Note that putting a C<Tail> type on the attribute does not necessarily
mean that the method is always delegated to the C<Tail> class.
The dispatch is still based on the I<run-
time
> type of the object,
not the declared type. So
has
Tail $:tail handles
'wag'
= { LongTail.new(*
%_
) };
delegates to the C<LongTail> class, not the C<Tail> class. Of course, you'll
get an exception at build
time
if
you
try
to
say
:
has
Tail $:tail handles
'wag'
= { Dog.new(*
%_
) };
since C<Dog> is not derived from C<Tail> (whether or not the tail can wag
the dog).
We declare C<$:tail> as a private attribute here, but C<$.tail> would
have worked just as well. A C<Dog>'s tail does seem to be a public
interface,
after
all. Kind of a
read
-only accessor.
=head2 Wildcard Delegation
We've seen that the argument to
"C<handles>"
can be a string or a list
of strings. But any argument or subargument that is not a string is
considered to be a smartmatch selector
for
methods. So you can
say
:
has
$:fur handles /^get_/;
and then you can
do
the C<.get_wet> or C<.get_fleas> methods (presuming
there are such), but you can't call the C<.shake> or C<.roll_in_the_dirt>
methods. (Obviously you don't want to delegate the C<.shake> method since
that means something
else
when
applied to the C<Dog> as a whole.)
If you
say
has
$:fur handles Groomable;
then you get only those methods available via the C<Groomable> role
or class.
Wildcard matches are evaluated only
after
it
has
been determined that
there's
no
exact match to the method name. They therefore function
as a kind of autoloading in the overall pecking order. If the class
also
has
an C<AUTOLOAD>, it is called only
if
none of the wildcard
delegations match. (An C<AUTOMETHDEF> is called much earlier, since it
knows from the stub declarations whether there is supposed to be a
method of that name. So you can think of explicit delegation as a
kind of autodefine, and wildcard delegation as a kind of autoload.)
When you have multiple wildcard delegations to different objects,
it's possible to have a conflict of method names. Wildcard method
matches are evaluated in order, so the earliest one wins. (Non-wildcard
method conflicts can be caught at class composition
time
.)
=head2 Renaming Delegated Methods
If, where you would ordinarily specify a string, you put a pair, then
the pair maps the method name in this class to the method name in the
other class. If you put a hash,
each
key/value pair is treated as
such a mapping. Such mappings are not considered wildcards.
has
$:fur handles { :shakefur«shake» :scratch«get_fleas» };
Perhaps that reads better
with
the old pair notation:
has
$:fur handles {
shakefur
=>
'shake'
,
scratch
=>
'get_fleas'
};
You I<can>
do
a wildcard renaming, but not
with
pairs. Instead
do
smartmatch
with
a substitution:
has
$:fur handles (s/^furget_/get_/);
As always, the left-to-right mapping is from this class to the other one.
The pattern matching is working on the method name passed to us, and
the substituted method name is used on the class we delegate to.
=head2 Delegation without an Attribute
Ordinarily delegation is based on an attribute holding an object
reference, but there's
no
reason in principle why you have to
use
an
attribute. Suppose you had a C<Dog>
with
two tails. You can delegate
based on a method call:
method select_tail handles «wag hang» {...}
The arguments are sent to both the delegator and delegatee method.
So
when
you call
$dog
.wag(:fast)
you're actually calling
$dog
.select_tail(:fast).wag(:fast)
If you
use
a wildcard delegation based on a method, you should be
aware that it
has
to call the method
before
it can even decide whether
there's a valid method call to the delegatee or not. So it behooves
you not to get too fancy
with
C<select_tail()>, since it might just have
to throw all that work away and go on to the
next
wildcard specification.
=head2 Delegation of Handlers
If your delegation object happens to be an array:
has
@:handlers handles
'foo'
;
then something cool happens. <cool rays> In this case Perl 6
assumes that your array contains a list of potential handlers, and you
just want to call the I<first> one that succeeds. This is not considered
a wildcard match
unless
the
"handles"
argument forces it to be.
Note that this is different from the semantics of a hyper method such
as C<
@objects
».foo()>, which will
try
to call the method on I<every>
object in C<
@objects
>. If you want to
do
that, you'll just have to
write
your own method:
has
@:ears;
method twitchears () { @:ears».twitch() }
Life is hard.
=head2 Hash-Based Redispatch
If your delegation object happens to be a hash:
has
%:objects handles
'foo'
;
then the hash provides a mapping from the string value of
"self"
to the object that should be delegated to:
has
%:barkers handles
"bark"
=
(
Chihauhau
=>
$yip
,
Beagle
=>
$yap
,
Terrier
=>
$arf
,
StBernard
=>
$woof
,
);
method prefix:~(
return
"$.breed"
)
[Update: That's C<< prefix:<~> >> now.]
If the string is not found in the hash, a
"C<next METHOD>"
is
automatically performed.
Again, this construct is not necessarily considered a wildcard.
In the example above we know
for
a fact that there's supposed to
be a C<.bark> method somewhere, therefore a specific method can be
autogenerated in the current class.
=head2 Relationship to Roles
Delegation is a means of including a set of methods into your class.
Roles can also include a set of methods in your class, but the difference
is that what a role includes happens at class composition
time
,
while
delegation is much more dynamic, depending on the current state of the
the delegating attribute (or method).
But there
's no reason you can'
t have your cake and eat it too, because
roles are specifically designed to allow you to pull in delegations
without the class even being aware of the fact that it's delegating.
When you include a role, you're just signing up
for
a set of methods,
with
maybe a little state thrown in. You don't care whether those
methods are
defined
directly, or indirectly. The role manages that.
In fact, this is one of the primary motivators
for
including roles
in the design of Perl 6. As a named abstraction, a role lets you
refactor all the classes using that role without changing any of the
classes involved. You can turn your single
"god"
object into a set
of nicely cooperating objects transparently. Well, you have to
do
the composition using roles first, and that's not transparent.
Note that all statically named methods are dispatched
before
any
wildcard methods, regardless of whether the methods came from a role
or the class itself. (Inherited methods also come
before
wildcard
methods because we order all the cachable method dispatches
before
all the non-cachable ones. But see below.) So the lookup order is:
=over 4
=item 1.
This class's declared methods (including autodefs and delegations)
=item 2.
An included role's declared methods (including autodefs and delegations)
=item 3.
Normal inherited methods (including autodefs and delegations
of the parent class)
=item 4.
Wildcard delegated methods in this class (or failing that, from any
inherited class that does wildcard delegations)
=item 5.
Methods autoloaded by an autoloader
defined
in this class (or failing
that, an autoloader from any inherited class)
=back
Note that any method that is stubbed (declared but not yet
defined
)
in steps 1 or 2 skips straight to step 4, because it means this
class thinks it
"owns"
a method of that name. (At this point Perl 5
would skip straight to step 5, but Perl 6 still wants to
do
wildcard
delegation
before
falling back on inherited autoloading.)
=head2 Anonymous Delegation
for
ISA Emulation
When you inherit from a class
with
a different layout policy, Perl
has
to emulate inheritance via anonymous delegation. In this case it
installs a wildcard delegation
for
you. According to the list above,
this gives precedence to all methods
with
the same layout policy over
all methods
with
a different layout policy. This might be a feature,
especially
when
calling cross-language. Then again, maybe it isn't.
There is
no
"C<has>"
variable
for
such an anonymous delegation. Its
delegated object is stored as a property on the class's entry
in the ISA list, probably. (Or we could autogenerate an attribute
whose name is related to the class name, I suppose.)
Since one of the primary motivations
for
allowing this is to make it
possible to call back and forth between Perl 5 and Perl 6 objects,
we need to make that as transparent as possible. When a Perl 6
object inherits from a Perl 5 object, it is emulated
with
delegation.
The invocant passed into the Perl 5 (Ponie) object looks like a Perl
5 object to Perl 5. However,
if
the Perl 5 object passes that as an
invocant back into Perl 6, it
has
to go back to looking like a Perl
6 object to Perl 6, or
our
emulation of inheritance is suboptimal.
When a Ponie object accesses its attributes through what it I<thinks>
is a hash reference, it really
has
to call the appropriate Perl 6
accessor function
if
the object comes from Perl 6. Likewise,
when
Perl 6 calls an accessor on a Perl 5 object, it
has
to translate that
method call into a hash lookup--presuming that the Perl 5 object is
implemented as a blessed hash.
Other language boundaries may or may not
do
similar tricks. Python's
attributes suffer from the same misdesign as Perl 5's attributes. (My
fault
for
copying Python
's object model. C<:-)> So that'
d be a good place
for
a similar policy.
So we can almost certainly emulate inheritance
with
delegation,
albeit
with
some possible misordering of classes
if
there are duplicate
method names. However, the hard part is constructing objects. Perl 5
doesn't enforce a policy of named arguments
for
its constructors, so
it is difficult
for
a Perl 6 C<BUILDALL> routine to have any automatic
way to call a Perl 5 constructor. It's tempting to install glue code
into the Perl 6 class that will
do
the translation, but that's really
not a good idea, because someday the Perl 5 class may eventually get
translated to a Perl 6 class, and your glue code will be useless,
or worse.
So the right place to put the glue is actually back into the Perl
5 class. If a Perl 5 class defines a C<BUILD> subroutine, it will
be assumed that it properly handles named pairs in Perl 5's even/odd
list
format
. That will be used in lieu of any predefined constructor
named
"C<new>"
or anything
else
.
If there is
no
C<BUILD> routine in the Perl 5
package
, but there is
a
"C<use fields>"
declaration, then we can autogenerate a rudimentary
C<BUILD> routine that should suffice
for
most
scalar
attributes.
=head1 Types and Subtypes
I've always really liked the Ada distinction between types and
subtypes. A type is something that adds capabilities,
while
a
subtype is something that takes away capabilities. Classes and
roles generally function as types in Perl 6. In general you don't
want to make a subclass that,
say
, restricts your integers to only
even numbers, because then you've violated Liskov substitutability.
In the same way that we force role composition to be
"before"
classes,
we will force subtyping constraints to be
"after"
classes. In both
cases we force it by a declarator change so that you are unlikely
to confuse a role
with
a class, or a class
with
a subtype. And just
as you aren
't allowed to derive a role from a class, you aren'
t
allowed to derive a class from a constrained type.
On the other hand, a bit confusingly, it looks like subtyping will
be done
with
the
"type"
keyword, since we aren't using that word yet.
To remind people that a subtype of a class is just a constrained alias
for
the class, we avoid the
"is"
word and declare a type using a C<::=>
compile-
time
alias, like this:
type Str_not2b ::= Str where /^[isnt|arent|amnot|aint]$/;
The C<::=> doesn't create the type, nor in fact does the C<type>
keyword. It's actually the C<where> that creates the type. The C<type>
keyword just marks the name as
"not really a classname"
so that you
don't accidentally
try
to derive from it.
[Update: I decided I don't like the forced
use
of C<::=>, nor
do
I
like the confusion engendered by
use
the word
"type"
to mean
"subtype"
,
so the syntax is now any of:
my
subtype Str_not2b of Str where /^[isnt|arent|amnot|aint]$/;
my
Str subtype Str_not2b where /^[isnt|arent|amnot|aint]$/;
.]
Since a type is
"post-class-ical"
, there's really
no
such thing as an
object blessed into a type. If you
try
it, you'll just end up
with
an object blessed into whatever the underlying unconstrained class
is, as far as inheritance is concerned. A type is not a subclass.
A type is primarily a handy way of sneaking smartmatching into
multiple dispatch. Just as a role allows you to specify something
more general than a class, a type allows you to specify something
more specific than a class.
While types are primarily intended
for
restricting parameter types
for
multiple dispatch, they also let you impose preconditions on
assignment. Basically,
if
you declare any container
with
a subtype,
Perl will check the constraint against any value you might
try
to
bind
or assign to the container.
type Str_not2b ::= Str where /^[isnt|arent|amnot|aint]$/;
type EvenNum ::= Num where { $^n % 2 == 0 }
my
Str_not2b
$hamlet
;
$hamlet
=
'isnt'
;
$hamlet
=
'amnt'
;
my
EvenNum
$n
;
$n
= 2;
$n
= -2;
$n
= 0;
$n
= 3;
It's perfectly legal to base one subtype on another. It merely
adds an additional constraint.
It's possible to
use
an anonymous subtype in a signature:
multi
sub
mesg (Str where /<profanity>/
$mesg
is copy) {
$mesg
~~ s:g/<profanity>/[expletive deleted]/;
print
$MESG_LOG
:
$mesg
;
}
multi
sub
mesg (Str
$mesg
) {
print
$MESG_LOG
:
$mesg
;
}
Given a set of multimethods that would
"tie"
on the actual classes
of the arguments, a multimethod
with
a matching constraint will be
preferred over an equivalent one
with
no
constraint. So the first
C<mesg> above is preferred
if
the constraint matches, and otherwise
the second is preferred. However,
if
two multis
with
constraints
match (and are otherwise equivalent), it
's just as if you'
d called
any other set of ambiguous multimethods, and one of them had better
be marked as the
default
, or you
die
.
We
say
that types are
"post-class-ical"
, but since you can base them off
of any class including C<Any>, they are actually rather orthogonal to
the class
system
.
[Update: Everywhere the preceding section says
"type"
, change to
"subtype"
,
including the keyword. And change C<::=> to C<of>.]
=head1 Enums
An enum functions as a subtype that is constrained to a single value.
(When a subtype is constrained to a single value, it can be used
for
that
value.) But rather than declaring it as:
type DayOfWeek ::= Int where 0..6;
type DayOfWeek::Sunday ::= DayOfWeek where 0;
type DayOfWeek::Monday ::= DayOfWeek where 1;
type DayOfWeek::Tuesday ::= DayOfWeek where 2;
type DayOfWeek::Wednesday ::= DayOfWeek where 3;
type DayOfWeek::Thursday ::= DayOfWeek where 4;
type DayOfWeek::Friday ::= DayOfWeek where 5;
type DayOfWeek::Saturday ::= DayOfWeek where 6;
we allow a shorthand:
type DayOfWeek ::=
int
enum
«Sunday Monday Tuesday Wednesday Thursday Friday Saturday»;
[Update: The syntax is now more like existing declarations:
our
int
enum DayOfWeek
<Sunday Monday Tuesday Wednesday Thursday Friday Saturday>;
where C<
int
> is usually omitted.]
Type C<
int
> is the
default
enum type, so that can be:
type DayOfWeek ::= enum
«Sunday Monday Tuesday Wednesday Thursday Friday Saturday»;
[Update: Now just C<< enum DayOfWeek <...> >>.]
The enum installer inspects the strings you give it
for
things that
look like pairs, so to number your days from 1 to 7, you can
say
:
type DayOfWeek ::= enum
«:Sunday(1) Monday Tuesday Wednesday Thursday Friday Saturday»;
You can
import
individual enums into your scope where they will
function like argumentless constant subs. However,
if
there is a
name collision
with
a
sub
or other enum, you'll have to disambiguate.
Unambiguous enums may be used as a property on the right side of a
"but"
,
and the enum type can be intuited from it to make sure the object in
question
has
the right semantics mixed in. Two builtin enums are:
type bool ::= bit enum «false true»;
type taint ::= bit enum «untainted tainted»;
[Update: Now just:
our
bit enum
*bool
is <false true>;
our
bit enum
*taint
is <untainted tainted>;
.]
=head1 Open vs Closed Classes
By
default
, classes in Perl are left
open
. That is, you can add
more methods to them, though you have to be explicit that that is
what you're doing:
class Object is extended {
method wow () {
say
"Wow, I'm an object."
}
}
Otherwise you'll get a class redefinition error.
Likewise, a
"final"
class (to
use
the Java term) is one that you know
will never be derived from, let alone mucked
with
internally.
Now, it so happens that leaving all your classes
open
is not terribly
conducive to certain kinds of optimization (let alone encapsulation).
From the standpoint of the compiler, you'd like to be able to
say
,
"I know this class will never be derived from or modified, so I can
do
things like access
my
attributes directly without going through
virtual accessors." We were, in fact, tempted to make closed classes
the
default
. But this breaks in frameworks like mod_perl where you
cannot predict in advance which classes will want to be extended or
derived from.
Some languages solve this (or think they solve it) by letting classes
declare themselves to be closed and/or final. But that's actually
a bad violation of OO principles. It should be the I<users> of a class
that decide such things--and decide it
for
themselves, not
for
others.
As such, there
has
to be a consensus among all users of a class to
close
or finalize it. And as we all know, consensus is difficult
to achieve.
Nevertheless, the Perl 6 approach is to give the top-level application
the right to
close
(and finalize) classes. But we don't
do
this by
simply listing the classes we want to
close
. Instead, we
use
the
sneaky strategy of switching the
default
to closed and then list the
classes we want to stay
open
.
The benefit of this is that modules other than the top level can
simply list all the classes that they know should stay
open
. In an
open
framework, these are, at worst,
no
-ops, and they don't cause
classes to
close
that other modules might want to remain
open
.
If I<any> module requests a class to stay
open
, it stays
open
.
If I<any> module requests that a class remain available as a base
class, it remains available.
It
has
been speculated that optimizer technology in Parrot will
develop such that a class can conjecturally be compiled as closed,
and then recompiled as
open
should the need arise. (This is just
a specific case of the more general problem of what you
do
whenever
the assumptions of the optimizer are violated.) If we get such an
on-the-fly optimizer/pessimizer, then
our
open
class declarations are
still not wasted--they will
tell
the optimizer which classes not to
bother trying to
close
or finalize in the first place. Setting the
default
the other way wouldn't have the same benefit.
Syntax? You want syntax? Hmm.
use
classes :closed :
open
«Mammal Insect»;
[Update: Now more like
"C<use opt :classes(:close :finalize);>"
, since
it
's direct instruction to the optimizer. It doesn'
t directly change
the meaning of the C<class> keyword, so it shouldn't
use
C<class>
as the pragma name.]
Or some such. Maybe certain kinds of class reference automatically request
the class to be
open
without a special pragma. A module could request
open
classes without attempting to
close
everything
with
just:
[Update: Just
"C<class>"
to match the keyword.]
On the other hand, maybe that's another one of those inside-out interfaces,
and it should just be options on the classes whose declarations you have
to include anyway:
class Mammal is
open
{...}
class Insect is
open
{...}
Similarly, we can finalize classes by
default
and then
"take it back"
for
certain classes:
class Mammal is base {...}
class Insect is base {...}
In any event, even though the
default
is expressed at the top of the
main application, the final decision on
each
class is not made by
the compiler
until
C<CHECK>
time
,
when
all the compiled code
has
had
a chance to stake its claims. (A JIT compiler might well
wait
even
longer, in case run-
time
evaluated code wishes to express an opinion.)
=head1 Interface Consistency
In theory, a subclass should always act as a more specialized version
of a superclass. In terms of design-by-contract theory, a subclass
should OR in its preconditions and AND in its postconditions. In terms
of Liskov substitutability, you should always be able to substitute
a derived class object in where a base class object is expected, and
not have it blow up. In terms of Internet policy, a derived class
(compared to its base class) should be at least as lenient in what it
accepts, and at least as strict in what it emits.
So,
while
it would be lovely in a way to
require
that derived methods
of the same name as a base method must
use
the same signature, in
practice that doesn't work out. A derived class often
has
to be
able to add arguments to the signature of a method so that it can
"be more lenient"
in what it accepts as input.
But this poses a problem, insofar as the user of the derived object
does not know whether all the methods of a
given
name support the
same interface. Under C<SUPER> semantics, one can at least assume
that the derived class will
"weed out"
any arguments that would be
detrimental to its superclass. But as we have already pointed out,
there isn't a single superclass under MI, and
each
superclass might
need to have different
"detrimental arguments"
weeded out. One could
say
that in that case, you don't call C<SUPER> but rather call out to
each
superclass explicitly. But then you're back to the problem that
C<SUPER> was designed to solve. And you haven
't solved C<SUPER>'
s
problem either.
Under C<NEXT> semantics, we assume that we are dispatching to a set
of methods
with
the same name, but potentially different signatures.
(Perl 6's C<SUPER> implementation is really a limited form of C<NEXT>,
insofar as C<SUPER> indicates a set of parent methods, unlike in
Perl 5 where it picks one.) We need a way of satisfying different
signatures
with
the same set of arguments.
There are, in fact, two ways to approach this. One way is to
say
,
okay everything is a multimethod, and we just won't call anything
whose signature is irreconcilably inconsistent
with
the arguments
presented. Plus there are varying degrees of consistency within the
set of
"consistent"
interfaces, so we
try
them in decreasing order
of consistency. A more consistent multi is allowed to fall back to
a less consistent multi
with
"C<next METHOD>"
.
But as a variant of the
"pick one"
mentality, that still doesn't
help the situation where you want to
send
a message to all your
ancestor classes (like "Please Mr. Base Class, help me initialize
this object."), but you want to be more specific
with
some classes
than others ("Please Miss Derived Class, set your C<$.prim> attribute
to 1."). So the other approach is to
use
named arguments that can
be ignored by any classes that don't grok the argument.
So what this essentially comes down to is the fact that all methods and
submethods of classes that might be derived from (which is essentially
all classes, but see the previous section) must have a C<*%> parameter,
either explicitly or implicitly, to collect up and render harmless
any unrecognized option pairs in the argument list. So the ruling
is that all methods and submethods that
do
not declare an explicit
C<*%> parameter will get an implicit C<*
%_
> parameter declared
for
them
whether they like it or not. (Subroutines are not granted this
"favor"
.)
It might be objected that this will slow down the parameter binding
algorithm
for
all methods favored
with
an implicit C<*
%_
>, but I would
argue that the binding code doesn't have to
do
anything till it sees
a named parameter it doesn't recognize, and then it can figure out
whether the method even references C<
%_
>, and
if
not, simply throw the
unrecognized argument away instead of constructing a C<
%_
> that won't
be used. And most of this
"figuring out"
can be done at compile
time
.
Another counterargument is that this prevents a class from recognizing
typos in argument names. That's true. It might be possible to ask
for
a warning that checks globally (at class-finalization
time
in
the optimizer?) to see
if
there is any method of that name anywhere that
is interested in a parameter of that name. But any class that
gets its parameters out of a C<*%> hash at run
time
would cause false
positives,
unless
we assume that any C<*%> hash makes any argument name
legal, in which case we're pretty much back to where we started,
unless
we
do
analysis of the usage of all C<*%> hash in those methods, and count
things like C<
%_
«prim»> as proper parameter declarations. And that
can still be spoofed in any number of ways. Plus it's not a trivial
warning to calculate, so it probably wouldn't be the
default
in a
load-and-go interpreter.
So I think we basically have to live
with
possible typos to get proper
polymorphic dispatch. If something is frequently misspelled, then you
could always put in an explicit test against C<
%_
>
for
that argument:
warn
"Didn't you mean :the(%_«teh»)?"
if
%_
«teh»;
And perhaps we could have a pragma:
But it's possible that the correct solution is to differentiate
two kinds of
"isa"
, one that derives from
"nextish"
classes,
and one that derives from
"superish"
classes. A
"C<next METHOD>"
traversal would assume that any delegation to a super class would
be handled explicitly by the current class's methods. That is, a
"superish"
inheritance hides the base class from C<.*> and C<.+>, as well
as
"C<next METHOD>"
.
On the other hand,
if
we marked the super class itself, we could
refrain from generating C<*%> parameters
for
its methods. Any
"next"
dispatcher would then have to
"look ahead"
to see
if
the
next
class
was a
"superish"
class, and bypass it. I haven't a clue what the
syntax should be though. We could mark the class
with
a
"superish"
trait, which wouldn't be inheritable. Or we could mark it
with
a
Superish role, which would be inheritable, and a base class would
have to
override
it to impose a Nextish role instead. (But then what
if
one parent class is Superish and one is Nextish?) Or we could
even have two different metaclasses,
if
we decide the two kinds of
classes are fundamentally different beasts. In that case we'd declare
them differently using
"class"
and some other keyword. Of course,
people will want to
use
"class"
for
the type they prefer, and the
other keyword
for
the type they don't prefer. :-)
But since we're attempting to bias things in favor of nextish
semantics, that would be a
"class"
, and the superish semantics might
be a
"guthlophikralique"
or some such. C<:-)>
Seriously,
if
we mark the class,
"C<is hidden>"
can hide the current
class from
"C<next METHOD>"
semantics. The problem
with
that is, how
do
you apply the trait to a class in a different language? That argues
for
marking the
"isa"
instead. So as usual
when
we can't make up
our
minds, we'll just have it both ways. To mark the class itself,
use
"C<is hidden>"
. To mark the
"isa"
,
use
"C<hides Base>"
instead of
"C<is Base>"
. In neither case will
"C<next METHOD>"
traverse to
such a class. (And
no
C<*
%_
> will be autogenerated.)
For example, here are two base classes that know about
"C<next METHOD>"
:
class Nextish1 { method dostuff() {...;
next
;}
class Nextish2 { method dostuff() {...;
next
;}
class MyClass is Nextish1 is Nextish2 {
method dostuff () {...;
next
;}
}
Since all the base classes are
"next-aware"
, C<MyClass> knows it can
just defer to
"next"
and both parent classes' C<dostuff> methods will
be called. However, suppose one of
our
base classes is old-fashioned
and thinks it should call things
with
C<SUPER::> instead. (Or it's a
class off in Python or Ruby.) Then we have to
write
our
classes
more like this:
class Superish { method dostuff(...; .
*SUPER::dostuff
(); }
class Nextish { method dostuff() {...;
next
;}
class MyClass hides Superish is Nextish {
method dostuff () {
.Superish::dostuff();
next
;
}
}
Here, C<MyClass> knows that it
has
two very different base classes.
C<Nextish> knows about
"C<next>"
, and C<Superish> doesn't. So it
delegates to C<Superish::dostuff()> differently than it delegates to
C<Nextish::dostuff()>. The fact that it declared
"C<hides Superish>"
prevents C<
next
> from visiting the Superish class.
=head1 Collections of Classes
=head2 In Classes
We
'd like to be able to support virtual inner classes. You can'
t
have virtual inner classes
unless
you have a way to dispatch to the
actual class of the invocant. That says to me that the solution is
bound up intimately
with
the method dispatcher, and the syntax of
naming an inner class
has
to know about the invocant in whose context
we have to start searching
for
the inner class. So we could have an
explicit syntax like:
class Base {
our
class Inner { ... }
has
Inner
$inner
;
submethod BUILD { .makeinner; }
method makeinner (
$self
:){
my
Inner
$thing
.=
$self
.Inner.new();
return
$thing
;
}
}
class Middle is Base {
our
class Inner is Base::Inner { ... }
}
class Derived is Middle {
}
When you
say
C<Derived.new()>, it creates a C<Derived> object, calls
C<Derived::BUILDALL>, which eventually calls C<Base::BUILD>, which
makes a C<Middle::Inner> object (because that's what the virtual
method C<
$self
.Inner> returns) and puts it in a variable that of
the C<Base::Inner> type (which is fine, since C<Middle::Inner>
ISA C<Base::Inner>. Whew!
The only extra magic here is that an inner class would have to
autogenerate an accessor method (of the same name) that returns
the class. A class could then choose to access an inner class name
directly, in which case it would get its own inner class of that
name, much like C<$.foo> always gets you your own attribute. But
if
you called the inner class name as a method, it would automatically
virtualize the name, and you'd get the most derived existing version
of the class.
This would give us most of what RFC 254 is asking
for
, at the expense
of one more autogenerated method. Use of such inner classes would
take the connivance of a base class that doesn't mind
if
derived
classes redefine its inner class. Unfortunately, it would have to
express that approval by calling C<
$self
.Inner> explicitly. So this
solution does not go as far as letting you change classes that didn't
expect to be changed.
It would be possible to take it further, and I think we should.
If we
say
that whenever you
use
any global class, it makes an inner
class on your behalf that is merely an alias to the global class,
creating the accessor method as
if
it were an inner class, then it's
possible to virtualize the name of I<any> class, as long as you're
in a context that
has
an appropriate invocant. Then we'd make any
class name lookup assume C<
$self
.> on the front, basically.
This may seem like a wild idea, but interestingly, we're already
proposing to
do
a similar aliasing in order to have multiple versions
of a module running simultaneously. In the case of classes, it
seems perfectly natural that a new version might derive from an older
version rather than redefining everything.
The one fly in the ointment that I can see is that we might not
always have an appropriate invocant--
for
instance, outside any
method body,
when
we
're declaring attributes. I guess when there'
s
no
dynamic context indicating what an
"inner"
classname should mean,
it should
default
to the ordinary meaning in the current lexical
and/or
package
context. Within a class definition,
for
instance,
the invocant is the metaclass, which is unhelpful. So generally that
means that a declared attribute type will turn out to be a superclass
of the actual attribute type at run
time
. But that
's fine, ain'
t it?
You can always store a C<Beagle> in a C<Dog> attribute.
So in essence, it boils down to this. Within a method, the invocant
is allowed to have opinions about the meanings of any class names, and
when
there are multiple possible meanings, pick the most appropriate
one, where that amounts to the name you'd find
if
the class name were
a virtual method name.
Here's the example from RFC 254, translated to Perl 6 (
with
C<Frog> made
into an explicit inner class
for
clarity (though it should work
with
any class by the aliasing rule above)):
class Forest {
our
class Frog {
method speak () {
say
"ribbit ribbit"
; }
method jump () {...}
method croak () {...}
}
has
Frog $.frog;
method new (
$class
) {
my
Frog
$frog
.= new;
return
$class
.
bless
(
frog
=>
$frog
);
}
sub
make_noise {
.frog.speak;
}
}
Now we derive from C<Forest>, producing C<Forest::Japanese>,
with
its own
kind of frogs:
class Forest::Japanese is Forest {
our
class Frog is Forest::Frog {
method speak () {
say
"kerokero"
; }
}
}
And
finally
, we make a forest of that type, and
tell
it to make a noise:
$forest
= new Forest::Japanese;
$forest
.make_noise();
In the Perl 5 equivalent, that would have printed
"ribbit ribbit"
instead.
How did it
do
the right thing in Perl 6?
The difference is on the line marked
"C<MAGIC>"
. Because C<Frog> was
mentioned in a method, and the invocant was of type C<Forest::Japanese>
rather than of type C<Forest>, the word
"C<Frog>"
figured out that
it was supposed to mean a C<Forest::Japanese::Frog> rather than a
C<Forest::Frog>. The name was
"virtual"
. So we ended up creating
a forest
with
a frog of the appropriate type, even though it might
not have occurred to the writer of C<Forest> that a subclass would
override
the meaning of C<Frog>.
So one object can think that its C<Frog> is Japanese,
while
another
thinks it's Russian, or Mexican, or even Antarctican (
if
you can
find any forests there). Base methods that talk about C<Frog> will
automatically find the C<Frog> appropriate to the current invocant.
This works even
if
C<Frog> is an outer class rather than an inner class,
because any outer class referenced by a base class is automatically
aliased into the class as a fake inner class. And the derived class
doesn't have to redefine its C<Frog> by declaring an inner class either.
It can just alias (or
use
) a different outer C<Frog> class in as its
fake inner class. Or even a different version of the same C<Frog> class,
if
there are multiple versions of it in the library.
And it just works.
=head2 In Modules
It's also possible to put a collection of classes into a module,
but that doesn't buy you much except the ability to pull them all
in
with
one C<
use
>, and manage them all
with
one version number.
Which
has
a lot to be said
for
it--in the
next
section.
=head1 Versioning
Way back at the beginning, we claimed that a file-scoped class
declaration:
class Dog;
...
is equivalent to the corresponding block-scoped declaration:
class Dog {...}
While that
's true, it isn'
t the whole truth. A file-scoped class
(or module, or
package
) is the carrier of more metadata than
a block-scoped declaration. Perl 6 supports a notion of versions
that is file based. But even a class name plus a version is not
sufficient to name a module--there also
has
to be a naming authority,
which could be a URI or a CPAN id. This will be discussed more
fully in Apocalypse 11, but
for
now we can make some predictions.
The extra metadata
has
to be associated
with
the file somehow. It
may be implicit in the filename, or in the directory path leading
to the file. If so, then Perl 6
has
to collect up this information
as modules are loaded and associate it
with
the top level class or
module as a set of properties.
It's also possible that a module could declare properties explicitly
to define these and other bits of metadata:
version 1.2.1
creator Joe Random
description This class implements camera obscura.
subject optics, boxes
language ja_JP
licensed Artistic|GPL
Modules posted to CPAN or entered into any standard Perl 6 library are
required to declare some set of these properties so that installations
can know where to keep them, such that multiple versions by different
authors can coexist, all of them available to any installed version
of Perl. (This is a requirement
for
any Perl 6 installation. We're
tired of having to reinstall half of CPAN every
time
we patch Perl.
We also want to be able to run different versions of the C<Frog> module
simultaneously
when
the C<Frog> requirements of the modules we
use
are
contradictory.)
It's possible that the metadata is supplied by both the declarations
and by the file
's name or location in the library, but if so, it'
s
a fatal error to
use
a module
for
which those two sources contradict
each
other as to author or version. (In theory, it could also be a
fatal error to
use
modules
with
incompatible licensing, but a kind
warning might be more appreciated.) Likely there will also be some
kind of automatic checksumming going on as well to prevent fraudulent
distributions of code.
It might simplify things
if
we make an C<identifier> metadatum that
incorporates all of naming authority,
package
name, and version.
But the individual parts still have to be accessible,
if
only as
components of C<identifier>. However we structure it, we should make
the C<identifier> the actual declared full name of the class, yet another
one of those
"long names"
that include extra parameters.
=head2 Version Declarations
The syntax of a versioned class declaration looks like this:
class Dog-1.2.1-cpan:JRANDOM;
class Dog-1.2.1-mailto:jrandom
@some
.com;
Perhaps those could also have short forms, presuming we can distinguish
CPAN ids, web pages, and email addresses by their internal forms.
class Dog-1.2.1-JRANDOM;
class Dog-1.2.1-www.some.com/~jrandom;
class Dog-1.2.1-jrandom
@some
.com;
Or maybe using email addresses is a bad idea now in the modern
Spam Age. Or maybe Spam Ages should be plural, like the Dark Ages...
In any event, such a declaration automatically aliases the full name
of the class (or module) to the short name. So
for
the rest of the
scope, C<Dog> refers to the longer name.
(Though
if
you refer to C<Dog> within a method, it's considered a
virtual class name, so Perl will search any derived classes
for
a
redefined inner C<Dog> class (or alias)
before
it settles on the
least-derived aliased C<Dog> class.)
We lied slightly
when
we said earlier that only the file-scoped class
carries extra metadata. In fact, all of the classes (or modules, or
packages)
defined
within your file carry metadata, but it so happens
that the version and author of all your extra classes (or modules, or
packages) are forced to be the same as the file's version and author.
This happens automatically, and you may not
override
the generation of
these long names, because
if
you did, different file versions could
and would have version collisions of their interior components, and
that would be catastrophic. In general you can ignore this, however,
since the long names of your extra classes are always automatically
aliased back down to the short names you thought you gave them in the
first place. The extra bookkeeping is in there only so that Perl can
keep your classes straight
when
multiple versions are running at the
same
time
. Just don't be surprised
when
you ask
for
the name of
the class and it tells you more than you expected.
=head2 Use of Version and Author Wildcards
Since these long names are the actual names of the classes,
when
you
say
:
you're really asking
for
something like:
And
when
you
say
:
you're really asking
for
:
Note that the C<1.2.1> specifies an I<exact> match on the version
number. You might think that it should specify a minimum version.
However, people who want stable software will specify an exact version
and stick
with
it. They don't want C<1.2.1> to mean a minimum version.
They know C<1.2.1> works, so they want that version nailed down
forever--at least
for
now.
To match more than one version, put a range operator in parens:
What goes inside the parens is in fact any valid smartmatch selector:
use
Dog-(1.2.1 | 1.3.4)-(/:i jrandom/);
use
Dog-(Any)-(/^cpan\:/)
And in fact they could be closures too. These means the same thing:
use
Dog-{$^ver ~~ 1.2.1 | 1.3.4}-{$^auth ~~ /:i jrandom/};
use
Dog-{$^ver ~~ Any}-{$^auth ~~ /^cpan\:/}
In any event, however you
select
the module, its full name is
automatically aliased to the short name
for
the rest of your lexical
scope. So you can just
say
my
Dog
$spot
.= new(
"woof"
);
and it knows (even
if
you don't) that you mean
my
Dog-1.3.4-cpan:JRANDOM
$spot
.= new(
"woof"
);
(Again,
if
you refer to C<Dog> within a method, it's a virtual class
name, so Perl will search any derived classes
for
a redefined C<Dog> class
before
it settles on the outermost aliased C<Dog> class.)
=head1 Introspection
It's easy to specify what Perl 6 will provide
for
introspection:
the union of what Perl 6 needs and whatever Parrot provides
for
other languages. C<;-)>
In the particular case of class metadata, the interface should
generally be via the class's metaclass instance--the object of type
C<MetaClass> that was in charge of building the class in the first
place. The metamethods are in the metaobject, not in the class object.
(Well, actually, those are the same object, but a class object ignores
the fact that it's also a metaobject, and dispatches by
default
to its
own methods, not the ones
defined
by the metaclass.)
To get to the metamethods of an ordinary class object you have to
use
the C<.meta> method:
MyClass.getmethods()
MyClass.meta.getmethods()
Unless C<MyClass>
has
defined
or inherited a C<.getmethods> method, the
first call is an error. The second is guaranteed to work
for
Perl 6's
standard C<MetaClass> objects. You can also call C<.meta> on any ordinary
object:
$obj
.meta.getmethods();
That's equivalent to
$obj
.dispatcher.meta.getmethods();
As
for
which parts of a class are considered metadata--they all are,
if
you scratch hard enough. Everything that is not stored directly as
a trait or property really ought to have some kind of trait-like method
to access it. Even the method body closures have to be accessible
as traits, since the C<.wrap> method needs to have something to put
its wrapper
around
.
Minimally, we'll have user-specified class traits that look like this:
name Dog
version 1.2.1
author Joe Random
description This class implements camera obscura.
subject optics, boxes
language ja_JP
licensed Artistic|GPL
And there may be internal traits like these:
isa list of parent classes
roles list of roles
disambig how to deal
with
ambiguous method names from roles
layout P6opaque, P6hash, P5hash, P5array, PyDict, Cstruct, etc.
The C<layout> determines whether one class can actually derive
from another or
has
to fake it. Any P6opaque class can compatibly
inherit from any other P6opaque class, but
if
it inherits from any
P5 class, it must
use
some form of delegation to another invocant.
(Hopefully
with
a smart enough invocant reference that,
if
the
delegated object unknowingly calls back into
our
layout
system
,
we can recover the original object reference and maintain some kind
of compositional integrity.)
The metaclass's C<.getmethods> method returns method-descriptor objects
with
at least the following properties:
name the name of the method
signature the parameters of the method
returns the
return
type of the method
multi whether duplicate names are allowed
do
the method body
The C<.getmethods> method
has
a selector parameter that lets you
specify whether you want to see a flattened or hierarchical view,
whether you're interested in private methods, and so forth. If you
want a hierarchical view, you only get the methods actually
defined
in the class proper. To get at the others, you follow the
"isa"
trait to find your parent classes' methods, and you follow the
"roles"
trait to get to role methods, and from parents or roles you may also
find links to further parents or roles.
The C<.getattributes> method returns a list of attribute descriptors
that have traits like these:
name
type
scope
rw
private
accessor
build
Additionally they can have any other variable traits that can reasonably be
applied to object attributes, such as C<constant>.
Strictly speaking, metamethods like C<.isa()>, C<.does()>, and C<.can()>
should be called through the meta object:
$obj
.meta.can(
"bark"
)
$obj
.meta.does(Dog)
$obj
.meta.isa(Mammal)
And they can always be called that way. For convenience you can often
omit the C<.meta> call because the base C<Object> type translates
any unrecognized C<.foo()> into C<.meta.foo()>
if
the meta class
has
a method of that name. But
if
a derived class overrides such a
metamethod, you have to go through the C<.meta> call explicitly to
get the original call.
In previous Apocalypses we said that:
$obj
~~ Dog
calls:
$obj
.isa(Dog)
That is not longer the case--you're actually calling:
$obj
.meta.does(Dog)
which is true
if
C<
$obj
> either
"does"
or
"isa"
C<Dog> (or
"isa"
something that
"does"
C<Dog>). That is, it asks
if
C<
$obj
> is likely
to satisfy the interface that comes from the C<Dog> role or class.
The C<.isa> method, by contrast, is strictly asking
if
C<
$obj
> inherits
from the C<Dog> class. It's erroneous to call it on a role. Well,
okay, it's not strictly erroneous. It will just never
return
true.
The optimizer will love you, and remove half your code.
Note that either of C<.does> or C<.isa> can lie, insofar as you might
include an interface that you later
override
parts of. When in doubt,
rely on C<.can> instead. Better yet, rely on your dispatcher to
pick the right method without trying to second guess it. (And then
be prepared to
catch
the exception
if
the dispatcher throws up its
hands in disgust...)
By the way, unlike in Perl 5 where C<.can> returns a single routine
reference, Perl 6's version of C<.meta.can> returns a
"WALK"
iterator
for
a set of routines that match the name. When dereferenced, the
iterator gets fed to a dispatcher as
if
the method had been called
in the first place. Note that any wildcard methods (via delegation
or C<AUTOLOAD>) are included by
default
in this list of potential
handlers, so there is
no
reason
for
subclasses to have to redefine
C<.can> to reflect the new names. This does potentially weaken the
meaning of C<.can> from
"definitely has a method of this name"
to
"definitely
has
one or more methods in one or more classes that will
try
to handle this." But that's probably closer to what you want,
and the best we can
do
when
people start fooling
around
with
wildcard
methods under MI.
However, that being said, many classes may wish to dynamically
specify at the
last
moment which methods they can or cannot handle.
That is, they want a hook to allow a class to declare names even
while
the C<.can> candidate list is being built. By
default
C<.meta.can>
includes all wildcard delegations and autoloads at the end of the list.
However, it will exclude from the list of candidates any class that
defines its own C<AUTOMETH> method, on the assumption that
each
such C<AUTOMETH> method
has
already had its chance to add any
callable names to the list. If the class's C<AUTOMETH> wishes to
supply a method, it should
return
a reference to that method.
Do not confuse C<AUTOMETH>
with
C<AUTOMETHDEF>. The former is
equivalent to declaring a stub declaration. The latter is equivalent
to supplying a body
for
an existing stub. Whether C<AUTOMETH>
actually creates a stub, or C<AUTOMETHDEF> actually creates a body,
is entirely up to those routines. If they wish to cache their results,
of course, then they should create the stub or body.
There are corresponding C<AUTOSUB> and C<AUTOSUBDEF> hooks.
And C<AUTOVAR> and C<AUTOVARDEF> hooks. These all pretty much
make C<AUTOLOAD> obsolete. But C<AUTOLOAD> is still there
for
old
times
's sake.
=head1 Other Non-OO Decisions
A lot of
time
went by
while
I was in the hospital
last
year, so we
ended up polishing up the design of Perl 6 in a number of areas not
directly related to OO. Since I've already got your attention
(and we're already 90% of the way through this Apocalypse), I might
as well list these decisions here.
=head2 Exportation
The trait we'll
use
for
exportation (typically from modules but also
from classes pretending to be modules) is C<export>:
sub
foo is export(:DEFAULT) {...}
sub
bar is export(:DEFAULT :others) {...}
sub
baz is export(:MANDATORY) {...}
sub
bop is export {...}
sub
qux is export(:others) {...}
Compared to Perl 5, we've basically made it easier to mark something
as exportable, but more difficult to export something by
default
.
You
no
longer have to declare your tagsets separately, since C<:foo>
parameters are self-declaring, and the module will automatically
build the tagsets
for
you from the export trait arguments.
=head2 The gather/take Construct
We used one example of the conjectural gather/take construct. A gather
executes a closure, returning a list of all the
values
returned by
C<take> within its lexical scope. In a lazy context it might run as
a coroutine. There probably ought to be a dynamically scoped variant.
Unless it should be dynamic by
default
, in which case there probably
ought to be a lexically scoped variant...
=head2 :foo() Adverbs
There's a new pair syntax that is more conducive to
use
as option
arguments. This syntax is reminiscent of both the Unix command
line syntax and the I/O layers syntax of Perl 5. But unlike Unix
command-line options, we
use
colon to introduce the option rather than
the overly negative minus sign. And unlike Perl 5's layers options, you
can
use
these outside of a string.
We haven
't discarded the old pair syntax. It'
s still more readable
for
certain uses, and it allows the key to be a non-identifier.
Plus we can define the new syntax in terms of it:
Old New
--- ---
foo
=>
$bar
:foo(
$bar
)
foo
=> [1,2,3,
@many
] :foo[1,2,3,
@many
]
foo
=> «alice bob charles» :foo«alice bob charles»
foo
=>
'alice'
:foo«alice»
foo
=> {
a
=> 1,
b
=> 2 } :foo{
a
=> 1,
b
=> 2 }
foo
=> { dostuff() } :foo{ dostuff() }
foo
=> 0 :foo(0)
foo
=> 1 :foo
It
's that last one that'
s the real winner
for
passing boolean options.
One other nice thing is that
if
you have several options in a row you
don't have to put commas between:
$handle
=
open
$file
, :
chomp
:encoding«guess» :ungzip or
die
"Oops"
;
It might be argued that this conflicts the :foo notation
for
private
methods. I don
't think it'
s a problem because method names never
occur in isolation.
Oh, one other feature of option pairs is that certain operations can
use
them as adverbs. For instance, you often want to
tell
the range
operator how much to skip on
each
iteration. That looks like this:
1..100 :by(3)
Note that this only works where an operator is expected rather than a term.
So there's
no
confusion between:
randomlistop 1..100 :by(3)
and
randomlistop 1..100, :by(3)
In the latter case, the option is being passed to C<randomlistop()>
rather than the C<infix:..> operator.
[Update: That's C<< infix:<..> >> now.]
=head2 Special Quoting of Identifiers Inside Curlies Going Away!
Novice Perl 5 programmers are continually getting trapped by
subscripts that autoquote unexpectedly. So in Perl 6, we'll remove
that special case.
%hash
{
shift
} now always calls the
shift
function,
because the inside of curlies is always an expression. Instead,
if
you want to subscript a hash
with
a constant string, or a slice
of constant strings,
use
the new French
qw//
-ish brackets like this:
%hash
«alice»
%hash
«alice bob charlie»
Note in particular that, since slices in Perl 6 are determined by
the subscript only, not the sigil, this:
%hash
«alice» =
@x
;
evaluates the right side in
scalar
context,
while
%hash
«alice bob charlie» =
@x
;
evaluates the right side in list context. As
with
all other uses of
the French quotes in Perl 6, you can always
use
:
%hash
<<alice>> =
@x
;
if
you can't figure out how to type C<<< ^K<< or ^K>> >>> in vim.
On the other hand,
if
you've got a fully Unicode aware editor, you
could probably
write
some macros to
use
the big double angles from
Asian languages:
%hash
《alice》 =
@x
;
But by
default
we only provide the Latin-1 compatible versions.
It would be easy to overuse Unicode in Perl 6, so we're trying to
underuse Unicode
for
small
values
of 6. (Not to be confused
with
⁶,
or ⅵ.)
=head2 Vector Operators Renamed Back to
"hyper"
Operators
The mathematicians got confused
when
we started talking about
"vector"
operators, so these dimensionally dwimming versions of
scalar
operators
are now called hyper operators (again). Some folks see operations like
@a
»*«
@b
as totally useless, and maybe they are--to a mathematician. But to
someone simply trying to calculate a bunch of things in parallel
(think cellular automata, or aerodynamic simulations,
for
instance), they
make a lot of sense. And don't restrict your thinking to math
operators. How about appending a newline to every string
before
printing it out:
print
@strings
»~«
"\n"
;
Of course,
for
@strings
{
say
}
is a shorter way to
do
the same thing. (
"C<say>"
is just Perl 6's version
of a printline function.)
=head2 Unary Hyper Operators Now Use One Quote Rather Than Two
Unary operators
read
better
if
they only
"hyper"
on the side where
there's an actual argument:
@neg
= -«
@pos
;
@indexes
=
@x
»++;
And in particular, I consider a method spec like C<.bletch(1,2,3)> to
be a unary postfix operator, and it would be really ugly to
say
:
@objects
».bletch(1,2,3)«
So that's just:
@objects
».bletch(1,2,3)
In general, binary operators still take
"hypers"
on both sides, indicating
that both sides participate in the dwimmery.
@a
»+«
@a
To indicate that one side or the other should be evaluated as a
scalar
before
participating in the hyperoperator, you can always put in a
context specifier:
@a
»+« +
@a
=head2 C<
$thumb
.twiddle> No Longer Requires Parens When Interpolated
In Apocalypse 2 we said that any method interpolated into a double-quoted
string
has
to have parentheses. We're throwing out that special rule
in the interests of consistency. Now
if
you want to interpolate a
variable followed by an
"accidental"
dot,
use
one of these:
$(
$var
).twiddle
$var
\.twiddle
Yes, that will make it a little harder to translate Perl 5 to Perl 6.
(Parentheses are still required
if
there are any arguments, however.)
[Update: We
're back to requiring parens on methods. In fact, we'
ve gone
the other way--we now
require
square brackets on arrays and curlies on
hashes. And a bare closure also interpolates. The I<only> interpolator
that doesn't
require
some kind of bracketing terminator is a simple
scalar
. See S2.]
=head2 The =:= Identity Operator
There is a new C<=:=> identity operator, which tests to see
if
two objects are the same object. The association
with
the C<:=>
binding operator should be obvious. (Some classes such as integers
may consider all objects of the same value to be a single object,
in a Platonic sense.)
Hmm? No, there is
no
associated assignment operator. And
if
there were,
I wouldn't
tell
you about it. Sheesh, some people...
But there is, of course, a hyper version:
@a
»=:=«
@b
=head2 New Grammatical Categories
The current set of grammatical categories
for
operator names is:
Category Example of
use
-------- --------------
coerce:as 123 as BigInt, BigInt(123)
self:
sort
@array
.=
sort
term:...
$x
= {...}
prefix:+ +
$x
infix:+
$x
+
$y
postfix:++
$x
++
circumfix:[] [
@x
]
postcircumfix:[]
$x
[
$y
] or
$x
.[
$y
]
rule_modifier:p5 m:p5//
trait_verb:handles
has
$.tail handles «wag»
trait_auxiliary:shall
my
$x
shall conform«TR123»
scope_declarator:
has
has
$.x;
statement_control:
if
if
$condition
{...}
else
{...}
infix_postfix_meta_operator:=
$x
+= 2;
postfix_prefix_meta_operator:»
@array
»++
prefix_postfix_meta_operator:« -«
@magnitudes
infix_circumfix_meta_operator:»«
@a
»+«
@b
Now, you may be thinking that some of these have long, unwieldy names.
You'd be right. The longer the name, the longer you should think
before
adding a new operator of that category. (And the
length
of
time
you should think probably scales exponentially
with
the
length
of
the name.)
[Update: The actual operator name must be quoted like a hash subscript:
coerce:<as> 123 as BigInt, BigInt(123)
self:<
sort
>
@array
.=
sort
term:<...>
$x
= {...}
prefix:<+> +
$x
infix:<+>
$x
+
$y
postfix:<++>
$x
++
circumfix:<[ ]> [
@x
]
postcircumfix:<[ ]>
$x
[
$y
] or
$x
.[
$y
]
rule_modifier:<p5> m:p5//
trait_verb:<handles>
has
$.tail handles <wag>
trait_auxiliary:<shall>
my
$x
shall conform<TR123>
scope_declarator:<
has
>
has
$.x;
statement_control:<
if
>
if
$condition
{...}
else
{...}
statement_modifier:<
if
> ...
if
$condition
infix_postfix_meta_operator:<=>
$x
+= 2;
postfix_prefix_meta_operator:{
'»'
}
@array
»++
prefix_postfix_meta_operator:{
'«'
} -«
@magnitudes
infix_circumfix_meta_operator:{
'»'
,
'«'
}
@a
»+«
@b
Please note that the
"hole"
in circumfixes is now specified by
slice notation. There is
no
longer any special
split
-down-the-middle
rule.]
=head2 Assignment to C<state> variable declaration now does
"first"
semantics.
As we talked about earlier, assignment to a
"C<has>"
variable is really
pseudo-assignment representing a call to the
"C<build>"
trait. In the
same way, assignment to C<state> variables (Perl's version of lexically
scoped
"static"
variables), is taken as pseudo-assignment representing
a call to the
"C<first>"
trait. The first
time
through a piece of code
is
when
state variables typically like to be initialized. So saying:
state
$pc
=
$startpc
;
is equivalent to
state
$pc
is first(
$startpc
);
which means that it will pay attention to the C<
$startpc
> variable
only the first
time
this block is ever executed. Note that any side
effects within the expression will only happen the first
time
through.
If you
say
state
$x
=
$y
++;
then that statement will only ever increment C<
$y
> once. If that's
not what you want, then
use
a real assignment as a separate statement:
state
$x
;
$x
=
$y
++;
The C<:=> and C<.=> operators also attempt to
do
what you mean, which
in the case of:
state
$x
:=
$y
++;
still probably doesn't
do
what you want. C<:-)>
In general, any
"preset"
trait is smart about
when
to apply its value
to the container it's being applied to, such that the value is set
statically
if
that
's possible, and if that'
s not possible, it is set
dynamically at the
"correct"
moment.
For ordinary assignment to a
"C<my>"
variable, that correct moment just
happens to be every
time
it is executed, so C<=> represents ordinary
assignment. If you want to force an initial value at execution
time
that was calculated earlier, however, then just
use
ordinary assignment
to assign the results of a precalculated block:
my
@canines
= INIT {
split
slurp
"%ENV«HOME»/.canines"
};
It's only the C<
has
> and C<state> declarators that redefine assignment
to set defaults
with
traits. (For C<
has
>, that's because the
actual attribute variable won't exist
until
the object is created.
For C<state>, that's because we want the
default
to be "first
time
through".) But you can
use
any of the traits on any variable
for
which it makes sense. For instance, just because we invented the
"C<first>"
initializer
for
state variables:
state
$lexstate
is first(0);
doesn
't mean you can'
t
use
it to initialize any variable only the
first
time
through a block of code:
my
$foo
is first(0);
However, it probably doesn't make a lot of sense on a
"C<my>"
variable,
unless
you really want it to be undefined the second
time
through.
It does make a little more sense on an
"C<our>"
variable that will hang
onto its value like a state variable:
our
$counter
is first(0);
An assignment would often be wrong in this case. But generally,
the naive user can simply
use
assignment, and it will usually
do
what
they want (
if
occasionally more often than they want). But it does
exactly what they want on C<
has
> and C<state> variables--presuming
they are savvy enough to want what it actually does... C<:-)>
So as
with
C<
has
> variables, C<state> variables can be initialized
with
precomputed
values
:
state
$x
= BEGIN { calc() }
state
$x
= CHECK { calc() }
state
$x
= INIT { calc() }
state
$x
= FIRST { calc() }
state
$x
= ENTER { calc() }
which mean something like:
state
$x
is first( BEGIN { calc() } )
state
$x
is first( CHECK { calc() } )
state
$x
is first( INIT { calc() } )
state
$x
is first( FIRST { calc() } )
state
$x
is first( ENTER { calc() } )
Note, however, that the
last
one doesn't in fact make much sense,
since C<ENTER> happens more frequently than C<FIRST>. Come to think of it,
doing C<FIRST> inside a C<first> doesn't buy you much either...
=head2 The
length
() function is gone
In Perl 6 you're not going to see
my
$sizeofstring
=
length
(
$string
);
That's because
"length"
has
been deemed to be an insufficiently
specified concept, because it doesn't specify the units. Instead,
if
you want the
length
of something in characters you
use
my
$sizeinchars
= chars(
$string
);
and
if
you want the size in elements, you
use
my
$sizeinelems
= elems(
@array
);
This is more orthogonal in some ways, insofar as you can now ask
for
the size in chars of an array, and it will add up all the lengths of
the strings in it
for
you:
my
$sizeinchars
= chars(
@array
);
And
if
you ask
for
the number of elems of a
scalar
, it knows to
dereference it:
my
$ref
= [1,2,3];
my
$sizeinelems
= elems(
$ref
);
These are, in fact, just generic object methods:
@array
.elems
$string
.chars
@array
.chars
$ref
.elems
And the functional forms are just multimethod calls. (Unless they're
indirect object calls...who knows?)
You can also
use
C<
%hash
.elems>, which returns the number of pairs in
the hash. I don't think C<
%hash
.chars> is terribly useful, but it will
tell
you how many characters total there are in the
values
. (The key
lengths are ignored, just like the integer
"keys"
of an ordinary array.)
Actually, the meaning of C<.chars> varies depending on your current
level of Unicode support. To be more specific, there's also:
$string
.bytes
$string
.codepoints
$string
.graphemes
$string
.letters
[Update: Those are shortened to C<.codes>, C<.graphs>, and C<.langs> now.]
...none of which should be confused
with
:
$string
.columns
or its evil twin:
$string
.pixels
Those
last
two
require
knowledge of the current font and rendering
engine, in fact. Though C<.columns> is likely to be pretty much the
same
for
most Unicode fonts that restrict themselves to single and
double-wide characters.
=head2 String positions
A corollary to the preceding is that string positions are not numbers.
If you
say
either
$pos
=
index
(
$string
,
"foo"
);
or
$string
~~ /foo/;
$pos
=
$string
.
pos
;
then C<
$pos
> points to that location in that string. If you ask
for
the numeric value of C<
$pos
>, you'll get a number, but which number you
get can vary depending on whether you're currently treating characters
as bytes, codepoints, graphemes, or letters. When you pass a C<
$pos
>
to C<
substr
(
$string
,
$pos
, 3)>, you'll get back
"C<foo>"
, but not
because it counted over some number of characters. If you
use
C<
$pos
>
on some other string, then it
has
to interpret the value numerically
in the current view of what
"character"
means. In a boolean context,
a position is true
if
the position is
defined
, even
if
that position
would evaluate to 0 numerically. (C<
index
> and C<
rindex
>
return
undef
when
they
"run out"
.)
And, in fact,
when
you
say
C<
$len
= .chars>, you're really getting
back the position of the end of the string, which just happens to
numerify to the number of characters in the string in the current view.
A consequence of the preceding rules is that C<
""
.chars> is true,
but C<+
""
.chars> is false. So Perl 5 code that says C<
length
(
$string
)>
needs to be translated to C<+chars(
$string
)>
if
used in a boolean
context.
Routines like C<
substr
> and C<
index
> take either positions or integers
for
arguments. Integers will automatically be turned into positions
in the current view. This may involve traversing the string
for
variable-width representations, especially
when
working
with
combining
characters as parts of graphemes. Once you're working
with
abstract
positions, however, they are efficient. So
while
$pos
=
index
(
$string
,
"fido"
,
$pos
+ 1) {...}
never
has
to rescan the string.
The other point of all this is that you can pass C<
$pos
> or C<
$len
> to
another module, and I<it doesn
't matter> if you'
re doing offsets in
graphemes and they are doing offsets in codepoints. They get the
correct position by their lights, even though the number of characters
looks different. The main constraint on this is that
if
you pass
a position from a lower Unicode support level to a higher Unicode
support level, you can end up
with
a position that is inside what
you think of as a unitary character, whether that's a byte within a
codepoint, or a codepoint within a grapheme or letter. If you deref
such a position, an exception is thrown. But generally high-level
routines call into low-level routines, so the issue shouldn't arise
all that often in practice. However, low-level routines that want
to be called from high-level routines should strive not to
return
positions inside high-level characters--the fly in the ointment being
that the low-level routine doesn't necessarily know the Unicode level
expected by the calling routine. But we have a solution
for
that...
High-level routines that suspect they may have a
"partial position"
can
call C<
$pos
.snap> (or C<
$pos
.=snap>) to round up to the
next
integral
position in the current view, or (much less commonly) C<
$pos
.snapback>
(or C<
$pos
.=snapback>) to round down to the
next
integral position
in the current view. This only biases the position rightward or
leftward. It doesn
't actually do any repositioning unless we'
re about
to throw an exception. So this allows the low-level routine to
return
C<
$pos
.snap> without knowing at the
time
how far forward to snap.
The actual snapping is done later
when
the high-level routine tries
to
use
the position, and at that point we know which semantics to
snap forward under.
By the way,
if
you
bind
to a position rather than assign, it tracks
the string in question:
my
$string
=
"xyz"
;
my
$endpos
:=
$string
.chars;
substr
(
$string
,0,0,
"abc"
);
Deletions of string
around
a position cause the position to be reduced
to the beginning of the deletion. Insertions at a position are assumed
to be
after
that position. That is, the position stays pointing to
the beginning of the newly inserted string, like this:
my
$string
=
"xyz"
;
my
$endpos
:=
$string
.chars;
substr
(
$string
,2,1,
"abc"
);
Hence concatenation never updates any positions. Which means that
sometimes you just have to call C<.chars> again... (Perhaps we'll
provide a way to optionally insert
before
any matching position.)
Note that positions
try
very hard not to get demoted to integers.
In particular, position objects overload addition and substraction
such that
$string
.chars - 1
index
(
$string
,
"foo"
) + 2
are still position objects
with
an implicit reference into the string.
(Subtracting one position object from another results in an integer,
however.)
=head2 The New
"&"
Separator in Regexen
Analogous to the disjunctional C<|> separator, we're also putting in a
conjunctional C<&> separator into
our
regex syntax:
"DOG"
~~ /D [ <vowel>+ & <upper>+ ] G/
The semantics of it are pretty straightforward, as long as you
realize that all of the ANDed assertions have to match I<
with
the
same
length
>. More precisely, they have to start and stop matching
at the same location. So the following is always going to be false:
/ . & .. /
It would be possible to have the other semantics where, as long as
the trailing assertion matches either way, it doesn't have to match
the trailing assertion the I<same> way. But then
tell
me whether C<$1>
should
return
"C<O>"
or
"C<G>"
after
this:
"DOG"
~~ /^[. & ..] (.)/
Besides, it's easy enough to get the other semantics
with
lookahead
assertions. Autoanchoring all the legs of a conjunction to the same
spot adds much more value to it by differentiating it from lookahead.
You have to work pretty hard to make separate lookaheads match the
same
length
. Plus doing that turns what should be a symmetric
operator into a non-symmetrical one, where the final lookahead
can't be a lookahead because someone
has
to
"eat"
the characters
that all the assertions have agreed on are the right number to eat.
So
for
all these reasons it's better to have a conjunction operator
with
complicated enough start/stop semantics to be useful.
Actually, this operator was originally suggested to me by a biologist.
Which leads us to
our
...
=head1 Optional Mandatory Cross-disciplinary Joke
for
People Tired of Dogs
Biologist: What's worse than being chased by a Velociraptor?
Physicist: Obviously, being chased by an Acceloraptor.
=head1 Future Directions
Away from Acceloraptors, obviously.
=head1 References...er, Reference...
Nathanael Schärli, Stéphane Ducasse, Oscar Nierstrasz and Andrew
Black. Traits: Composable Units of Behavior. European Conference on
Object-Oriented Programming (ECOOP), July 2003. Springer LNCS 2743,
Ed. Luca Cardelli.