=head1 TITLE
Apocalypse 3: Operators
=head1 AUTHOR
Larry Wall <larry
@wall
.org>
=head1 VERSION
Maintainer: Larry Wall <larry
@wall
.org>
Date: 2 Oct 2001
Last Modified: 24 Sep 2004
Number: 3
Version: 2
To me, one of the most agonizing aspects of language design is coming
up
with
a useful
system
of operators. To other language designers, this
may seem like a silly thing to agonize over. After all, you can view
all operators as mere syntactic sugar -- operators are just funny
looking function calls. Some languages make a feature of leveling all
function calls into one syntax. As a result, the so-called functional
languages tend to wear out your parenthesis
keys
,
while
OO languages
tend to wear out your dot key.
But
while
your computer really likes it
when
everything looks the same,
most people don't think like computers. People prefer different things
to look different. They also prefer to have shortcuts
for
common tasks.
(Even the mathematicians don't go
for
complete orthogonality. Many of
the shortcuts we typically
use
for
operators were, in fact, invented by
mathematicians in the first place.)
So let me enumerate some of the principles that I weigh against
each
other
when
designing a
system
of operators.
=over
=item * Different classes of operators should look different. That's
why filetest operators look different from string or numeric operators.
=item * Similar classes of operators should look similar. That's why
the filetest operators look like
each
other.
=item * Common operations should be
"Huffman coded."
That is,
frequently used operators should be shorter than infrequently used
ones. For how often it's used, the C<
scalar
> operator of Perl 5 is too
long, in
my
estimation.
=item * Preserving your culture is important. So Perl borrowed many of
its operators from other familiar languages. For instance, we used
Fortran's C<**> operator
for
exponentiation. As we go on to Perl 6,
most of the operators will be
"borrowed"
directly from Perl 5.
=item * Breaking out of your culture is also important, because that is
how we understand other cultures. As an explicitly multicultural
language, Perl
has
generally done OK in this area, though we can always
do
better. Examples of cross-cultural exchange among computer cultures
include XML and Unicode. (Not surprisingly, these features also enable
better cross-cultural exchange among human cultures -- we sincerely
hope.)
=item * Sometimes operators should respond to their context. Perl
has
many operators that
do
different but related things in
scalar
versus
list context.
=item * Sometimes operators should propagate context to their
arguments. The C<x> operator currently does this
for
its left argument,
while
the short-circuit operators
do
this
for
their right argument.
=item * Sometimes operators should force context on their arguments.
Historically, the
scalar
mathematical operators of Perl have forced
scalar
context on their arguments. One of the RFCs discussed below
proposes to revise this.
=item * Sometimes operators should respond polymorphically to the types
of their arguments. Method calls and overloading work this way.
=item * Operator precedence should be designed to minimize the need
for
parentheses. You can think of the precedence of operators as a partial
ordering of the operators such that it minimizes the number of
"unnatural"
pairings that
require
parentheses in typical code.
=item * Operator precedence should be as simple as possible. Perl's
precedence table currently
has
24 levels in it. This might or might not
be too many. We could probably reduce it to about 18 levels,
if
we
abandon strict C compatibility of the C-like operators.
=item * People don't actually want to think about precedence much, so
precedence should be designed to match expectations. Unfortunately, the
expectations of someone who knows the precedence table won't match the
expectations of someone who doesn't. And Perl
has
always catered to the
expectations of C programmers, at least up till now. There's not much
one can
do
up front about differing cultural expectations.
=back
It would be easy to drive any one of these principles into the ground,
at the expense of other principles. In fact, various languages have
done precisely that.
My overriding design principle
has
always been that the complexity of
the solution space should
map
well onto the complexity of the problem
space. Simplification good! Oversimplification bad! Placing artificial
constraints on the solution space produces an impedence mismatch
with
the problem space,
with
the result that using a language that is
artificially simple induces artificial complexity in all solutions
written in that language.
One artificial constraint that all computer languages must deal
with
is
the number of symbols available on the keyboard, corresponding roughly
to the number of symbols in ASCII. Most computer languages have
compensated by defining systems of operators that include digraphs,
trigraphs, and worse. This works pretty well, up to a point. But it
means that certain common unary operators cannot be used as the end of
a digraph operator. Early versions of C had assignment operators in the
wrong order. For instance, there used to be a C<=-> operator. Nowadays
that's spelled C<-=>, to avoid conflict
with
unary minus.
By the same token (
no
pun intended), you can't easily define a unary
C<=> operator without requiring a space
before
it most of the
time
,
since so many binary operators end
with
the C<=> character.
Perl gets
around
some of these problems by keeping track of whether it
is expecting an operator or a term. As it happens, a unary operator is
simply one that occurs
when
Perl is expecting a term. So Perl could
keep track of a unary C<=> operator, even
if
the human programmer might
be confused. So I'd place a unary C<=> operator in the category of
"OK, but don't
use
it
for
anything that will cause widespread
confusion." Mind you, I'm not proposing a specific
use
for
a unary
C<=> at this point. I'm just telling you how I think. If we ever
do
get
a unary C<=> operator, we will hopefully have taken these issues into
account.
While we can disambiguate operators based on whether an operator or a
term is expected, this implies some syntactic constraints as well. For
instance, you can't
use
the same symbol
for
both a postfix operator and
a binary operator. So you'll never see a binary C<++> operator in Perl,
because Perl wouldn't know whether to expect a term or operator
after
that. It also implies that we can't
use
the
"juxtaposition"
operator.
That is, you can't just put two terms
next
to
each
other, and expect
something to happen (such as string concatenation, as in I<awk>). What
if
the second term started
with
something looked like an operator? It
would be misconstrued as a binary operator.
Well, enough of these vague generalities. On to the vague specifics.
The RFCs
for
this apocalypse are (as usual) all over the
map
, but don't
cover the
map
. I'll talk first about what the RFCs
do
cover, and then
about what they don't. Here are the RFCs that happened to get
themselves classified into chapter 3:
RFC PSA Title
--- --- -----
024 rr Data types: Semi-finite (lazy) lists
025 dba Operators: Multiway comparisons
039 rr Perl should have a
print
operator
045 bbb C<||> and C<&&> should propagate result context to both sides
054 cdr Operators: Polymorphic comparisons
081 abc Lazily evaluated list generation functions
082 abc Arrays: Apply operators element-wise in a list context
084 abb
Replace
=> (stringifying comma)
with
=> (pair constructor)
104 ccr Backtracking
138 rr Eliminate =~ operator.
143 dcr Case ignoring eq and cmp operators
170 ccr Generalize =~ to a special
"apply-to"
assignment operator
283 ccc C<
tr
///> in array context should
return
a histogram
285 acb Lazy Input / Context-sensitive Input
290 bbc Better english names
for
-X
320 ccc Allow grouping of -X file tests and add C<filetest> builtin
Note that you can click on the following RFC titles to view a copy of
the RFC in question. The discussion sometimes assumes that you've
read
the RFC.
=head2 RFC 025: Operators: Multiway comparisons
This RFC proposes that expressions involving multiple chained
comparisons should act like mathematician would expect. That is,
if
you
say
this:
0 <=
$x
< 10
it really means something like:
0 <=
$x
&&
$x
< 10
The C<
$x
> would only be evaluated once, however. (This is very much
like the rewrite rule we
use
to explain assignment operators such as
C<
$x
+= 3>.)
I started
with
this RFC simply because it's not of any earthshaking
importance whether I
accept
it or not. The tradeoff is whether to put
some slight complexity into the grammar in order to save some slight
complexity in some Perl programs. The complexity in the grammar is not
much of a problem here, since it's amortized over all possible uses of
it, and it already matches the known psychology of a great number of
people.
There is a potential interaction
with
precedence levels, however. If we
choose to allow an expression like:
0 <=
$x
==
$y
< 20
then we'll have to unify the precedence levels of the comparison
operators
with
the equality operators. I don't see a great problem
with
this, since the main reason
for
having them different was (I believe)
so that you could
write
an exclusive of two comparisons, like this:
$x
< 10 !=
$y
< 10
However, Perl
has
a built-in C<xor> operator, so this isn't really much
of an issue. And there's a lot to be said
for
forcing parentheses in
that
last
expression anyway, just
for
clarity. So
unless
anyone comes
up
with
a large objection that I'm not seeing, this RFC is accepted.
=head2 RFC 320: Allow grouping of -X file tests and add C<filetest>
builtin
This RFC proposes to allow clustering of file test operators much like
some Unix utilities allow bundling of single character switches. That
is,
if
you
say
:
-drwx
$file
it really means something like:
-d
$file
&& -r
$file
&& -w
$file
&& -x
$file
Unfortunately, as proposed, this syntax will simply be too confusing.
We have to be able to negate named operators and subroutines. The
proposed workaround of putting a space
after
a unary minus is much too
onerous and counterintuitive, or at least countercultural.
The only way to rescue the proposal would be to
say
that such operators
are autoloaded in some fashion; any negated but I<unrecognized>
operator would then be assumed to be a clustered filetest. This would
be risky in that it would prevent Perl from catching misspelled
subroutine names at compile
time
when
negated, and the error might well
not get caught at run
time
either,
if
all the characters in the name
are valid filetests, and
if
the argument can be interpreted as a
filename or filehandle (which is usually). Perhaps it would be
naturally disallowed under C<
use
strict>, since we'd basically be
treating C<-xyz> as a bareword. On the other hand, in Perl 5, I<all>
method names are essentially in the unrecognized category
until
run
time
, so it would be impossible to
tell
whether to parse the minus sign
as a real negation. Optional type declarations in Perl 6 would only
help the compiler
with
variables that are actually declared to have a
type. Fortunately, a negated 1 is still true, so even
if
we parsed the
negation as a real negation, it might still end up doing the right
thing. But it's all very tacky.
So I'm thinking of a different tack. Instead of bundling the letters:
-drwx
$file
let's think about the trick of returning the value of C<
$file
>
for
a
true value. Then we'd
write
nested unary operators like this:
-d -r -w -x
$file
One tricky thing about that is that the operators are applied right to
left. And they don't really short circuit the way stacked C<&&> would
(though the optimizer could probably fix that). So I expect we could
do
this
for
the
default
, and
if
you want the C<-drwx> as an autoloaded
backstop, you can explicitly declare that.
In any event, the proposed C<filetest> built-in need not be built in.
It can just be a universal method. (Or maybe just common to strings and
filehandles?)
My one hesitation in making cascading operators work like that is that
people might be tempted to get cute
with
the returned filename:
$handle
=
open
-r -w -x
$file
or
die
;
That might be terribly confusing to a lot of people. The solution to
this conundrum is presented at the end of the
next
section.
=head2 RFC 290: Better english names
for
-X
This RFC proposes long names as aliases
for
the various filetest
operators, so that instead of saying:
-r
$file
you might
say
something like:
freadable(
$file
)
Actually, there's
no
need
for
the C<
use
english>, I expect. These names
could merely universal (or nearly universal) methods. In any case, we
should start getting used to the idea that C<mumble(
$foo
)> is
equivalent to C<
$foo
.mumble()>, at least in the absence of a
local
subroutine definition to the contrary. So I expect that we'll see both:
is_readable(
$file
)
and:
$file
.is_readable
Similar to the cascaded filetest ops in the previous section, one
approach might be that the boolean methods
return
the object in
question
for
success so that method calls could be stacked without
repeating the object:
if
(
$file
.is_dir
.is_readable
.is_writable
.is_executable) {
But C<-drwx
$file
> could still be construed as more readable,
for
some
definition of readability. And cascading methods aren't really
short-circuited. Plus, the value returned would have to be something
like
"$file is true,"
to prevent confusion over filename
"0."
There is also the question of whether this really saves us anything
other than a little notational convenience. If
each
of those methods
has
to
do
a I<
stat
> on the filename, it will be rather slow. To fix
that, what we'd actually have to
return
would be not the filename, but
some object containing the
stat
buffer (represented in Perl 5 by the
C<_> character). If we did that, we wouldn't have to play C<
$file
is
true> games, because a valid
stat
buffer object would (presumably)
always be true (at least
until
it's false).
The same argument would apply to cascaded filetest operators we talked
about earlier. An autoloaded C<-drwx> handler would presumably be smart
enough to
do
a single
stat
. But we'd likely lose the speed gain by
invoking the autoload mechanism. So cascaded operators (either C<-X>
style or C<.is_XXX> style) are the way to go. They just
return
objects
that know how to be either boolean or
stat
buffer objects in context.
This implies you could even
say
$statbuf
= -f
$file
or
die
"Not a regular file: $file"
;
if
(-r -w
$statbuf
) { ... }
This allows us to simplify the special case in Perl 5 represented by
the C<_> token, which was always rather difficult to explain. And
returning a
stat
buffer instead of C<
$file
> prevents the confusing:
$handle
=
open
-r -w -x
$file
or
die
;
Unless, of course, we decide to make a
stat
buffer object
return
the
filename in a string context. C<:-)>
=head2 RFC 283: C<
tr
///> in array context should
return
a histogram
Yes, but ...
While it's true that I put that item into the Todo list ages ago, I
think that histograms should probably have their own interface, since
the histogram should probably be returned as a complete hash in
scalar
context, but we can
't guess that they'
ll want a histogram
for
an
ordinary
scalar
C<
tr
///>. On the other hand, it could just be a C</h>
modifier. But we've already done violence to C<
tr
///> to make it
do
character counting without transliterating, so maybe this isn't so far
fetched.
One problem
with
this RFC is that it does the histogram over the input
rather than the output string. The original Todo entry did not specify
this, but it was what I had intended. But it's more useful to
do
it on
the resulting characters because then you can
use
the C<
tr
///> itself
to categorize characters into,
say
, vowels and consonants, and then
count the resulting V
's and C'
s.
On the other hand, I'm thinking that the C<
tr
///> interface is really
rather lousy, and getting lousier every day. The whole C<
tr
///>
interface is kind of sucky
for
any
sort
of dynamically generated data.
But even without dynamic data, there are serious problems. It was bad
enough
when
the character set was just ASCII. The basic problem is that
the notation is inside out from what it should be, in the sense that it
doesn't actually show which characters correspond, so you have to count
characters. We made some progress on that in Perl 5
when
, instead of:
tr
/abcdefghijklmnopqrstuvwxyz/VCCCVCCCVCCCCCVCCCCCVCCCCC/
we allowed you to
say
:
tr
[abcdefghijklmnopqrstuvwxyz]
[VCCCVCCCVCCCCCVCCCCCVCCCCC]
There are also shenanigans you can play
if
you know that duplicates on
the left side prefer the first mention to subsequent mentions:
tr
/aeioua-z/VVVVVC/
But you're still working against the notation. We need a more explicit
way to put character classes into correspondence.
More problems show up
when
we extend the character set beyond ASCII.
The
use
of C<
tr
///>
for
case translations
has
long been
semi-deprecated, because a range like C<
tr
/a-z/A-Z/> leaves out
characters
with
diacritics. And now
with
Unicode, the whole notion of
what is a character is becoming more susceptible to interpretation, and
the C<
tr
///> interface doesn't
tell
Perl whether to treat character
modifiers as part of the base character. For some of the double-wide
characters it's even hard to just I<look> at the character and
tell
if
it's one character or two. Counted character lists are about as modern
as hollerith strings in Fortran.
So I suspect the C<
tr
///> syntax will be relegated to being just one
quote-like interface to the actual transliteration module, whose main
interface will be specified in terms of translation pairs, the left
side of which will give a pattern to match (typically a character
class), and the right side will
say
what to translation anything
matching to. Think of it as a series of coordinated parallel C<s///>
operations. Syntax is still
open
for
negotiation till apocalypse 5.
But there can certainly be a histogram option in there somewhere.
=head2 RFC 084: Replace C<< => >> (stringifying comma)
with
C<< => >>
(pair constructor)
I like the basic idea of pairs because it generalizes to more than just
hash
values
. Named parameters will almost certainly be implemented
using pairs as well.
I
do
have some quibbles
with
the RFC. The proposed C<key> and C<value>
built-ins should simply be lvalue methods on pair objects. And
if
we
use
pair objects to implement entries in hashes, the key must be
immutable, or there must be some way of re-hashing the key
if
it
changes.
The stuff about using pairs
for
mumble-but-false is bogus. We'll
use
properties
for
that
sort
of chicanery. (And multiway comparisons won't
rely on such chicanery in any event. See above.)
=head2 RFC 081: Lazily evaluated list generation functions
Sorry, you can't have the colon--at least, not without sharing it.
Colon will be a kind of
"supercomma"
that supplies an adverbial list
to some previous operator, which in this case would be the prior colon
or dotdot.
(We can't quite implement C<?:> as a C<:> modifier on C<?>, because the
precedence would be screwey,
unless
we limit C<:> to a single argument,
which would preclude its being used to disambiguate indirect objects.
More on that later.)
The RFCs proposal concerning C<attributes::get(
@a
)> stuff is superseded
by value properties. So, C<
@a
.method()> should just pull out the
variable's properties directly,
if
the variable is of a type that
supports the methods in question. A lazy list object should certainly
have such methods.
Assignment of a lazy list to a
tied
array is a problem
unless
the
tie
implementation handles laziness. By
default
a
tied
array is likely to
enforce immediate list evaluation. Immediate list evaluation doesn't
work on infinite lists. That means it's gonna fill up your disk drive
if
you
try
to
say
something like:
@my_tied_file
= 1..Inf;
Laziness should be possible, but not necessarily the norm. It's all
very well to delay the evaluation of
"pure"
functions in the realm of
math, since presumably you get the same result
no
matter
when
you
evaluate. But a lot of Perl programming is done
with
real world data
that changes over
time
. Saying C<somefunc(
$a
..
$b
)> can get terribly
fouled up
if
C<
$b
> can change, and the lazy function still refers to
the variable rather than its instantaneous value. On the other hand,
there is overhead in taking snapshots of the current state.
On the gripping hand, the lazy list object I<is> the snapshot of the
values
, that's not a problem in this case. Forget I mentioned it.
The tricky thing about lazy lists is not the lazy lists themselves, but
how they interact
with
the rest of the language. For instance, what
happens
if
you
say
:
@lazy
= 1..Inf;
@lazy
[5] = 42;
Is C<
@lazy
> still lazy
after
it is modified? Do we remember the
C<
@lazy
[5]> is an
"exception"
, and
continue
to generate the rest of
the
values
by the original rule? What
if
C<
@lazy
> is going to be
generated by a recursive function? Does it matter whether we've already
generated C<
@lazy
[5]>?
And how
do
we explain this simply to people so that they can
understand? We will have to be very clear about the distinction between
the abstraction and the concrete value. I'm of the opinion that a lazy
list is a definition of the I<
default
>
values
of an array, and that the
actual
values
of the array
override
any
default
values
. Assigning to a
previously memoized element overrides the memoized value.
It would help the optimizer to have a way to declare
"pure"
array
definitions that can't be overridden.
Also consider this:
@array
= (1..100, 100..10000:100);
A single flat array can have multiple lazy lists as part of it's
default
definition. We'll have to keep track of that, which could get
especially tricky
if
the definitions start overlapping via slice
definitions.
In practice, people will treat the
default
values
as real
values
. If
you pass a lazy list into a function as an array argument, the function
will probably not know or care whether the
values
it's getting from the
array are being generated on the fly or were there in the first place.
I can think of other cans of worms this opens, and I'm quite certain
I'm too stupid to think of them all. Nevertheless,
my
gut feeling is
that we can make things work more like people expect rather than less.
And I was always a little bit jealous that REXX could have arrays
with
default
values
. C<:-)>
[Update: Turns out that all lists are lazy by
default
. Use unary C<**>
to force a non-lazy list evaluation immediately.]
=head2 RFC 285: Lazy Input / Context-sensitive Input
Solving this
with
C<want()> is the wrong approach, but I think the
basic idea is sound because it's what people expect. And the C<want()>
should in fact be unnecessary. Essentially,
if
the right side of a list
assignment produces a lazy list, and the left side requests a finite
number of elements, the list generator will only produce enough to
satisy the demand. It doesn't need to know how many in advance. It just
produces another
scalar
value
when
requested. The generator doesn't
have to be smart about its context. The motto of a lazy list generator
should be, "Ours is not to question why, ours is but to
do
(the
next
one) or
die
."
It will be tricky to make this one work right:
(
$first
,
@rest
) = 1 .. Inf;
=head2 RFC 082: Arrays: Apply operators element-wise in a list context
APL, here we come... :-)
This is by far the most difficult of these RFCs to decide, so I'm going
to be doing a lot of thinking out loud here. This is research--or at
least, a search. Please bear
with
me.
I expect that there are two classes of Perl programmers--those that
would find these
"hyper"
operators natural, and those that wouldn't.
Turning this feature on by
default
would cause a lot of heartburn
for
people who (from Perl 5 experience) expect arrays to always
return
their
length
under
scalar
operators even in list context. It can
reasonably be argued that we need to make the
scalar
operators
default
,
but make it easy to turn on hyper operators within a lexical scope. In
any event, both sets of operators need to be visible from
anywhere--we're just arguing over who gets the short, traditional
names. All operators will presumably have longer names
for
use
as
function calls anyway. Instead of just naming an operator
with
long
names like:
operator:+
operator:/
the longer names could distinguish
"hyperness"
like this:
@a
scalar
:+
@b
@a
list:/
@b
That implies they could also be called like this:
scalar
:+(
@a
,
@b
)
list:/(
@a
,
@b
)
We might find some short prefix character stands in
for
"list"
or
"scalar"
. The obvious candidates are C<@> and C<$>:
@a
$+
@b
@a
@/
@b
Unfortunately, in this case,
"obvious"
is synonymous
with
"wrong"
.
These operators would be completely confusing from a visual point of
view. If the main psychological point of putting noun markers on the
nouns is so that they stand out from the verbs, then you don't want to
put the same markers on the verbs. It would be like the Germans
starting to capitalize all their words instead of just their nouns.
Instead, we could borrow a singular/plural memelet from shell globbing,
where C<*> means multiple characters, and C<?> means one character:
@a
?+
@b
@a
*/
@b
But that
has
a bad ambiguity. How
do
you
tell
whether C<**> is an
exponentiation or a list multiplication? So
if
we went that route, we'd
probably have to
say
:
@a
?:+
@b
@a
*:/
@b
Or some such. But
if
we're going that far in the direction of
gobbledygook, perhaps there are prefix characters that wouldn't be so
ambiguous. The colon and the dot also have a visual singular/plural
value:
@a
.+
@b
@a
:/
@b
We
're already changing the old meaning of dot (and I'
m planning to
rescue colon from the C<?:> operator), so perhaps that could be made to
work. You could almost think of dot and colon as complementary method
calls, where you could
say
:
$len
=
@a
.
length
;
@len
=
@a
:
length
;
But that would interfere
with
other desirable uses of colon. Plus, it's
actually going to be confusing to think of these as singular and plural
operators because,
while
we're specifying that we want a
"plural"
operator, we're not specifying how to treat the plurality. Consider
this:
@len
= list:
length
(
@a
);
Anyone would naively think that returns the
length
of the list, not the
length
of
each
element of the list. To make it work in English, we'd
actually have to
say
something like this:
@len
=
each
:
length
(
@a
);
$len
= the:
length
(
@a
);
That would be equivalent to the method calls:
@len
=
@a
.
each
:
length
;
$len
=
@a
.the:
length
;
But does this really mean that there are two array methods
with
those
weird names? I don
't think so. We'
ve reached a result here that is
spectacularly
close
to a I<reductio ad absurdum>. It seems to me that
the whole point of this RFC is that the
"eachness"
is most simply
specified by the list context, together
with
the knowledge that
C<
length
()> is a function/method that maps one
scalar
value to another.
The distribution of that function over an array value is not something
the
scalar
function should be concerned
with
, except insofar as it must
make sure its type signature is correct.
And there
's the rub. We'
re really talking about enforced strong typing
for
this to work right. When we
say
:
@foo
=
@bar
.mumble
How
do
we know whether C<mumble>
has
the type signature that magically
enables iteration over C<
@bar
>? That definition is off in some other
file that we may not have memorized quite yet. We need some more
explicit syntax that says that auto-interation is expected, regardless
of whether the definition of the operator is well specified. Magical
auto-iteration is not going to work well in a language
with
optional
typing.
So the resolution of this is that the unmarked forms of operators will
force
scalar
context as they
do
in Perl 5, and we'll need a special
marker that says an operator is to be auto-iterated. That special
marker turns out to be an uparrow,
with
a tip o' the hat to
higher-order functions. That is, the hyper-operator:
@a
^*
@b
is equivalent to this:
parallel { $^a * $^b }
@a
,
@b
(where C<parallel> is a hypothetical function that iterates through
multiple arrays in parallel.)
[Update: These days hyper operators are marked
with
German quotes: C<»*«>.
We stole C<^>
for
exclusive-or junctions.]
Hyper operators will also intuit where a dimension is missing from one
of its arguments, and replicate a
scalar
value to a list value in that
dimension. That means you can
say
:
@a
^+ 1
to get a value
with
one added to
each
element of C<
@a
>. (C<
@a
> is
unchanged.)
I don't believe there are any insurmountable ambiguities
with
the
uparrow notation. There is currently an uparrow operator meaning
exclusive-or, but that is rarely used in practice, and is not typically
followed by other operators
when
it is used. We can represent
exclusive-or
with
C<~> instead. (I like that idea anyway, because the
unary C<~> is a 1's complement, and the binary C<~> would simply be
doing a 1's complement on the second argument of the set bits in the
first argument. On the other hand, there's destructive interference
with
other cultural meanings of tilde, so it's not completely obvious
that it
's the right thing to do. Nevertheless, that'
s what we're
doing.)
[Update: Except we're not. Unary and binary C<~> are now string operators,
and C's bitwise ops have been demoted to longer operators
with
a prefix.]
Anyway, in essence, I'm rejecting the underlying premise of this RFC,
that we'll have strong enough typing to intuit the right behavior
without confusing people. Nevertheless, we'll still have easy-to-
use
(and more importantly, easy-to-recognize) hyper-operators.
This RFC also asks about how
return
values
for
functions like C<
abs
()>
might be specified. I expect
sub
declarations to (optionally) include a
return
type, so this would be sufficient to figure out which functions
would know how to
map
a
scalar
to a
scalar
. And we should point out
again that even though the base language will not
try
to intuit which
operators should be hyperoperators, there's
no
reason in principle that
someone couldn't invent a dialect that does. All is fair
if
you
predeclare.
=head2 RFC 045: C<||> and C<&&> should propagate result context to both
sides
Yes. The thing that makes this work in Perl 6, where it was almost
impossible in Perl 5, is that in Perl 6, list context doesn't imply
immediate list flattening. More precisely, it specifies immediate list
flattening in a notional sense, but the implementation is free to delay
that flattening
until
it's actually required. Internally, a flattened
list is still an object. So
when
C<
@a
||
@b
> evaluates the arrays,
they're evaluated as objects that can
return
either a boolean value or
a list, depending on the context. And it will be possible to apply both
contexts to the first argument simultaneously. (Of course, the computer
actually looks at it in the boolean context first.)
There is
no
conflict
with
RFC 81 because the hyper versions of these
operators will be spelled:
@a
^||
@b
@a
^&&
@b
[Update: That'd be C<»||«> and C<»&&«> now.]
=head2 RFC 054: Operators: Polymorphic comparisons
I'm not sure of the performance hit of backstopping numeric equality
with
string equality. Maybe vtables help
with
this. But I think this
RFC is proposing something that is too specific. The more general
problem is how you allow variants of built-ins, not just
for
C<==>, but
for
other operators like C<< <=> >> and C<cmp>, not to mention all
the other operators that have
scalar
and list variants.
A generic equality operator could potentially be supplied by operator
definition. I expect that a similar mechanism would allow us to define
how abstract a comparison C<cmp> would
do
, so we could
sort
and collate
according to the various
defined
levels of Unicode.
The argument that you can't
do
generic programming is somewhat
specious. The problem in Perl 5 is that you can't name operators, so
you couldn't pass in a generic operator in place of a specific one even
if
you wanted to. I think it's more important to make sure all
operators have real function names in Perl 6:
operator:+(
$a
,
$b
);
operator:^+(
@a
,
@b
);
my
sub
operator:<?> (
$a
,
$b
) { ... }
if
(
$a
<?>
$b
) { ... }
@sorted
= collate \
&operator
:<?>,
@unicode
;
[Update: This role is now filled in part by the C<~~> smartmatch operator.
Also, there
's no need to name hyper operators--they'
re always constructed
artificially.]
=head2 RFC 104: Backtracking
As proposed, this can easily be done
with
an operator definition to
call a sequence of closures. I wonder whether the proposal is complete,
however. There should probably be more make-it-didn't-happen semantics
to a backtracking engine. If Prolog unification is emulated
with
an
assignment, how
do
you later unassign a variable
if
you backtrack past
it?
Ordinarily, temporary
values
are scoped to a block, but we're using
blocks differently here, much like parens are used in a regex. Later
parens don't undo the
"unifications"
of earlier parens.
In normal imperative programming these temporary determinations are
remembered in ordinary scoped variables and the current hypothesis is
extended via recursion. An C<andthen> operator would need to have a way
of keeping BLOCK1's scope
around
until
BLOCK2 succeeds or fails. That
is, in terms of lexical scoping:
{BLOCK1} andthen {BLOCK2}
needs to work more like
{BLOCK1 andthen {BLOCK2}}
This might be difficult to arrange as a mere module. However,
with
rewriting rules it might be possible to install the requisite scoping
semantics within BLOCK1 to make it work like that. So I don't think
this is a primitive in the same sense that continuations would be. For
now let's assume we can build backtracking operators from
continuations. Those will be covered in a future apocalypse.
[Update: Also, the fact that Perl 6 regexes can call closures
with
backtracking covers most of this functionality. See A5 and S5.]
=head2 RFC 143: Case ignoring C<eq> and C<cmp> operators
This is another RFC that proposes a specific feature that can be
handled by a more generic feature, in this case, an operator
definition:
my
sub
operator:EQ {
lc
($^a) eq
lc
($^b) }
Incidentally, I notice that the RFC normalizes to uppercase. I suspect
it's better these days to normalize to lowercase, because Unicode
distinguishes titlecase from uppercase, and provides mappings
for
both
to lowercase.
=head2 RFC 170: Generalize C<=~> to a special
"apply-to"
assignment
operator
I don't think the argument should come in on the right. I think it
would be more natural to treat it as an object, since all Perl
variables will essentially be objects anyway,
if
you scratch them
right. Er, left.
I
do
wonder whether we could generalize C<=~> to a list operator that
calls a
given
method on multiple objects, so that
(
$a
,
$b
) =~ s/foo/bar/;
would be equivalent to
for
(
$a
,
$b
) { s/foo/bar/ }
But then maybe it's redundant, except that you could
say
@foo
=~ s/foo/bar/
in the middle of an expression. But by and large, I think I'd rather
see:
@foo
.
grep
{!m/\s/}
instead of using C<=~>
for
what is essentially a method call. In line
with
what we discussed
before
, the list version could be a
hyperoperator:
@foo
. ^s/foo/bar/;
or possibly:
@foo
^. s/foo/bar/;
Note that in the general case this all implies that there is some
interplay between how you declare method calls and how you declare
quote-like operators. It seems as though it would be dangerous to let a
quote-like declaration out of a lexical scope, but then it's also not
clear how a method call declaration could be lexically scoped. So we
probably can't
do
away
with
C<=~> as an explicit marker that the thing
on the left is a string, and the thing on the right is a quoted
construct. That means that a hypersubstitution is really spelled:
@foo
^=~ s/foo/bar/;
Admittedly, that's not the prettiest thing in the world.
[Update: The C<~~> smartmatch operator subsumes all C<=~> functionality.]
=head1 Non-RFC considerations
The RFCs propose various specific features, but don't give a systematic
view of the operators as a whole. In this section I'll
try
to give a
more cohesive picture of where I see things going.
=head2 Binary C<.> (dot)
This is now the method call operator, in line
with
industry-wide
practice. It also
has
ramifications
for
how we declare object attribute
variables. I'm anticipating that, within a class module, saying
my
int
$.counter;
would declare both a C<$.counter> instance variable and a C<counter>
accessor method
for
use
within the class. (If marked as public, it
would also declare a C<counter> accessor method
for
use
outside the
class.)
[Update: The keyword is C<
has
> rather than C<
my
>, and a
read
-only
public accessor is generated by
default
. See A12.]
=head2 Unary C<.> (dot)
It's possible that a unary C<.> would call a method on the current
object within a class. That is, it would be the same as a binary C<.>
with
C<
$self
> (or equivalent) on the left:
method foowrapper (
$a
,
$b
) {
.reallyfoo(
$a
,
$b
,
$c
)
}
On the other hand, it might be considered better style to be explicit:
method foowrapper (
$self
:
$a
,
$b
) {
$self
.reallyfoo(
$a
,
$b
,
$c
)
}
(Don't take that declaration syntax as final just yet, however.)
[Update: Unary dot turns out to a method call on the current topic. See A4
and S4.]
=head2 Binary C<_>
Since C<.> is taken
for
method calls, we need a new way to concatenate
strings. We'll
use
a solitary underscore
for
that. So, instead of:
$a
.
$b
.
$c
you'll
say
:
$a
_
$b
_
$c
The only downside to that is the space between a variable name and the
operator is required. This is to be construed as a feature.
[Update: Nowadays concatenation is C<~>.]
=head2 Unary C<_>
Since the C<_> token indicating
stat
buffer is going away, a unary
underscore operator will force stringification, just as interpolation
does, only without the quotes.
[Update: That's unary C<~> now.]
=head2 Unary C<+>
Similarly, a unary C<+> will force numification in Perl 6, unlike in
Perl 5. If that fails, NaN (not a number) is returned.
=head2 Binary C<:=>
We need to distinguish two different forms of assignment. The standard
assignment operator, C<=>, works just as it does Perl 5, as much as
possible. That is, it tries to make it look like a value assignment.
This is
our
cultural heritage.
But we also need an operator that works like assignment but is more
definitional. If you're familiar
with
Prolog, you can think of it as a
sort
of unification operator (though without the implicit backtracking
semantics). In human terms, it treats the left side as a set of formal
arguments exactly as
if
they were in the declaration of a function, and
binds a set of arguments on the right hand side as though they were
being passed to a function. This is what the new C<:=> operator does.
More below.
=head2 Unary C<*>
Unary C<*> is the list flattening operator. (See Ruby
for
prior art.)
When used on an rvalue, it turns off function signature matching
for
the rest of the arguments, so that,
for
instance:
@args
= (\
@foo
,
@bar
);
push
*
@args
;
would be equivalent to:
push
@foo
,
@bar
;
In this respect, it serves as a replacement
for
the
prototype
-disabling
C<
&foo
(
@bar
)> syntax of Perl 5. That would be translated to:
foo(*
@bar
)
In an lvalue, the unary C<*> indicates that subsequent array names
slurp all the rest of the
values
. So this would swap two arrays:
(
@a
,
@b
) := (
@b
,
@a
);
whereas this would assign all the array elements of C<
@c
> and C<
@d
> to
C<
@a
>.
(*
@a
,
@b
) := (
@c
,
@d
);
An ordinary flattening list assignment:
@a
= (
@b
,
@c
);
is equivalent to:
*
@a
:= (
@b
,
@c
);
That's not the same as
@a
:= *(
@b
,
@c
);
which would take the first element of C<
@b
> as the new definition of
C<
@a
>, and throw away the rest, exactly as
if
you passed too many
arguments to a function. It could optionally be made to blow up at run
time
. (It can
't be made to blow up at compile time, since we don'
t know
how many elements are in C<
@b
> and C<
@c
> combined. There could be
exactly one element, which is what the left side wants.)
=head2 List context
The whole notion of list context is somewhat modified in Perl 6. Since
lists can be lazy, the interpretation of list flattening is also by
necessity lazy. This means that, in the absence of the C<*> list
flattening operator (or an equivalent old-fashioned list assignment),
lists in Perl 6 are object lists. That is to
say
, they are parsed as
if
they were a list of objects in
scalar
context. When you see a function
call like:
foo
@a
,
@b
,
@c
;
you should generally assume that three discrete arrays are being passed
to the function,
unless
you happen to know that the signature of C<foo>
includes a list flattening C<*>. (If a subroutine doesn't have a
signature, it is assumed to have a signature of C<(*
@_
)>
for
old
times
'
sake.) Note that this is really nothing new to Perl, which
has
always
made this distinction
for
builtins, and extended it to user-
defined
functions in Perl 5 via prototypes like C<\@> and C<\%>. We're just
changing the syntax in Perl 6 so that the unmarked form of formal
argument expects a
scalar
value, and you optionally declare the final
formal argument to expect a list. It's a matter of Huffman coding
again, not to mention saving wear and tear on the backslash key.
=head2 Binary C<:>
As I pointed out in an earlier apocalypse, the first rule of computer
language design is that everybody wants the colon. I think that means
that we should
do
our
best to give the colon to as many features as
possible.
Hence, this operator modifies a preceding operator adverbially. That
is, it can turn any operator into a trinary operator (provided a
suitable definition is declared). It can be used to supply a
"step"
to a range operator,
for
instance. It can also be used as a kind of
super-comma separating an indirect object from the subsequent argument
list:
print
$handle
[2]:
@args
;
[Update: binary C<:> as an invocant separator is now distinguished from
adverbs that start
with
C<:>, so the
"step"
of a range is specified
using C<:by(
$x
)> rather than a bare colon.]
Of course, this conflicts
with
the old definition of the C<?:>
operator. See below.
In a method type signature, this operator indicates that a previous
argument (or arguments) is to be considered the
"self"
of a method
call. (Putting it
after
multiple arguments could indicate a desire
for
multimethod dispatch!)
=head2 Trinary C<??::>
The old C<?:> operator is now spelled C<??::>. That is to
say
, since
it's really a kind of short-circuit operator, we just double both
characters like the C<&&> and C<||> operator. This makes it easy to
remember
for
C programmers. Just change:
$a
?
$b
:
$c
to
$a
??
$b
::
$c
The basic problem is that the old C<?:> operator wastes two very useful
single characters
for
an operator that is not used often enough to
justify the waste of two characters. It's bad Huffman coding, in other
words. Every proposed
use
of colon in the RFCs conflicted
with
the
C<?:> operator. I think that says something.
I can't list here all the possible spellings of C<?:> that I
considered. I just think C<??::> is the most visually appealing and
mnemonic of the lot of them.
=head2 Binary C<//>
A binary C<//> operator is the defaulting operator. That is:
$a
//
$b
is short
for
:
defined
(
$a
) ??
$a
::
$b
except that the left side is evaluated only once. It will work on
arrays and hashes as well as scalars. It also
has
a corresponding
assignment operator, which only does the assignment
if
the left side is
undefined:
$pi
//= 3;
=head2 Binary C<;>
The binary C<;> operator separates two expressions in a list, much like
the expressions within a C-style C<
for
> loop. Obviously the expressions
need to be in some kind of bracketing structure to avoid ambiguity
with
the end of the statement. Depending on the context, these expressions
may be interpreted as arguments to a C<
for
> loop, or slices of a
multi-dimensional array, or whatever. In the absence of other context,
the
default
is simply to make a list of lists. That is,
[1,2,3;4,5,6]
is a shorthand
for
:
[[1,2,3],[4,5,6]]
But usually there will be other context, such as a multidimension array
that wants to be sliced, or a syntactic construct that wants to emulate
some kind of control structure. A construct emulating a 3-argument
C<
for
> loop might force all the expressions to be closures,
for
instance, so that they can be evaluated
each
time
through the loop.
User-
defined
syntax will discussed in apocalypse 18,
if
not sooner.
=head2 Unary C<^>
Unary ^ is now reserved
for
hyper operators. Note that it works on
assignment operators as well:
@a
^+= 1;
[Update: That'd be C<»+=«> now.]
=head2 Unary C<?>
Reserved
for
future
use
.
[Update: This is now the boolean context operator, the opposite of C<!>.]
=head2 Binary C<?>
Reserved
for
future
use
.
=head2 Binary C<~>
This is now the bitwise XOR operator. Recall that unary C<~> (1's
complement) is simply an XOR
with
a value containing all 1 bits.
[Update: C<~> is now string concatenation. Bitwise XOR is C<+^> or C<~^>
depending on whether your doing numeric xor or stringwise.]
=head2 Binary C<~~>
This is a logical XOR operator. It's a high precedence version of the
low precedence C<xor> operator.
[Update: C<~~> is now the smartmatch operator. Logical XOR is C<^^>.
Junctive XOR is C<^>.]
=head2 User
defined
operators
The declaration syntax of user-
defined
operators is still up
for
grabs,
but we can
say
a few things about it. First, we can differentiate unary
from binary declarations simply by the number of arguments.
(Declaration of a
return
type may also be useful
for
disambiguating
subsequent parsing. One place it won't be needed is
for
operators
wanting to know whether they should behave as hyperoperators. The
pressure to
do
that is relieved by the explicit C<^> hypermarker.)
We also need to think how these operator definitions relate to
overloading. We can treat an operator as a method on the first object,
but sometimes it's the second object that should control the action.
(Or
with
multimethod dispatch, both objects.) These will have to be
thrashed out under ordinary method dispatch policy. The important thing
is to realize that an operator is just a funny looking method call.
When you
say
:
$man
bites
$dog
The infrastruture will need to untangle whether the man is biting the
dog or the dog is getting bitten by the man. The actual biting could be
implement in either the C<Man> class or the C<Dog> class, or even
somewhere
else
, in the case of multimethods.
[Update: Unary and binary operators are now distinguished by prefixing
with
either C<prefix:> or C<infix:>. There are many other syntactic
categories as well.]
=head2 Unicode operators
Rather than using longer and longer strings of ASCII characters to
represent user-
defined
operators, it will be much more readable to
allow the (judicious)
use
of Unicode operators.
In the short term, we won't see much of this. As screen resolutions
increase over the
next
20 years, we'll all become much more comfortable
with
the richer symbol set. I see
no
reason (other than fear of
obfuscation (and fear of fear of obfuscation))) why Unicode operators
should not be allowed.
Note that, unlike APL, we won't be hardware dependent, in the sense
that any Perl implementation will always be able to parse Unicode, even
if
you can't display it very well. (But note that Vim 6.0 just came out
with
Unicode support.)
=head2 Precedence
We will at least unify the precedence levels of the equality and
relational operators. Other unifications are possible. For instance,
the C<not> logical operator could be combined
with
list operators in
precedence. There's only so much simplification that you can
do
,
however, since you can't mix right association
with
left association.
By and large, the precedence table will be what you expect,
if
you
expect it to remain largely the same.
[Update: We also got rid of the special levels
for
bitwise operators,
shifts, binding operators, and range operators. On the other hand,
we added levels
for
junctive operators and non-chaining binaries.
Still, we managed to reduce it from 24 to 22 precedence levels. See S3.]
And that still goes
for
Perl 6 in general. We talk a lot here about
what we
're changing, but there'
s a lot more that we're not changing.
Perl 5 does a lot of things right, and we're not terribly interested in
"fixing"
that.