=encoding utf-8
=head1 NAME
Apocalypse_2 - Bits and Pieces
=head1 AUTHOR
Larry Wall <larry
@wall
.org>
=head1 VERSION
Maintainer: Larry Wall <larry
@wall
.org>
Date: 3 May 2001
Last Modified: 18 May 2006
Number: 2
Version: 6
Here's Apocalypse 2, meant to be
read
in conjunction
with
Chapter 2 of
the Camel Book. The basic assumption is that
if
Chapter 2 talks about
something that I don
't discuss here, it doesn'
t change in Perl 6. (Of
course, it could always just be an oversight. One might
say
that people
who oversee things have a gift of oversight.)
Before I go further, I would like to thank all the victims, er,
participants in the RFC process. (I beg special forgiveness from those
whose brains I haven't been able to get inside well enough to
incorporate their ideas). I would also like to particularly thank
Damian Conway, who will recognize many of his systematic ideas here,
including some that have been less than improved by
my
meddling.
Here are the RFCs covered:
RFC PSA Title
--- --- -----
Textual
005 cdr Multiline Comments
for
Perl
102 dcr Inline Comments
for
Perl
Types
161 adb Everything in Perl Becomes an Object
038 bdb Standardise Handling of Abnormal Numbers Like Infinities and NaNs
043 bcb Integrate BigInts (and BigRats) Support Tightly With the Basic Scalars
192 ddr Undef Values ne Value
212 rrb Make Length(
@array
) Work
218 bcc C<
my
Dog
$spot
> Is Just an Assertion
Variables
071 aaa Legacy Perl
$pkg
'var Should Die
009 bfr Highlander Variable Types
133 bcr Alternate Syntax
for
Variable Names
134 bcc Alternative Array and Hash Slicing
196 bcb More Direct Syntax
for
Hashes
201 bcr Hash Slicing
Strings
105 aaa Remove
"In string @ must be \@"
Fatal Error
111 aaa Here Docs Terminators (Was Whitespace and Here Docs)
162 abb Heredoc Contents
139 cfr Allow Calling Any Function With a Syntax Like s///
222 abb Interpolation of Object Method Calls
226 acr Selective Interpolation in Single Quotish Context
237 adc Hashes Should Interpolate in Double-Quoted Strings
251 acr Interpolation of Class Method Calls
252 abb Interpolation of Subroutines
327 dbr C<\v>
for
Vertical Tab
328 bcr Single Quotes Don't Interpolate \' and \\
Files
034 aaa Angle Brackets Should Not Be Used
for
File Globbing
051 ccr Angle Brackets Should Accept Filenames and Lists
Lists
175 rrb Add C<list> Keyword to Force List Context (like C<
scalar
>)
Retracted
010 rr Filehandles Should Use C<*> as a Type Prefix If Typeglobs Are Eliminated
103 rr Fix C<
$pkg::
$var
> Precedence Issues With Parsing of C<::>
109 rr Less Line Noise - Let's Get Rid of @%
245 rr Add New C<empty> Keyword to DWIM
for
Clearing Values
263 rr Add Null() Keyword and Fundamental Data Type
=head1 Atoms
Perl 6 programs are notionally written in Unicode, and assume Unicode
semantics by
default
even
when
they happen to be processing other
character sets behind the scenes. Note that
when
we
say
that Perl is
written in Unicode, we're speaking of an abstract character set, not
any particular encoding. (The typical program will likely be written in
UTF-8 in the West, and in some 16-bit character set in the East.)
=head1 Molecules
=head2 RFC 005: Multiline Comments
for
Perl
I admit to being prejudiced on this one -- I was unduly influenced at a
tender age by the rationale
for
the design of Ada, which made a good
case, I thought,
for
leaving multiline comments out of the language.
But even
if
I weren
't blindly prejudiced, I suspect I'
d look at the
psychology of the thing, and notice that much of the
time
, even in
languages that have multiline comments, people nevertheless tend to
use
them like this:
/*
* Natter, natter, natter.
* Gromish, gromish, gromish.
*/
The counterargument to that is, of course, that people don't I<always>
do
that in C, so why should they have to
do
it in Perl? And
if
there
were
no
other way to
do
multiline comments in Perl, they'd have a
stronger case. But there already is another way, albeit one rejected by
this RFC as
"a workaround."
But it seems to me that, rather than adding another kind of comment or
trying to make something that looks like code behave like a comment,
the solution is simply to fix whatever is wrong
with
POD so that its
use
for
commenting can
no
longer be considered a workaround. Actual
design of POD can be put off till Apocalypse 26, but we can speculate
at this point that the rules
for
switching back and forth between POD
and Perl are suboptimal
for
use
in comments. If so, then it's likely
that in Perl 6 we'll have a rule like this: If a C<=begin MUMBLE>
transitions from Perl to POD mode then the corresponding C<=end MUMBLE>
should transition back (without a C<=cut> directive).
Note that we haven't
defined
our
C<MUMBLE>s yet, but they can be set up
to let
our
program have any
sort
of programmatic access to the data
that we desire. For instance, it is likely that comments of this kind
could be
tied
in
with
some
sort
of literate (or at least, semiliterate)
programming framework.
=head2 RFC 102: Inline Comments
for
Perl
I have never much liked inline comments -- as commonly practiced they
tend to obfuscate the code as much as they clarify it. That being said,
"All is fair if you predeclare."
So there should be nothing
preventing someone from writing a lexer regex that handles them,
provided we make the lexer sufficiently mutable. Which we will. (As it
happens, the character sequence
"C</*>"
will be unlikely to occur in
standard Perl 6. Which I guess means it I<is> likely to occur in
nonstandard Perl 6. C<:-)>
A pragma declaring nonstandard commenting would also allow people to
use
C</* */>
for
multiline comments,
if
they like. (But I still think
it'd be better to
use
POD directives
for
that, just to keep the text
accessible to the program.)
[Update: It eventually became apparent (
after
five years!) that we
could simplify the distinction between postfix and infix operators
if
we had a general way to embed comments, so we now have a general
quote-like mechanism
for
embedded comments such that you can
say
C<
$foo
\
character is immediately followed by a bracket, that bracket pair
determines the scope of the comment. (If you're wondering how the
backslash/dot become one dot in the example, see the explanation of the
"long dot"
in S02.)]
=head1 Built-In Data Types
The basic change here is that, rather than just supporting scalars,
arrays and hashes, Perl 6 supports opaque objects as a fourth
fundamental data type. (You might think of them as pseudo-hashes done
right.) While a class can access its object attributes any way it
likes, all external access to opaque objects occurs through methods,
even
for
attributes. (This guarantees that attribute inheritance works
correctly.)
While Perl 6 still defaults to typeless scalars, Perl will be able to
give you more performance and safety as you give it more type
information to work
with
. The basic assumption is that homogenous data
structures will be in arrays and hashes, so you can declare the type of
the scalars held in an array or hash. Heterogenous structures can still
be put into typeless arrays and hashes, but in general Perl 6 will
encourage you to
use
classes
for
such data, much as C encourages you to
use
structs rather than arrays
for
such data.
One thing we'll be mentioning
before
we discuss it in detail is the
notion of
"properties."
(In Perl 5, we called these
"attributes,"
but we're reserving that term
for
actual object attributes these days,
so we'll call these things
"properties."
) Variables and
values
can
have additional data associated
with
them that is
"out of band"
with
respect to the ordinary typology of the variable or value. For now,
just think of properties as a way of adding ad hoc attributes to a
class that doesn't support them. You could also think of it as a form
of class derivation at the granularity of the individual object,
without having to declare a complete new class.
[Update: We're now calling compile-
time
properties
"traits"
. And objects
don't really have properties separate from their attributes--this is now
handled
with
a mixin mechanism.]
=head2 RFC 161: Everything in Perl Becomes an Object.
This is essentially a philosophical RFC that is rather short on detail.
Nonetheless, I agree
with
the premise that all Perl objects should act
like objects
if
you choose to treat them that way. If you choose not to
treat them as objects, then Perl will
try
to go along
with
that, too.
(You may
use
hash subscripting and slicing syntax to call attribute
accessors,
for
instance, even
if
the attributes themselves are not
stored in a hash.) Just because Perl 6 is more object-oriented
internally, does not mean you'll be forced to think in object-oriented
terms
when
you don't want to. (By and large, there will be a few places
where OO-think is more required in Perl 6 than in Perl 5. Filehandles
are more object-oriented in Perl 6,
for
instance, and the special
variables that used to be magically associated
with
the currently
selected output handle are better specified by association
with
a
specific filehandle.)
=head2 RFC 038: Standardise Handling Of Abnormal Numbers Like
Infinities and NaNs
This is likely to slow down numeric processing in some locations.
Perhaps it could be turned off
when
desirable. We need to be careful
not to invent something that is guaranteed to run slower than IEEE
floating point. We should also
try
to avoid defining a type
system
that
makes translation of numeric types to Java or C
That being said, standard semantics are a good thing, and should be the
default
behavior.
=head2 RFC 043: Integrate BigInts (and BigRats) Support Tightly With
the Basic Scalars
This RFC suggests that a pragma enables the feature, but I think it
should probably be
tied
to the run-
time
type
system
, which means it's
driven more by how the data is created than by where it happens to be
stored or processed. I don't see how we can make it a pragma, except
perhaps to influence the meaning of
"int"
and
"num"
in actual
declarations further on in the lexical scope:
my
int
$i
;
might really mean
my
bigint
$i
;
or maybe just
my
int
$i
is bigint;
since representation specifications might just be considered part of
the
"fine print."
But the whole subject of lexically scoped variable
properties specifying the nature of the objects they contain is a bit
problematic. A variable is a
sort
of mini-interface, a contract
if
you
will, between the program and the object in question. Properties that
merely influence how the program sees the object are not a problem --
when
you declare a variable to be constant, you're promising not to
modify the object through that variable, rather than saying something
intrinsically true about the object. (Not that there aren't objects
that are intrinsically constant.)
Other property declarations might need to have some
say
in how
constructors are called in order to guarantee consistency between the
variable's view of the object, and the nature of the object itself. In
the worst case we could
try
to enforce consistency at run
time
, but
that's apt to be slow. If every assignment of a C<Dog> object to a
C<Mammal> variable
has
to check to see whether C<Dog> is a C<Mammal>,
then the assignment is going to be a dog.
So we
'll have to revisit this when we'
re defining the relationship
between variable declarations and constructors. In any event,
if
we
don
't make Perl'
s numeric types automatically promote to big
representations, we should at least make it easy to specify it
when
you
I<want> that to happen.
[Update: The C<Int> type automatically upgrades to arbitrary precision
internally. The C<
int
> type does not.]
=head2 RFC 192: Undef Values ne Value
I
've rejected this one, because I think something that'
s undefined
should be considered just that, undefined. I think the standard
semantics are useful
for
catching many kinds of errors.
That being said, it'll hopefully be easy to modify the standard
operators within a particular scope, so I don't think we need to think
that
our
way to think is the only way to think, I think.
=head2 RFC 212: Make C<
length
(
@array
)> Work
Here's an oddity, an RFC that the author retracted, but that I
accept
,
more or less. I think C<
length
(
@array
)> should be equivalent to
C<
@array
.
length
()>, so
if
there's a C<
length
> method available, it
should be called.
The question is whether there should be a C<
length
> method at all,
for
strings or arrays. It almost makes more sense
for
arrays than it does
for
strings these days, because
when
you talk about the
length
of a
string, you need to know whether you're talking about byte
length
or
character
length
. So we may
split
up the traditional
length
function
into two, in which case we might end up
with
:
$foo
.chars
$foo
.bytes
@foo
.elems
Or some such. Whatever the method names we choose, differentiating them
would be more powerful in supplying context. For instance, one could
envision calling C<
@foo
.bytes> to
return
the byte
length
of all the
strings. That wouldn't fly
if
we overloaded the method name.
Even C<chars(
$foo
)> might not be sufficiently precise, since, depending
on how you're processing Unicode, you might want to know how long the
string is in actual characters, not counting combining characters that
don
't take extra space. But that'
s a topic
for
later.
[Update: There is
no
C<
length
> function. There are C<bytes>,
C<codes>, C<graphs>, and C<langs> methods
for
the various Unicode
support levels. (The C<chars> method returns one of those
values
depending on the current Unicode support level.) Arrays and hashes
report number of elements
with
the C<elems> method.]
=head2 RFC 218: C<
my
Dog
$spot
> Is Just an Assertion
I expect that a declaration of the form:
my
Dog
$spot
;
is merely an assertion that you will not
use
C<
$spot
> inconsistently
with
it being a C<Dog>. (But I mean something different by
"assertion"
than this RFC does.) This assertion may or may not be
tested at every assignment to C<
$spot
>, depending on pragmatic context.
This bare declaration does not call a constructor; however, there may
be forms of declaration that
do
. This may be necessary so that the
variable and the object can pass properties back and forth, and in
general, make sure they're consistent
with
each
other. For example, you
might declare an array
with
a multidimensional shape, and this shape
property needs to be visible to the constructor,
if
we don't want to
have to specify it redundantly.
On the other hand, we might be able to get assignment sufficiently
overloaded to accomplish the same goal, so I'm deferring judgment on
that. All I'm deciding here is that a bare declaration without
arguments as above does not invoke a constructor, but merely tells the
compiler something.
[Update: The constructor may be called using the C<.=new()> construct.]
=head2 Other Decisions About Types
Built-in object types will be in all uppercase: C<INTEGER>, C<NUMBER>,
C<STRING>, C<REF>, C<SCALAR>, C<ARRAY>, C<HASH>, C<REGEX> and C<CODE>.
Corresponding to at least some of these, there will also be lowercase
intrinsic types, such as C<
int
>, C<num>, C<str> and C<
ref
>. Use of the
lowercase typename implies you aren't intending to
do
anything fancy
OO-wise
with
the
values
, or store any run-
time
properties, and thus
Perl should feel free to store them compactly. (As a limiting case,
objects of type C<bit> can be stored in one bit.) This distinction
corresponds roughly to the boxed/unboxed distinction of other computer
languages, but it is likely that Perl 6 will attempt to erase the
distinction
for
you to the extent possible. So,
for
instance, an C<
int
>
may still be used in a string context, and Perl will convert it
for
you, but it won't cache it, so the
next
time
you
use
it as a string, it
will have to convert again.
[Update: The object types are
no
longer all caps, but C<Int>, C<Num>,
C<Str>, etc.]
The declared type of an array or hash specifies the type of
each
element, not the type of an array or hash as a whole. This is justified
by the notion that an array or hash is really just a strange kind of
function that (typically) takes a subscript as an argument and returns
a value of a particular type. If you wish to associate a type
with
the
array or hash as a whole, that involves setting a C<
tie
> property. If
you find yourself wishing to declare different types on different
elements, it probably means that you should either be using a class
for
the whole heterogenous thing, or at least declare the type of array or
hash that will be a base class of all the objects it will contain.
Of course, untyped arrays and hashes will be just as acceptable as they
are currently. But a language can only run so fast
when
you force it to
defer all type checking and method lookup till run
time
.
The intent is to make
use
of type information where it's useful, and
not
require
it where it's not. Besides performance and safety, one
other place where type information is useful is in writing interfaces
to other languages. It is postulated that Perl 6 will provide enough
optional type declaration syntax that it will be unnecessary to
write
XS-style glue in most cases.
[Update: Turns out one of the most important reasons
for
adding type
information is that it allows
for
multimethod dispatch.]
=head1 Variables
=head2 RFC 071: Legacy Perl
$pkg
'var Should Die
I agree. I was unduly influenced by Ada syntax here, and it was a
mistake. And although we're adding a properties feature into Perl 6
that is much like Ada
's attribute feature, we won'
t make the mistake of
reintroducing a syntax that drives highlighting editors nuts. We'll
try
to make different mistakes this
time
.
=head2 RFC 009: Highlander Variable Types
I basically agree
with
the problem this RFC is trying to solve, but I
disagree
with
the proposed solution. The basic problem is that,
while
the idiomatic association of C<
$foo
[
$bar
]>
with
C<
@foo
> rather than
C<
$foo
> worked fine in Perl 4,
when
we added recursive data structures
to Perl 5, it started getting in the way notationally, so that initial
funny character was trying to
do
too much in both introducing the
"root"
of the reference, as well as the context to apply to the final
subscript. This necessitated odd looking constructions like:
$foo
->[1][2][3]
This RFC proposes to solve the dilemma by unifying
scalar
variables
with
arrays and hashes at the name level. But I think people like to
think of C<
$foo
>, C<
@foo
> and C<
%foo
> as separate variables, so I don't
want to break that. Plus, the RFC doesn
't unify C<&foo>, while it'
s
perfectly possible to have a reference to a function as well as a
reference to the more ordinary data structures.
So rather than unifying the names, I believe all we have to
do
is unify
the treatment of variables
with
respect to references. That is, all
variables may be thought of as references, not just scalars. And in
that case, subscripts always dereference the reference implicit in the
array or hash named on the left.
This
has
two major implications, however. It means that Perl
programmers must learn to
write
C<
@foo
[1]> where they used to
write
C<
$foo
[1]>. I think most Perl 5 people will be able to get used to
this, since many of them found the current syntax a bit weird in the
first place.
The second implication is that slicing needs a new notation, because
subscripts
no
longer have their
scalar
/list context controlled by the
initial funny character. Instead, the context of the subscript will
need to be controlled by some combination of:
=over
=item 1. Context of the entire term.
=item 2. Appearance of known list operators in the subscript, such as
comma or range.
=item 3. Explicit syntax casting the inside of the subscript to list or
scalar
context.
=item 4. Explicit declaration of
default
behavior.
=back
One thing that probably shouldn't enter into it is the run-
time
type of
the array object, because context really needs to be calculated at
compile
time
if
at all possible.
In any event, it's likely that some people will want subscripts to
default
to scalars, and other people will want them to
default
to
lists. There are good arguments
for
either
default
, depending on
whether you think more like an APL programmer or a mere mortal.
[Update: Rvalue subscripts are always list context, but it's trivial to
force
scalar
context
with
either of the C<+> or C<~> unary operators.
Lvalue subscripts are
scalar
context
unless
the lvalue is in parentheses.]
There are other larger implications. If composite variables are thought
of as
scalar
references, then the names C<
@foo
> and C<
%foo
> are really
scalar
variables
unless
explicitly dereferenced. That means that
when
you mention them in a
scalar
context, you get the equivalent of Perl
5's C<\
@foo
> and C<\
%foo
>. This simplifies the prototyping
system
greatly, in that an operator like C<
push
>
no
longer needs to specify
some kind of special reference context
for
its first argument -- it can
merely specify a
scalar
context, and that's good enough to assume the
reference generation on its first argument. (Of course, the function
signature can always be more specific
if
it wants to. More about that
in future installments.)
There are also implications
for
the assignment operator, in that it
has
to be possible to assign array references to array variables without
accidentally invoking list context and copying the list instead of the
reference to the list. We could invent another assignment operator to
distinguish the two cases, but at the moment it looks as though bare
variables and slices will behave as lvalues just as they
do
in Perl 5,
while
lists in parentheses will change to a binding of the right-hand
arguments more closely resembling the way Perl 6 will
bind
formal
arguments to actual arguments
for
function calls. That is to
say
,
@foo
= (1,2,3);
will supply an unbounded list context to the right side, but
(
@foo
,
@bar
) = (
@bar
,
@foo
)
will supply a context to the right side that requests two
scalar
values
that are array references. This will be the
default
for
unmarked
variables in an lvalue list, but there will be an easy way to mark
formal array and hash parameters to slurp the rest of the arguments
with
list context, as they
do
by
default
in Perl 5.
(Alternately, we might end up leaving the ordinary list assignment
operator
with
Perl 5 semantics, and define a new assignment operator
such as C<:=> that does signatured assignment. I can argue that one
both ways.)
[Update: We ended up
with
a C<:=> binding operator.]
Just as arrays and hashes are explicitly dereferenced via subscripting
(or implicitly dereferenced in list context), so too functions are
merely named but not called by C<
&foo
>, and explicitly dereferenced
with
parentheses (or by
use
as a bare name without the ampersand (or
both)). The Perl 5 meanings of the ampersand are
no
longer in effect,
in that ampersand will
no
longer imply that signature matching is
suppressed -- there will be a different mechanism
for
that. And since
C<
&foo
> without parens doesn't
do
a call, it is
no
longer possible to
use
that syntax to automatically pass the C<
@_
> array -- you'll have to
do
that explicitly now
with
C<foo(
@_
)>.
Scalar variables are special, in that they may hold either references
or actual
"native"
values
, and there is
no
special dereference syntax
as there is
for
other types. Perl 6 will attempt to hide the
distinction as much as possible. That is,
if
C<
$foo
> contains a native
integer, calling the C<
$foo
.bar> method will call a method on the
built-in type. But
if
C<
$foo
> contains a reference to some other
object, it will call the method on that object. This is consistent
with
the way we think about overloading in Perl 5, so you shouldn't find
this behavior surprising. It may take special syntax to get at any
methods of the reference variable itself in this case, but it's OK
if
special cases are special.
[Update: The C<variable(
$foo
)> pseudo-function allows you to specify the
container rather than the contained object.]
=head2 RFC 133: Alternate Syntax
for
Variable Names
This RFC
has
a valid point, but in fact we're going to
do
just the
opposite of what it suggests. That is, we'll consider the funny
characters to be part of the name, and
use
the subscripts
for
context.
This works out better, because there's only one funny character, but
many possible forms of dereferencing.
[Update: Nowadays we call those funny characters I<sigils>. And
for
weirdly
scoped variables there's a second character called a I<twigil>.]
=head2 RFC 134: Alternative Array and Hash Slicing
We
're definitely killing Perl 5'
s slice syntax, at least as far as
relying on the initial character to determine the context of the
subscript. There are many ways we could reintroduce a slicing syntax,
some of which are mentioned in this RFC, but we'll defer the decision
on that till Apocalypse 9 on Data Structures, since the interesting
parts of designing slice syntax will be driven by the need to slice
multidimensional arrays.
[Update: There is
no
Apocalypse 9, but there is a Synopsis 9 that
covers these matters.]
For now we'll just
say
that arrays can have subscript signatures much
like functions have parameter signatures. Ordinary one-dimensional
arrays (and hashes) can then support some kind of simple slicing syntax
that can be extended
for
more complicated arrays,
while
allowing
multidimensional arrays to distinguish between simple slicing and
complicated mappings of lists and functions onto subscripts in a manner
more conducive to numerical programming.
On the subject of hash slices returning pairs rather than
values
, we
could distinguish this
with
special slice syntax, or we could establish
the notion of a hashlist context that tells the slice to
return
pairs
rather than just
values
. (We may not need a special slice syntax
for
that
if
it's possible to typecast back and forth between pair lists and
ordinary lists.)
[Update: Slicing to get a pairlist can be done by attaching a C<:p>
modifier to the subscript. In general though there's
no
such thing
as a hashlist context. It's just that the list context supplied by
assignment to a hash happens to know how to deal
with
pairs.]
=head2 RFC 196: More Direct Syntax
for
Hashes
This RFC makes three proposals, which we'll consider separately.
Proposal 1 is "that a hash in
scalar
context evaluate to the number of
keys
in the hash." (You can find that out now, but only by using the
C<
keys
()> function in
scalar
context.) Proposal 1 is OK
if
we change
"scalar context"
to
"numeric context,"
since in
scalar
context a
hash will produce a reference to the hash, which just happens to numify
to the number of entries.
We must also realize that some implementations of hash might have to go
through and count all the entries to
return
the actual number.
Fortunately, in boolean context, it suffices to find a single entry to
determine whether the hash contains anything. However, on hashes that
don't keep track of the number of entries, finding even one entry might
reset
any active iterator on the hash, since some implementations of
hash (in particular, the ones that don't keep track of the number of
entries) may only supply a single iterator.
[Update: You may also call C<.elems> to be more explicit.]
Proposal 2 is "that the iterator in a hash be
reset
through an
explicit call to the C<
reset
()> function." That's fine,
with
the
proviso that it won't be a function, but rather a I<method> on the HASH
class.
[Update: all list contexts in Perl 6 are lazy by
default
, and different
list contexts generate their own iterators, so all you have to
do
to
"reset"
and iterator is stop reading from the list in question.]
Proposal 3 is really about C<
sort
> recognizing pairs and doing the
right thing. Defaulting to sorting on C<$^a[0] cmp $^b[0]> is likely to
be reasonable, and that
's where a pair'
s key would be found. However,
it's probable that the correct solution is simply to provide a
default
string method
for
anonymous lists that happens to produce a decent key
to
sort
on
when
C<cmp> requests a string representation of either of
its arguments. The C<
sort
> itself should probably just concentrate on
memoizing the returned strings so they don't have to be recalculated.
[Update: The C<
sort
> interface
has
been completely revamped since this
was written. This will eventually appear in S29, but as of now it's
just in the perl6-language archives.]
=head2 RFC 201: Hash Slicing
This RFC proposes to
use
C<%> as a marker
for
special hash slicing in
the subscript. Unfortunately, the C<%> funny character will not be
available
for
this
use
, since all hash refs will start
with
C<%>.
Concise list comprehensions will
require
some other syntax within the
subscript, which will hopefully generalize to arrays as well.
=head2 Other Decisions About Variables
Various special punctuation variables are gone in Perl 6, including all
the deprecated ones. (Non-deprecated variables will be replaced by some
kind of similar functionality that is likely to be invoked through some
kind of method call on the appropriate object. If there is
no
appropriate object, then a named global variable might provide similar
functionality.)
Freeing up the various bracketing characters allows us to
use
them
for
other purposes, such as interpolation of expressions:
"$(expr)"
"@(expr)"
[Update: Those forms mean something
else
now (casting). Expression
interpolation is normally done via closure.]
C<
$#foo
> is gone. If you want the final subscript of an array, and
C<[-1]> isn't good enough,
use
C<
@foo
.end> instead.
Other special variables (such as the regex variables) will change from
dynamic scoping to lexical scoping. It is likely that even C<
$_
> and
C<
@_
> will be lexically scoped in Perl 6.
[Update: And indeed they are. But they happen to be a special kind of
lexical variable called an
"environment"
variable, modeled on Unix
environment variables. This allows subroutines to get at them and
use
them as defaults, in a pronominal
sort
of way.]
=head1 Names
In Perl 5, lexical scopes are unnamed and unnameable. In Perl 6, the
current lexical scope will have a name that is visible within the
lexical scope as the pseudo class C<MY>, so that such a scope can,
if
it so chooses, delegate management of its lexical scope to some other
module at compile
time
. In normal terms, that means that
when
you
use
a
module, you can let it
import
things lexically as well as packagely.
[Update: The currently compiling lexical scope may also be named from anywhere
as the C<COMPILING> pseudopackage these days.]
Typeglobs are gone. Instead, you can get at a variable object through
the symbol table hashes that are structured much like Perl 5's. The
variable object
for
C<
$MyPackage::foo
> is stored in:
%MyPackage::
{
'$foo'
}
Note that the funny character is part of the name. There is
no
longer
any structure in Perl that associates everything
with
the name
"C<foo>"
.
[Update: The right way to
say
that now is
"C<< MyPackage::<$foo> >>"
.
Hence the C<
$foo
> variable in the scope currently being compiled is
known as C<< COMPILING::<
$foo
> >>.]
Perl's special global names are stored in a special
package
named
"C<*>"
because they're logically in every scope that does not hide
them. So the unambiguous name of the standard input filehandle is
C<$
*STDIN
>, but a
package
may just refer to C<
$STDIN
>, and it will
default
to C<$
*STDIN
>
if
no
package
or lexical variable of that name
has
been declared.
[Update: We did s/STD// on those, so standard input is now just C<$
*IN
>.]
Some of these special variables may actually be cloned
for
each
lexical
scope or
each
thread, so just because a name is in the special global
symbol table doesn't mean it always behaves as a global across all
modules. In particular, changes to the symbol table that affect how the
parser works must be lexically scoped. Just because I install a special
rule
for
my
cool new hyperquoting construct doesn't mean everyone
else
should have to put up
with
it. In the limiting case, just because I
install a Python parser, it shouldn't force other modules into a maze
of twisty little whitespace, all alike.
Another way to look at it is that all names in the
"C<*>"
package
are
automatically exported to every
package
and/or outer lexical scope.
[Update: The names are
no
longer automatically exported, but you can
import
them from the global namespace via
"C<use GLOBALS '$IN', '$OUT';>"
and such.]
=head1 Literals
=head2 Underscores in Numeric Literals
Underscores will be allowed between any two digits within a number.
=head2 RFC 105: Remove
"In string @ must be \@"
Fatal Error
Fine.
[Update: The interpolation rules
for
arrays have been completely revised.
A bare array name
no
longer interpolates--you have to
say
C<
@foo
[]>.]
=head2 RFC 111: Here Docs Terminators (Was Whitespace and Here Docs)
Fine.
=head2 RFC 162: Heredoc contents
I think I like option (e) the best: remove whitespace equivalent to the
terminator.
By
default
,
if
it
has
to dwim, it should dwim assuming that hard tabs
are 8 spaces wide. This should not generally pose a problem, since most
of the
time
the tabbing will be consistent throughout anyway, and
no
dwimming will be necessary. This puts the onus on people using
nonstandard tabs to make sure they
're consistent so that Perl doesn'
t
have to guess.
Any additional mangling can easily be accomplished by a user-
defined
operator.
[Update: Here docs are now just a :to variant on extensible quotes, so
any customization you can
do
to C<
q/foo/
> you can also
do
to C<
q:to/END/>.
=head2 RFC 139:
Allow Calling Any Function With a Syntax Like s///
Creative quoting will be allowed
with
lexical mutataion, but we can't
parse C<foo(bar)> two different ways simultaneously, and I'm unwilling
to prevent people from using parens as quote characters. I don't see
how we can reasonably have new quote operators without explicit
declaration. And
if
the utility of a quote-like operator is sufficient,
there should be little relative burden in requiring such a declaration.
The form of such a declaration is left to the reader as an exercise in
function property definition. We may revisit the question later in this
series. It's also possible that a quote operator such as C<
qx//
> could
have a corresponding function name like C<quote:
qx> that could be
invoked as a function.
=head2 RFC 222: Interpolation of Object Method Calls
I've been hankering for methods to interpolate for a long time, so I'm
in favor of this RFC. And it'll become doubly important as we move
toward encouraging people to use accessor methods to refer to object
attributes outside the class itself.
I have one "but," however. Since we'll switch to using C&
lt;.> instead
of C<< -> >>, I think
for
sanity's sake we may have to
require
the
parentheses, or
"C<$file.$ext>"
is going to give people fits. Not to
mention
"C<$file.ext>"
.
[Update: Nowadays we also
require
brackets on array interpolations
and braces on hash interpolations. See S03
for
more.]
=head2 RFC 226: Selective Interpolation in Single Quotish Context.
This proposal
has
much going
for
it, but there are also difficulties,
and I've come
close
to rejecting it outright simply because the
single-quoting policy of Perl 5
has
been successful. And I think the
proposal in this RFC
for
C<\I>...C<\E> is ugly. (And I'd like to
kill
However, I think there is a major
"can't get there from here"
that we
could solve by treating interpolation into single quotes as something
hard, not something easy. The basic problem is that it's too easy to
run into a C<\$> or C<\@> (or a C<\I>
for
that matter) that wants to be
taken literally. I think we could allow the interpolation of arbitrary
expressions into single-quoted strings, but only
if
we limit it to an
unlikely sequence where three or more characters are necessary
for
recognition. The most efficient mental model would seem to be the idea
of embedding one kind of quote in another, so I think this:
\
q{stuff}
will embed single-quoted stuff,
while
this:
\
qq{stuff}
will embed double-quoted stuff. A variable could then be interpolated
into a single-quoted string by saying:
\
qq{$foo}
=head2 RFC 237: Hashes Should Interpolate in Double-Quoted Strings
I agree
with
this RFC in principle, but we can't define the
default
hash stringifier in terms of variables that are going away in Perl 6,
so the RFC's proposal of using C<$"> is right out.
All objects should have a method by which they produce readable output.
How this may be overridden by user preference is
open
to debate.
Certainly, dynamic scoping
has
its problems. But lexical
override
of an
object's preferences is also problematic. Individual object properties
appear to give a decent way out of this. More on that below.
[Update: Hash
values
by
default
interpolate
with
tabs between key
and value, and
with
newline between pairs. But you can give it a specific
format
with
the C<.as> method.]
On C<
printf
> formats, I don
't see any way to dwim that C<%d> isn'
t an
array, so we'll just have to put formats into single quotes in general.
Those
format
strings that also interpolate variables will be able to
use
the new C<\
qq{$var}
> feature.
[Update: Since hash interpolations
require
braces now, C<
printf
> formats
are safe again (
unless
they happen to be followed by curlies).]
Note
for
those who are thinking we should just stick
with
Perl 5
interpolation rules: We have to allow C<%> to introduce interpolation
now because individual hash
values
are
no
longer named
with
C<
$foo
{
$bar
}>, but rather C<
%foo
{
$bar
}>. So we might as well allow
interpolation of complete hashes.
=head2 RFC 251: Interpolation of Class Method Calls
Class method calls are relatively rare (except
for
constructors, which
will be rarely interpolated). So rather than scanning
for
identifiers
that might introduce a class, I think we should just depend on
expression interpolation instead:
"There are $(Dog.numdogs) dogs."
[Update: That's now done
with
closure interpolation.]
=head2 RFC 252: Interpolation of Subroutines
I think subroutines should interpolate, provided they're introduced
with
the funny character. (On the other hand, how hard is
C<$(sunset
$date
)> or C<@(sunset
$date
)>? On the gripping hand, I
like the consistency of C<&>
with
C<$>, C<@> and C<%>.)
I think the parens are required, since in Perl 6,
scalar
C<
&sub
> will
just
return
a reference, and
require
parens
if
you really want to deref
the
sub
ref
. (It's true that a subroutine can be called without parens
when
used as a list operator, but you can't interpolate those without a
funny character.)
For those worried about the
use
of C<&>
for
signature checking
suppression, we should point out that C<&> will
no
longer be the way to
suppress signature checking in Perl 6, so it doesn't matter.
=head2 RFC 327: C<\v>
for
Vertical Tab
I think the opportunity cost of not reserving C<\v>
for
future
use
is
too high to justify the small utility of retaining compatibility
with
a
feature virtually nobody uses anymore. For instance, I almost used
C<\v> and C<\V>
for
switching into and out of verbatim (single-quote)
mode,
until
I decided to unify that
with
quoting syntax and
use
C<\
qq{}
> and C<\
q{}
> instead.
[Update: Turns out that C<\v> matches vertical whitespace in patterns,
which conveniently includes vertical tab--whatever that is... Also we
now have C<\h>
for
horizontal whitespace.]
=head2 RFC 328: Single quotes don't interpolate \' and \\
I think hyperquotes will be possible
with
a declaration of your quoting
rules, so we're not going to change the basic single-quote rules
(except
for
supporting C<\
q>).
[Update: There are adverbial modifiers now that can do hyperquoting. See S02.]
=head2 Other Decisions About Literals
=head3 Scoping of \L et al.
I'd like to get rid of the gratuitously ugly C&
lt;\E> as an end-of-scope
marker. Instead,
if
any sequence such as C<\L>, C<\U> or C<\Q> wishes
to impose a scope, then it must
use
curlies
around
that scope:
C<\L{I<stuff>}>, C<\U{I<stuff>}> or C<\Q{I<stuff>}>. Any literal
curlies contained in I<stuff> must be backslashed. (Curlies as syntax
(such as
for
subscripts) should nest correctly.)
[Update: Those constructs are now gone entirely. Use closure interpolation
to interpolate the value of an expression.]
=head3 Bareword Policy
There will be
no
barewords in Perl 6. Any bare name that is a declared
package
name will be interpreted as a class object that happens to
stringify to the
package
name. All other bare names will be interpreted
as subroutine or method calls. For nonstrict applications, undefined
subroutines will autodefine themselves to
return
their own name. Note
that in C<${name}> and friends, the name is considered autoquoted, not
a bareword.
[Update: The C<${name}> construct is gone. Use closure interpolation
to disambiguate expression interpolations: C<
"{$name}text"
>.
Use C<$(
$ref
)> or C<
$$ref
>
for
hard dereferences.
Use C<$::(
$name
)>
for
symbolic dereferences.]
=head3 Weird brackets
Use of brackets to disambiguate
"${foo[bar]}"
from
"${foo}[bar]"
will
no
longer be supported. Instead, the expression parser will always
grab as much as it can, and you can make it quit at a particular point
by interpolating a null string, specified by C<\Q>:
"$foo\Q[bar]"
[Update: That's gone too. Just
use
closure interpolation to disambiguate.]
=head3 Special tokens
Special tokens will turn into either POD directives or lexically scoped
OO methods under the C<MY> pseudo-
package
:
Old New
--- ---
__LINE__ MY.line
__FILE__ MY.file
__PACKAGE__ MY.
package