Frozen Perl 2010

The Perl Compiler

rurban - Reini Urban <br> Graz, Austria

What's new?

Fixed most bugs (in work) <br> bytecode: 12=>0, c: 6=>1, cc: 9=>5
5.10 and 5.12, non-threaded favored (faster)
.plc platform compatible, almost version compatible (.plc header change)
added testsuite
more and better optimisations (in work)
removed B::Stash bloat from perlcc, -stash [optional]

Who am I

rurban maintains cygwin perl since 5.8.8 and 3-4 modules, guts, B::* => 5.10

Mostly doing LISP Perl and PHP, and support for custom HW, windows + linux + real-time systems in real-life. Coding in winter, surfing in summer.

1995 first on CPAN with the perl5.hlp file and converter for Windows.

Contents

Started 1995 by Malcom Beattie, abandoned 2007 by p5p, revived 2008 by me

Very dynamic language. eval "require $foo;" -> which packages?

Overview
Status
Plans

Why use B::C / perlcc?

Improved startup time, esp. significant with larger code.
Reduced memory usage. <br><small> 9% less memory w/ 25000 lines</small>
Distribute binary only versions
No need to ship an entire perl install
Self contained application
But you could also use a "Packager", like perl2exe, perlapp, PAR <br> <small>They are no compilers, slower startup </small>
And with B::CC - Improve run-time

Overview

In the Perl Compiler suite B::C are three seperate compilers:

B::Bytecode / ByteLoader (freeze/thaw to .plc + .pmc)
B::C (freeze/thaw to .c)
B::CC (optimising to .c)

perl toke.c/op.c - B::C - perl op walker run.c

Eliminate the whole parsing and dynamic allocation time.

The Walker

After compilation walk the "op tree" - run.c

The Walker

Observation

1. The op tree is not a "tree", it is reduced to a simple linked list of ops. Every "op" (a pp_&lt;opname&gt; function) returns the next op.

2. PERL_ASYNC_CHECK is called after every single op.

Perl Phases - the "Perl Compiler"

=> Parse + Compile to op tree (in three phases, see perlguts and perloptree) <br>
BEGIN (use ...)
CHECK (O modules)
INIT (main phase)
END (cleanup, perl destructors)

Normal Perl functions start at INIT, after BEGIN and CHECK. <br> The O modules start at CHECK, and skip INIT.

Perl Phases - the "B Compilers"

Parse + Compile to op tree (in three phases)
BEGIN (use ...)
=> CHECK (O) => freeze
compiled INIT (main phase)
compiled END (cleanup, perl destructors)

Perl Phases - the "B Compilers"

The B Compilers, invoked via O, freeze the state in CHECK, and invoke then the walker.

$ perl -MO=C,-omyprog.c myprog.pl <br>
$ cc_harness -o myprog myprog.c <br>
$ ./myprog

B::CC - Unoptimised / the walker

B::CC - The optimiser / unrolled

</font>

B::CC - The optimiser / unrolled

<br><br><br>

no CALL_FPTR - call by ref
static direct function call
prefetched into CPU cache!
no unneeded stack handling
PERL_ASYNC_CHECK only after every basic block

Status

5.6.2 and 5.8.9 non-threaded B::C are quite usable and have the least known bugs, but 5.10 and 5.12 became also pretty stable now.

Targets:

Bugfixes for B::C
Test top100 CPAN modules (3-4 fail)
Isolate bugs into simple tests (35 cases)
Test the perl cores suite (~20 fails) <br> Estimated 3-4 more open bugs.

Status

5.6.2 + 5.8.9 are almost bug free, with B::Bytecode and B::C
B::C >=5.10 threaded (pads) in work <br> 2-3 minor bugs with certain modules
With debugging perls there seem to be less bugs than with releases. <small>Normally it 's the other way round</small>
B::CC has some limitations and some known bugs

See testsuite and STATUS

Projects

Which software is compiler critical?

Projects

Which software is compiler critical?

Execution time is the same (sans B::CC)

Projects

Which software is compiler critical?

Execution time is the same (sans B::CC)

Startup time is radical faster

Projects

Which software is compiler critical?

Execution time is the same (sans B::CC)

Startup time is radical faster.

Web Apps with fast response times -

1 sec more or less => good or bad software

Projects

Which software is compiler critical?

Execution time is the same (sans B::CC)

Startup time is radical faster.

Web Apps with fast response times -

Optimise static initialization - strings and arrays

New Optimisations

Optimise static initialization - strings and arrays

non-threaded ! +10-20% performance

ltrace reveils Gthr_key_ptr, gv_fetchpv, savepvn, av_extend and safesysmalloc as major culprits, the later three at startup-time.

New Optimisations

Optimise static initialization - strings and arrays

non-threaded ! +10-20% performance

common constant strings with gcc -Os => automatically optimised

New Optimisations

Optimise static initialization - strings and arrays

non-threaded ! +10-20% performance

common constant strings with gcc -Os => automatically optimised

av_extend - run-time malloc => static arrays ?

New Optimisations

av_extend - run-time malloc => static arrays ?

static arrays are impossible if not Readonly

can not be extended at run-time, need to be realloc'ed into the heap.

New Optimisations

av_extend - run-time malloc => static arrays ?

pre-allocate faster with -fav-init or -O3

at least this is the idea. Same for hashes (nyi).

Real Life Applications

cPanel has used B::C compiled 5.6 for a decade, and wants to switch to 5.8.9 (or later).

cPanel offers web hosting automation software that manages provider data, domains, emails, webspace. A typical large webapp. Perl startup time can be too slow for many AJAX calls which need fast initial response times.

Benchmarks (by cPanel)

Larger code base => more significant startup improvements

18.78x faster startup for large production applications. (~ 70000 lines)
3.52x faster startup on smaller applications. (~8000 lines)
3x faster startup on tiny applications < 1024 lines of code
2x faster startup for very tiny applications
Guessed: 2x-10x faster run-time for CC optimised code, esp. arithmetic.

Benchmarks (by cPanel)

Web Service Daemon <br>

Resident Size (perlcc)  9072 <br>
Resident Size (perl)    9756 <br> <br>

DNS Settings Client <br>

Startup Time (perl)   0.074 <br>
Startup Time (perlcc) 0.021 <br> <br>

HTML Template Processor <br>

Startup Time (perlcc) 0.037 <br>
Startup Time (perl)   0.695 <br>

Plans

2010: Find and fix all remaining bugs

2010: Faster testsuite (Now 8 min - 40min - 2 days)

2011: CC type and sub optimisations

2012: CC unrolling => jit within perl (perl -j)

Emit parrot pir.

B::CC Limitations

run-time ops vs compile-time ...

dynamic range 1..$foo

goto/next/last $label

Undetected modules behind eval "require": <br> use -uModule to enforce scanning these

B::CC Limitations

run-time ops vs compile-time<br> BEGIN blocks only compile-time side-effects.

BEGIN { <br>
&nbsp;&nbsp;    use Package;   # okay <BR>
&nbsp;&nbsp;    chdir "dir";   # not okay. <BR>
&nbsp;&nbsp;                   # only done at compile-time, not at the user<BR>
&nbsp;&nbsp;    print "stuff"; # okay, only at compile-time <BR>
&nbsp;&nbsp;    eval "what";   # hmm; depends <br>
}

Move eval "require Package;" to BEGIN

B::CC Bugs

Custom sort BLOCK is buggy, wrong queue implementation

B::CC Bugs

Custom sort BLOCK is buggy, wrong queue implementation, causing an endless loop

sort { $a <=> $b }  <br>
<small>is optimised away, ok</small><br><br>

sort { $hash{$a} <=> $hash{$b} } <br>
<small>maybe?</small><br><br>

sort { $hash{$a}->{field} <=> $hash{$b}->{field} }  <br>
<small>for sure not</small>

Testsuite

user make test (via cpan):

35x bytecode + c -O0 - O4 + cc -O0 - O2

=> 8 min

Testsuite

author make test:

35x bytecode + c -O0 - O4 + cc -O0 - O2 (8 min)

modules.t top100 (16 min)

+ testcore.t (16 min)

=> ~40 min

Testsuite

author make test 40 min

for 5-10 perls (5.6, 5.8, 5.10, 5.11 / threaded + non-threaded) 4*2=8

on 5 platforms (cygwin, debian, centos, solaris, freebsd)

=> 26 h (8*5*40 = 1600min) = 1-2 days, similar to the gcc testsuite.

Testsuite

top100 modules?

See webpage or svn repo for results for all tested perls / modules

With 5.8 non-threaded 3 fails Attribute::Handlers B::Hooks::EndOfScope YAML MooseX::Types

With blead non-threaded 4 fails Attribute::Handlers File::Temp ExtUtils::Install

unpredictable results: e.g. threaded 5.10 39/98 (cygwin release) vs 3/80 (a test version) fails. Innocent change => fatal consequences.

CC

Sub calls - Opcodes

What can we statically leave out per pp_?

Now: arguments passing, return values for 50% ops

Planned: more + direct xsub calls.

Types - understand declarations

Now: Unroll for known static types pp_opname completely into simple arithmetic.

Known static types at compile-time? User declarations or Devel::TypeCheck

CC - Type declarations

Currently:

my $E<lt>nameE<gt>_i;  IV integer <br>
my $E<lt>nameE<gt>_ir; IV integer in a pseudo register <br>
my $E<lt>nameE<gt>_d;  NV double 

<hr>

Future ideas are type qualifiers such as <br> <code>my (int $foo, double $foo_d); </code>

or attributes such as <br> <code>my ($foo:Cint, $foo:Cintr, $foo:Cdouble);</code>

or MooseX::Types

Code

http://search.cpan.org/dist/B-C/

http://code.google.com/p/perl-compiler/

Planned:

http://compiler.perl.org/

mailto:compiler@perl.org

Questions?