TITLE
A high-level overview of the perl system
VERSION
CURRENT
Maintainer: Dan Sugalski
Class: Meta
PDD Number: 1
Version: 1
Status: Developing
Last Modified: 02 January 2001
PDD Format: 1
Language: English
HISTORY
None--this is the first version
CHANGES
None. (Yet...)
ABSTRACT
This PDD provides a high-level overview of the perl system.
DESCRIPTION
Major components
The perl system generally looks like this:
+----------------------------------------------------+
| Embedding App |
+----------+------------+-------------+--------------+
| | | | |
| parser <-> compiler <-> optimizer <-> interpreter |
| | | | |
+----------+------------+-------------+--------------+
| Extensions to perl |
+----------------------------------------------------+
- Parser
-
The parser takes source code of some sort (presumably perl source, but we're not picky--if you want to write a parser module that takes C, Python, or klingon that's OK with us) and creates a syntax tree of that source.
The parser module is designed to be extended both with perl and compiled languages, and much of the parser is written in perl. (This is the plan, at least) Generally there will be one parser, though there's no reason that there can't be multiple independent parsers.
- Bytecode compiler
-
The bytecode compiler module takes a syntax tree from the parser and emits an unoptimized stream of bytecode. This code is suitable for passing straight to the interpreter, though it is probably not going to be very fast.
- Optimizer
-
The optimizer module takes the bytecode stream from the compiler and optionally the syntax tree the bytecode was generated from, and optimizes the bytecode.
- Interpreter
-
The interpreter module takes the bytecode stream from either the optimizer or the bytecode compiler and executes it. There must always be at least one interpreter module available for any program that can handle all of perl, since it's required for use statements and BEGIN blocks.
While there must be at least one interpreter, there may be multiple interpreter modules linked into an executable. This would be the case, for example, for programs that produced Java bytecode, where one of the interpreter modules would take the bytecode stream and spit out java bytecode instead of interpreting it.
Independent subsystems
Perl also has a number of subsystems that are independent of any single module.
- PerlIO subsystem
-
The PerlIO subsystem provides source- and platform-independent asynchronous I/O to perl. With this, perl 6 is independent of C's stdio system. (And good riddance--it sucks) How this maps to an OS' underlying I/O code is not generally perl's concern, and a platform isn't obligated to provide asynchronous I/O.
Additionally, the PerlIO subsystem allows a program to push filters onto an input stream if necessary, to manipulate the data before it is presented to a perl program.
- Regex engine
-
The regular expression engine's somewhat decoupled from the guts of perl. Its job is to turn regexes into objects, and apply those regex objects to strings.
API levels
- Embedding
-
The embedding API is the set of calls exported to the embedding application. This is a small, simple set of calls, requiring minimum effort to use.
The goal is to provide an interface that a competent programmer who is uninterested in perl can use to provide access to a perl interpreter within another application with very little programming or intellectual effort. Generally it should take less than thirty minutes for a simple interface, though more complete integration will take longer.
Backwards binary compatibility at this level is guaranteed across the life of perl 6.
- Extensions
-
The extension API is the set of calls exported to perl extensions. They provide access to most of the things an exension needs to do, while hiding the implementation details. (So that, for example, we can change the way scalars are stored without having to rewrite, or even recompile, an extension)
Binary compatibility is a serious goal, though it may be broken if absolutely necessary.
- Guts
-
The guts-level APIs are the routines used within a component. These aren't guaranteed to be stable, and shouldn't be used outside a component. (For example, an extension to the interpreter shouldn't call any of the parser's internal routines)
No binary compatibility is guaranteed, and routines here may be changed without notice.
VARIATIONS ON A THEME
One of the explicit goals of perl 6 is to generate Java bytecode and .NET code, as well as to run on small devices such as the Palm. The modular nature of perl 6 makes this reasonably straightforward.
- Perl for small platforms
-
For small platforms, the parser, compiler, and optimizer modules are replaced with a small bytecode loader module which reads in perl bytecode and passes it to the interpreter for execution. No string eval, do, use, or require is available, though loading of precompiled modules via do, use, or require may be supported.
- Bytecode compilation
-
One straightforward use of modular perl is to precompile perl source into bytecode and save it for later use. This is easily done by having a second interpreter module. The standard perl interpreter is used during compilation to evaluate BEGIN blocks and such-like things, but a simple freeze-to-disk module is used when mainline execution begins. Then, rather than executing the bytecode, it gets frozen to disk for later loading.
- Perl in, Java (or whatever) out
-
This is a variant of the bytecode compilation. Instead of freezing the bytecode to disk, it's instead translated to something else. That something could be Java bytecode or .NET code, or an executable of some sort. Perl could also be a front end to other modular compilers such as gcc or Compaq's GEM compiler system.
- Standalone pieces
-
Each piece of perl can, with enough support hidden away (in the form of an interpreter for the parsing module, for example), stand on its own. This means it's feasible to have separate executables that parse perl to a syntax tree, turn a syntax tree into bytecode, optimize the bytecode, and execute the bytecode.
This allows us to develop pieces independently--the first version of the parser, for example, can be written mainly in perl 5 using an embedded interpreter. It also means we can have a standalone optimizer which can spend a lot of time groveling over bytecode, far more than you might want to devote to optimizing one-liners or code that'll run only once or twice.
- The perl assembler
-
The parser and bytecode compiler can be replaced with a unit that will eat a textual representation of the bytecode--essentially a perl assembler. This can be useful in a number of ways, allowing programs to emit perl bytecode without having to know the gory details of the binary interface, or in fact having perl immediately available at all. (It also means we can cobble up real perl programs without having a full parser built yet, though that's more an issue of initial implementation than anything else)