HLLs and Interoperation

Parrot HLL Environment

In the earliest days Parrot was designed to be the single-purpose backend for the Perl 6 language. It quickly blossomed beyond that, and now has a much grander purpose: to host all dynamic languages, and to host them together on a single platform. If we look back through the history of dynamic programming languages, they've had a more difficult time interoperating with each other then compiled languages have because compiled languages operate at the same machine-code level and typically can make use of the same application binary interface (ABI). With the right compiler settings, programs written in Visual Basic can interoperate with programs written in C On some systems anyway, which can call functions written in C++, in Ada, Fortran, Pascal and so on. To try to mix two common dynamic languages, like Perl and Python, or Ruby and PHP, you would need to write some kind of custom "glue" function to try to include an interpreter object from one language as a library for another language, and then write code to try and get the parser for one to interact nicely with the parser for the other. It's a nightmare, frankly, and you don't see it happen too often.

In Parrot, the situation is different because high level languages (HLL) are almost all written with the PCT tools, and are compiled to the same PIR and PBC code. Once compiled into PBC, a library written in any HLL language can be loaded and called by any other HLL Well, any HLL which supports loading libraries. A language can have a syntax to include code snippets from other languages inline in the same file. We can write a binding for a popular library such as opengl or xlib once, and include that library into any language that needs it. Compare this to the current situation where a library like Gtk2 needs to have bindings for every language that wants to use it. In short, Parrot should make interoperation easier for everybody.

This chapter is going to talk about HLLs, the way they operate, and the way they interoperate on Parrot.

HLLs on Parrot

Working with HLLs

Fakecutables

It's possible to turn compilers created with PCT into stand-alone executables that run without the Parrot executable. To do this, the compiler bytecode is linked together with a small driver program in C and the Parrot library, libparrot . These programs have been given a special name by the Parrot development community: fakecutables . They're called fake because the PBC is not converted to native machine code like in a normal binary executable file, but instead is left in original PBC format.

Compiler Objects

The compreg opcode has two forms that are used with HLL compilers. The first form stores an object as a compiler object to be retrieved later, and the second form retrieves a stored compiler object for a given language. The exact type of compiler object stored with compreg can vary for each different language implementation, although most of the languages using PCT will have a common form. If a compiler object is in register $P0, it can be stored using the following compreg syntax:

compreg 'MyCompiler', $P0

There are two built-in compiler objects: One for PIR and one for PASM. These two don't need to be stored first, they can simply be retrieved and used. The PIR and PASM compiler objects are Sub PMCs that take a single string argument and return an array PMC containing a list of all the compiled subroutines from the string. Other compiler objects might be different entirely, and may need to be used in different ways. A common convention is for a compiler to be an object with a compile method. This is done with PCT-based compilers and for languages who use a stateful compiler.

Compiler objects allow programs in Parrot to compile arbitrary code strings at runtime and execute them. This ability, to dynamically compile code that is represented in a string variable at runtime, is of fundamental importance to many modern dynamic languages. Here's an example using the PIR compiler:

$P0 = compreg 'PIR'      # Get the compiler object
$P1 = $P0(code)          # Compile the string variable "code"

The returned value from invoking the compiler object is an array of PMCs that contains the various executable subroutines from the compiled source. Here's a more verbose example of this:

$P0 = compreg 'PIR'
$S0 = << "END_OF_CODE"

  .sub 'hello'
     say 'hello world!'
  .end

  .sub 'goodbye'
     say 'goodbye world!'
  .end

END_OF_CODE

$P1 = $P0($S0)
$P2 = $P1[0]      # The sub "hello"
$P3 = $P1[0]      # The sub "goodbye"

$P2()             # "hello world!"
$P3()             # "goodbye world!"

Here's an example of a Read-Eval-Print-Loop (REPL) in PIR:

.sub main
  $P0 = getstdin
  $P1 = compreg 'PIR'
  
  loop_top:
    $S0 = readline $P0
    $S0 = ".sub '' :anon\n" . $S0
    $S0 = $S0 . "\n.end\n"
    $P2 = $P1($S0)
    $P2()
    
    goto loop_top
.end

The exact list of HLL packages installed on your system may vary. Some language compiler packages will exist as part of the Parrot source code repository, but many will be developed and maintained separately. In any case, these compilers will typically need to be loaded into your program first, before a compiler object for them can be retrieved and used.

HLL Namespaces

Let's take a closer look at namespaces then we have in previous chapters. Namespaces, as we mentioned before can be nested to an arbitrary depth starting with the root namespace. In practice, the root namespace is not used often, and is typically left for use by the Parrot internals. Directly beneth the root namespace are the HLL Namespaces, named after the HLLs that the application software is written in. HLL namespaces are all lower-case, such as "perl6", or "cardinal", or "pynie". By sticking to this convention, multiple HLL compilers can operate on Parrot simultaneously while staying completely oblivious to each other.

HLL Mapping

HLL mapping enables Parrot to use a custom data type for internal operations instead of using the normal built-in types. Mappings can be created with the "hll_map" method of the interpreter PMC.

$P0 = newclass "MyNewClass"         # New type
$P1 = getclass "ResizablePMCArray"  # Built-in type
$P2 = getinterp
$P2.'hll_map'($P1, $P0)

With the mapping in place, anywhere that Parrot would have used a ResizablePMCArray it now uses a MyNewClass object instead. Here's one example of this:

.sub 'MyTestSub'
    .param pmc arglist :slurpy   # A MyNewClass array of args
    .return(arglist)
.end

Interoperability Guidelines

Libraries and APIs

As a thought experiment, imagine a library written in Common Lisp that uses Common Lisp data types. We like this library, so we want to include it in our Ruby project and call the functions from Ruby. Immediately we might think about writing a wrapper to convert parameters from Ruby types into Common Lisp types, and then to convert the Common Lisp return values back into Ruby types. This seems sane, and it would probably even work well. Now, expand this to all the languages on Parrot. We would need wrappers or converters to allow every pair of languages to communicate, which requires N^2 libraries to make it work! As the number of languages hosted on the platform increases, this clearly becomes an untennable solution.

So, what do we do? How do we make very different languages like Common Lisp, Ruby, Scheme, PHP, Perl and Python to interoperate with each other at the data level? There are two ways:

  • VTable methods

    VTable methods are the standard interface for PMC data types, and all PMCs have them. If the PMCs were written properly to satisfy this interface all the necessary information from those PMCs. Operate on the PMCs at the VTable level, and we can safely ignore the implementation details of them.

  • Class Methods

    If a library returns data in a particular format, the library reuser should know, understand, and make use of that format. Classes written in other languages will have a whole set of documented methods to be interfaced with and the reuser of those classes should use those methods. This only works, of course, in HLLs that allow object orientation and classes and methods, so for languages that don't have this the vtable interface should be used instead.

Mixing and Matching Datatypes

Linking and Embedding

Not strictly a topic about HLLs and their interoperation, but it's important for us to also mention another interesting aspect of Parrot: Linking and embedding. We've touched on one related topic above, that of creating the compiler fakecutables. The fakecutables contain a link to libparrot, which contains all the necessary guts of Parrot. When the fakecutable is executed, a small driver program loads the PBC data into libparrot through its API functions. The Parrot executable is just one small example of how Parrot's functionality can be implemented, and we will talk about a few other ways here too.

Embedding Parrot

libparrot is a library that can be statically or dynamically linked to any other executable program that wants to use it. This linking process is known as embedding parrot, and is a great way to interoperate

Creating and Interoperating Interpreters

Parrot's executable, which is the interface which most users are going to be familiar with, uses a single interpreter structure to perform a single execution task. However, this isn't the only supported structural model that Parrot supports. In fact, the interpreter structure is not a singleton, and multiple interpreters can be created by a single program. This allows separate tasks to be run in separate environments, which can be very helpful if we are writing programs and libraries in multiple languages. Interpreters can communicate and share data between each other, and can run independently from others in the same process.

Small, Toy, and Domain-Specific Languages

How many programs are out there with some sort of scripting capability? You can probably name a few off the top of your head with at least some amount of scripting or text-based commands. In developing programs like this, typically it's necessary to write a custom parser for the input commands, and a custom execution engine to make the instructions do what they are intended to do. Instead of doing all this, why not embed an instance of Parrot in the program, and let Parrot handle the parsing and executing details?

Small scripting components which are not useful in a general sense like most programming languages, and are typically limited to use in very specific environments (such as within a single program) are called Domain-Specific Languages (DSL). DSLs are a very popular topic because a DSL allows developers to create a custom language that makes dealing with a given problem space or data set very easy. Parrot and its suite of compiler tools in turn make creating the DSLs very easy. It's all about ease of use.

Parrot API

3 POD Errors

The following errors were encountered while parsing the POD:

Around line 5:

A non-empty Z<>

Around line 9:

Deleting unknown formatting code N<>

Around line 27:

Deleting unknown formatting code N<>