PDD 18: Security Model
Abstract
This PDD describes the security infrastructure of Parrot.
Description
Parrot will be used in a variety of different application contexts, each with its own unique security needs.
Small devices such as cell phones need tight control over resource usage (CPU, memory, etc).
Web applications need filtering and validation of incoming data and blocks to prevent the use of unfiltered data in execution contexts (SQL, system calls, runtime eval, etc).
Web browser embedding, i.e. client-side execution of high-level languages, needs control over resource access on the client machine (local disk access, local network connections), sandboxing for downloaded code, limits on what code can be loaded and executed, and limits on certain dynamic features (runtime eval of code, modification of global namespaces).
Database engine embedding, i.e. server-side execution of high-level languages as stored procedures, also needs control over resource access (disk access, network connections), and limits on loaded code, but additionally needs administrator configured lists of allowed libraries and library paths.
Security auditing tools need hooks in the compilation process for static analysis.
Implementation
Parrot's security infrastructure is not an independent, encapsulated subsystem. It is a series of related features and functionality spread throughout the virtual machine.
Resource Quotas
Resource quotas ensure that an interpreter doesn't use more CPU time, memory, or system resources than are allowed. Quotas are most useful when running code in a managed environment such as a web, database, or game server where no one interpreter is allowed to consume too many resources and impact the system too badly. CPU time is managed by the runloop. The memory system handles memory quotas, the I/O system handles file open and pending I/O count quotas, and so on.
Privileges
A privilege system is used to restrict code from performing certain actions. When privilege checking is in force the code may need a particular privilege to load a library, or open a file.
Privileges can be quite broad, on the order of "allow file I/O", or as fine-grained as allowing/denying the right to run one particular opcode. Privileges are discrete entities, They are also hierarchical, one privilege can be specified to follow from another privilege (the privilege FOO may be automatically granted to anything with the privilege BAR). Anything with the ALL privilege is automatically granted all other privileges in the system. Privileges are user-definable, but user-defined privileges can only give grants of rights, they cannot take them (BAZ may grant its privileges to any user with FOO privileges, but it can't automatically grant itself all the FOO privileges).
A few example privileges:
- ALL
-
Granted all privileges.
- IO
-
May run I/O operations.
- INVOKE
-
May invoke subroutines, methods, or return continuations.
- RETURN
-
May invoke return continuations (not subroutines or methods). (RETURN privileges are granted to anyone with INVOKE privileges.)
- SYSCALL
-
May run a system call.
- COMPILE
-
May compile code from string source at runtime (eval).
- LOAD
-
May load libraries at runtime.
Users
For the most part, a "user" in the Parrot privilege system doesn't correspond to a literal user (though it may, if Parrot is running embedded in a database engine or multi-user gaming system). A user is a bundle of privileges, identified by a user ID, and authenticated with a pass key. The privilege ID can be cheaply passed around, and validated whenever a restricted action is performed.
Opcode Disabling
All opcodes in Parrot can be selectively disabled, by short name (print
), long name (including signature, print_sc
), or by group (io
, net
, load
, compile
). They can also be selectively enabled, by defining a privilege with "disable all", and then allowing only specific opcodes.
When running in a secured mode, all dynamically loaded opcode libraries are disabled by default, and have to be explicitly enabled (individually, as a group, or by a system-wide configuration).
Opcodes are tagged with their group in their definition, and may be tagged in multiple groups, as in:
inline op print(in INT) :base_io,io {
...
}
Library Loading
In certain environments, it's desirable to be able to restrict what libraries may be loaded by code running on the virtual machine. The allowed library list is defined in a system-wide configuration file, or set at runtime by a user with administrative privileges. Libraries may be signed, with the key specified in the library list and verified on loading. Libraries that can't be signed (C libraries), can be check-summed to ensure that the library you load is the exact file you expect.
Generally, library loading restrictions are useful in an embedded environment like a database engine or web browser, or a multi-user environment like a web hosting server, where arbitrarily extended behavior is a security risk. It can also be a useful development tool, as running your daily development environment with library loading restrictions turned on means you always know exactly what dependencies the code base has.
Resource Access
Access to resources such as the local disk, network, are controlled through the privilege system. Resource access limitations are a combination of disabled opcodes, blocked library loading, and privilege checks within standard libraries for I/O, network, etc.
Sandboxing
A sandbox is a virtual machine within the virtual machine. It's a safe zone to contain code from an untrusted source. In the extreme case, a sandbox is completely isolated from all outside code, with no access to read or write to the surrounding environment. In the general case, a sandbox will have the ability to read from, but not write to the surrounding environment (global namespaces, for example), with a very narrow and carefully filtered route to send some data back to the code that called it. The sandbox system works together with the privileges system, in that by default code in the sandbox has no privileges outside the sandbox, but may be granted privileges.
Data Firewall
Any data that originates from user input (command-line, user prompt, web form, file access, network operation) is a potential security risk. The best place to trap bad data is at the point of entry, before it touches a single line of code. When the data firewall is enabled, all data entering from an external source or crossing a sandbox barrier is subjected to filter rules. The filter rules in force are configurable, and filters can be selectively enabled and disabled for particular types and sources of data.
Data filters can be sanitizing or validating. Sanitizing data filters modify the data as it passes (escaping quotes, encoding entities, etc). Validating filters check that the data meets certain conditions (the presence or absence of specific features), and when it fails to meet those conditions, the data is blocked from passing the firewall (returning an empty string or PMCNULL instead of the expected data). Data filters can also be user-defined, as a regular expression (PGE rule) or subroutine.
The same filter rules applied within the data firewall can be called explicitly on any data.
Bytecode Validation
In normal operation the interpreter assumes that the bytecode that it executes is valid -- that is, any parameters to opcodes are sane, data structures are intact. When bytecode validation is enabled, however, Parrot assumes that bytecode is not necessarily valid. The interpreter then, at runtime, makes sure that all specified register numbers are within valid range, and string and PMC structures used are valid.
Auditing Hooks
Even in dynamic languages, it's possible to perform a degree of static analysis for security risks. The opcode syntax tree (OST) produced by the language compiler is a good data source for static analysis checks, because it's low-level enough that you can check for individual opcodes that will be called (checking for I/O, networking, and other similar operations, or unknown dynamically-loaded opcodes), and high-level enough that you still have access to substantial metadata from the parse. The standard compiler tools already have the ability to add and remove stages from the compilation process. Static analysis tools can be implemented by stopping the standard compilation at the OST phase, and inserting an additional phase to scan the OST. Because the OST form is standard across high-level languages running on Parrot, the tools can be written once and applied to many languages.
References
"Exploring the Broken Web": http://talks.php.net/show/osdc07
"Safe ERB": http://agilewebdevelopment.com/plugins/safe_erb
pecl/filter: http://us2.php.net/filter
Rasmus Lerdorf for the term "data firewall".