NAME

CORBA::IDLtree - OMG IDL to symbol tree translator

VERSION

Version 2.05

SYNOPSIS

Subroutine Parse_File is the universal entry point (to be called by the main program.) It takes an IDL file name as the input parameter and parses that file, constructing one or more symbol trees for the outermost declarations encountered. It returns a reference to an array containing references to those trees. In case of errors during parsing, Parse_File returns 0.

Usage:

use CORBA::IDLtree;

my $ref_to_array_of_outermost_declarations = CORBA::IDLtree::Parse_File("myfile.idl");

$ref_to_array_of_outermost_declarations or die "File had syntax errors\n";
foreach my $node (@$ref_to_array_of_outermost_declarations) {
    # Query $node->[TYPE] to find out what each node is;
    # use $node->[SUBORDINATES] according to the $node->[TYPE].
    # For example:
    if ($node->[CORBA::IDLtree::TYPE] == CORBA::IDLtree::MODULE) {
        foreach my $subnode @{$node->[CORBA::IDLtree::SUBORDINATES]}) {
            # Assuming your "sub process" codes your business logic:
            &process($subnode);
        }
    } elsif ($node->[CORBA::IDLtree::TYPE] == CORBA::IDLtree::...) {
        # And so on, decode and process all the types you need ...
        # For further details see the demo application in subdir demoapp.
    }
}

STRUCTURE OF THE SYMBOL TREE

A "thing" in the symbol tree can be either a reference to a node, or a reference to an array of references to nodes.

Each node is a six element array with the elements

[0] => TYPE (MODULE|INTERFACE|STRUCT|UNION|ENUM|TYPEDEF|CHAR|...)
[1] => NAME
[2] => SUBORDINATES
[3] => ANNOTATIONS
[4] => COMMENT
[5] => SCOPEREF

The TYPE element, instead of holding a type ID number (see the following list under SUBORDINATES), can also be a reference to the node defining the type. When the TYPE element can contain either a type ID or a reference to the defining node, we will call it a type descriptor. Which of the two alternatives is in effect can be determined via the isnode function.

The NAME element, unless specified otherwise, simply holds the name string of the respective IDL syntactic item.

The SUBORDINATES element depends on the type ID:

MODULE or INTERFACE

Reference to an array of nodes (symbols) which are defined within the module or interface. In the case of INTERFACE, element [0] in this array will contain a reference to a further array which in turn contains references to the parent interface(s) if inheritance is used, or the null value if the current interface is not derived by inheritance. Element [1] is the "local/abstract" flag which is ABSTRACT for abstract interfaces, or LOCAL for interfaces declared local.

INTERFACE_FWD

Reference to the node of the full interface declaration.

STRUCT or EXCEPTION

Reference to an array of node references representing the member components of the struct or exception. Each member representative node is a quintuplet consisting of (TYPE, NAME, <dimref>, ANNOTATIONS, COMMENT). The <dimref> is a reference to a list of dimension numbers, or is 0 if no dimensions were given. In case of STRUCT, the first element may be a reference to a further STRUCT node instead of the reference to quintuplet. In this case, the first element indicates the IDL4 parent struct type of the current struct. The function isnode() can be used for detecting this case.

UNION

Similar to STRUCT/EXCEPTION, reference to an array of nodes. For union members, the member node has the same structure as for STRUCT/EXCEPTION. However, the first node contains a type descriptor for the discriminant type. The switch node does not follow the usual quadruplet structure of members; it is a single item. The TYPE of a member node may also be CASE or DEFAULT. When the TYPE is CASE or DEFAULT, this means that the following member node will be the union branch controlled by the CASE or DEFAULT. For CASE, the NAME is unused, and the SUBORDINATES contains a reference to a list of the case values for the following member node. For DEFAULT, both the NAME and the SUBORDINATES are unused.

ENUM

Reference to an array describing the enum value literals. Each element in the array is a reference to a triplet (three element array): The first element in the triplet is the enum literal value. The second element is a reference to an array of annotations as described in the ANNOTATIONS documentation (see below). The third element is a reference to the trailing comment list.

TYPEDEF

Reference to a two-element array: element 0 contains a reference to the type descriptor of the original type; element 1 contains a reference to an array of dimension expressions, or the null value if no dimensions are given. When given, the dimension expressions are plain strings.

SEQUENCE

As a special case, the NAME element of a SEQUENCE node does not contain a name (as sequences are anonymous types), but instead is used to hold the bound number. If the bound number is 0 then it is an unbounded sequence. The SUBORDINATES element contains the type descriptor of the base type of the sequence. This descriptor could itself be a reference to a SEQUENCE defining node (that is, a nested sequence definition.)

BOUNDED_STRING

Bounded strings are treated as a special case of sequence. They are represented as references to a node that has BOUNDED_STRING or BOUNDED_WSTRING as the type ID, the bound number in the NAME, and the SUBORDINATES element is unused.

CONST

Reference to a two-element array. Element 0 is a type descriptor of the const's type; element 1 is a reference to an array containing the RHS expression symbols.

FIXED

Reference to a two-element array. Element 0 contains the digit number and element 1 contains the scale factor. The NAME component in a FIXED node is unused.

VALUETYPE

Uses the following structure:

[0] => $is_abstract (boolean)
[1] => reference to a tuple (two-element list) containing
       inheritance related information:
       [0] => $is_truncatable (boolean)
       [1] => \@ancestors (reference to array containing
              references to ancestor nodes)
[2] => \@members: reference to array containing references
       to tuples (two-element lists) of the form:
       [0] => 0|PRIVATE|PUBLIC
              A zero for this value means the element [1]
              contains a reference to a declaration, such
              as a METHOD or ATTRIBUTE.
              In case of METHOD, the first element in the
              method node subordinates (i.e., the return
              type) may be FACTORY.
              However, unlike interface methods, the last
              element is _not_ a reference to the 'raises'
              list.  Support for 'raises' of valuetype
              methods may be added in a future version.
       [1] => reference to the defining node.
              In case of PRIVATE or PUBLIC state member,
              the SUBORDINATES of the defining node
              contains a dimref (reference to dimensions
              list, see STRUCT.)
VALUETYPE_BOX

Reference to the defining type node.

VALUETYPE_FWD

Reference to the node of the full valuetype declaration.

NATIVE

Subordinates unused.

ATTRIBUTE

Reference to a two-element array; element 0 is the read- only flag (0 for read/write attributes), element 1 is a type descriptor of the attribute's type.

METHOD

Reference to a variable length array; element 0 is a type descriptor for the return type. Elements 1 and following are references to parameter descriptor nodes with the following structure:

elem. 0 => parameter type descriptor
elem. 1 => parameter name
elem. 2 => parameter mode (IN, OUT, or INOUT)

The last element in the variable-length array is a reference to the "raises" list. This list contains references to the declaration nodes of exceptions raised, or is empty if there is no "raises" clause.

INCFILE

Reference to an array of nodes (symbols) which are defined within the include file. The Name element of this node contains the include file name.

PRAGMA_PREFIX

Subordinates unused.

PRAGMA_VERSION

Version string.

PRAGMA_ID

ID string.

PRAGMA

This is for the general case of pragmas that are none of the above, i.e. pragmas unknown to IDLtree. The NAME holds the pragma name, and SUBORDINATES holds all further text appearing after the pragma name.

REMARK

The NAME of the node contains the starting line number of the comment text. The SUBORDINATES component contains a reference to a list of comment lines. The comment lines are not newline terminated. The source line number of each comment line can be computed by adding the starting line number and the array index of the comment line. By default, REMARK nodes will not be generated; generation of REMARK nodes can be enabled by setting the $enable_comments global variable to non zero.

The ANNOTATIONS element holds the reference to an array of annotation nodes if IDL4 style annotations are present (if no annotations are present then the ANNOTATIONS element holds 0). Each entry in this array is an array reference. The first element in the array referenced is a reference to an entry in @annoDefs (see comments at declaration of @annoDefs). The following elements contain the concrete values for the parameters, in the order as defined by the entry in @annoDefs. If the user omitted the value of the parameter then the default as specified by the entry in @annoDefs is filled in.

The COMMENT element holds the comment text that follows the IDL declaration on the same line. Usually this is just a single line. However, if a multi- line comment is started on the same line after a declaration, the multi-line comment may extend to further lines - therefore we use a list of lines. The lines in this list are not newline terminated. The COMMENT field is a reference to a tuple of starting line number and reference to the line list, or contains 0 if no trailing comment is present at the IDL item.

The SCOPEREF element is a reference back to the node of the module or interface enclosing the current node. If the current node is already at the global scope level then the SCOPEREF is 0. Special case: For a reopened module, the SCOPEREF points to the previous opening of the same module. In case of multiple reopenings, each reopening points to the previous opening. The SCOPEREF of the initial module finally points to the enclosing scope. All nodes have this element except for the parameter nodes of methods and the component nodes of structs/unions/exceptions.

CLASS VARIABLES

Variables that can be set by client code

@CORBA::IDLtree::include_path

Paths where to look for included IDL files.

%CORBA::IDLtree::defines

Symbol definitions for preprocessor.

$CORBA::IDLtree::cache_trees

Values 0 or 1, default 0. By default, do not cache trees of #included files.

$CORBA::IDLtree::enable_comments

Values 0 or 1, default 0. By default, do not generate REMARK nodes.

$CORBA::IDLtree::struct2vt

Values 0 or 1, default 0. Change struct into equivalent valuetype

$CORBA::IDLtree::vt2struct

Values 0 or 1, default 0. Change valuetype into equivalent struct

$CORBA::IDLtree::cache_statistics

Values 0 or 1, default 0. Print cache statistics

$CORBA::IDLtree::long_double_supported

Values 0 or 1, default 0. Switch on support for IDL long double.

$CORBA::IDLtree::union_default_null_allowed

Values 0 or 1, default 1. Switch off permission that a union's default branch may be empty.

$CORBA::IDLtree::leading_underscore_allowed

Value 1 will remove the leading underscore. Value 2 will preserve the leading underscore.

$CORBA::IDLtree::permissive

Values 0 or 1, default 0. By default, misuse of IDL keywords as identifiers is a hard error.

Variables written by CORBA::IDLtree

These are to be considered read-only from outside:

$CORBA::IDLtree::n_errors

Cumulative number of errors for a Parse_File call.

$CORBA::IDLtree::global_idlfile

Copy of filename passed into most recent call of sub Parse_File

CONSTANTS

Constants for accessing the elements of a node

Constants for indexing the elements of a node

As explained in STRUCTURE OF THE SYMBOL TREE, each node is represented as a six element array. These constants are intended for indexing the array:

sub TYPE ()         { 0 }
sub NAME ()         { 1 }
sub SUBORDINATES () { 2 }
sub MODE ()         { 2 }
sub ANNOTATIONS ()  { 3 }
sub COMMENT ()      { 4 }
sub SCOPEREF ()     { 5 }

The constant MODE is an alias of SUBORDINATES for method parameter nodes.

Method parameter modes
sub IN ()    { 1 }
sub OUT ()   { 2 }
sub INOUT () { 3 }
Meanings of the TYPE entry in the symbol node
sub NONE ()            { 0 }   # error/illegality value
sub BOOLEAN ()         { 1 }
sub OCTET ()           { 2 }
sub CHAR ()            { 3 }
sub WCHAR ()           { 4 }
sub SHORT ()           { 5 }
sub LONG ()            { 6 }
sub LONGLONG ()        { 7 }
sub USHORT ()          { 8 }
sub ULONG ()           { 9 }
sub ULONGLONG ()       { 10 }
sub FLOAT ()           { 11 }
sub DOUBLE ()          { 12 }
sub LONGDOUBLE ()      { 13 }
sub STRING ()          { 14 }
sub WSTRING ()         { 15 }
sub OBJECT ()          { 16 }
sub TYPECODE ()        { 17 }
sub ANY ()             { 18 }
sub FIXED ()           { 19 }  # node
sub BOUNDED_STRING ()  { 20 }  # node
sub BOUNDED_WSTRING () { 21 }  # node
sub SEQUENCE ()        { 22 }  # node
sub ENUM ()            { 23 }  # node
sub TYPEDEF ()         { 24 }  # node
sub NATIVE ()          { 25 }  # node
sub STRUCT ()          { 26 }  # node
sub UNION ()           { 27 }  # node
sub CASE ()            { 28 }
sub DEFAULT ()         { 29 }
sub EXCEPTION ()       { 30 }  # node
sub CONST ()           { 31 }  # node
sub MODULE ()          { 32 }  # node
sub INTERFACE ()       { 33 }  # node
sub INTERFACE_FWD ()   { 34 }  # node
sub VALUETYPE ()       { 35 }  # node
sub VALUETYPE_FWD ()   { 36 }  # node
sub VALUETYPE_BOX ()   { 37 }  # node
sub ATTRIBUTE ()       { 38 }  # node
sub ONEWAY ()          { 39 }  # implies "void" as the return type
sub VOID ()            { 40 }
sub FACTORY ()         { 41 }
sub METHOD ()          { 42 }  # node
sub INCFILE ()         { 43 }  # node
sub PRAGMA_PREFIX ()   { 44 }  # node
sub PRAGMA_VERSION ()  { 45 }  # node
sub PRAGMA_ID ()       { 46 }  # node
sub PRAGMA ()          { 47 }  # node
sub REMARK ()          { 48 }  # node
sub NUMBER_OF_TYPES () { 49 }

The constant FACTORY can only occur as the return type of a method in a valuetype.

Interface/valuetype flag values
sub ABSTRACT      { 1 }
sub LOCAL         { 2 }
sub TRUNCATABLE   { 2 }
sub CUSTOM        { 3 }
Valuetype member flags
sub PRIVATE       { 1 }
sub PUBLIC        { 2 }

SUBROUTINES

Parse_File

Parses the file name given as argument. Returns reference to array of nodes representing the top level (global) declarations in the file. Returns 0 if the file had syntax errors. Parse_File writes the error messages to STDERR.

Dump_Symbols

Symbol tree dumper (for debugging etc.) reconstructs the IDL source notation from the parsed symbol tree. Parameters:

  1. Reference to a symbol array (return value from a previous call to Parse_File).

  2. Optional parameter controlling the output:

    • If given as string then it is the name of a file into which to dump the IDL source.

    • If given as array reference then the IDL source will be placed in the referenced array, one line per element, where each line is not newline terminated.

    • If the optional parameter is not given or is given as undef then the IDL source will be dumped to STDOUT.

is_elementary_type

Given a node reference, returns the type constant if the node prepresents an elementary type. Returns 0 if the type is not elementary.

predef_type

Given a type name (as string), returns the type constant if the type name is that of an elementary type. Returns 0 if the type is not elementary.

isnode

Given a "thing", returns 1 if it is a reference to a node, 0 otherwise.

is_scope

Given a "thing", returns 1 if it's a ref to a MODULE, INTERFACE, or INCFILE node.

find_node

Looks up a name in the symbol tree(s) constructed so far. Returns the node ref if found, else 0.

typeof

Given a type descriptor, returns the type as a string in IDL syntax.

set_verbose

Call this to make the parser tell us what it's doing.

is_a

Determine if typeid is of given type, recursing through TYPEDEFs.

root_type

Get the original type of a TYPEDEF, i.e. recurse through all non array TYPEDEFs until the original type is reached.

is_pragma

Return 1 if the given type constant or node is a pragma.

files_included

Returns an array with the names of files #included.

get_scalar_default

Get default value for type. Uses comment directives object if available.

idlsplit

Splits a given IDL expression into its individual tokens. Returns the tokens as a list. Example: The call

idlsplit("(m_a::myconst+1.0) / scale")

returns the list

"(", "m_a::myconst", "+", "1.0", ")", "/", "scale"

is_valid_identifier

Returns 1 if the argument is a valid IDL identifier.

scoped_name

Expects a symbol node as the input argument and returns its fully qualified name in IDL syntax.

collect_includes

Utility for collecting #included files. Parameters:

  1. Reference to node list to analyze.

  2. Reference to hash in which to add the includefile names encountered. The includefile names are added as key fields of the hash. The value fields are not used.

get_numeric

Computes numeric value of expression.

enum_literals

The SUBORDINATES of ENUM contains more than just the actual enum literal values (the additional data are: annotations, trailing comments). This is a convenience subroutine which returns the net literals of the given $enumnode[SUBORDINATES].

AUTHOR

Oliver M. Kellogg, <okellogg at users.sourceforge.net>

BUGS

Please report any bugs or feature requests to bug-corba-idltree at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=CORBA-IDLtree. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

perldoc CORBA::IDLtree

You can also look for information at:

ACKNOWLEDGEMENTS

Thanks to Heiko Schroeder for contributing.

LICENSE AND COPYRIGHT

Copyright (C) 1998-2020, Oliver M. Kellogg

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.