logCroak("[%s] Lexeme value \"%s\" cannot be associated to lexeme name %s at position %d:%d.\n\nLast position:\n\n%s%s", whoami(__PACKAGE__), $found, $newlexeme, $lexemeHashp->{line}, $lexemeHashp->{column}, showLineAndCol(@{$line_columnp}, $self->{_sourcep}), $self->_context());
}
#
# A lexeme_read() can generate an event
#
$self->_getLexeme($lexemeHashp);
$self->_doEvents();
}
} elsif(${$self->{_sourcep}} =~ /\G[^\n]*/) {
#
# Could be an opaque ASM on a single line. If we are wrong, BNF will take over this wrong assumption
# by invalidating the tree. Please note that this will handle eventual multiple __asm statements, all
logCroak("[%s] Lexeme value \"%s\" cannot be associated to lexeme name %s at position %d:%d.\n\nLast position:\n\n%s%s", whoami(__PACKAGE__), $found, $newlexeme, $lexemeHashp->{line}, $lexemeHashp->{column}, showLineAndCol(@{$line_columnp}, $self->{_sourcep}), $self->_context());
# Hack for the Callback framework: store in advance the IDENTIFIER, preventing
# a call to lastCompleted
#
$self->{_lastIdentifier} = $lexemeHashp->{value};
}
}
if(! @alternatives) {
my$line_columnp= lineAndCol($self->{_impl});
logCroak("[%s] Lexeme value \"%s\" cannot be associated to TYPEDEF_NAME, ENUMERATION_CONSTANT nor IDENTIFIER at line %d, column %d.\n\nLast position:\n\n%s%s", whoami(__PACKAGE__), $lexemeHashp->{value}, $lexemeHashp->{line}, $lexemeHashp->{column}, showLineAndCol($lexemeHashp->{line}, $lexemeHashp->{column}, $self->{_sourcep}), $self->_context());
}
#
# Push the alternatives, more than one possible only if lazy mode is turned on
$log->debugf('[%s] Pushed alternative %s "%s"', whoami(__PACKAGE__), $_, $lexemeHashp->{value});
}
if($_eq 'IDENTIFIER') {
$self->{_lastIdentifier} = $lexemeHashp->{value};
}
} else{
if($is_debug) {
$log->debugf('[%s] Failed alternative %s "%s"', whoami(__PACKAGE__), $_, $lexemeHashp->{value});
}
}
}
if(! @alternativesOk) {
my$line_columnp= lineAndCol($self->{_impl});
logCroak("[%s] Lexeme value \"%s\" cannot be associated to %s at position %d:%d.\n\nLast position:\n\n%s%s", whoami(__PACKAGE__), $lexemeHashp->{value}, \@alternatives, $lexemeHashp->{line}, $lexemeHashp->{column}, showLineAndCol(@{$line_columnp}, $self->{_sourcep}), $self->_context());
logCroak("[%s] Lexeme value \"%s\" cannot be completed at position %d:%d.\n\nLast position:\n\n%s%s", whoami(__PACKAGE__), $lexemeHashp->{value}, $lexemeHashp->{line}, $lexemeHashp->{column}, showLineAndCol(@{$line_columnp}, $self->{_sourcep}), $self->_context());
}
$lexemeHashp->{name} = $alternativesOk[0];
$delta= $lexemeHashp->{length};
#
# A lexeme_read() can generate an event
#
$self->_doEvents();
}
}
return$delta;
}
1;
__END__
=pod
=encoding utf-8
=head1 NAME
MarpaX::Languages::C::AST - Translate a C source to an AST
This module translates C source into an AST tree. To assist further process of the AST tree, the nodes of the AST are blessed according to the C grammar you have selected. (The default is 'ISO-ANSI-C-2011'.) If you want to enable logging, be aware that this module is a Log::Any thingy.
This module implements the full syntax, as well as those specification constraints which are syntactic in nature: Associativity of nested if-then-else statements is according to the C standards, as is the treatment of names as typedefs, enums, or variable identifiers.
The C standards contain many constraints that are non-syntactic. MarpaX::Languages::C::AST does not implement these, leaving them for AST post-process. One example of a non-syntactic constraint is the requirement that labeled statements within a function be unique. Another is the requirement that declarations include at most one storage class specifier.
=head1 SUBROUTINES/METHODS
=head2 new($class, %options)
Instantiate a new object. Takes as parameter an optional hash of options that can be:
=over
=item grammarName
Name of a grammar. Default is 'ISO-ANSI-C-2011'.
=item typedef
An array reference to a list of known typedefs, injected at top scope before parsing start. This option should I<not> be used unless you pass a C source that is incomplete. Typically something that has not gone through a preprocessor. Default is [] i.e. empty list.
=item enum
An array reference to a list of known enums, injected at top scope before parsing start. Alike for typedef, this option should I<not> be used unless you pass a C source that is incomplete. Typically something that has not gone through a preprocessor. Default is [] i.e. empty list.
=item lazy
A flag saying the parser to inject automatically all allowed alternatives when the grammar reaches a TYPEDEF_NAME/ENUMERATION_CONSTANT/IDENTIFIER ambiguity. This option should be used in practice only when you are parsing a source code not pre-processed. Please note that I<if> lazy mode is on, there might be several parse tree values. In such a case, unless the option $optionalArrayOfValuesb of the value() method is true, the first of the parse tree values will be returned. If more than one alternative is accepted, the lexemeCallback (see below) will be, in order of preference, either TYPEDEF_NAME, ENUMERATION_CONSTANT or IDENTIFIER. The lazy mode can produce more than one parse tree value. The options typedef and enum (see upper) can be used to help lazy mode choose between TYPEDEF_NAME and ENUMERATION_CONSTANT, while IDENTIFIER will always be pushed as an alternative. Default is a false value.
=item start
A string giving the starting point of the grammar. This should be used when you know that the source code to parse is not a full valid source, but a portion of if. This requires knowledge of the grammar rules. Default is empty string: '', i.e. let the grammar apply its default start rule.
Please note that giving another value but 'translationUnit' will emit warnings from the grammar, saying that some rules are not reachable.
=item logInfo
Reference to an array of lexemes for which a log of level INFO will be issued.
=item lexemeCallback
Array reference containing a CODE ref and optional arguments. This callback will be trigerred like this: &$CODE(@arguments, $lexemeHashp), where $lexemeHashp is a reference to a hash describing current lexeme:
=over
=item name
Name of the lexeme. You have to refer to the grammar used to get its definition, although this is usually self-explanatory.
=item start
G1 (Marpa term) start location.
=item length
Length of the lexeme
=item line
Line number in the source being parsed.
=item column
Column number in the source being parsed.
=item value
String containing lexeme value.
=back
=back
=head2 parse($self, $sourcep)
Do the parsing. Takes as parameter the reference to a C source code. Returns $self, so that chaining with value method will be natural, i.e. parse()->value().
=head2 scope($self)
Returns the MarpaX::Languages::C::AST::Scope object.
=head2 value($self, $optionalArrayOfValuesb)
Return the blessed value. Takes as optional parameter a flag saying if the return value should be an array of all values or not. If this flag is false, the module will croak if there more than one parse tree value. If this flag is true, a reference to an array of values will be returned, even if there is a single parse tree value.
=head1 INCOMPATIBILITIES
Since version 0.30, the c2ast.pl script is named c2ast (i.e. without extension).
=head1 NOTES
C code can have inline ASM code. The GCC Inline Assembly is fully supported, any other is falling into a heuristic that should catch everything needed. CL inline assemblies have been targetted in particular.