SYNOPSIS

Compile it with:

$ eyapp -C SemanticInfoInTokens.eyp

Run it with:

$ ./SemanticInfoInTokens.pm -t -i -f inputforsemanticinfo.txt

try also:

./SemanticInfoInTokens.pm -t -i -f inputforsemanticinfo2.txt

THE TYPENAME-IDENTIFIER PROBLEM WHEN PARSING THE C LANGUAGE

The C language has a context dependency: the way an identifier is used depends on what its current meaning is. For example, consider this:

T(x);

This looks like a function call statement, but if T is a typedef name, then this is actually a declaration of x. How can a parser for C decide how to parse this input?

Here is another example:

{
  T * x;
  ...
}

What is this, a declaration of x as a pointer to T, or a void multiplication of the variables T and x?

The usual method to solve this problem is to have two different token types, ID and TYPENAME. When the lexer finds an identifier, it looks up in the symbol table the current declaration of the identifier in order to decide which token type to return: TYPENAME if the identifier is declared as a typedef, ID otherwise.

THIS EXAMPLE

One way to handle context-dependency is the lexical tie-in: a flag which is set by the semantic actions, whose purpose is to alter the way tokens are parsed.

In this "Calc"-like example we have a language with a special construct hex (hex-expr). After the keyword hex comes an expression in parentheses in which all integers are hexadecimal. In particular, strings in /[A-F0-9]+/ like A1B must be treated as an hex integer unless they were previously declared.

Here the lexer looks at the value of the hexflag attribute; when it is nonzero, all integers are parsed in hexadecimal, and tokens starting with letters are parsed as integers if possible.

SEE ALSO