This program is what I use for modifying the regular expressions con- tained in lib/JE/Code.pm. If you are just going to install the JE mod- ule and use it, ignore this file. If you want to play with the module and modify it, then you may find this interesting.
Warning: This is not portable code. It works on Mac OS X, and should work on any Unix, but not on any weird OS like Windows.
The 'build_regex' function below replaces <str: with code that records the beginning of a 'str' (or whatever is between '<' and ':') in @A, and replaces :> (a birdie) or :token> with code that records the ending posi- tion of a given token. <:token:> is used for fixed-length tokens, such as 'new'. In records the ending position of the token. '<ident>' is replaced with (??{$_re_ident}), etc. $blahblahblah is replaced with $_re_blahblahblah, etc.
Right now it also replaces '(?>' with '(?:', because I rely on variable localisation and backtracking. Currently (as of 5.8.8) variable localisation done within atomic groups is undone when the group is exited. If/when this is fixed, I would like to go back to using atomic groups again, because it should theoretically speed things up, especially when there is a syntax error. Without atomic groups, sometimes this parser will backtrack and end up finding a match anyway (maybe this is actually a feature, not a bug).
$h takes care of horizontal white space and /* comments */ that do not contain line breaks. This can occur where the spec says "NoLineTerminatorHere."
$s is for all white space and comments.
$S is for mandatory white space or comments (e.g., between 'var' and the following identifier).
$ss is a single whitespace char
I'm calling a 'term' what the spec calls a PrimaryExpression. It includes parenthesised expressions, as well as terms.
The special literals null, true and false are thrown by these regexps into the same category as identifiers. They get sorted out afterwards.
Though 'a || b = c' is a syntax error according to the spec., parsing is easier if I allow it. This could be construed as a feature if I make || return an lvalue, so that's what I've done): a || b = c means a ? a = c : b = c Not bad, is it? And likewise, a && b = c means a ? b = c : a = c Errors like '3 < 4 = 5' will be caught at run time, which is still acc. to spec., since the spec. says explicitly that the reporting of a syntax error may be deferred until execution of the statement in question.