NAME

Image::Leptonica::Func::parseprotos

VERSION

version 0.03

parseprotos.c

parseprotos.c

      char             *parseForProtos()

   Static helpers
      static l_int32    getNextNonCommentLine()
      static l_int32    getNextNonBlankLine()
      static l_int32    getNextNonDoubleSlashLine()
      static l_int32    searchForProtoSignature()
      static char      *captureProtoSignature()
      static char      *cleanProtoSignature()
      static l_int32    skipToEndOfFunction()
      static l_int32    skipToMatchingBrace()
      static l_int32    skipToSemicolon()
      static l_int32    getOffsetForCharacter()
      static l_int32    getOffsetForMatchingRP()

FUNCTIONS

parseForProtos

char * parseForProtos ( const char *filein, const char *prestring )

parseForProtos()

    Input:  filein (output of cpp)
            prestring (<optional> string that prefaces each decl;
                      use NULL to omit)
    Return: parsestr (string of function prototypes), or NULL on error

Notes:
    (1) We parse the output of cpp:
            cpp -ansi <filein>
        Three plans were attempted, with success on the third.
    (2) Plan 1.  A cursory examination of the cpp output indicated that
        every function was preceeded by a cpp comment statement.
        So we just need to look at statements beginning after comments.
        Unfortunately, this is NOT the case.  Some functions start
        without cpp comment lines, typically when there are no
        comments in the source that immediately precede the function.
    (3) Plan 2.  Consider the keywords in the language that start
        parts of the cpp file.  Some, like 'typedef', 'enum',
        'union' and 'struct', are followed after a while by '{',
        and eventually end with '}, plus an optional token and a
        final ';'  Others, like 'extern' and 'static', are never
        the beginnings of global function definitions.   Function
        prototypes have one or more sets of '(' followed eventually
        by a ')', and end with ';'.  But function definitions have
        tokens, followed by '(', more tokens, ')' and then
        immediately a '{'.  We would generate a prototype from this
        by adding a ';' to all tokens up to the ')'.  So we use
        these special tokens to decide what we are parsing.  And
        whenever a function definition is found and the prototype
        extracted, we skip through the rest of the function
        past the corresponding '}'.  This token ends a line, and
        is often on a line of its own.  But as it turns out,
        the only keyword we need to consider is 'static'.
    (4) Plan 3.  Consider the parentheses and braces for various
        declarations.  A struct, enum, or union has a pair of
        braces followed by a semicolon.  They cannot have parentheses
        before the left brace, but a struct can have lots of parentheses
        within the brace set.  A function prototype has no braces.
        A function declaration can have sets of left and right
        parentheses, but these are followed by a left brace.
        So plan 3 looks at the way parentheses and braces are
        organized.  Once the beginning of a function definition
        is found, the prototype is extracted and we search for
        the ending right brace.
    (5) To find the ending right brace, it is necessary to do some
        careful parsing.  For example, in this file, we have
        left and right braces as characters, and these must not
        be counted.  Somewhat more tricky, the file fhmtauto.c
        generates code, and includes a right brace in a string.
        So we must not include braces that are in strings.  But how
        do we know if something is inside a string?  Keep state,
        starting with not-inside, and every time you hit a double quote
        that is not escaped, toggle the condition.  Any brace
        found in the state of being within a string is ignored.
    (6) When a prototype is extracted, it is put in a canonical
        form (i.e., cleaned up).  Finally, we check that it is
        not static and save it.  (If static, it is ignored).
    (7) The @prestring for unix is NULL; it is included here so that
        you can use Microsoft's declaration for importing or
        exporting to a dll.  See environ.h for examples of use.
        Here, we set: @prestring = "LEPT_DLL ".  Note in particular
        the space character that will separate 'LEPT_DLL' from
        the standard unix prototype that follows.

AUTHOR

Zakariyya Mughal <zmughal@cpan.org>

COPYRIGHT AND LICENSE

This software is copyright (c) 2014 by Zakariyya Mughal.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.