NAME
Syntax::Kamelon - A versatile and fully programmable textual content parser that is extremely well suited for syntax highlighting and code folding
SYNOPSIS
use Syntax::Kamelon;
my @attributes = Syntax::Kamelon->AvailableAttributes;
my %formtab = ();
for (@attributes) {
$formtab{$_} = "<font class=\"$_\">"
}
my $textfilter = "[%~ text FILTER html FILTER replace('\\040', ' ') FILTER replace('\\t', ' ') ~%]";
my $hl = new Syntax::Kamelon(
xmlfolder => $xmldir,
noindex => 1,
formatter => ['Base',
textfilter => \$textfilter,
format_table => \%formtab,
newline => "</br>\n",
tagend => '</font>',
],
);
while (my $in = <IFILE>) {
$hl->Parse($in);
}
print $hl->Format;
DESCRIPTION
Kamelon is based on the syntax highlighting and code folding algorithms used in the Kate texteditor of the KDE desktop. It replaces and supercedes Syntax::Highlight::Engine::Kate.
This is a rewrite and not backwards compatible with Syntax::Highlight::Engine::Kate.
Instead of using plugin modules it loads Kate's syntax highlight definition xml files directly. That makes development and testing a lot easier. It also opens up a new field of applications like creating your own highlight definitions to neatly format your reports. Tons of bugs have been removed. Testing has been redesigned. It runs about four times faster than version 0.10 and is up to spec with the latest Kate highlight definitions.
OPTIONS
Kamelons' constructor is called with a paired list of options as parameters. You can use the following options.
- commands => ref to hash
-
Specify commands to execute upon a specific match. Example:
mycommand => sub { my $match = shift; return '' }
You can specify the command in the rules of your own syntax xml file. Kamelon will give the matched text to your sub as parameter and will parse whatever your sub returns. Make it always return at least an empty string.
- formatter => ['Name', @options],
-
A formatter can be any object that inherits Syntax::Kamelon::Format::Base and lives in Syntax::Kamelon::Format::Name. By default 'Base' without options is loaded. This is convenient if you only use ParseRaw.
See also Syntax::Kamelon::Format::Base, Syntax::Kamelon::Format::ANSI, Syntax::Kamelon::Format::HTML4.
- indexfile => filename
-
Specifies the filename where Kamelon stores information about available syntax definitions.
By default it points to 'indexrc' in the xmlfolder. If the file does not exist Kamelon will load all xml files in the xmlfolder and attempt to create the indexfile.
Once the indexfile has been created it becomes static. If you add a syntax definition XML file to the xmlfolder it will no longer be recognized. Delete the indexfile and reload Kamelon to fix that.
See also Syntax::Kamelon::Indexer
- logcall => ref to sub
-
By default Kamelon writes all errors to STERR.
- noindex => boolean
-
By default 0. If you set this option Kamelon will ignore the existence of an indexfile and manually build the index, without saving it. But it gives you the liberty of adding and removing syntax highlight definition files.
This option comes with a considerable startup penalty.
See also Syntax::Kamelon::Indexer
- syntax => string
-
Specify the syntax definition you want to use. If you do not specify this option Kamelon will start in blank mode. It the Highlight and HighlightAndFormat methods will allow text to pass without any highlighting being done.
- verbose => boolean
-
By default 0. If you set it Kamelon will happily complain about all integrity errors it finds in syntax xml files. Otherwise it will only complain and crunch about the ones it cannot overcome.
- xmlfolder => folder
-
This is the place where Kamelon looks for syntax highlight definition XML files. By default it searches @INC for 'Syntax/Kamelon/XML'. Here you find the XML files used in the Kate text editor. They are specially crafted for this module.
See also Syntax::Kamelon::Indexer
PUBLIC METHODS
- AvailableAttributes
-
Returns a list of all available attribute tags. Can also be called before initializing Kamelon.
- AvailableSyntaxes
-
Returns a list of all available syntax definitions.
- ClearLexers
-
Empties the pool of loaded lexers. Every called lexer will be loaded from scratch.
- Column
-
Returns the column position in the line that is currently highlighted.
- FirstNonSpace($string);
-
Returns true if the current line did not contain a non-spatial character so far and the first character in $string is a non spatial character.
- Format
-
Calls the Format method of the currently loaded formatter and returns the result.
- Formatter
-
Returns a reference to the formatter object.
- GetIndexer
-
Returns the Indexer object.
- GetLexer($syntax);
-
Returns the lexer data structure from the pool of loaded lexers. If it is not found it will create and return it.
- IsDeliminator($char);
-
Returns true if $char is a deliminator.
- LastcharBoundary
-
Returns true if the last character that was parsed was a word boundary.
- LastcharDeliminator
-
Returns true if the last character that was parsed was a deliminator.
- LastChar
-
Returns the last character that was parsed.
- LineNumber
-
Returns the line number of the next line that is to be parsed.
- LineStart
-
Returns true if the parser is at the beginning of a line.
- LogCallGet;
-
Returns a reference to the anonymous sub that handles Warnings. See also the locall option.
- LogCallSet($anonsub);
-
Sets the anonymous sub that handles Warnings. See also the locall option.
- LogWarning($message);
-
Send a message to the warning mechanism of Kamelon.
- Parse($text);
-
Parses $text and returns a formatted text.
- ParseRaw($text);
-
Parses $text and returns a paired list of text fragments and the format information from the formatters FormatTable.
- Reset
-
Clears all buffers and resets Kamelon to beginning state.
- StateCompare($state);
-
Returns true if the current stack is equal to a previously saved $state. $state contains a reference to a list.
- StateGet
-
Returns a copy of the stack in an array.
- StateSet(@state);
-
Set the state to a previously saved state.
- SuggestSyntax($filename);
-
Tries to come up with a suitable lexer for $filename. It matches the extension of the file against the extension database held by the Indexer. Returns undef if nothing is found.
- Syntax($syntax);
-
Switches to the lexer in $syntax and performs a reset.
PRIVATE METHODS
- Captured(\@captured)
-
Stores a list of items captured by the RegExpr rule in the stack with the current context. Used with dynamic rules.
- CapturedGet($number)
-
Returns the captured item indexed by number that is stored in the stack with the parent context. Used with dynamic rules.
- CapturedParseC($string)
-
$string should only be one character long and numeric. capturedParseC will return the Nth captured element of the parent context.
Used with dynamic rules.
- CapturedParseR($string)
-
All occurences of \[1-9] will be replaced by the corresponding captured elements of the parent context.
Used with dynamic rules.
- CommandExecute($cmdname, $matchedtext);
-
Calls the sub connected to $cmdname with $matchedtext as parameter. Returns whatever the sub returns.
- IncludeRules($text, $callbacklist, $debuginfo, $inclattr);
-
Called when an IncludeRules instruction is encountered in a lexer context.
- IncludeSyntax($text, $syntax, $context);
-
Same as IncludeRules, only the Rules refer to another syntax.
- IncludeSyntaxIA($text, $syntax, $context, $attr);
-
Same as IncludeSyntax but now with the includeAttribute flag set.
- InitFormatter($reftolist);
-
Tries to load a formatter module that lives in Syntax::Kamelon::Format::PluginName. The PluginName is the first item in the list. The other items are its options
These formatters do not yet work.
Initializes the formatter module. These modules are not yet ready for deployment.
- LineEndContext($shifter);
-
Makes sure the context shift is done properly when a LineEndContext is specified.
- ParseContext($text, $callbacklist, $debuginfo);
-
The internal combustion engine of Kamelon. It parses $$text through all callbacks in $callbacklist. If a match is found it parses the result and stops. Returns true if a match was made.
- ParseLine($text);
-
Called by Parse. It is given one line (including the newline) as a parameter.
- ParseResultXXXX
-
These methods are called by the resultparsers. These are anonymous subs within the rules of the lexer. Sometimes multiple resultparsers are embedded in a rule.
- ParseResult($text, $match, $contextswitcher, $attr);
- ParseResultCommand($text, $match, $contextswitcher, $attr, @resultparsers, $command);
- ParseResultLookAhead($text, $match, $contextswitcher);
- ParseResultOverStrike($text, $match, $contextswitcher, $attr, @resultparsers, $command);
- ParseResultRegion($text, $match, $contextswitcher, $attr, $beginregion, $endregion);
-
Not yet functional. Should be called when a match indicates a region (code folding) event.
- ParseResultReplace($text, $match, $contextswitcher, $attr, @resultparsers, $string);
- SnippetForce
-
Forces the current text snippet into the output cue together with its attribute. Clears the snippet.
- SnippetParse($match, $attribute);
-
Tries to collect a snippet of text, as long as possible under the same attribute. If $attribute differs form the current attribute, the snippet and current attribute are parsed to the output. The snippet is emptied and the current attribute changed.
- StackPush($lexer, $context);
-
Pushes $lexer and $context to the stack. Also adds the collected dynamic captures. This $lexer now becomes the acting lexer.
Stack items are stored in the stack as [$lexer, $context, \@captures].
- StackPull
-
Pulls the last lexer and context that was pushed from the stack and returns it.
- StackTop
-
Returns a reference to the top item on the stack.
- testXXXXXX
-
These are the methods called by the rules in the lexers. They return true if a match was found and take care that the result gets parsed. $text stands for a reference to the line that is currently parsed.
- testAnyChar($text, $string, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);
-
Case sensitive.
- testAnyCharI($text, $string, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);
-
Case insensitive.
- testCommonColumn($text, $column, $nexttestmethod, @options);
- testCommonFirstNonSpace($text, $nexttestmethod, @options);
- testCommonLastCharBB($text, $nexttestmethod, @options);
- testCommonLastCharBb($text, $nexttestmethod, @options);
- testCommonLineStart($text, $nexttestmethod, @options);
- testDetectChar($text, $char, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);
-
Case sensitive.
- testDetectCharD($text, $char, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);
-
Dynamic.
- testDetectCharDI($text, $char, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);
-
Dynamic and case insensitive.
- testDetectCharI($text, $char, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);
-
Case insensitive.
- testDetect2Chars($text, $char1, $char2, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);
-
Case sensitive.
- testDetect2CharsD($text, $char1, $char2, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);
-
Dynamic.
- testDetect2CharsDI($text, $char1, $char2, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);
-
Dynamic and case insensitive.
- testDetect2CharsI($text, $char1, $char2, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);
-
Case insensitive.
- testDetectIdentifier($text, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);
- testDetectSpaces($text, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);
- testFloat($text, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);
- testHlCChar($text, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);
- testHlCHex($text, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);
- testHlCOct($text, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);
- testHlCStringChar($text, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);
- testInt($text, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);
- testKeyword($text, $list, $deliminators, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);
-
Case sensitive.
- testKeywordI($text, $list, $deliminators, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);
-
Case insensitive.
- testLineContinue($text, $char, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);
- testRangeDetect($text, $char1, $char2, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);
-
Case sensitive.
- testRangeDetectI($text, $char1, $char2, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);
-
Case insensitive.
- testRegExpr($text, $reg, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);
-
Case sensitive.
- testRegExprD($text, $regstring, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);
-
Dynamic.
- testRegExprDI($text, $regstring, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);
-
Dynamic and case insensitive.
- testRegExprI($text, $reg, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);
-
Case insensitive.
- testStringDetect($text, $string, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);
-
Case sensitive.
- testStringDetectD($text, $string, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);
-
Dynamic.
- testStringDetectDI($text, $string, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);
-
Dynamic and case insensitive.
- testStringDetectI($text, $string, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);
-
Case insensitive.
- testWordDetect($text, $string, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);
-
Case sensitive.
- testWordDetectD($text, $string, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);
-
Dynamic.
- testWordDetectDI($text, $string, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);
-
Dynamic and case insensitive.
- testWordDetectI($text, $string, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);
-
Case insensitive.
- UseAttribStackPush
- UseAttribStackPull
- UseAttribStackTop
-
Used with IncludeRules instructions. The UseAttribStack facilitates the use of the attribute of the calling rule/context to make usse of it's attribute instead of the native attribute of the included context.
ACKNOWLEDGEMENTS
All the people who wrote Kate and the syntax highlight xml files.
AUTHOR AND COPYRIGHT
This module is written and maintained by:
Hans Jeuken < hansjeuken at xs4all dot nl>
Copyright (c) 2017 by Hans Jeuken, all rights reserved.
Published under the GPLV3 license
SEE ALSO
Syntax::Kamelon::Builder, Syntax::Kamelon::Debugger, Syntax::Kamelon::Diagnostics, Syntax::Kamelon::Indexer, Syntax::Kamelon::XMLData, Syntax::Kamelon::Format::Base, Syntax::Kamelon::Format::ANSI, Syntax::Kamelon::Format::HTML4, Syntax::Kamelon::Syntaxes.