NAME

Syntax::Kamelon - A versatile and fully programmable textual content parser that is extremely well suited for syntax highlighting and code folding

SYNOPSIS

use Syntax::Kamelon;

my @attributes = Syntax::Kamelon->AvailableAttributes;
my %formtab = ();
for (@attributes) {
   $formtab{$_} = "<font class=\"$_\">"
}

my $textfilter = "[%~ text FILTER html FILTER replace('\\040', '&nbsp;') FILTER replace('\\t', '&nbsp;&nbsp;&nbsp;') ~%]";
my $hl = new Syntax::Kamelon(
   xmlfolder => $xmldir,
   noindex => 1,
   formatter => ['Base',
      textfilter => \$textfilter,
      format_table => \%formtab,
      newline => "</br>\n",
      tagend => '</font>',
   ],
);
while (my $in = <IFILE>) {
  $hl->Parse($in);
}
print $hl->Format;

DESCRIPTION

Kamelon is based on the syntax highlighting and code folding algorithms used in the Kate texteditor of the KDE desktop. It replaces and supercedes Syntax::Highlight::Engine::Kate.

This is a rewrite and not backwards compatible with Syntax::Highlight::Engine::Kate.

Instead of using plugin modules it loads Kate's syntax highlight definition xml files directly. That makes development and testing a lot easier. It also opens up a new field of applications like creating your own highlight definitions to neatly format your reports. Tons of bugs have been removed. Testing has been redesigned. It runs about four times faster than version 0.10 and is up to spec with the latest Kate highlight definitions.

OPTIONS

Kamelons' constructor is called with a paired list of options as parameters. You can use the following options.

commands => ref to hash

Specify commands to execute upon a specific match. Example:

mycommand => sub { my $match = shift; return '' }

You can specify the command in the rules of your own syntax xml file. Kamelon will give the matched text to your sub as parameter and will parse whatever your sub returns. Make it always return at least an empty string.

formatter => ['Name', @options],

A formatter can be any object that inherits Syntax::Kamelon::Format::Base and lives in Syntax::Kamelon::Format::Name. By default 'Base' without options is loaded. This is convenient if you only use ParseRaw.

See also Syntax::Kamelon::Format::Base, Syntax::Kamelon::Format::ANSI, Syntax::Kamelon::Format::HTML4.

indexfile => filename

Specifies the filename where Kamelon stores information about available syntax definitions.

By default it points to 'indexrc' in the xmlfolder. If the file does not exist Kamelon will load all xml files in the xmlfolder and attempt to create the indexfile.

Once the indexfile has been created it becomes static. If you add a syntax definition XML file to the xmlfolder it will no longer be recognized. Delete the indexfile and reload Kamelon to fix that.

See also Syntax::Kamelon::Indexer

logcall => ref to sub

By default Kamelon writes all errors to STERR.

noindex => boolean

By default 0. If you set this option Kamelon will ignore the existence of an indexfile and manually build the index, without saving it. But it gives you the liberty of adding and removing syntax highlight definition files.

This option comes with a considerable startup penalty.

See also Syntax::Kamelon::Indexer

syntax => string

Specify the syntax definition you want to use. If you do not specify this option Kamelon will start in blank mode. It the Highlight and HighlightAndFormat methods will allow text to pass without any highlighting being done.

verbose => boolean

By default 0. If you set it Kamelon will happily complain about all integrity errors it finds in syntax xml files. Otherwise it will only complain and crunch about the ones it cannot overcome.

xmlfolder => folder

This is the place where Kamelon looks for syntax highlight definition XML files. By default it searches @INC for 'Syntax/Kamelon/XML'. Here you find the XML files used in the Kate text editor. They are specially crafted for this module.

See also Syntax::Kamelon::Indexer

PUBLIC METHODS

AvailableAttributes

Returns a list of all available attribute tags. Can also be called before initializing Kamelon.

AvailableSyntaxes

Returns a list of all available syntax definitions.

ClearLexers

Empties the pool of loaded lexers. Every called lexer will be loaded from scratch.

Column

Returns the column position in the line that is currently highlighted.

FirstNonSpace($string);

Returns true if the current line did not contain a non-spatial character so far and the first character in $string is a non spatial character.

Format

Calls the Format method of the currently loaded formatter and returns the result.

Formatter

Returns a reference to the formatter object.

GetIndexer

Returns the Indexer object.

GetLexer($syntax);

Returns the lexer data structure from the pool of loaded lexers. If it is not found it will create and return it.

IsDeliminator($char);

Returns true if $char is a deliminator.

LastcharBoundary

Returns true if the last character that was parsed was a word boundary.

LastcharDeliminator

Returns true if the last character that was parsed was a deliminator.

LastChar

Returns the last character that was parsed.

LineNumber

Returns the line number of the next line that is to be parsed.

LineStart

Returns true if the parser is at the beginning of a line.

LogCallGet;

Returns a reference to the anonymous sub that handles Warnings. See also the locall option.

LogCallSet($anonsub);

Sets the anonymous sub that handles Warnings. See also the locall option.

LogWarning($message);

Send a message to the warning mechanism of Kamelon.

Parse($text);

Parses $text and returns a formatted text.

ParseRaw($text);

Parses $text and returns a paired list of text fragments and the format information from the formatters FormatTable.

Reset

Clears all buffers and resets Kamelon to beginning state.

StateCompare($state);

Returns true if the current stack is equal to a previously saved $state. $state contains a reference to a list.

StateGet

Returns a copy of the stack in an array.

StateSet(@state);

Set the state to a previously saved state.

SuggestSyntax($filename);

Tries to come up with a suitable lexer for $filename. It matches the extension of the file against the extension database held by the Indexer. Returns undef if nothing is found.

Syntax($syntax);

Switches to the lexer in $syntax and performs a reset.

PRIVATE METHODS

Captured(\@captured)

Stores a list of items captured by the RegExpr rule in the stack with the current context. Used with dynamic rules.

CapturedGet($number)

Returns the captured item indexed by number that is stored in the stack with the parent context. Used with dynamic rules.

CapturedParseC($string)

$string should only be one character long and numeric. capturedParseC will return the Nth captured element of the parent context.

Used with dynamic rules.

CapturedParseR($string)

All occurences of \[1-9] will be replaced by the corresponding captured elements of the parent context.

Used with dynamic rules.

CommandExecute($cmdname, $matchedtext);

Calls the sub connected to $cmdname with $matchedtext as parameter. Returns whatever the sub returns.

IncludeRules($text, $callbacklist, $debuginfo, $inclattr);

Called when an IncludeRules instruction is encountered in a lexer context.

IncludeSyntax($text, $syntax, $context);

Same as IncludeRules, only the Rules refer to another syntax.

IncludeSyntaxIA($text, $syntax, $context, $attr);

Same as IncludeSyntax but now with the includeAttribute flag set.

InitFormatter($reftolist);

Tries to load a formatter module that lives in Syntax::Kamelon::Format::PluginName. The PluginName is the first item in the list. The other items are its options

These formatters do not yet work.

Initializes the formatter module. These modules are not yet ready for deployment.

LineEndContext($shifter);

Makes sure the context shift is done properly when a LineEndContext is specified.

ParseContext($text, $callbacklist, $debuginfo);

The internal combustion engine of Kamelon. It parses $$text through all callbacks in $callbacklist. If a match is found it parses the result and stops. Returns true if a match was made.

ParseLine($text);

Called by Parse. It is given one line (including the newline) as a parameter.

ParseResultXXXX

These methods are called by the resultparsers. These are anonymous subs within the rules of the lexer. Sometimes multiple resultparsers are embedded in a rule.

ParseResult($text, $match, $contextswitcher, $attr);
ParseResultCommand($text, $match, $contextswitcher, $attr, @resultparsers, $command);
ParseResultLookAhead($text, $match, $contextswitcher);
ParseResultOverStrike($text, $match, $contextswitcher, $attr, @resultparsers, $command);
ParseResultRegion($text, $match, $contextswitcher, $attr, $beginregion, $endregion);

Not yet functional. Should be called when a match indicates a region (code folding) event.

ParseResultReplace($text, $match, $contextswitcher, $attr, @resultparsers, $string);
SnippetForce

Forces the current text snippet into the output cue together with its attribute. Clears the snippet.

SnippetParse($match, $attribute);

Tries to collect a snippet of text, as long as possible under the same attribute. If $attribute differs form the current attribute, the snippet and current attribute are parsed to the output. The snippet is emptied and the current attribute changed.

StackPush($lexer, $context);

Pushes $lexer and $context to the stack. Also adds the collected dynamic captures. This $lexer now becomes the acting lexer.

Stack items are stored in the stack as [$lexer, $context, \@captures].

StackPull

Pulls the last lexer and context that was pushed from the stack and returns it.

StackTop

Returns a reference to the top item on the stack.

testXXXXXX

These are the methods called by the rules in the lexers. They return true if a match was found and take care that the result gets parsed. $text stands for a reference to the line that is currently parsed.

testAnyChar($text, $string, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);

Case sensitive.

testAnyCharI($text, $string, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);

Case insensitive.

testCommonColumn($text, $column, $nexttestmethod, @options);
testCommonFirstNonSpace($text, $nexttestmethod, @options);
testCommonLastCharBB($text, $nexttestmethod, @options);
testCommonLastCharBb($text, $nexttestmethod, @options);
testCommonLineStart($text, $nexttestmethod, @options);
testDetectChar($text, $char, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);

Case sensitive.

testDetectCharD($text, $char, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);

Dynamic.

testDetectCharDI($text, $char, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);

Dynamic and case insensitive.

testDetectCharI($text, $char, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);

Case insensitive.

testDetect2Chars($text, $char1, $char2, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);

Case sensitive.

testDetect2CharsD($text, $char1, $char2, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);

Dynamic.

testDetect2CharsDI($text, $char1, $char2, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);

Dynamic and case insensitive.

testDetect2CharsI($text, $char1, $char2, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);

Case insensitive.

testDetectIdentifier($text, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);
testDetectSpaces($text, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);
testFloat($text, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);
testHlCChar($text, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);
testHlCHex($text, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);
testHlCOct($text, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);
testHlCStringChar($text, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);
testInt($text, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);
testKeyword($text, $list, $deliminators, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);

Case sensitive.

testKeywordI($text, $list, $deliminators, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);

Case insensitive.

testLineContinue($text, $char, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);
testRangeDetect($text, $char1, $char2, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);

Case sensitive.

testRangeDetectI($text, $char1, $char2, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);

Case insensitive.

testRegExpr($text, $reg, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);

Case sensitive.

testRegExprD($text, $regstring, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);

Dynamic.

testRegExprDI($text, $regstring, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);

Dynamic and case insensitive.

testRegExprI($text, $reg, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);

Case insensitive.

testStringDetect($text, $string, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);

Case sensitive.

testStringDetectD($text, $string, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);

Dynamic.

testStringDetectDI($text, $string, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);

Dynamic and case insensitive.

testStringDetectI($text, $string, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);

Case insensitive.

testWordDetect($text, $string, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);

Case sensitive.

testWordDetectD($text, $string, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);

Dynamic.

testWordDetectDI($text, $string, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);

Dynamic and case insensitive.

testWordDetectI($text, $string, $contextswitcher, $attribute, $beginregion, $endregion, @resultparsers);

Case insensitive.

UseAttribStackPush
UseAttribStackPull
UseAttribStackTop

Used with IncludeRules instructions. The UseAttribStack facilitates the use of the attribute of the calling rule/context to make usse of it's attribute instead of the native attribute of the included context.

ACKNOWLEDGEMENTS

All the people who wrote Kate and the syntax highlight xml files.

AUTHOR AND COPYRIGHT

This module is written and maintained by:

Hans Jeuken < hansjeuken at xs4all dot nl>

Copyright (c) 2017 by Hans Jeuken, all rights reserved.

Published under the GPLV3 license

BUGS, ERRORS and DISCLAIMER

We know for a fact that the supplied xml files from Kate do not all produce accurate results. Most of them do though.

There are a few instances (Lilypond we know of) where Perl treats regular expressions slightly different from Kate. This became obvious as of Perl 5.26.

We have also chosen to not integrate some of Kate's tag features. We think they are a poor design choice. Kate allows you to specify tags like 'bold', 'italic' or a colour. We have foregone those. So for some xml definition files you will get different output when comparing it with a Kate editor window.

If you bump into one of these, unfortunately you are on your own.

What you can do is use your own set of xml definitions in a folder of your choice. Then edit the xml to your liking.

SEE ALSO

Syntax::Kamelon::Builder, Syntax::Kamelon::Debugger, Syntax::Kamelon::Diagnostics, Syntax::Kamelon::Indexer, Syntax::Kamelon::XMLData, Syntax::Kamelon::Format::Base, Syntax::Kamelon::Format::ANSI, Syntax::Kamelon::Format::HTML4, Syntax::Kamelon::Syntaxes.