NAME

Text::VimColor - syntax color text in HTML or XML using Vim

SYNOPSIS

use Text::VimColor;
my $syntax = Text::VimColor->new(
   file => $0,
   filetype => 'perl',
);

print $syntax->html;
print $syntax->xml;

DESCRIPTION

This module tries to markup text files according to their syntax. It can be used to produce web pages with pretty-printed colourful source code samples. It can produce output in the following formats:

HTML

Valid XHTML 1.0, with the exact colouring and style left to a CSS stylesheet

XML

Pieces of text are marked with XML elements in a simple vocabulary, which can be converted to other formats, for example, using XSLT

Perl array

A simple Perl data structure, so that Perl code can be used to turn it into whatever is needed

This module works by running the Vim text editor and getting it to apply its excellent syntax highlighting (aka 'font-locking') to an input file, and mark pieces of text according to whether it thinks they are comments, keywords, strings, etc. The Perl code then reads back this markup and converts it to the desired output format.

This is an object-oriented module. To use it, create an object with the new function (as shown above in the SYNOPSIS) and then call methods to get the markup out.

METHODS

new(options)

Returns a syntax highlighting object. Pass it a hash of options.

The following options are recognised:

file

The file to syntax highlight. Can be either a filename or an open file handle.

Note that using a filename might allow Vim to guess the file type from its name if none is specified explicitly.

If the file isn't specified while creating the object, it can be given later in a call to the syntax_mark_file method (see below), allowing a single Text::VimColor object to be used with multiple input files.

string

Use this to pass a string to be used as the input. This is an alternative to the file option.

The syntax_mark_string method (see below) is another way to use a string as input.

html_full_page

By default the html() output method returns a fragment of HTML, not a full file. To make useful output this must be wrapped in a <pre> element and a stylesheet must be included from somewhere. Setting the html_full_page option will instead make the html() method return a complete stand-alone HTML file.

Note that while this is useful for testing, most of the time you'll want to put the syntax highlighted source code in a page with some other content, in which case the default output of the html() method is more appropriate.

html_inline_stylesheet

Turned on by default, but has no effect unless html_full_page is also enabled.

This causes the CSS stylesheet defining the colours to be used to render the markup to be be included in the HTML output, in a <style> element. Turn it off to instead use a <link> to reference an external stylesheet (recommended if putting more than one page on the web).

html_stylesheet

Ignored unless html_full_page and html_inline_stylesheet are both enabled.

This can be set to a stylesheet to include inline in the HTML output (the actual CSS, not the filename of it).

html_stylesheet_file

Ignored unless html_full_page and html_inline_stylesheet are both enabled.

This can be the filename of a stylesheet to copy into the HTML output, or a file handle to read one from. If neither this nor html_stylesheet are given, the supplied stylesheet light.css will be used instead.

html_stylesheet_url

Ignored unless html_full_page is enabled and html_inline_stylesheet is disabled.

This can be used to supply the URL (relative or absolute) or the stylesheet to be referenced from the HTML <link> element in the header. If this isn't given it will default to using a file: URL to reference the supplied light.css stylesheet, which is only really useful for testing.

xml_root_element

By default this is true. If set to a false value, XML output will not be wrapped in a root element called <syn:syntax>, but will be otherwise the same. This could allow XML output for several files to be concatenated, but to make it valid XML a root element must be added. Disabling this option will also remove the binding of the namespace prefix syn:, so an xmlns:syn attribute would have to be added elsewhere.

vim_command

The name of the executable which will be run to invoke Vim. The default is vim.

vim_options

A reference to an array of options to pass to Vim. The default options are:

qw( -RXZ -i NONE -u NONE -N )
syntax_mark_file(file)

Mark up the specified file. Subsequent calls to the output methods will then return the markup. It is not necessary to call this if a file or string option was passed to new().

Returns the object it was called on, so an output method can be called on it directly:

my $syntax = Text::VimColor->new(
   vim_command => '/usr/local/bin/special-vim',
);

foreach (@files) {
   print $syntax->syntax_mark_file($_)->html;
}
syntax_mark_string(string)

Does the same as syntax_mark_file (see above) but uses a string as input. Returns the object it was called on.

html()

Return XHTML markup based on the Vim syntax colouring of the input file.

Unless the html_full_page option is set, this will only return a fragment of HTML, which can then be incorporated into a full page. The fragment will be valid as either HTML and XHTML.

The only markup used for the actual text will be <span> elements wrapped round appropriate pieces of text. Each one will have a class attribute set to a name which can be tied to a foreground and background color in a stylesheet. The class names used will have the prefix syn, for example synComment. For the full list see the section HIGHLIGHTING TYPES below.

xml()

Returns markup in a simple XML vocabulary. Unless the xml_root_element option is turned off (it's on by default) this will produce a complete XML document, with all the markup inside a <syntax> element.

This XML output can be transformed into other formats, either using programs which read it with an XML parser, or using XSLT. See the text-highlight-vim(1) program for an example of how XSLT can be used with XSL-FO to turn this into PDF.

The markup will consist of mixed content with elements wrapping pieces of text which Vim recognized as being of a particular type. The names of the elements used are the ones listed in the HIGHLIGHTING TYPES section below.

The <syntax> element will declare the namespace for all the elements prodeced, which will be http://ns.laxan.com/text-vimcolor/1. It will also have an attribute called filename, which will be set to the value returned by the input_filename method, if that returns something other than undef.

The XML namespace is also available as $Text::VimColor::NAMESPACE_ID.

marked()

This output function returns the marked-up text in the format which the module stores it in internally. The data looks like this:

use Data::Dumper;
print Dumper($syntax->marked);

$VAR1 = [
   [ 'Statement', 'my' ],
   [ '', ' ' ],
   [ 'Identifier', '$syntax' ],
   [ '', ' = ' ],
    ...
];

The marked() method returns a reference to an array. Each item in the array is itself a reference to an array of two items: the first is one of the names listed in the HIGHLIGHTING TYPES section below (or the empty string if none apply), and the second is the actual piece of text.

input_filename()

Returns the filename of the input file, or undef if a filename wasn't specified.

HIGHLIGHTING TYPES

The following list gives the names of highlighting types which will be set for pieces of text. For HTML output, these will appear as CSS class names, except that they will all have the prefix syn added. For XML output, these will be the names of elements which will all be in the namespace http://ns.laxan.com/text-vimcolor/1.

Here is the complete list:

  • Comment

  • Constant

  • Identifier

  • Statement

  • PreProc

  • Type

  • Special

  • Underlined

  • Error

  • Todo

SEE ALSO

text-highlight-vim(1)

A simple command line interface to this module's features. It can be used to produce HTML and XML output, and can also generate PDF output using an XSLT/XSL-FO stylesheet and the FOP processor.

http://www.vim.org/

Everything to do with the Vim text editor.

http://ungwe.org/blog/

The author's weblog, which uses this module. It is used to make the code samples look pretty.

BUGS

Quite a few, actually:

  • Things can break if there is already a Vim swapfile, but sometimes it seems to work.

  • The 'string' option to new() and the syntax_mark_string() function should both accept a reference to a string as well as just a string.

  • If Vim returns an exit code, it should be interpreted and a human-readable error message given.

  • There should be a way of getting a DOM object back instead of an XML string.

  • It should be possible to choose between HTML and XHTML, and perhaps there should be some control over the DOCTYPE declaration when a complete file is produced.

  • With Vim versions earlier than 6.2 there is a 2 second delay each time Vim is run.

  • It doesn't work on Windows. I am unlikely to fix this, but if anyone who knows Windows can sort it out let me know.

AUTHOR

Geoff Richards <qef@laxan.com>

The Vim script mark.vim is a crufted version of 2html.vim by Bram Moolenaar <Bram@vim.org> and David Ne\v{c}as (Yeti) <yeti@physics.muni.cz>.

COPYRIGHT

Copyright 2002-2003, Geoff Richards.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.