NAME

Text::VCardFast - Perl extension for very fast parsing of VCards

SYNOPSIS

use Text::VCardFast;

my $hash = Text::VCard::vcard2hash($card, multival => ['adr', 'org', 'n']);
my $card = Text::VCard::hash2vcard($hash, "\r\n");

DESCRIPTION

Text::VCardFast is designed to parse VCards very quickly compared to pure-perl solutions. It has a perl and an XS version of the same API, accessible as vcard2hash_pp and vcard2hash_c, with the XS version being preferred.

Why would you care? We were writing the calendaring code for fastmail.fm, and it was taking over 6 seconds to draw respond to a request for calendar data, and the bulk was going to the perl middleware layer - and THAT profiled down to the vcard parser.

Two of us independently wrote better pure perl implementations, leading to about a 5 times speedup in each case. I figured it was worth checking if XS would be much better. Here's the benchmark on the v4 example from Wikipedia:

Benchmark: timing 10000 iterations of fastxs, pureperl, vcardasdata...
    fastxs:  0 wallclock secs ( 0.16 usr +  0.01 sys =  0.17 CPU) @ 58823.53/s (n=10000)
            (warning: too few iterations for a reliable count)
  pureperl:  1 wallclock secs ( 1.04 usr +  0.00 sys =  1.04 CPU) @ 9615.38/s (n=10000)
vcardasdata:  8 wallclock secs ( 7.35 usr +  0.00 sys =  7.35 CPU) @ 1360.54/s (n=10000)

(see bench.pl in the source tarball for the code)

EXPORT

vcard2hash
hash2vcard

API

Text::VCard::vcard2hash($card, %options);
Options:

* only_one - A flag which, if true, means parsing will stop after
  extracting a single VCard from the buffer.  This is very useful
  in cases where, for example, a disclaimer has been added after
  a calendar event in an email.

* multival - A list of entry names which will be considered to have
  multiple values.  Instead of having a 'value' field in the hash,
  entries with this key will have a 'values' field containing an
  arrayref of values - even if there is only one value.
  The value is split on semicolon, with escaped semicolons decoded
  correctly within each item.

  Default is the empty list.

* multiparam - As with values - multiparam is a list of entry names
  which can have multiple values.  To see the difference here you
  must consider something like this:

  EMAIL;TYPE="INTERNET,HOME";TYPE=PREF:example@example.com

  If 'multiparam' includes 'TYPE' then the result will be:
  ['INTERNET', 'HOME', 'PREF'], otherwise it will be:
  ['INTERNET,HOME', 'PREF'].

  Default is the empty list.

* barekeys - if set, then a bare parameter will be considered to be
  a parameter name with an undefined value, rather than a being a
  value for the parameter type.

  Consider:

  EMAIL;INTERNET;HOME:example@example.com

  barekeys off:

  {
    name => 'email',
    params => { type => ['INTERNET', 'HOME'] },
    value => 'example@example.com',
  }

  barekeys on:

  {
    name => 'email',
    params => { internet => [undef], home => [undef] },
    value => 'example@example.com',
  }

  default is barekeys off.

The input is a scalar containing VFILE text, as per RFC 6350 or the various
earlier RFCs it replaces.  If the perl unicode flag is set on the scalar,
then it will be propagated to the output values.

The output is a hash reference containing a single key 'objects', which is
an array of all the cards within the source text.

Each object can have the following keys:
* type - the text after BEGIN: and END: of the card (lower cased)
* properties - a hash from name to array of instances within the card.
* objects - an array of sub cards within the card.

Properties are a hash with the following keys:
* group - optional - if the propery name as 'foo.bar', this will be foo.
* name - a copy of the hash key that pointed to this property, so that
  this hash can be used without keeping the key around too
* params - a hash of the parameters on the entry.  This is everything from
  the ; to the :
* value - either a scalar (if not a multival field) or an array of values.
  This is everything after the :

Decoding is done where possible, including RFC 6868 handling of ^.

All names, both entry names and parameter names, are lowercased where the
RFC says they are not case significant.  This means that all hash keys are
lowercase within this API, as are card types.

Values, on the other hand, are left in their original case even where the
RFC says they are case insignificant - due to the increased complexity of
tracking which version what parameters are in effect.
Text::VCard::hash2vcard($hash, $eol)
The inverse operation (as much as possible!)

Given a hash with an 'objects' key in it, output a scalar string containing
the VCARD representation.  Lines are separated with the $eol string given,
or the default "\n".  Use "\r\n" for files going to caldav/carddav servers.

In the inverse of the above case, where names are case insignificant, they
are generated in UPPERCASE in the card, for maximum compatibility with
other implementations.

EXAMPLES

For more examples see the t/cases directory in the tarball, which contains
some sample VCARDs and JSON dumps of the hash representation.

BEGIN:VCARD
KEY;PKEY=PVALUE:VALUE
KEY2:VALUE2
END:VCARD

{
'objects' => [
  {
    'type' => 'vcard',
    'properties' => {
      'key2' => [
        {
          'value' => 'VALUE2',
          'name' => 'key2'
        }
      ],
      'key' => [
        {
          'params' => {
            'pkey' => [
              'PVALUE'
            ]
          },
          'value' => 'VALUE',
          'name' => 'key'
        }
      ]
    }
  }
]
}

BEGIN:VCARD
BEGIN:SUBCARD
KEY:VALUE
END:SUBCARD
END:VCARD

{
'objects' => [
  {
    'objects' => [
      {
        'type' => 'subcard',
        'properties' => {
          'key' => [
            {
              'value' => 'VALUE',
              'name' => 'key'
            }
          ]
        }
      }
    ],
    'type' => 'vcard',
    'properties' => {}
  }
]
}

BEGIN:VCARD
GROUP1.KEY:VALUE
GROUP1.KEY2:VALUE2
GROUP2.KEY:VALUE
END:VCARD

{
'objects' => [
  {
    'type' => 'vcard',
    'properties' => {
      'key2' => [
        {
          'group' => 'group1',
          'value' => 'VALUE2',
          'name' => 'key2'
        }
      ],
      'key' => [
        {
          'group' => 'group1',
          'value' => 'VALUE',
          'name' => 'key'
        },
        {
          'group' => 'group2',
          'value' => 'VALUE',
          'name' => 'key'
        }
      ]
    }
  }
]
}

SEE ALSO

There is a similar module Text::VFile::asData on CPAN, but it is much slower and doesn't do as much decoding.

Code is stored on github at

https://github.com/brong/Text-VCardFast/

AUTHOR

Bron Gondwana, <brong@fastmail.fm<gt>

COPYRIGHT AND LICENSE

Copyright (C) 2014 by Bron Gondwana

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.14.2 or, at your option, any later version of Perl 5 you may have available.