NAME
Text::VCardFast - Perl extension for very fast parsing of VCards
SYNOPSIS
use Text::VCardFast;
my $hash = Text::VCard::vcard2hash($card, multival => ['adr', 'org', 'n']);
my $card = Text::VCard::hash2vcard($hash, "\r\n");
DESCRIPTION
Text::VCardFast is designed to parse VCards very quickly compared to pure-perl solutions. It has a perl and an XS version of the same API, accessible as vcard2hash_pp and vcard2hash_c, with the XS version being preferred.
Why would you care? We were writing the calendaring code for fastmail.fm, and it was taking over 6 seconds to draw respond to a request for calendar data, and the bulk was going to the perl middleware layer - and THAT profiled down to the vcard parser.
Two of us independently wrote better pure perl implementations, leading to about a 5 times speedup in each case. I figured it was worth checking if XS would be much better. Here's the benchmark on the v4 example from Wikipedia:
Benchmark: timing 10000 iterations of fastxs, pureperl, vcardasdata...
fastxs: 0 wallclock secs ( 0.16 usr + 0.01 sys = 0.17 CPU) @ 58823.53/s (n=10000)
(warning: too few iterations for a reliable count)
pureperl: 1 wallclock secs ( 1.04 usr + 0.00 sys = 1.04 CPU) @ 9615.38/s (n=10000)
vcardasdata: 8 wallclock secs ( 7.35 usr + 0.00 sys = 7.35 CPU) @ 1360.54/s (n=10000)
(see bench.pl in the source tarball for the code)
EXPORT
vcard2hash
hash2vcard
API
- Text::VCard::vcard2hash($card, %options);
-
Options: * only_one - A flag which, if true, means parsing will stop after extracting a single VCard from the buffer. This is very useful in cases where, for example, a disclaimer has been added after a calendar event in an email. * multival - A list of entry names which will be considered to have multiple values. Instead of having a 'value' field in the hash, entries with this key will have a 'values' field containing an arrayref of values - even if there is only one value. The value is split on semicolon, with escaped semicolons decoded correctly within each item. Default is the empty list. * multiparam - As with values - multiparam is a list of entry names which can have multiple values. To see the difference here you must consider something like this: EMAIL;TYPE="INTERNET,HOME";TYPE=PREF:example@example.com If 'multiparam' includes 'TYPE' then the result will be: ['INTERNET', 'HOME', 'PREF'], otherwise it will be: ['INTERNET,HOME', 'PREF']. Default is the empty list. * barekeys - if set, then a bare parameter will be considered to be a parameter name with an undefined value, rather than a being a value for the parameter type. Consider: EMAIL;INTERNET;HOME:example@example.com barekeys off: { name => 'email', params => { type => ['INTERNET', 'HOME'] }, value => 'example@example.com', } barekeys on: { name => 'email', params => { internet => [undef], home => [undef] }, value => 'example@example.com', } default is barekeys off. The input is a scalar containing VFILE text, as per RFC 6350 or the various earlier RFCs it replaces. If the perl unicode flag is set on the scalar, then it will be propagated to the output values. The output is a hash reference containing a single key 'objects', which is an array of all the cards within the source text. Each object can have the following keys: * type - the text after BEGIN: and END: of the card (lower cased) * properties - a hash from name to array of instances within the card. * objects - an array of sub cards within the card. Properties are a hash with the following keys: * group - optional - if the propery name as 'foo.bar', this will be foo. * name - a copy of the hash key that pointed to this property, so that this hash can be used without keeping the key around too * params - a hash of the parameters on the entry. This is everything from the ; to the : * value - either a scalar (if not a multival field) or an array of values. This is everything after the : Decoding is done where possible, including RFC 6868 handling of ^. All names, both entry names and parameter names, are lowercased where the RFC says they are not case significant. This means that all hash keys are lowercase within this API, as are card types. Values, on the other hand, are left in their original case even where the RFC says they are case insignificant - due to the increased complexity of tracking which version what parameters are in effect.
- Text::VCard::hash2vcard($hash, $eol)
-
The inverse operation (as much as possible!) Given a hash with an 'objects' key in it, output a scalar string containing the VCARD representation. Lines are separated with the $eol string given, or the default "\n". Use "\r\n" for files going to caldav/carddav servers. In the inverse of the above case, where names are case insignificant, they are generated in UPPERCASE in the card, for maximum compatibility with other implementations.
EXAMPLES
For more examples see the t/cases directory in the tarball, which contains
some sample VCARDs and JSON dumps of the hash representation.
BEGIN:VCARD
KEY;PKEY=PVALUE:VALUE
KEY2:VALUE2
END:VCARD
{
'objects' => [
{
'type' => 'vcard',
'properties' => {
'key2' => [
{
'value' => 'VALUE2',
'name' => 'key2'
}
],
'key' => [
{
'params' => {
'pkey' => [
'PVALUE'
]
},
'value' => 'VALUE',
'name' => 'key'
}
]
}
}
]
}
BEGIN:VCARD
BEGIN:SUBCARD
KEY:VALUE
END:SUBCARD
END:VCARD
{
'objects' => [
{
'objects' => [
{
'type' => 'subcard',
'properties' => {
'key' => [
{
'value' => 'VALUE',
'name' => 'key'
}
]
}
}
],
'type' => 'vcard',
'properties' => {}
}
]
}
BEGIN:VCARD
GROUP1.KEY:VALUE
GROUP1.KEY2:VALUE2
GROUP2.KEY:VALUE
END:VCARD
{
'objects' => [
{
'type' => 'vcard',
'properties' => {
'key2' => [
{
'group' => 'group1',
'value' => 'VALUE2',
'name' => 'key2'
}
],
'key' => [
{
'group' => 'group1',
'value' => 'VALUE',
'name' => 'key'
},
{
'group' => 'group2',
'value' => 'VALUE',
'name' => 'key'
}
]
}
}
]
}
SEE ALSO
There is a similar module Text::VFile::asData on CPAN, but it is much slower and doesn't do as much decoding.
Code is stored on github at
https://github.com/brong/Text-VCardFast/
AUTHOR
Bron Gondwana, <brong@fastmail.fm<gt>
COPYRIGHT AND LICENSE
Copyright (C) 2014 by Bron Gondwana
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.14.2 or, at your option, any later version of Perl 5 you may have available.