=head1 NAME DateTime::Format::Builder - create DateTime parser objects. =head1 SYNOPSIS use DateTime::Format::Builder; my $parser = DateTime::Format::Builder->parser( params => [ qw( year month day hour minute second ) ], regex => qr/^(\d\d\d\d)(\d\d)(\d\d)(\d\d)(\d\d)(\d\d)$/, ); my $dt = $parser->parse_datetime( "197907161533" ); =head1 DESCRIPTION C<DateTime::Format::Builder> creates L<DateTime> parser objects. Many string formats of dates and times are simple and just require a basic regular expression to extract the relevant information. As such, they don't require a full blown module to be implemented. Hence, this module was written. It allows you to create parser objects and classes with a minimum of fuss. =head1 FORMATTING vs PARSING The name of this module is C<DateTime::Format::Builder>. This is, perhaps, somewhat misleading. It should be noted that the word C<Format> is being used as a noun, not a verb. =head1 CONSTRUCTORS =head2 new Creates a new C<DateTime::Format::Builder> object. If called as an object method, then it clones the object. No arguments. =head2 parser If called as a class method, it creates a new C<DateTime::Format::Builder> object with a specified parser. Parameters are as for L<create_parser>. If called as an object method, it creates a new parser for that object. (Essentially a shortcut for C<create_parser> and C<set_parser>.) # Class my $new_parser = DateTime::Format::Builder->parser( ... ); # Object $new_parser->parser( ... ) As a sidenote, when called as an object method (e.g. C<< $new_parser->parser(...) >>) then the object iself is returned (e.g. C<< $new_parser >>). =head2 clone For those who prefer to explicitly clone via a method called C<clone()>. If called as a class method it will die. my $clone = $original->clone(); =head2 create_class C<create_class> is different from the other constructors. It creates a full class for the parser, not just an instance of C<DateTime::Format::Builder>. It takes two optional parameters and one required one. =head3 OPTIONAL PARAMETERS =over 4 =item * C<class> is the name of the class to create. If not specified then it is inferred to be the current package. Generally best left unspecified. =item * C<version> is the version of the class. Generally best left unspecified unless C<class> is also specified (that is, you're not just preparing the current context). Why? Because CPAN won't pick up a version for a module that isn't specified with a C<$VERSION> like how CPAN wants, it won't behave properly. Ditto L<ExtUtils::MakeMaker> =back =head3 REQUIRED PARAMETER C<parsers> is the important parameter. It takes a hashref as an argument. This hashref is a list of method names and arrayrefs of parser specifications. For example (since the code is often clearer than my writing): package DateTime::Format::Brief; use DateTime::Format::Builder; DateTime::Format::Builder->create_class( parsers => { parse_datetime => [ { regex => qr/^(\d{4})(\d\d)(d\d)(\d\d)(\d\d)(\d\d)$/, params => [qw( year month day hour minute second )], }, { regex => qr/^(\d{4})(\d\d)(d\d)$/, params => [qw( year month day )], }, ], } ); If you just have one specification, you can just have it without the list: parse_datetime => { regex => qr/^(\d{4})(\d\d)(d\d)$/, params => [qw( year month day )], }, =head1 CLASS METHODS These methods work on either our objects or as class methods. =head2 create_parser Creates a function to parse datetime strings and return L<DateTime> objects. # Parse a 15 character ICal string my $parser_fn = DateTime::Format::Builder->create_parser({ regex => qr/^(\d\d\d\d)(\d\d)(\d\d)T(\d\d)(\d\d)(\d\d)$/, params => [qw( year month day hour minute second )] extra => {}, }); # Parse an 8 character ICal string my $short_ical_parser = DateTime::Format::Builder->create_parser( { params => [ qw( year month day ) ], regex => qr/^(\d\d\d\d)(\d\d)(\d\d)$/, } ); I call the arguments seen above 'specifications', or C<spec>. A reference to such a C<spec> is done in a hashref and I call this a C<specref>. Pardon the introduction of terminology, but it does make things simpler later on. I specify the layout of a C<spec> L<below|/Specificiations>. C<create_parser> (and most of the other routines because of this) can create a few different sorts of parser. For each type I'll have a bit in parens that indicates a the call style. =head1 SPECIFICATIONS A specification is typically a hashref (except for simple, single, parser creations where they can be just a hash). For example, here we have two specifications: my $inefficient_ical_parser = DateTime::Format::Builder->create_parser( { regex => qr/^(\d\d\d\d)(\d\d)(\d\d)T(\d\d)(\d\d)(\d\d)$/, params => [qw( year month day hour minute second )] }, { params => [ qw( year month day ) ], regex => qr/^(\d\d\d\d)(\d\d)(\d\d)$/, }, ); Right. And for further fun and games, any of these C<specrefs> can also be a coderef. The routine will be given C<$self> object (or it may just be a class string) and a date string on input, and is expected to return undef on failure, or a C<DateTime> object on success. =over 4 =item * C<regex> will be applied to the input of the created function. This argument is required. =item * C<params> is an arrayref that maps the results of C<regex> to parameters of C<< DateTime->new() >>. The first element is C<$1>, the second C<$2>, etc. This argument is required. =item * C<extra> is a hashref that lists what any extra arguments should be set to. You can use it to specify parameters to C<< DateTime->new() >>, such as C<time_zone>. =item * C<on_fail> is a reference to a subroutine (anonymous or otherwise) that will be called in the event of a parse failing. It will be passed a hash looking like: =over 4 =item * C<input>, being the input on which the parser failed =item * C<label>, being the label of the parser, if there is one =back =item * C<on_match> is just like C<on_fail>, only it's called in the event of success. =item * C<label> provides a name for the parser and is passed to C<on_fail> and C<on_match>. If you specified a set of parsers with some form of C<< X => Y >> hash style, then by default, the label is the C<X>. That will be overridden if you use this C<label> tag. =item * C<preprocess> is another callback. Its arguments are a hash consisting of the keys C<input> (the datetime string given to the parser) and C<parsed> (a hashref that is initially empty [unless your group of parser specifications had a preprocessor that put something in it]). You may put what you like in the hashref, and it will be kept. This callback is called I<after> length determination. =item * C<postprocess> is yet another callback. Its arguments the same as for C<preprocess>, except the C<parsed> hashref has been filled out with how the parse went. If parsing failed, it is not called. It is free to modify the hashref. Any changes will be reflected back. If the callback returns false, then the parse is regarded as a failure. B<Note>: ensure you return some true value if you don't want things to fail mysteriously. =back If you have a series of specification and want a common preprocessor, it can be specified like this: my $brief_parser = DateTime::Format::Builder->create_parser( [ preprocess => sub { whatever }, ], { regex => qr/^(\d{4})(\d\d)(d\d)(\d\d)(\d\d)(\d\d)$/, params => [qw( year month day hour minute second )], }, { regex => qr/^(\d{4})(\d\d)(d\d)$/, params => [qw( year month day )], }, ], } B<Note> that this works with the arrays of specs in C<create_class> too. B<Note also> that the arrayref B<must> be the first argument. The C<preprocess> sub is given a hash on input of the date to be parsed and a hashref in which to place any pre-calculated values. The hash keys are C<input> and C<parsed> respectively. The return value should be the date string that the parsers will then go on to process. A sample preprocessor (taken from L<DateTime::Format::ICal>) looks like this: my $add_tz = sub { my %args = @_; my ($date, $p) = @args{qw( input parsed )}; if ( $date =~ s/^TZID=([^:]+):// ) { $p->{time_zone} = $1; } # Z at end means UTC elsif ( $date =~ s/Z$// ) { $p->{time_zone} = 'UTC'; } else { $p->{time_zone} = 'floating'; } return $date; }; Any length calculations (for length parsers) are done after this preprocessing. =head1 OBJECT METHODS If you actually create a C<DateTime::Format::Builder> object, then you get the following methods on that object. =head2 set_parser / get_parser Set and get the object's parser function. Fairly straight forward and of minimal use, except for sub classes. =head2 parse_datetime Given an Builder day number, return a C<DateTime> object representing that date and time. # Having created our parser, somehow, we can: my $dt = $parser->parse_datetime( "1998-04-01 15:16:24" ); If you receive errors about things being undefined, then there was a parse failure. =head2 format_datetime Ok. We don't actually implement this. It's just here to make sure you know we don't. It's implemented like an abstract method: it will die if invoked. It will be available at some point. =head1 THANKS Dave Rolsky (DROLSKY) for kickstarting the DateTime project and some much needed review. Joshua Hoblitt (JHOBLITT) for the concept, some of the API, and more much needed review. Kellan Elliott-McCrea (KELLAN) for even more review! Simon Cozens (SIMON) for saying it was cool. =head1 SUPPORT Support for this module is provided via the datetime@perl.org email list. See http://lists.perl.org/ for more details. Alternatively, log them via the CPAN RT system via the web or email: http://perl.dellah.org/rt/dtbuilder bug-datetime-format-builder@rt.cpan.org This makes it much easier for me to track things and thus means your problem is less likely to be neglected. =head1 LICENSE AND COPYRIGHT Copyright E<copy> Iain Truskett, 2003. All rights reserved. This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. The full text of the licenses can be found in the F<Artistic> and F<COPYING> files included with this module. =head1 AUTHOR Iain Truskett <spoon@cpan.org> =head1 TODO =over 4 =item * More tests. =item * strptime compatible parsing =item * strftime compatible formatting =back =head1 SEE ALSO C<datetime@perl.org> mailing list. L<http://datetime.perl.org/> L<perl>, L<DateTime> =cut