NAME
Text::Extract::MaketextCallPhrases - Extract phrases from maketext–call–looking text
VERSION
This document describes Text::Extract::MaketextCallPhrases version 0.4
SYNOPSIS
use Text::Extract::MaketextCallPhrases;
my $results_ar = get_phrases_in_text($text);
use Text::Extract::MaketextCallPhrases;
my $results_ar = get_phrases_in_file($file);
DESCRIPTION
Well designed systems use consistent calls for localization. If you're really smart you've also used Locale::Maketext!!
You will probably have a collection of data that contains things like this:
$locale->maketext( ... ); (perl)
[% locale.maketext( ..., arg1 ) %] (TT)
!!* locale%greetings+programs | ... , arg1 | *!! (some bizarre thing you've invented)
This module looks for the first argument to things that look like maketext() calls (See "SEE ALSO") so that you can process as needed (lint check, add to lexicon management system, etc).
By default it looks for calls to maketext(), maketext_*_context(), lextext(), and translatable() (ala Locale::Maketext::Utils::MarkPhrase). If you use a shortcut (e.g. _()) or an unperlish format, it can do that too (You might also want to look at "SEE ALSO" for an alernative this module).
EXPORTS
get_phrases_in_text() and get_phrases_in_file() are exported by default unless you bring it in with require() or no-import use()
require Text::Extract::MaketextCallPhrases;
use Text::Extract::MaketextCallPhrases ();
INTERFACE
These functions return an array ref containg a "result hash" (described below) for each phrase found, in the order they appear in the original text.
get_phrases_in_text()
The first argument is the text you want to parse for phrases.
The second optional argument is a hashref of options. It's keys can be as follows:
- 'regexp_conf'
-
This should be an array reference. Each item in it should be an array reference with the following 2 items:
A regex object (i.e. qr()) that matches the beginning of the thing you are looking for.
The regex should simply match and remain simple as it gets used by the parser where and as needed. Do not anchor or capture in it!
qr/\<cptext/
A regex object (i.e. qr()) that matches the end of the thing you are looking for.
It can also be a coderef that gets passed the string matched by item 1 and returns the appropriate regex object (i.e. qr()) that matches the end of the thing you are looking for.
The regex should simply match and remain simple as it gets used by the parser where and as needed. Do not anchor or capture in it! If it is possible that there is space before the closing "whatever" you should include that too.
qr/\s*\>/
'regexp_conf' => [ [ qr/greetings\+programs \|/, qr/\s*\|/ ], [ qr/\_\(?/, sub { return substr( $_[0], -1, 1 ) eq '(' ? qr/\s*\)/ : qr/\s*\;/ } ], ],
- 'no_default_regex'
-
If you are using 'regexp_conf' then setting this to true will avoid using the default maketext() lookup. (i.e. only use 'regexp_conf')
- 'encode_unicode_slash_x'
-
Boolean (default is false) that when true will turn Unicode string notation \x{....} into a non-grapheme byte string. This will cause Encode to be loaded if needed.
Otherwise \x{....} are left in the phrase as-is.
- 'debug_ignored_matches'
-
This is an array that gets aggregate debug info on matches that did not look like something that should have a phrase associated with it.
Some examples of things that might match but would not :
sub i_heart_maketext { 1 } *i_heart_maketext = "foo"; goto &xyz::maketext; print $locale->Maketext("Hello World"); # maketext() is cool
- 'ignore_perlish_statement'
-
Boolean (default is false) that when true will cause matches that look like a statement to be put in 'debug_ignored_matches' instead of a result with a 'type' of 'no_arg'.
- 'ignore_perlish_comment'
-
Boolean (default is false) that when true will cause matches that look like a perl comment to be put in 'debug_ignored_matches' instead of a result.
Since this is parsing arbitrary text and thus there is no real context, interpreting what is a comment or not becomes very complex and context sensitive.
If you do not want to grab phrases from commented out data and this check does not work with this text's commenting scheme then yo could instead strip comments out of the text before parsing.
get_phrases_in_file()
Same as get_phrases_in_text() except it takes a path whose contents you want to process instead of text you want to process.
If it can't be opened returns false:
my $results = get_phrases_in_file($file) || die "Could not read '$file': $!";
The "result hash"
This hash contains the following keys that describe the phrase that was pasred.
- 'phrase'
-
The phrase in question.
- 'offset'
-
The offset in the text where the phrase started.
- 'line'
-
Available via get_phrases_in_file() only, not get_phrases_in_text().
The line number the offset applies to. If a phrase spans more than one line it should be the line it starts on - but you're too smart to let the phrase dictate output format right ;p?
- 'file'
-
Available via get_phrases_in_file() only, not get_phrases_in_text().
The file the result is from. Useful when aggregating results from multiple files.
- 'matched'
-
Chunk that matched the "maketext call" regex.
- 'regexp'
-
The array reference used to match this call/phrase. It is the same thing as each array ref passed in the regexp_conf list.
- 'quotetype'
-
If the match was in double quote context it will be 'double'. Soecial like \t and \n are interpolated.
If the match was in single quote context it will be 'single'. Specials like \t and \n remain literal.
Otherwise it won't exist.
- 'heredoc'
-
If the match was a here doc, it will contain the
- 'is_warning'
-
The phrase we found wasn't a string, which is odd.
- 'is_error'
-
The phrase we found looks like a mistake was made.
- 'type'
-
If the phrase is a warning or error this is a keyword that highlights why the parser wants you to look at it further.
The value can be:
- undef/non-existent
-
Was a normal string, all is well.
- 'command'
-
The phrase was a backtick or qx() expression.
- 'pattern'
-
The phrase was a regex or transliteration expression.
- 'empty'
-
The phrase was a hardcoded empty value.
- 'bareword'
-
The phrase was a bare word expression.
- 'perlish'
-
The phrase was perl-like expression (e.g. a variable)
- 'no_arg'
-
The call had no arguments
- 'multiline'
-
The call's argument did not contain a full entity. Probably due to a multiline phrase that is cut off at the end of the text being parsed.
This should only happen in the last item and means that some data need prependeds to the next chunk you will be parsing in effort to get a complete, parsable, argument.
my $string_1 = "maketext('I am the very model of "; my $string_2 = "of a modern major general.')"; my $results = get_phrases_in_text($string_1); if ( $results->[-1]->{'type'} eq 'multiline' ) { my $trailing_partial = pop @{$results}; $string_2 = $trailing_partial->{'matched'} . substr( $string_1, $trailing_partial->{'offset'} ) . $string_2; } push @{$results}, @{ get_phrases_in_text($string_2) };
DIAGNOSTICS
This module throws no warnings or errors of its own.
CONFIGURATION AND ENVIRONMENT
Text::Extract::MaketextCallPhrases requires no configuration files or environment variables.
DEPENDENCIES
Module::Want (In order to re-use the "name space" regex it has - hate to maintain it in more than one place)
INCOMPATIBILITIES
None reported.
CAVEATS
If the first thing following the "call" is a comment, the phrase will not be found.
This is because these are maketext-looking calls, not necessarily perl code. Thus interpreting what is a comment or not becomes very complex and context sensitive.
See "SEE ALSO" if you really need to support that convention (said convention seems rather silly but hey, its your code).
The result hash's values for that call are unknown (probably 'multiline' type and undef phrase). If that holds true then detecting one in the middle of your results stack is a sign of that condition.
BUGS AND LIMITATIONS
No bugs have been reported.
Please report any bugs or feature requests to bug-text-extract-maketextcallphrases@rt.cpan.org
, or through the web interface at http://rt.cpan.org.
SEE ALSO
Locale::Maketext::Extract it is a driver based OO parser that has a more complex and extensible interface that may serve your needs better.
AUTHOR
Daniel Muey <http://drmuey.com/cpan_contact.pl>
LICENCE AND COPYRIGHT
Copyright (c) 2011, Daniel Muey <http://drmuey.com/cpan_contact.pl>
. All rights reserved.
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See perlartistic.
DISCLAIMER OF WARRANTY
BECAUSE THIS SOFTWARE IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE SOFTWARE, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE SOFTWARE "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE SOFTWARE IS WITH YOU. SHOULD THE SOFTWARE PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR, OR CORRECTION.
IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE SOFTWARE AS PERMITTED BY THE ABOVE LICENCE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE SOFTWARE (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE SOFTWARE TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.