NAME
VCP::Filter::stringedit - alter any field character by character
SYNOPSIS
StringEdit:
## Convert illegal p4 characters to ^NN hex escapes and the
## p4 wildcard "..." to a safe string. The "^" is not an illegal
## char, it's replaced with an escape to allow us to use it as
## an escape character without the (extremely small) risk of
## running across a file name that actually uses it.
## Order is significant in this ruleset.
# field(s) match replacement
name,labels /([\s@#*%^])/ ^%02x
name,labels "..." ^___
StringEdit:
## underscorify each unwanted character to a single "_"
name,labels /[\s@#*%^]/ _
StringEdit:
## underscorify each run of unwanted characters to a single "_"
name,labels /[\s@#*%^]*/ _
StringEdit:
## prefix labels that don't start with a letter or underscore:
labels /([^a-zA-Z_])/ _%c
DESCRIPTION
Allows field by field string editing, using Perl regular expressions to match characters and substrings and sprintf-like replacement strings.
Rules
A rule is a triplet of expressions specifying a (1) set of fields to match, (2) a pattern to match against those fields' contents (matching contents are removed), and (3) a string to replace each of the removed bits with.
NOTE 1: the "match" expression uses perl5 regular expressions, not filename wildcards used in most other places in VCP configurations.
The list of rules is evaluated top down and all rules are applied to each string.
NOTE 2: The all-rules-apply nature of this filter is different from the behaviors of the ...Map: filters, which stop after the first matching rule. This is because ...Map: filters are rewriting entire strings and there can be only one result string, while the StringEdit filter may be rewriting pieces of string and multiple rewrites may be combined to good effect.
The Fields List
A comma separated list of field names. Any field may be edited except those that begin with "source_".
The Match Expression
For each field, the match expression is run against the field and, if it matches, causes all matching portions of string to be replaced.
The match expression is a full perl5 regular expression enclosed in /.../ delimiters or a plain string, either of which may be enclosed in '' or "" delimiters if inline spaces are needed (rare, we hope).
The Replacement Expression
Each match is replaced by one instance of the replacement expression, optionally enclosed in single or double quotation marks.
The replacement expression provides a limited list of C sprintf style macros:
%d The decimal codes for each character in the match
%o The octal codes for each character in the match
%x The hex codes for each character in the match
Any non-letter preceded by a backslash "\" character is replaced by itself. Some more or less useful examples:
\% \\ \" \' \` \{ \} \$ \* \+ \? \1
If a punctuation character other than a period (.) or slash "/" follows a letter macro, it must be escaped using the backslash character (this is to reserve room in the spec for postfix modifiers like "*", "+", and "?"). So, to put a literal star (*) after a hex code, you would do something like "%02x\*".
The "normal" perl5 letter abbreviations are also allowed:
\t tab (HT, TAB)
\n newline (NL)
\r return (CR)
\f form feed (FF)
\b backspace (BS)
\a alarm (bell) (BEL)
\e escape (ESC)
\033 octal char (ESC)
\x1b hex char (ESC)
\x{263a} wide hex char (SMILEY)
\c[ control char (ESC)
\N{name} named Unicode character
including the following escape sequences are available in constructs that modify what follows:
\l lowercase next char
\u uppercase next char
\L lowercase till \E
\U uppercase till \E
\E end case modification
\Q quote non-word characters till \E
As shown above, normal sprintf-style options may be included (and are recommended), so %02x produces results like "%09" (if the match was a single TAB character) or "%20" (if the match was a SPACE character). The dot precision modifiers (".3") are not supported, just the leading 0 and the field width specifier.
Case sensitivity
By default, all patterns are case sensitive. There is no way to override this at present; one will be added.
Command Line Parsing
For large stringedits or repeated use, the stringedit is best specified in a .vcp file. For quick one-offs or scripted situations, however, the stringedit: scheme may be used on the command line. In this case, each parameter is a "word" and every triple of words is a ( pattern, result ) pair.
Because vcp command line parsing is performed incrementally and the next filter or destination specifications can look exactly like a pattern or result, the special token "--" is used to terminate the list of patterns if StringEdit: is used on the command line. This may also be the last word in the StringEdit:
section of a .vcp file, but that is superfluous. It is an error to use "--" before the last word in a .vcp file.
LIMITATIONS
There is no way (yet) of telling the stringeditor to continue processing the rules list. We could implement labels like <<label
> > to be allowed before pattern expressions (but not between pattern and result), and we could then impelement <<goto label
> >. And a <<next
> > could be used to fall through to the next label. All of which is wonderful, but I want to gain some real world experience with the current system and find a use case for gotos and fallthroughs before I implement them. This comment is here to solicit feedback :).
AUTHOR
Barrie Slaymaker <barries@slaysys.com>
COPYRIGHT
Copyright (c) 2000, 2001, 2002 Perforce Software, Inc. All rights reserved.
See VCP::License (vcp help license
) for the terms of use.