Normalization
The only single white space characters allowed are normal space and non-break-space.
Rationale
A tiny change in white-space[-ish] characters will make a phrase lookup fail erroneously.
The only other purpose of allowing characters like this would be formatting which should not be part of a phrase.
Such formatting is not applicable to all contexts (e.g. HTML)
Since it is not a translatable entity translators are likley to miss it and break your format.
Same text with different formatting becomes a new, redundant, phrase.
Doing internal formatting via bracket notation’s output() methods address the first 2 completely and the third one most of the time (it can be “completely” if you give it a little thought first).
It is easy for a developer to miss the subtle difference and get it wrong.
Surrounding whitespace is likely a sign that partial phrases are in use.
That being the case we simplify consistently by using single space and non-break-space characters inside the string (and the beginning if it starts with an ellipsis).
possible violations
- Invalid whitespace-like characters
-
The string contains white space characters besides space and non-break-space, invisible characters, or control characters.
These will be turned into “[comment,invalid char UxNNNN]” (where NNNN is the Unicode code point) so you can find them visually.
- Beginning white space
-
These are removed.
This accounts for strings beginning with an ellipsis which should be preceded by one space.
- Beginning ellipsis space should be a normal space
-
If a string starts with an ellipsis it should be a normal space. A non-break-space implies formatting or concatenation of 2 partial phrases, ick!
- Trailing white space
-
These are removed.
- Multiple internal white space
-
These are collapsed into a single space.
possible warnings
None