Normalization
Graphemes are not very human readable and require interpolation, we can avoid both issues by not using them!
Rationale
This helps give consistency, clarity, and simplicity.
If we parse a string and find 'Commencing compilation \xe2\x80\xa6' then we have to interpolate that string into 'Commencing compilation …' before we can look it up to see if it exists in a hash.
Graphemes also add a layer of complexity that hinders translators and thus makes room for lower quality translations.
Developers have it slightly better in that they’ll recognize it but it still requires effort to figure out what it is exactly and to determine what sequence they need for a given character.
You can simply use the character itself or a bracket notation method for the handful of markup related or visually special characters
possible violations
If you get false positives then that only goes to help highlight how ambiguity adds to the reason to avoid non-bytes strings!
- Contains grapheme notation
-
A sequence of \xe2\x98\xba\xe2\x80\xa6 will be replaced w/ [comment,grapheme “\xe2\x98\xba\xe2\x80\xa6”]
possible warnings
None
Entire filter only runs under extra filter
See "extra filters" in Locale::Maketext::Utils::Phrase::Norm for more details.